The Wayback Machine - http://web.archive.org/web/20211120200724/https://github.com/ContinualAI/avalanche/issues/743
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the possibility to pretrain on multiple tasks #743

Open

AlbinSou opened this issue Sep 15, 2021 · 3 comments
Open

Add the possibility to pretrain on multiple tasks #743

AlbinSou opened this issue Sep 15, 2021 · 3 comments

Comments

@AlbinSou
Copy link
Contributor

@AlbinSou AlbinSou commented Sep 15, 2021

For the moment, the nc_benchmark generator function allows for a nc_first_task option, which is good for pre-training in the class-incremental learning scenario. However, the same kind of option is not available if one wants to pretrain in the task-incremental scenario. It would be nice to have an option that could be used together with task_labels=True and allows for pretraining on multiple tasks at the same time, in a multitask training manner.

This kind of pre-training is used for instance in Lifelong Learning of Compositional Structures

A quick fix that I'm using for now, but that is breaking some things (maybe to be put in bugs?) is the following:

# Number of tasks to pretrain on
pretrain = 4
pretrain_datasets = [exp.dataset for exp in scenario.train_stream[:pretrain]]

# Modify the first experience so that it contains data of the 4 first ones
first_experience = scenario.train_stream[0]
first_experience.dataset = AvalancheConcatDataset(pretrain_datasets)

# Train on the modified first experience
cl_strategy.train(first_experience)

# Train on the rest of the experiences
for t, experience in enumerate(scenario.train_stream[pretrain:]):
    cl_strategy.train(experience)

Doing this works as intended except that it multiplies the batch_size by the number of pretraining tasks for some reason:

  • size of strategy.mb_x when pretrain=4: (256, 3, 32, 32)
  • size of strategy.mb_x when pretrain=1: (64, 3, 32, 32)
@AntonioCarta
Copy link
Collaborator

@AntonioCarta AntonioCarta commented Sep 16, 2021

I agree about the nc_first_task option, we should also have it for multi-task scenarios.

Your snippets seems wrong. Instead of modifying the experiences in place, it's easier to create a new benchmark by first concatenating/splitting the datasets however you like, and the using one of the generic builders, like dataset_benchmark.

If you still get an error using dataset_benchmark, feel free to open a question on the Discussions.

Loading

@AlbinSou
Copy link
Contributor Author

@AlbinSou AlbinSou commented Sep 16, 2021

I agree about the nc_first_task option, we should also have it for multi-task scenarios.

Your snippets seems wrong. Instead of modifying the experiences in place, it's easier to create a new benchmark by first concatenating/splitting the datasets however you like, and the using one of the generic builders, like dataset_benchmark.

If you still get an error using dataset_benchmark, feel free to open a question on the Discussions.

Yes, I agree that this is an ugly fix, I also tried like you did but the batch size is still multiplied by the number of tasks in the first experience. I think this comes from TaskBalancedDataLoader, but I don't know if it's intended that the batch size is increased that way.

Loading

@AntonioCarta
Copy link
Collaborator

@AntonioCarta AntonioCarta commented Sep 17, 2021

Ok, now I get it. Yes, it's normal, some of the dataloaders, like the TaskBalancedDataLoader add batch_size samples for each group (task/experience ...). Maybe we should rename the parameter to avoid confusion.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants