Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on Convergence and Grad Norm Behavior During Training with GaLore #66

Open
chelouche9 opened this issue Nov 9, 2024 · 0 comments

Comments

@chelouche9
Copy link

Hi,

I have a question regarding model convergence behavior when using the GaLore optimizer. My use case involves training Llama-3.2-1B-Instruct on a new language.

During training, I observed that the gradient norm is consistently reported as 0. From my understanding, a gradient norm of 0 typically suggests that the model may not be learning effectively. However, despite this, I noticed that the loss function is steadily minimizing over time, and the model’s performance improves when evaluated, which suggests learning is occurring.

Could the zero gradient norm be expected behavior due to how GaLore functions, or might there be something I am missing? I am particularly interested in understanding if this behavior might be influenced by the GaLore optimizer’s unique characteristics.

Additionally, I noticed log messages at the start of training indicating that GaLore only supports linear layers, and some of my model’s layers are being ignored. Might it be related?

Screenshot 2024-11-09 at 12 58 28

I am using the following configuration:

args = TrainingArguments(
    output_dir="./why-1B-v0.1",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_checkpointing=True,
    optim="galore_adamw_8bit_layerwise",
    optim_target_modules=["attn", "mlp"],
    logging_steps=5,
    save_steps=50,
    save_total_limit=5,
    report_to="wandb",
    lr_scheduler_type="cosine",
    learning_rate=5e-5,
)

model_id = "meta-llama/Llama-3.2-1B-Instruct"

config = AutoConfig.from_pretrained(model_id)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="cuda",
    torch_dtype=torch.bfloat16,
)

trainer = trl.SFTTrainer(
    model=model,
    args=args,
    train_dataset=chunked_ds,
    dataset_text_field='text',
    max_seq_length=5000,
)
trainer.train(
    # resume_from_checkpoint=True
)

Logs I get while I run the training.

model.layers.0.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.0.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.0.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.0.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant