Pretrained Model Weights Not Updating During DPO Training

Ask Question

Asked 1 month ago

Modified 1 month ago

Viewed 32 times

I'm trying to apply DPO to a pre-trained model. However, during the training process, the scores given by the pre-trained model and the fine-tuned model are identical, and the loss remains the same across all batches, leading me to believe the weights are not being updated. My training method is given below.

def train(model, optimizer, pref_set, dispref_set, epochs, beta, bs):
    model.train()
    #print(list(model.parameters())[0])
    #print(list(model.parameters())[0].grad)
    for epoch in range(epochs):
        cur_pref=[]
        cur_dispref=[]
        for i in range(len(pref_set)):
            cur_pref.append(pref_set[i])
            cur_dispref.append(dispref_set[i]) #collects preferred and dispreferred responses
            if (i+1) % bs == 0:
                make_fastas(cur_pref, cur_dispref) #sets up necessary files
                run_mpnn('model-DPO') #scores responses
                optimizer.zero_grad()
                b_ref, nb_ref, b_dpo, nb_dpo = collect_logps(cur_pref) #collects scores
                loss = calc_loss(b_dpo, nb_dpo, b_ref, nb_ref, beta) #computes DPO loss
                print(loss)
                loss.backward()
                optimizer.step()
                print(optimizer)
                torch.save({ #saves updated model for next round of scoring
                        'epoch': epoch+1,
                        'step': i,
                        'num_edges' : 48,
                        'noise_level': 0.2,
                        'model_state_dict': model.state_dict(),
                        'optimizer_state_dict': optimizer.state_dict(),
                        }, "../ProteinMPNN/vanilla_model_weights/model-DPO.pt")
                print(loss)
                cur_pref=[]
                cur_dispref=[]

In short, the scoring of my preferred and dispreferred responses must be done in a separate script, meaning I must save the updated model after each batch to be loaded for the following round of scoring. But as I mentioned, the model weights are not changing, and the scores returned by the reference and target models are always the same. Any help in resolving this issue would be greatly appreciated.

I've checked to make sure that the model parameters are initialized correctly, with requires_grad=True. They also have no gradient before training (list(model.parameters())[0].grad = None). I also checked to ensure that I'm not overwriting the updated model weights, or accidentally loading the vanilla weights during scoring. I double checked my loss function, and tried setting the loss and learning rates to arbitrarily high values to force the weights to update. However, no change in scoring occurred. The model parameter gradient after the backward call is still None, and I'm not sure why. As mentioned previously, all model parameters are initialized with requires_grad=True.

asked Jun 24 at 19:48

jeash

11 bronze badge

Add a comment |

Collectives™ on Stack Overflow

Pretrained Model Weights Not Updating During DPO Training

0

Browse other questions tagged
machine-learning
large-language-model
fine-tuning
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged machine-learninglarge-language-modelfine-tuning or ask your own question.

Browse other questions tagged
machine-learning
large-language-model
fine-tuning
or ask your own question.