Fine-tuned LLaMA-2-Chat-HF Model Generates Same Responses as Pre-trained Model and Suitability for Retrieval-based Task

Ask Question

Asked 1 month ago

Modified 1 month ago

Viewed 32 times

Part of NLP Collective

I am working on building a chatbot for substance abuse support. My approach involves two main steps:

Fine-tuning the LLaMA-2-Chat-HF model: I have fine-tuned the LLaMA-2-Chat-HF model using a dataset of mental health conversations. The dataset was transformed into an instruction template format before fine-tuning.
Retrieval-based system: I am using a retrieval-based system to fetch information from a vector database that contains a textbook on substance abuse support (theory and practice).

After fine-tuning, I am using the fine-tuned model to generate reponses using the context/information retrieved from the vector database. The process involves passing both the context and the user query into a prompt template, and then generating answers using an LLM chain. However, I am encountering the following issues:

Both the fine-tuned model and the pre-trained model are generating the exact same responses when queried, despite using the context from the vector database.

My Questions: Why might the fine-tuned LLaMA-2-Chat-HF model be generating the same responses as the pre-trained model? What could be causing this issue? Is the fine-tuned LLaMA-2-Chat-HF model suitable for a retrieval-based task? If not, what adjustments or different approaches should I consider?

Any insights, suggestions, or alternative approaches would be greatly appreciated.

Code for retrieval using the fine tuned model:

DB_FAISS_PATH = 'vectorstores/db_faiss'

custom_prompt_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:{context}
Question:{question}

Only return the helpful answer below and nothing else.
Helpful answer:
"""

def set_custom_prompt():
     """
     Prompt template for QA retrieval for each vector store
     """
     prompt = PromptTemplate(template=custom_prompt_template, input_variables=['context', 'question'])
     return prompt

def create_llm(model, tokenizer):

     text_generation_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

     llm = HuggingFacePipeline(pipeline=text_generation_pipeline)
     return llm

def retrieval_qa_chain(llm, prompt, db):
     qa_chain = RetrievalQA.from_chain_type(
         llm=llm,
         chain_type='stuff',
         retriever=db.as_retriever(search_kwargs={'k': 2}),
         return_source_documents=True,
         chain_type_kwargs={'prompt': prompt}
     )
     return qa_chain

def qa_bot(model, tokenizer):
     embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
     db = FAISS.load_local(DB_FAISS_PATH, embeddings, allow_dangerous_deserialization=True)
     llm = create_llm(model, tokenizer)
     qa_prompt = set_custom_prompt()
     qa = retrieval_qa_chain(llm, qa_prompt, db)
     return qa

def final_result(query, model, tokenizer):
     qa = qa_bot(model, tokenizer)

     qa_result = qa({'query': query})

     response = qa_result['result']

     helpful_answer = response.split('Helpful answer:')[-1].strip()

     return helpful_answer.strip()

asked Jun 20 at 14:50

Hannah Mariam John

111 bronze badge

Add a comment |

Collectives™ on Stack Overflow

Fine-tuned LLaMA-2-Chat-HF Model Generates Same Responses as Pre-trained Model and Suitability for Retrieval-based Task

0

Browse other questions tagged
nlp
chatbot
large-language-model
llama
fine-tuning
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged nlpchatbotlarge-language-modelllamafine-tuning or ask your own question.

Browse other questions tagged
nlp
chatbot
large-language-model
llama
fine-tuning
or ask your own question.