1

I am working on building a chatbot for substance abuse support. My approach involves two main steps:

  • Fine-tuning the LLaMA-2-Chat-HF model: I have fine-tuned the LLaMA-2-Chat-HF model using a dataset of mental health conversations. The dataset was transformed into an instruction template format before fine-tuning.
  • Retrieval-based system: I am using a retrieval-based system to fetch information from a vector database that contains a textbook on substance abuse support (theory and practice).

After fine-tuning, I am using the fine-tuned model to generate reponses using the context/information retrieved from the vector database. The process involves passing both the context and the user query into a prompt template, and then generating answers using an LLM chain. However, I am encountering the following issues:

Both the fine-tuned model and the pre-trained model are generating the exact same responses when queried, despite using the context from the vector database.

My Questions: Why might the fine-tuned LLaMA-2-Chat-HF model be generating the same responses as the pre-trained model? What could be causing this issue? Is the fine-tuned LLaMA-2-Chat-HF model suitable for a retrieval-based task? If not, what adjustments or different approaches should I consider?

Any insights, suggestions, or alternative approaches would be greatly appreciated.

Code for retrieval using the fine tuned model:

DB_FAISS_PATH = 'vectorstores/db_faiss'

custom_prompt_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:{context}
Question:{question}

Only return the helpful answer below and nothing else.
Helpful answer:
"""

def set_custom_prompt():
     """
     Prompt template for QA retrieval for each vector store
     """
     prompt = PromptTemplate(template=custom_prompt_template, input_variables=['context', 'question'])
     return prompt

def create_llm(model, tokenizer):

     text_generation_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

     llm = HuggingFacePipeline(pipeline=text_generation_pipeline)
     return llm

def retrieval_qa_chain(llm, prompt, db):
     qa_chain = RetrievalQA.from_chain_type(
         llm=llm,
         chain_type='stuff',
         retriever=db.as_retriever(search_kwargs={'k': 2}),
         return_source_documents=True,
         chain_type_kwargs={'prompt': prompt}
     )
     return qa_chain

def qa_bot(model, tokenizer):
     embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
     db = FAISS.load_local(DB_FAISS_PATH, embeddings, allow_dangerous_deserialization=True)
     llm = create_llm(model, tokenizer)
     qa_prompt = set_custom_prompt()
     qa = retrieval_qa_chain(llm, qa_prompt, db)
     return qa

def final_result(query, model, tokenizer):
     qa = qa_bot(model, tokenizer)

     qa_result = qa({'query': query})

     response = qa_result['result']

     helpful_answer = response.split('Helpful answer:')[-1].strip()

     return helpful_answer.strip()

0