Hamid Shojanazeri 1 year ago
parent
commit
8d0714714f
2 changed files with 2 additions and 2 deletions
  1. 1 1
      tutorials/chatbot/README.md
  2. 1 1
      tutorials/chatbot/data_pipelines/REAME.md

+ 1 - 1
tutorials/chatbot/README.md

@@ -17,7 +17,7 @@ Fine-tuning LLMs here LLama 2 involves several key steps:
 
 - Volume: The amount of data needed depends on the complexity of the task and the variability of the language in your domain. Generally, more data leads to a better-performing model, but aim for high-quality, relevant data.
 
-Here, we are going to use [self-instruct](https://arxiv.org/abs/2212.10560) idea and use OpenAI GPT3 model to build our dataset, for details please check this [doc](./data_pipelines/REAME.md).
+Here, we are going to use [self-instruct](https://arxiv.org/abs/2212.10560) idea and use Llama model to build our dataset, for details please check this [doc](./data_pipelines/REAME.md).
 
 2. **Data Formatting** 
 

+ 1 - 1
tutorials/chatbot/data_pipelines/REAME.md

@@ -15,7 +15,7 @@ The idea here is to use Llama 70B using OctoAI APIs, to create question and answ
 
 ```bash
 export OCTOAI_API_TOKEN="OCTOAI_API_TOKEN"
-python generate_question_answers.py --url=https://llama.meta.com/get-started/
+python generate_question_answers.py 
 ```
 
 **NOTE** You need to be aware of your  RPM (requests per minute), TPM (tokens per minute) and TPD (tokens per day), limit on your OpenAI account. In our case we had to process each document at a time. Then merge all the Q&A `json` files to make our dataset. We aimed for a specific number of Q&A pairs per document anywhere between 50-100. This is experimental and totally depends on your documents, wealth of information in them and how you prefer to handle question, short or longer answers etc.