# Ensure the required libraries are installed i.e.
!pip install sentence-transformers qdrant-client requests IPython

# Step 1: Import necessary modules

In [60]:
import os
import uuid
import re
from pathlib import Path
from sentence_transformers import SentenceTransformer, CrossEncoder
from qdrant_client import QdrantClient, models
from qdrant_client.models import SearchRequest
import requests
from IPython.display import Markdown, display
import json



## Step 2: Define Configuration and Global Variables

To use this example, follow these steps to configure your environment:

1.  **Set up an account with Llama**: You can use the LLAMA API key with a model like `Llama-4-Maverick-17B-128E-Instruct-FP8`. However, you're not limited to this; you can choose any other inference provider's endpoint and respective LLAMA models that suit your needs.
2.  **Choose a Llama model or alternative**: Select a suitable Llama model for inference, such as `Llama-4-Maverick-17B-128E-Instruct-FP8`, or explore other available LLAMA models from your chosen inference provider.
3.  **Create a Qdrant account**: Sign up for a Qdrant account and generate an access token.
4.  **Set up a Qdrant collection**: Use the provided script (`setup_qdrant_collection.py`) to create and populate a Qdrant collection.

For more information on setting up a Qdrant collection, refer to the `setup_qdrant_collection.py` script. This script demonstrates how to process files, split them into chunks, and store them in a Qdrant collection.

Once you've completed these steps, you can define your configuration variables as follows:

In [65]:
LLAMA_API_KEY = os.getenv("LLAMA_API_KEY") 
if not LLAMA_API_KEY:
    raise ValueError("LLAMA_API_KEY not found. Please set it as an environment variable.")
API_URL = "https://api.llama.com/v1/chat/completions"  # Replace with your chosen inference provider's API URL
HEADERS = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {LLAMA_API_KEY}"
}
LLAMA_MODEL = "Llama-4-Maverick-17B-128E-Instruct-FP8"  # Choose a suitable Llama model or replace with your preferred model
# Qdrant Configuration
QDRANT_URL = "add your existing qdrant URL"  # Replace with your Qdrant instance URL
QDRANT_API_KEY = os.getenv("QDRANT_API_KEY") # Load from environment variable
if not QDRANT_API_KEY:
    raise ValueError("QDRANT_API_KEY not found. Please set it as an environment variable.")
# The Qdrant collection to be queried. This should already exist.
MAIN_COLLECTION_NAME = "readme_blogs_latest"

## Step 3: Define Helper Functions

In this step, we'll define several helper functions that are used throughout the blog generation process. These functions include:

1.  **`get_qdrant_client`**: Returns a Qdrant client instance configured with your Qdrant URL and API key.
2.  **`query_qdrant`**: Queries Qdrant with hybrid search and reranking on a specified collection.

These helper functions simplify the code and make it easier to manage the Qdrant interaction. 


In [62]:
def get_qdrant_client():
    """
    Returns a Qdrant client instance.
    
    :return: QdrantClient instance

    """
    return QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)

def get_embedding_model():
    """Returns the SentenceTransformer embedding model."""
    return SentenceTransformer('all-MiniLM-L6-v2')

def query_qdrant(query: str, client: QdrantClient, collection_name: str, top_k: int = 5) -> list:
    """
    Query Qdrant with hybrid search and reranking on a specified collection.
    
    :param query: Search query
    :param client: QdrantClient instance
    :param collection_name: Name of the Qdrant collection
    :param top_k: Number of results to return (default: 5)
    :return: List of relevant chunks
    """
    embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
    query_embedding = embedding_model.encode(query).tolist()
    
    try:
        results = client.search(
            collection_name=collection_name,
            query_vector=query_embedding,
            limit=top_k*2
        )
    except Exception as e:
        print(f"Error during Qdrant search on collection '{collection_name}': {e}")
        return []
    
    if not results:
        print("No results found in Qdrant for the given query.")
        return []
    cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L6-v2')
    pairs = [(query, hit.payload["text"]) for hit in results]
    scores = cross_encoder.predict(pairs)
    
    sorted_results = [x for _, x in sorted(zip(scores, results), key=lambda pair: pair[0], reverse=True)]
    return sorted_results[:top_k]



## Step 4: Define the Main Blog Generation Function

The `generate_blog` function is the core of our blog generation process. It takes a topic as input and uses the following steps to generate a comprehensive blog post:

1.  **Retrieve relevant content**: Uses the `query_qdrant` function to retrieve relevant chunks from the Qdrant collection based on the input topic.
2.  **Construct a prompt**: Creates a prompt for the Llama model by combining the retrieved content with a system prompt and user input.
3.  **Generate the blog post**: Sends the constructed prompt to the Llama model via the chosen inference provider's API and retrieves the generated blog post.

This function orchestrates the entire blog generation process, making it easy to produce high-quality content based on your technical documentation.

In [63]:
def generate_blog(topic: str) -> str:
    """
    Generates a technical blog post based on a topic using RAG.
    
    :param topic: Topic for the blog post
    :return: Generated blog content
    """
    client = get_qdrant_client()
    relevant_chunks = query_qdrant(topic, client, MAIN_COLLECTION_NAME)
    
    if not relevant_chunks:
        error_message = "No relevant content found in the knowledge base. Cannot generate blog post."
        print(error_message)
        return error_message

    context = "\n".join([chunk.payload["text"] for chunk in relevant_chunks])
    system_prompt = f"""
    You are a technical writer specializing in creating comprehensive documentation-based blog posts. 
    Use the following context from technical documentation to write an in-depth blog post about {topic}.
    
    Requirements:
    1. Structure the blog with clear sections and subsections
    2. Include code structure and configuration details where relevant
    3. Explain architectural components using diagrams (describe in markdown)
    4. Add setup instructions and best practices
    5. Use technical terminology appropriate for developers
    
    Context:
    {context}
    """
    
    payload = {
        "model": LLAMA_MODEL,
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"Write a detailed technical blog post about {topic}"}
        ],
        "temperature": 0.5,
        "max_tokens": 4096
    }
    
    try:
        response = requests.post(API_URL, headers=HEADERS, json=payload)
        
        if response.status_code == 200:
            response_json = response.json()
            blog_content = response_json.get('completion_message', {}).get('content', {}).get('text', '')
            
            markdown_content = f"# {topic}\n\n{blog_content}"
            output_path = Path(f"{topic.replace(' ', '_')}_blog.md")
            with open(output_path, "w", encoding="utf-8") as f:
                f.write(markdown_content)
            
            print(f"Blog post generated and saved to {output_path}.")
            display(Markdown(markdown_content))
            return markdown_content
            
        else:
            error_message = f"Error: {response.status_code} - {response.text}"
            print(error_message)
            return error_message
    
    except Exception as e:
        error_message = f"An unexpected error occurred: {str(e)}"
        print(error_message)
        return error_message

## Step 5: Execute the Blog Generation Process

Now that we've defined the necessary functions, let's put them to use! To generate a blog post, simply call the `generate_blog` function with a topic of your choice.

For example:
```python
topic = "Building a Messenger Chatbot with Llama 3"
blog_content = generate_blog(topic)

In [None]:
# Specify the topic for the blog post
topic = "Building a Messenger Chatbot with Llama 3"
try:
    blog_content = generate_blog(topic)
    print(blog_content)
except Exception as e:
    print(e)

  results = client.search(


Blog post generated and saved to Building_a_Messenger_Chatbot_with_Llama_3_blog.md.


# Building a Messenger Chatbot with Llama 3

Building a Messenger Chatbot with Llama 3: A Step-by-Step Guide
===========================================================

### Introduction

In this blog post, we'll explore the process of building a Llama 3 enabled Messenger chatbot using the Messenger Platform. We'll cover the architecture, setup instructions, and best practices for integrating Llama 3 with the Messenger Platform.

### Overview of the Messenger Platform

The Messenger Platform is a powerful tool that allows businesses to connect with their customers through a Facebook business page. With the Messenger Platform, businesses can build chatbots that can respond to customer inquiries, provide support, and even offer personalized recommendations.

### Architecture of the Llama 3 Enabled Messenger Chatbot

The diagram below illustrates the components and overall data flow of the Llama 3 enabled Messenger chatbot demo.

```markdown
+---------------+
|  Facebook    |
|  Business Page  |
+---------------+
        |
        |  (User Message)
        v
+---------------+
|  Messenger    |
|  Platform      |
+---------------+
        |
        |  (Webhook Event)
        v
+---------------+
|  Web Server    |
|  (e.g., Amazon  |
|   EC2 instance)  |
+---------------+
        |
        |  (API Request)
        v
+---------------+
|  Llama 3       |
|  Model          |
+---------------+
        |
        |  (Generated Response)
        v
+---------------+
|  Web Server    |
|  (e.g., Amazon  |
|   EC2 instance)  |
+---------------+
        |
        |  (API Response)
        v
+---------------+
|  Messenger    |
|  Platform      |
+---------------+
        |
        |  (Bot Response)
        v
+---------------+
|  Facebook    |
|  Business Page  |
+---------------+
```

The architecture consists of the following components:

*   Facebook Business Page: The page where customers interact with the chatbot.
*   Messenger Platform: The platform that handles user messages and sends webhook events to the web server.
*   Web Server: The server that receives webhook events from the Messenger Platform, sends API requests to the Llama 3 model, and returns API responses to the Messenger Platform.
*   Llama 3 Model: The AI model that generates responses to user messages.

### Setting Up the Messenger Chatbot

To set up the Messenger chatbot, follow these steps:

1.  **Create a Facebook Business Page**: Create a Facebook business page for your business.
2.  **Create a Facebook Developer Account**: Create a Facebook developer account and register your application.
3.  **Set Up the Messenger Platform**: Set up the Messenger Platform for your application and configure the webhook settings.
4.  **Set Up the Web Server**: Set up a web server (e.g., Amazon EC2 instance) to receive webhook events from the Messenger Platform.
5.  **Integrate with Llama 3**: Integrate the Llama 3 model with your web server to generate responses to user messages.

### Configuring the Webhook

To configure the webhook, follow these steps:

1.  Go to the Facebook Developer Dashboard and navigate to the Messenger Platform settings.
2.  Click on "Webhooks" and then click on "Add Subscription".
3.  Enter the URL of your web server and select the "messages" and "messaging_postbacks" events.
4.  Verify the webhook by clicking on "Verify" and entering the verification token.

### Handling Webhook Events

To handle webhook events, you'll need to write code that processes the events and sends API requests to the Llama 3 model. Here's an example code snippet in Python:
```python
import os
import json
from flask import Flask, request
import requests

app = Flask(__name__)

# Llama 3 API endpoint
LLAMA_API_ENDPOINT = os.environ['LLAMA_API_ENDPOINT']

# Verify the webhook
@app.route('/webhook', methods=['GET'])
def verify_webhook():
    mode = request.args.get('mode')
    token = request.args.get('token')
    challenge = request.args.get('challenge')

    if mode == 'subscribe' and token == 'YOUR_VERIFY_TOKEN':
        return challenge
    else:
        return 'Invalid request', 403

# Handle webhook events
@app.route('/webhook', methods=['POST'])
def handle_webhook():
    data = request.get_json()
    if data['object'] == 'page':
        for entry in data['entry']:
            for messaging_event in entry['messaging']:
                if messaging_event.get('message'):
                    # Get the user message
                    user_message = messaging_event['message']['text']

                    # Send API request to Llama 3 model
                    response = requests.post(LLAMA_API_ENDPOINT, json={'prompt': user_message})

                    # Get the generated response
                    generated_response = response.json()['response']

                    # Send API response back to Messenger Platform
                    send_response(messaging_event['sender']['id'], generated_response)

    return 'OK', 200

# Send response back to Messenger Platform
def send_response(recipient_id, response):
    # Set up the API endpoint and access token
    endpoint = f'https://graph.facebook.com/v13.0/me/messages?access_token={os.environ["PAGE_ACCESS_TOKEN"]}'

    # Set up the API request payload
    payload = {
        'recipient': {'id': recipient_id},
        'message': {'text': response}
    }

    # Send the API request
    requests.post(endpoint, json=payload)

if __name__ == '__main__':
    app.run(debug=True)
```

### Best Practices

Here are some best practices to keep in mind when building a Messenger chatbot with Llama 3:

*   **Test thoroughly**: Test your chatbot thoroughly to ensure that it responds correctly to user messages.
*   **Use a robust web server**: Use a robust web server that can handle a high volume of webhook events.
*   **Implement error handling**: Implement error handling to handle cases where the Llama 3 model fails to generate a response.
*   **Monitor performance**: Monitor the performance of your chatbot to ensure that it's responding quickly to user messages.

### Conclusion

Building a Messenger chatbot with Llama 3 is a powerful way to provide customer support and improve customer experience. By following the steps outlined in this blog post, you can build a chatbot that responds to user messages and provides personalized recommendations. Remember to test thoroughly, use a robust web server, implement error handling, and monitor performance to ensure that your chatbot is successful.

# Building a Messenger Chatbot with Llama 3

Building a Messenger Chatbot with Llama 3: A Step-by-Step Guide

### Introduction

In this blog post, we'll explore the process of building a Llama 3 enabled Messenger chatbot using the Messenger Platform. We'll cover the architecture, setup instructions, and best practices for integrating Llama 3 with the Messenger Platform.

### Overview of the Messenger Platform

The Messenger Platform is a powerful tool that allows businesses to connect with their customers through a Facebook business page. With the Messenger Platform, businesses can build chatbots that can respond to customer inquiries, provide support, and even offer personalized recommendations.

### Architecture of the Llama 3 Enabled Messenger Chatbot

The diagram below illustrates the components and overall data flow of the Llama 3 enabled Messenger chatbot demo.

```markdown
+---------------+
|  Facebook    |
|  Business Page  |
+---------------+
        |
        |  (User Message