{ "cells": [ { "cell_type": "markdown", "id": "42a6fd1b", "metadata": {}, "source": [ "# Generating documentation for an entire codebase" ] }, { "cell_type": "markdown", "id": "72c37e61", "metadata": {}, "source": [ "*Copyright (c) Meta Platforms, Inc. and affiliates.\n", "This software may be used and distributed according to the terms of the Llama Community License Agreement.*" ] }, { "cell_type": "markdown", "id": "352a1d17", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "36f56eac-5824-4b4d-8231-ca1d9a792cfc", "metadata": { "vscode": { "languageId": "raw" } }, "source": [ "This tutorial shows you how to build an automated documentation generator for source code repositories. Using Llama 4 Scout, you'll create a \"Repo2Docs\" system that analyzes an entire codebase and produces a comprehensive README with architectural diagrams and component summaries.\n", "\n", "While traditional documentation tools require manual annotation or simple extraction, this approach uses Llama 4's large context window and code understanding capabilities to generate meaningful, contextual documentation that explains not just what the code does, but how components work together.\n", "\n", "## What you will learn\n", "\n", "- **Build a multi-stage AI pipeline** that performs progressive analysis, from individual files to the complete architecture.\n", "- **Leverage Llama 4 Scout's large context window** to analyze entire source files and repositories without complex chunking strategies.\n", "- **Use the Meta Llama API** to access Llama 4 models.\n", "- **Generate production-ready documentation**, including Mermaid diagrams that visualize your repository's architecture.\n", "\n", "| Component | Choice | Why |\n", "|:----------|:-------|:----|\n", "| **Model** | Llama 4 Scout | Large context window (up to 10M tokens) and Mixture-of-Experts (MoE) architecture for efficient, high-quality analysis. |\n", "| **Infrastructure** | Meta Llama API | Provides serverless, production-ready access to Llama 4 models using the `llama_api_client` SDK. |\n", "| **Architecture** | Progressive Pipeline | Deconstructs the complex task of repository analysis into manageable, sequential stages for scalability and efficiency. |\n", "---\n", "\n", "**Note on Inference Providers:** This tutorial uses the Llama API for demonstration purposes. However, you can run Llama 4 models with any preferred inference provider. Common examples include [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html) and [Together AI](https://together.ai/llama). The core logic of this tutorial can be adapted to any of these providers." ] }, { "cell_type": "markdown", "id": "ebe11503-7483-4bd4-a0fa-ed6d75e70c59", "metadata": {}, "source": [ "## Problem: Documentation debt\n", "\n", "Documentation debt is a persistent challenge in software development. As codebases evolve, manual documentation efforts often fall behind, leading to outdated, inconsistent, or missing information. This slows down developer onboarding and makes maintenance more difficult.\n", "\n", "## Solution: An automated documentation pipeline\n", "\n", "This tutorial's solution is a multi-stage pipeline that systematically analyzes a repository to produce a comprehensive `README.md` file. The system works by progressively analyzing your repository in multiple stages:\n", "\n", "```mermaid\n", "flowchart LR\n", " A[GitHub Repo] --> B[Step 1: File Analysis]\n", " B --> C[Step 2:
Repository Overview]\n", " C --> D[Step 3:
Architecture Analysis]\n", " D --> E[Step 4: Final README]\n", "```\n", "\n", "By breaking down the complex task of repository analysis into manageable stages, you can process repositories of any size efficiently. The large context window of Llama 4 Scout is sufficient to analyze entire source files without complex chunking strategies, resulting in high-quality documentation that captures both fine-grained details and architectural patterns." ] }, { "cell_type": "markdown", "id": "36a4c1eb-5b32-4328-9f23-566e07c5abc7", "metadata": { "vscode": { "languageId": "raw" } }, "source": [ "## Prerequisites\n", "\n", "Before you begin, ensure you have a Llama API key. If you do not have a Llama API key, please get one from [Meta Llama API](https://llama.developer.meta.com/).\n", "\n", "Remember, we use the Llama API for this tutorial, but you can adapt this section to use your preferred inference provider.\n", "\n", "## Install dependencies\n", "\n", "You will need a few libraries for this project: `tiktoken` for accurate token counting, `tqdm` for progress bars, and the official `llama-api-client`." ] }, { "cell_type": "code", "execution_count": 1, "id": "76934968-7e45-4604-bc05-8f6cda19f20f", "metadata": {}, "outputs": [], "source": [ "# Install dependencies\n", "!pip install --quiet tiktoken llama-api-client tqdm" ] }, { "cell_type": "markdown", "id": "ab850db6-9f0c-4633-b874-edd7b86fe5d1", "metadata": {}, "source": [ "## Imports & Llama API client setup\n", "\n", "Import the necessary modules and initialize the `LlamaAPIClient`. This requires a Llama API key to be available as an environment variable." ] }, { "cell_type": "code", "execution_count": 2, "id": "93135cfe-ead0-4390-8631-259834c9b988", "metadata": {}, "outputs": [], "source": [ "import os, sys, re\n", "import tempfile\n", "import textwrap\n", "import urllib.request\n", "import zipfile\n", "from pathlib import Path\n", "from typing import Dict, List, Tuple\n", "from urllib.parse import urlparse\n", "import json\n", "import pprint\n", "from tqdm import tqdm\n", "import boto3\n", "import tiktoken\n", "from llama_api_client import LlamaAPIClient\n", "\n", "# --- Llama client ---\n", "API_KEY = os.getenv(\"LLAMA_API_KEY\")\n", "if not API_KEY:\n", " sys.exit(\"āŒ Please set the LLAMA_API_KEY environment variable.\")\n", "\n", "client = LlamaAPIClient(api_key=API_KEY)" ] }, { "cell_type": "markdown", "id": "fae9842e-10c8-4fc0-a118-429c786fb63a", "metadata": { "vscode": { "languageId": "raw" } }, "source": [ "### Model Selection\n", "\n", "For this tutorial, you'll use **Llama 4 Scout**. Its large context window is well-suited for ingesting and analyzing entire source code files, which is a key requirement for this use case. While Llama 4 Scout supports up to 10M tokens, the Llama API currently supports 128k tokens." ] }, { "cell_type": "code", "execution_count": 3, "id": "5d363e30-94c8-4420-a438-9c98f45585d0", "metadata": {}, "outputs": [], "source": [ "# --- Constants & Configuration ---\n", "LLM_MODEL = \"Llama-4-Scout-17B-16E-Instruct-FP8\"\n", "CTX_WINDOW = 128000 # Context window for Llama API" ] }, { "cell_type": "markdown", "id": "73eb5cfb-585c-4acb-b8b5-f6a5bb8eb271", "metadata": { "vscode": { "languageId": "raw" } }, "source": [ "## Step 1: Download the repository\n", "\n", "First, you'll download the target repository. This tutorial analyzes the official [Meta Llama repository](https://github.com/facebookresearch/llama), but you can adapt it to any public GitHub repository.\n", "\n", "The code downloads the repository as ZIP archive (faster than git clone, avoids .git metadata) and extracts to a temporary directory for isolated processing." ] }, { "cell_type": "code", "execution_count": null, "id": "f99c109e-d084-4efd-bdd0-a12ee02a3d9d", "metadata": {}, "outputs": [], "source": [ "REPO_URL = \"https://github.com/facebookresearch/llama\"\n", "BRANCH_NAME = \"main\" # The default branch to download" ] }, { "cell_type": "code", "execution_count": null, "id": "2e99d1b2-251b-47c3-81d5-3b02a4863684", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "šŸ“„ Downloading repository from https://github.com/facebookresearch/llama/archive/refs/heads/main.zip...\n", "šŸ“¦ Extracting files...\n", "āœ… Extracted to: /var/folders/sz/kf8w7j1x1v790jxs8k2gl72c0000gn/T/tmptwo_kdt5/llama-main\n" ] } ], "source": [ "base_url = REPO_URL.rstrip(\"/\").removesuffix(\".git\")\n", "repo_zip_url = f\"{base_url}/archive/refs/heads/{BRANCH_NAME}.zip\"\n", "\n", "# Create a temporary directory to work in\n", "tmpdir_obj = tempfile.TemporaryDirectory()\n", "tmpdir = Path(tmpdir_obj.name)\n", "\n", "# Download the repository ZIP file\n", "zip_path = tmpdir / \"repo.zip\"\n", "print(f\"šŸ“„ Downloading repository from {repo_zip_url}...\")\n", "urllib.request.urlretrieve(repo_zip_url, zip_path)\n", "\n", "# Extract the archive\n", "print(\"šŸ“¦ Extracting files...\")\n", "with zipfile.ZipFile(zip_path, 'r') as zf:\n", " zf.extractall(tmpdir)\n", "extracted_root = next(p for p in tmpdir.iterdir() if p.is_dir())\n", "print(f\"āœ… Extracted to: {extracted_root}\")" ] }, { "cell_type": "markdown", "id": "01474e16-f15c-4deb-bee9-1506b56153fe", "metadata": { "vscode": { "languageId": "raw" } }, "source": [ "## Step 2: Analyze individual files\n", "\n", "In this step, you'll generate a concise summary for each relevant file in the repository. This is the first step in the progressive analysis pipeline.\n", "\n", "**File selection strategy**: To ensure the analysis is both comprehensive and efficient, you'll selectively process files based on their extension and name (`should_include_file`). This avoids summarizing binary files, build artifacts, or other content that is not relevant to documentation.\n", "\n", "The list below provides a general-purpose starting point, but you should customize it for your target repository. For a large project, consider what file types contain the most meaningful source code and configuration, and start with those." ] }, { "cell_type": "code", "execution_count": 6, "id": "304bb4f3-aea8-4b19-83b9-9f8329db74e4", "metadata": {}, "outputs": [], "source": [ "# Allowlist of file extensions to summarize\n", "INCLUDE_EXTENSIONS = {\n", " \".py\", # Python\n", " \".js\", \".jsx\", \".ts\", \".tsx\", # JS/Typescript\n", " \".md\", \".txt\", # Text\n", " \".json\", \".yaml\", \".yml\", \".toml\", # Config\n", " \".sh\", \".css\", \".html\",\n", "}\n", "INCLUDE_FILENAMES = {\"Dockerfile\", \"Makefile\"} # Common files without extension\n", "\n", "def should_include_file(file_path: Path, extracted_root: Path) -> bool:\n", " \"\"\"Checks if a file should be included for documentation based on its path and type.\"\"\"\n", " \n", " if not file_path.is_file(): # Must be a file.\n", " return False\n", "\n", " rel_path = file_path.relative_to(extracted_root)\n", " if any(part.startswith('.') for part in rel_path.parts): # Exclude hidden files/folders.\n", " return False\n", "\n", " if ( # Must be in our allow-list of extensions or filenames.\n", " file_path.suffix.lower() in INCLUDE_EXTENSIONS\n", " or file_path.name in INCLUDE_FILENAMES\n", " ):\n", " return True\n", "\n", " return False" ] }, { "cell_type": "markdown", "id": "2b1fbfc8-0347-41bc-af26-af80e22a3de7", "metadata": { "vscode": { "languageId": "raw" } }, "source": [ "**Prompt strategy for file summaries**: The prompt for this phase instructs Llama 4 to elicit summaries that focus on a file's purpose and its role within the project, rather than a line-by-line description of its implementation. This is a critical step for generating a high-level, conceptual understanding of the codebase." ] }, { "cell_type": "code", "execution_count": null, "id": "fe341e12-6e4d-4e7f-9a6e-a0fb57d6c795", "metadata": {}, "outputs": [], "source": [ "MAX_COMPLETION_TOKENS_FILE = 400 # Max tokens for file summary\n", "# To keep this tutorial straightforward, we'll skip files larger than 1MB.\n", "# For a production system, you might implement a chunking strategy for large files.\n", "MAX_FILE_SIZE = 1_000_000\n", "\n", "def summarize_file_content(file_path: str, file_content: str) -> str:\n", " \"\"\"Summarizes the content of a single file.\"\"\"\n", " sys_prompt = (\n", " \"You are a senior software engineer creating a concise summary of a \"\n", " \"source file for a project's README.md.\"\n", " )\n", " user_prompt = textwrap.dedent(\n", " f\"\"\"\\\n", " Please summarize the following file: `{file_path}`.\n", "\n", " The summary should be a **concise paragraph** (around 40-60 words) that \n", " explains the file's primary purpose, its main functions or classes, and how \n", " it fits into the broader project. Focus on the *what* and *why*, not a \n", " line-by-line explanation of the *how*.\n", "\n", " ```\n", " {file_content}\n", " ```\n", " \"\"\"\n", " )\n", " try:\n", " resp = client.chat.completions.create(\n", " model=LLM_MODEL,\n", " messages=[\n", " {\"role\": \"system\", \"content\": sys_prompt},\n", " {\"role\": \"user\", \"content\": user_prompt},\n", " ],\n", " temperature=0.1, # Low temperature for deterministic summaries\n", " max_tokens=MAX_COMPLETION_TOKENS_FILE,\n", " )\n", " return resp.completion_message.content.text\n", " except Exception as e:\n", " print(f\" Error summarizing file: {e}\")\n", " return \"\" # Return empty string on failure" ] }, { "cell_type": "code", "execution_count": null, "id": "f420b225-5abf-46ed-ad3a-a333ae59e3ad", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Summarizing individual files ---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "šŸ” Summarising files: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 22/22 [00:28<00:00, 1.29s/file]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "āœ… Summarized 15 files.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# --- Summarize relevant files ---\n", "print(\"\\n--- Summarizing individual files ---\")\n", "file_summaries: Dict[str, str] = {}\n", "files_to_process = list(extracted_root.rglob(\"*\"))\n", "\n", "for file_path in tqdm(files_to_process, desc=\"šŸ” Summarizing files\", unit=\"file\"):\n", " # First, check if the file type is one we want to process.\n", " if (\n", " not should_include_file(file_path, extracted_root) # valid file for summarization\n", " or file_path.stat().st_size > MAX_FILE_SIZE\n", " or file_path.stat().st_size == 0\n", " ):\n", " continue\n", "\n", " rel_name = str(file_path.relative_to(extracted_root))\n", " try:\n", " text = file_path.read_text(encoding=\"utf-8\")\n", " except UnicodeDecodeError:\n", " continue\n", " \n", " if not text.strip():\n", " continue\n", " \n", " # With a large context window, we can summarize the whole file at once.\n", " summary = summarize_file_content(rel_name, text)\n", " if summary:\n", " file_summaries[rel_name] = summary\n", "\n", "print(f\"āœ… Summarized {len(file_summaries)} files.\")" ] }, { "cell_type": "code", "execution_count": 9, "id": "5e777b23-7271-4768-bf31-5807528b7151", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'CODE_OF_CONDUCT.md': 'The `CODE_OF_CONDUCT.md` file outlines the expected '\n", " 'behavior and standards for contributors and '\n", " 'maintainers of the project, aiming to create a '\n", " 'harassment-free and welcoming environment. It defines '\n", " 'acceptable and unacceptable behavior, roles and '\n", " 'responsibilities, and procedures for reporting and '\n", " 'addressing incidents, promoting a positive and '\n", " 'inclusive community.',\n", " 'CONTRIBUTING.md': 'Here is a concise summary of the `CONTRIBUTING.md` file:\\n'\n", " '\\n'\n", " 'The `CONTRIBUTING.md` file outlines the guidelines and '\n", " 'processes for contributing to the Llama project. It '\n", " 'provides instructions for submitting pull requests, '\n", " 'including bug fixes, improvements, and new features, as '\n", " 'well as information on the Contributor License Agreement, '\n", " 'issue tracking, and licensing terms, to ensure a smooth '\n", " 'and transparent contribution experience.',\n", " 'MODEL_CARD.md': 'The `MODEL_CARD.md` file provides detailed information '\n", " 'about the Llama 2 family of large language models (LLMs), '\n", " 'including model architecture, training data, performance '\n", " 'evaluations, and intended use cases. It serves as a '\n", " \"comprehensive model card, outlining the model's \"\n", " 'capabilities, limitations, and responsible use guidelines '\n", " 'for developers and researchers.',\n", " 'README.md': 'This `README.md` file serves as a deprecated repository for '\n", " 'Llama 2, a large language model, providing minimal examples for '\n", " 'loading models and running inference. It directs users to new, '\n", " 'consolidated repositories for Llama 3.1 and offers guidance on '\n", " 'downloading models, quick start instructions, and responsible '\n", " 'use guidelines.',\n", " 'UPDATES.md': 'Here is a concise summary of the `UPDATES.md` file:\\n'\n", " '\\n'\n", " 'The `UPDATES.md` file documents recent updates to the project, '\n", " 'specifically addressing issues with system prompts and token '\n", " 'sanitization. Updates aim to reduce false refusal rates and '\n", " 'prevent prompt injection attacks, enhancing model safety and '\n", " 'security. Changes include removing default system prompts and '\n", " 'sanitizing user-provided prompts to mitigate abuse.',\n", " 'USE_POLICY.md': 'Here is a concise summary of the `USE_POLICY.md` file:\\n'\n", " '\\n'\n", " 'The Llama 2 Acceptable Use Policy outlines the guidelines '\n", " 'for safe and responsible use of the Llama 2 tool. It '\n", " 'prohibits uses that violate laws, harm individuals or '\n", " 'groups, or facilitate malicious activities, and requires '\n", " 'users to report any policy violations, bugs, or concerns to '\n", " 'designated channels.',\n", " 'download.sh': 'The `download.sh` script downloads Llama 2 models and '\n", " 'associated files from a provided presigned URL. It prompts '\n", " 'for a URL and optional model sizes, then downloads the '\n", " 'models, tokenizer, LICENSE, and usage policy to a target '\n", " 'folder, verifying checksums for integrity.',\n", " 'example_chat_completion.py': 'This file, `example_chat_completion.py`, '\n", " 'demonstrates how to use a pretrained Llama '\n", " 'model for generating text in a conversational '\n", " 'setting. It defines a `main` function that '\n", " 'takes in model checkpoints, tokenizer paths, '\n", " 'and generation parameters, and uses them to '\n", " 'generate responses to a set of predefined '\n", " 'dialogs. The file serves as an example for '\n", " 'chat completion tasks in the broader project.',\n", " 'example_text_completion.py': 'This file, `example_text_completion.py`, '\n", " 'demonstrates text generation using a '\n", " 'pretrained Llama model. The `main` function '\n", " 'initializes the model, generates text '\n", " 'completions for a set of prompts, and prints '\n", " \"the results. It showcases the model's \"\n", " 'capabilities in natural language continuation '\n", " 'and translation tasks, serving as an example '\n", " 'for integrating Llama into broader projects.',\n", " 'llama/__init__.py': 'The `llama/__init__.py` file serves as the entry point '\n", " 'for the Llama project, exposing key classes and '\n", " 'modules. It imports and makes available the main '\n", " '`Llama` and `Dialog` generation classes, `ModelArgs` '\n", " 'and `Transformer` model components, and the `Tokenizer` '\n", " \"class, providing a foundation for the project's \"\n", " 'functionality.',\n", " 'llama/generation.py': 'The `llama/generation.py` file contains the core '\n", " 'logic for text generation using the Llama model. It '\n", " 'defines the `Llama` class, which provides methods for '\n", " 'building a model instance, generating text '\n", " 'completions, and handling conversational dialogs. The '\n", " 'class supports features like nucleus sampling, log '\n", " 'probability computation, and special token handling.',\n", " 'llama/model.py': 'The `llama/model.py` file defines a Transformer-based '\n", " 'model architecture, specifically the Llama model. It '\n", " 'includes key components such as RMSNorm, attention '\n", " 'mechanisms, feedforward layers, and a Transformer block, '\n", " 'which are combined to form the overall model. The model is '\n", " 'designed for efficient and scalable training and '\n", " 'inference.',\n", " 'llama/tokenizer.py': 'The `llama/tokenizer.py` file implements a tokenizer '\n", " 'class using SentencePiece, enabling text tokenization '\n", " 'and encoding/decoding. The `Tokenizer` class loads a '\n", " 'SentencePiece model, providing `encode` and `decode` '\n", " 'methods for converting text to token IDs and vice '\n", " 'versa, with optional BOS and EOS tokens.',\n", " 'requirements.txt': 'Here is a concise summary of the `requirements.txt` '\n", " 'file:\\n'\n", " '\\n'\n", " 'The `requirements.txt` file specifies the dependencies '\n", " 'required to run the project. It lists essential '\n", " 'libraries, including PyTorch, Fairscale, Fire, and '\n", " 'SentencePiece, which provide core functionality for the '\n", " 'project. This file ensures that all necessary packages '\n", " \"are installed, enabling the project's features and \"\n", " 'functionality to work as intended.',\n", " 'setup.py': 'The `setup.py` file is a build script that packages and '\n", " 'distributes the project. Its primary purpose is to define '\n", " 'project metadata and dependencies. It uses `setuptools` to find '\n", " 'and include packages, and loads required libraries from '\n", " '`requirements.txt`, enabling easy installation and setup of the '\n", " 'project.'}\n" ] } ], "source": [ "pprint.pprint(file_summaries)" ] }, { "cell_type": "markdown", "id": "0796a5c9-0f08-465e-9ec3-cb12f3426248", "metadata": {}, "source": [ "## Step 3: Create repository overview\n", "\n", "After summarizing each file, the next step is to synthesize this information into a high-level repository overview. This overview provides a starting point for a user to understand the project's purpose and structure.\n", "\n", "You'll prompt Llama 4 to generate three key sections based on the file summaries from the previous step:\n", "1. **Project Overview**: A short, descriptive paragraph that explains the repository's main purpose.\n", "2. **Key Components**: A bulleted list of the most important files, providing a quick look at the core logic.\n", "3. **Getting Started**: A brief instruction on how to install dependencies and run the project.\n", "\n", "This prompt leverages the previously generated file summaries as context, enabling the model to create an accurate and cohesive overview without re-analyzing the raw source code." ] }, { "cell_type": "code", "execution_count": null, "id": "6b57e502-0bb4-41f3-bc90-1986a2118c4a", "metadata": {}, "outputs": [], "source": [ "MAX_COMPLETION_TOKENS_REPO = 600 # Max tokens for repo overview\n", "\n", "def build_repo_overview(file_summaries: Dict[str, str]) -> str:\n", " \"\"\"Creates the high-level Overview and Key Components sections.\"\"\"\n", " bullets = \"\\n\".join(f\"- **{n}**: {s}\" for n, s in file_summaries.items())\n", " sys_prompt = (\n", " \"You are an expert technical writer. Draft a high-level overview \"\n", " \"for the root of a README.md.\"\n", " )\n", " user_prompt = textwrap.dedent(\n", " f\"\"\"\\\n", " Below is a list of source files with their summaries.\n", "\n", " 1. Write an **'Overview'** section (ā‰ˆ3-4 sentences) explaining the purpose of the repository.\n", " 2. Follow it with a **'Key Components'** bullet list (max 6 bullets) referencing the files.\n", " 3. Close with a short 'Getting Started' hint: `pip install -r requirements.txt` etc.\n", "\n", " ---\n", " FILE SUMMARIES\n", " {bullets}\n", " \"\"\"\n", " )\n", " try:\n", " resp = client.chat.completions.create(\n", " model=LLM_MODEL,\n", " messages=[\n", " {\"role\": \"system\", \"content\": sys_prompt},\n", " {\"role\": \"user\", \"content\": user_prompt},\n", " ],\n", " temperature=0.1,\n", " max_tokens=MAX_COMPLETION_TOKENS_REPO,\n", " )\n", " return resp.completion_message.content.text\n", " except Exception as e:\n", " print(f\" Error creating repo overview: {e}\")\n", " return \"\"" ] }, { "cell_type": "code", "execution_count": 11, "id": "c92ac3d4-dbc7-4577-bf75-2b56072daba3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Building high-level repository overview ---\n", "āœ… Overview created.\n" ] } ], "source": [ "# --- Create High-Level Repo Overview ---\n", "print(\"\\n--- Building high-level repository overview ---\")\n", "repo_overview = build_repo_overview(file_summaries)\n", "print(\"āœ… Overview created.\")" ] }, { "cell_type": "code", "execution_count": 12, "id": "a15aaa25-440c-4901-a677-ac5452d8775d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Here is a high-level overview for the root of a README.md:\n", "\n", "## Overview\n", "\n", "This repository provides a comprehensive framework for utilizing the Llama large language model, including model architecture, training data, and example usage. The project aims to facilitate the development of natural language processing applications, while promoting responsible use and community engagement. By providing a range of tools and resources, this repository enables developers and researchers to explore the capabilities and limitations of the Llama model. The repository is structured to support easy integration, modification, and extension of the model.\n", "\n", "## Key Components\n", "\n", "* **llama/generation.py**: Core logic for text generation using the Llama model\n", "* **llama/model.py**: Transformer-based model architecture definition\n", "* **llama/tokenizer.py**: Tokenizer class using SentencePiece for text encoding and decoding\n", "* **example_text_completion.py**: Example usage of the Llama model for text completion tasks\n", "* **example_chat_completion.py**: Example usage of the Llama model for conversational tasks\n", "* **requirements.txt**: Dependency specifications for project setup and installation\n", "\n", "## Getting Started\n", "\n", "To get started with this project, run `pip install -r requirements.txt` to install the required dependencies. You can then explore the example usage files, such as `example_text_completion.py` and `example_chat_completion.py`, to learn more about integrating the Llama model into your projects.\n" ] } ], "source": [ "print(repo_overview)" ] }, { "cell_type": "markdown", "id": "824f4439-b1d5-4c44-833a-573cfbc03ee7", "metadata": {}, "source": [ "## Step 4: Analyze repository architecture\n", "\n", "A high-level overview is useful, but a deep architectural understanding requires analyzing how components interact. This phase generates that deeper analysis.\n", "\n", "### Two-step approach to architecture analysis\n", "\n", "Analyzing an entire codebase for architectural patterns is complex. Instead of passing all the code to the model at once, you'll use a more strategic, two-step approach that mirrors how a human architect would work:\n", "\n", "1. **AI-driven file selection**: First, you use Llama 4 to identify the most architecturally significant files. The model is prompted to select files that represent the core logic, primary entry points, or key data structures, based on the summaries generated earlier. This step efficiently filters the codebase down to its most critical components.\n", "2. **Deep-dive analysis**: With the key files identified, you perform a much deeper analysis. While only the full source code of these selected files is provided, the model also receives the summaries of *all* files generated in the first step. This ensures it has broad, high-level context on the entire repository when it performs its deep analysis.\n", "\n", "This two-step process is highly effective because it focuses the model's analytical power on the most important parts of the code, enabling it to generate high-quality architectural insights that are difficult to achieve with a less focused approach." ] }, { "cell_type": "code", "execution_count": null, "id": "d107a46a-8cc3-4cfd-bfc3-3847076bf523", "metadata": {}, "outputs": [], "source": [ "def select_important_files(file_summaries: Dict[str, str]) -> List[str]:\n", " \"\"\"Uses an LLM to select the most architecturally significant files.\"\"\"\n", " bullets = \"\\n\".join(f\"- **{n}**: {s}\" for n, s in file_summaries.items())\n", " sys_prompt = (\n", " \"You are a senior software architect. Your task is to identify the \"\n", " \"most critical files for understanding a repository's architecture.\"\n", " )\n", " user_prompt = textwrap.dedent(\n", " f\"\"\"\\\n", " Based on the following file summaries, identify the most architecturally\n", " significant files. These files should represent the core logic,\n", " primary entry points, or key data structures of the project.\n", "\n", " Your response MUST be a comma-separated list of file paths, ordered from\n", " most to least architecturally significant. Do not add any other text.\n", " Please ensure that the file paths exactly match the file summaries \n", " below.\n", " Example: `README.md`,`src/main.py,src/utils.py,src/models.py`\n", "\n", " ---\n", " FILE SUMMARIES\n", " {bullets}\n", " \"\"\"\n", " )\n", " \n", " try:\n", " resp = client.chat.completions.create(\n", " model=LLM_MODEL,\n", " messages=[\n", " {\"role\": \"system\", \"content\": sys_prompt},\n", " {\"role\": \"user\", \"content\": user_prompt},\n", " ],\n", " temperature=0.1,\n", " )\n", " response = resp.completion_message.content.text\n", " \n", " # Parse the comma-separated list.\n", " if response:\n", " # Clean up the response to handle potential markdown code blocks\n", " cleaned_response = (response.strip()\n", " .removeprefix(\"```\")\n", " .removesuffix(\"```\")\n", " .strip())\n", " return [f.strip() for f in cleaned_response.split(',') if f.strip()]\n", " except Exception as e:\n", " print(f\" Error selecting important files: {e}\")\n", " return []" ] }, { "cell_type": "code", "execution_count": 14, "id": "22ad831e-0b3d-405c-9749-b9df65c2f4c3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Selecting important files for deep analysis ---\n", "āœ… LLM selected 6 files for analysis: ['llama/generation.py', 'llama/model.py', 'llama/__init__.py', 'llama/tokenizer.py', 'example_text_completion.py', 'example_chat_completion.py']\n" ] } ], "source": [ "print(\"\\n--- Selecting important files for deep analysis ---\")\n", "important_files = select_important_files(file_summaries)\n", "if important_files:\n", " print(f\"āœ… LLM selected {len(important_files)} files for analysis: \"\n", " f\"{important_files}\")\n", "else:\n", " print(\"ā„¹ļø No files were selected for architectural analysis.\")" ] }, { "cell_type": "code", "execution_count": 15, "id": "efa951e9-8584-4ac5-9d41-639faef9c5a4", "metadata": {}, "outputs": [], "source": [ "def token_estimate(text: str) -> int:\n", " \"\"\"Estimates the token count of a text string using tiktoken.\"\"\"\n", " enc = tiktoken.get_encoding(\"o200k_base\")\n", " return len(enc.encode(text))" ] }, { "cell_type": "markdown", "id": "68e91354", "metadata": {}, "source": [ "**Managing context for large repositories**\n", "\n", "In large repositories, the combined size of important files can still exceed the model's context window. The code below uses a simple budgeting strategy: it collects file contents until a token limit is reached, ensuring the request doesn't fail.\n", "\n", "For a production-grade system, a more sophisticated approach is recommended. For example, you could include the full content of the most critical files that fit, and supplement this with summaries of other important files to stay within the context limit." ] }, { "cell_type": "code", "execution_count": null, "id": "324a3a67-7f06-437e-a917-38eba6fba738", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Step 5: Retrieving code for 6 selected files ---\n", "āœ… Retrieved content of 6 files for deep analysis.\n" ] } ], "source": [ "# --- Get code for selected files ---\n", "# The files are processed in order of importance as determined by the LLM, so\n", "# that the most critical files are most likely to be included if we hit the\n", "# context window budget.\n", "snippets: List[Tuple[str, str]] = []\n", "if important_files:\n", " print(f\"\\n--- Step 5: Retrieving code for {len(important_files)} \"\n", " f\"selected files ---\")\n", " tokens_used = 0\n", " for file_name in important_files:\n", " # It's possible the model returns paths with leading/trailing whitespace\n", " file_name = file_name.strip()\n", "\n", " fp = extracted_root / file_name\n", " if not fp.is_file():\n", " print(f\"āš ļø Selected path '{file_name}' is not a file, skipping.\")\n", " continue\n", "\n", " try:\n", " # Limit file size to avoid huge token counts for single files\n", " code = fp.read_text(encoding=\"utf-8\")[:20_000]\n", " except UnicodeDecodeError:\n", " continue\n", "\n", " token_count = token_estimate(code)\n", "\n", " # Reserve half of the context window for summaries and other prompt text\n", " if tokens_used + token_count > (CTX_WINDOW // 2):\n", " print(f\"āš ļø Context window budget reached. Stopping at \"\n", " f\"{len(snippets)} files.\")\n", " break\n", "\n", " snippets.append((file_name, code))\n", " tokens_used += token_count\n", "\n", " print(f\"āœ… Retrieved content of {len(snippets)} files for deep analysis.\")" ] }, { "cell_type": "markdown", "id": "84837f2e-46e8-4735-8ab3-fcf1ba3ee49e", "metadata": {}, "source": [ "**Deep Analysis Process**: Include full source code of selected files in context to generate:\n", "- Mermaid class diagrams\n", "- Component relationships \n", "- Architectural patterns\n", "- README-ready documentation" ] }, { "cell_type": "code", "execution_count": null, "id": "69f5b5d1-e95c-4c9d-bf87-d68f4c3cf63a", "metadata": {}, "outputs": [], "source": [ "# --- Cross-File Architectural Reasoning Function ---\n", "MAX_COMPLETION_TOKENS_ARCH = 900 # Max tokens for architecture overview\n", "\n", "def build_architecture(\n", " file_summaries: Dict[str, str], \n", " code_snippets: List[Tuple[str, str]], \n", " ctx_budget: int\n", ") -> str:\n", " \"\"\"Produces an Architecture & Key Concepts section using the large model.\"\"\"\n", " summary_lines = \"\\n\".join(f\"- **{n}**: {s}\" for n, s in file_summaries.items())\n", " prompt_sections = [\n", " \"[[FILE_SUMMARIES]]\",\n", " summary_lines,\n", " \"[[/FILE_SUMMARIES]]\",\n", " ]\n", " tokens_used = token_estimate(\"\\n\".join(prompt_sections))\n", "\n", " if code_snippets:\n", " code_block_lines = []\n", " for fname, code in code_snippets:\n", " added = \"\\n### \" + fname + \"\\n```code\\n\" + code + \"\\n```\\n\"\n", " t = token_estimate(added)\n", " if tokens_used + t > (ctx_budget // 2):\n", " break\n", " code_block_lines.append(added)\n", " tokens_used += t\n", " if code_block_lines:\n", " prompt_sections.extend(\n", " [\"[[RAW_CODE_SNIPPETS]]\"] + code_block_lines + \n", " [\"[[/RAW_CODE_SNIPPETS]]\"]\n", " )\n", "\n", " user_prompt = textwrap.dedent(\"\\n\".join(prompt_sections) + \"\"\"\n", " ---\n", " **Your tasks**\n", " 1. Identify the major abstractions (classes, services, data models) \n", " across the entire codebase.\n", " 2. Explain how they interact – include dependencies, data flow, and any \n", " cross-cutting concerns.\n", " 3. Output a concise *Architecture & Key Concepts* section suitable for a \n", " README, consisting of:\n", " • short Overview (≤ 3 sentences)\n", " • Mermaid diagram (`classDiagram` or `flowchart`) of components\n", " • bullet list of abstractions with brief descriptions.\n", " \"\"\")\n", "\n", " sys_prompt = (\n", " \"You are a principal software architect. Use the provided file \"\n", " \"summaries (and raw code if present) to infer high-level design. \"\n", " \"Be precise and avoid guesswork.\"\n", " )\n", " \n", " try:\n", " resp = client.chat.completions.create(\n", " model=LLM_MODEL,\n", " messages=[\n", " {\"role\": \"system\", \"content\": sys_prompt},\n", " {\"role\": \"user\", \"content\": user_prompt},\n", " ],\n", " temperature=0.2,\n", " max_tokens=MAX_COMPLETION_TOKENS_ARCH,\n", " )\n", " return resp.completion_message.content.text\n", " except Exception as e:\n", " print(f\" Error creating architecture analysis: {e}\")\n", " return \"\"" ] }, { "cell_type": "code", "execution_count": 18, "id": "09d61720-62b8-4dc7-a2c0-18c927382432", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Performing cross-file architectural reasoning ---\n", "āœ… Architectural analysis complete.\n" ] } ], "source": [ "print(\"\\n--- Performing cross-file architectural reasoning ---\")\n", "architecture_section = build_architecture(\n", " file_summaries, snippets, CTX_WINDOW\n", ")\n", "print(\"āœ… Architectural analysis complete.\")" ] }, { "cell_type": "code", "execution_count": 19, "id": "5d9036ca-9aef-47f9-8c02-fe86f48ba427", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "## Architecture & Key Concepts\n", "\n", "### Overview\n", "\n", "The Llama project is a large language model implementation that provides a simple and efficient way to generate text based on given prompts. The project consists of several key components, including a Transformer-based model, a tokenizer, and a generation module. These components work together to enable text completion and chat completion tasks.\n", "\n", "### Mermaid Diagram\n", "\n", "```mermaid\n", "classDiagram\n", " class Llama {\n", " +build(ckpt_dir, tokenizer_path, max_seq_len, max_batch_size)\n", " +text_completion(prompts, temperature, top_p, max_gen_len, logprobs, echo)\n", " +chat_completion(dialogs, temperature, top_p, max_gen_len, logprobs)\n", " }\n", " class Transformer {\n", " +forward(tokens, start_pos)\n", " }\n", " class Tokenizer {\n", " +encode(s, bos, eos)\n", " +decode(t)\n", " }\n", " class ModelArgs {\n", " +dim\n", " +n_layers\n", " +n_heads\n", " +n_kv_heads\n", " +vocab_size\n", " +multiple_of\n", " +ffn_dim_multiplier\n", " +norm_eps\n", " +max_batch_size\n", " +max_seq_len\n", " }\n", " Llama --> Transformer\n", " Llama --> Tokenizer\n", " Transformer --> ModelArgs\n", "```\n", "\n", "### Abstractions and Descriptions\n", "\n", "* **Llama**: The main class that provides a simple interface for text completion and chat completion tasks. It uses a Transformer-based model and a tokenizer to generate text.\n", "* **Transformer**: A Transformer-based model that takes in token IDs and outputs logits. It consists of multiple layers, each with an attention mechanism and a feedforward network.\n", "* **Tokenizer**: A class that tokenizes and encodes/decodes text using SentencePiece.\n", "* **ModelArgs**: A dataclass that stores the model configuration parameters, such as the dimension, number of layers, and vocabulary size.\n", "* **Dialog**: A list of messages, where each message is a dictionary with a role and content.\n", "* **Message**: A dictionary with a role and content.\n", "\n", "## Interaction and Dependencies\n", "\n", "The Llama class depends on the Transformer and Tokenizer classes. The Transformer class depends on the ModelArgs dataclass. The Llama class uses the Transformer and Tokenizer classes to generate text.\n", "\n", "The data flow is as follows:\n", "\n", "1. The Llama class takes in a prompt or a dialog and tokenizes it using the Tokenizer class.\n", "2. The tokenized prompt or dialog is then passed to the Transformer class, which outputs logits.\n", "3. The logits are then used to generate text, which is returned by the Llama class.\n", "\n", "Cross-cutting concerns include:\n", "\n", "* **Model parallelism**: The Transformer class uses model parallelism to speed up computation.\n", "* **Caching**: The Transformer class caches the keys and values for attention to reduce computation.\n", "* **Error handling**: The Llama class and Transformer class handle errors, such as invalid input or out-of-range values.\n", "\n", "## Key Components and Their Responsibilities\n", "\n", "* **Llama**: Provides a simple interface for text completion and chat completion tasks.\n", "* **Transformer**: Implements the Transformer-based model for generating text.\n", "* **Tokenizer**: Tokenizes and encodes/decodes text using SentencePiece.\n", "* **ModelArgs**: Stores the model configuration parameters.\n", "\n", "## Generation Module\n", "\n", "The generation module is responsible for generating text based on given prompts. It uses the Transformer class and the Tokenizer class to generate text.\n", "\n", "The generation module provides two main functions:\n", "\n", "* **text_completion**: Generates text completions for a list of prompts.\n", "* **chat_completion**: Generates assistant responses for a list of conversational dialogs.\n", "\n", "These functions take in parameters such as temperature, top-p, and maximum generation length to control the generation process.\n", "\n", "## Conclusion\n", "\n", "The Llama project provides a simple and efficient way to generate text based on given prompts. The project consists of several key components, including a Transformer-based model, a tokenizer, and a generation module. These components work together to enable text completion and chat completion tasks.\n" ] } ], "source": [ "print(architecture_section)" ] }, { "cell_type": "markdown", "id": "68646af1-e362-4d1d-8a8d-ed311e54145b", "metadata": {}, "source": [ "## Step 5: Assemble final documentation\n", "\n", "The final phase assembles all the AI-generated content into a single, comprehensive `README.md` file. The goal is to create a document that is not only informative but also easy for developers to navigate and use.\n", "\n", "### Documentation structure\n", "\n", "The generated README follows a layered approach that enables readers to consume information at their preferred level of detail.\n", "\n", "1. **Repository Summary**: A high-level overview gives developers an immediate understanding of the project's purpose.\n", "2. **Architecture and Key Concepts**: A deeper technical analysis, including a Mermaid diagram, helps developers understand how the system is designed.\n", "3. **File Summaries**: A detailed breakdown of each component provides granular information for those who need it.\n", "4. **Attribution**: A concluding note clarifies that the document was generated by AI, which provides transparency about its origin.\n", "\n", "> **šŸŽÆ** The combination of Llama 4's code intelligence and large context window enables the automated generation of thorough, high-quality documentation that rivals manually-created content, requiring minimal human intervention." ] }, { "cell_type": "code", "execution_count": 20, "id": "a0594f6f-4dcb-4510-9f4b-aea1b892f6be", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "āœļø Writing final README to /Users/saip/Documents/GitHub/meta-documentation-shared/notebooks/Generated_README_llama-main.md...\n", "\n", "\n", "šŸŽ‰ Success! Documentation generated at: /Users/saip/Documents/GitHub/meta-documentation-shared/notebooks/Generated_README_llama-main.md\n" ] } ], "source": [ "OUTPUT_DIR = Path.cwd()\n", "readme_path = OUTPUT_DIR / f\"Generated_README_{extracted_root.name}.md\"\n", "print(f\"\\nāœļø Writing final README to {readme_path.resolve()}...\")\n", "with readme_path.open(\"w\", encoding=\"utf-8\") as fh:\n", " fh.write(f\"# Repository Summary for `{extracted_root.name}`\\n\\n\"\n", " f\"{repo_overview}\\n\\n\")\n", " fh.write(\"## Architecture & Key Concepts\\n\\n\")\n", " fh.write(architecture_section.strip() + \"\\n\\n\")\n", " fh.write(\"## File Summaries\\n\\n\")\n", " for n, s in sorted(file_summaries.items()):\n", " fh.write(f\"- **{n}** – {s}\\n\")\n", " fh.write(\n", " \"\\n---\\n*This README was generated automatically using \"\n", " \"Meta's **Llama 4** models.*\"\n", " )\n", "\n", "print(f\"\\n\\nšŸŽ‰ Success! Documentation generated at: \"\n", " f\"{readme_path.resolve()}\")" ] }, { "cell_type": "code", "execution_count": 21, "id": "82caa4b0-3f9f-4749-802a-348c941ed386", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# Repository Summary for `llama-main`\n", "\n", "Here is a high-level overview for the root of a README.md:\n", "\n", "## Overview\n", "\n", "This repository provides a comprehensive framework for utilizing the Llama large language model, including model architecture, training data, and example usage. The project aims to facilitate the development of natural language processing applications, while promoting responsible use and community engagement. By providing a range of tools and resources, this repository enables developers and researchers to explore the capabilities and limitations of the Llama model. The repository is structured to support easy integration, modification, and extension of the model.\n", "\n", "## Key Components\n", "\n", "* **llama/generation.py**: Core logic for text generation using the Llama model\n", "* **llama/model.py**: Transformer-based model architecture definition\n", "* **llama/tokenizer.py**: Tokenizer class using SentencePiece for text encoding and decoding\n", "* **example_text_completion.py**: Example usage of the Llama model for text completion tasks\n", "* **example_chat_completion.py**: Example usage of the Llama model for conversational tasks\n", "* **requirements.txt**: Dependency specifications for project setup and installation\n", "\n", "## Getting Started\n", "\n", "To get started with this project, run `pip install -r requirements.txt` to install the required dependencies. You can then explore the example usage files, such as `example_text_completion.py` and `example_chat_completion.py`, to learn more about integrating the Llama model into your projects.\n", "\n", "## Architecture & Key Concepts\n", "\n", "## Architecture & Key Concepts\n", "\n", "### Overview\n", "\n", "The Llama project is a large language model implementation that provides a simple and efficient way to generate text based on given prompts. The project consists of several key components, including a Transformer-based model, a tokenizer, and a generation module. These components work together to enable text completion and chat completion tasks.\n", "\n", "### Mermaid Diagram\n", "\n", "```mermaid\n", "classDiagram\n", " class Llama {\n", " +build(ckpt_dir, tokenizer_path, max_seq_len, max_batch_size)\n", " +text_completion(prompts, temperature, top_p, max_gen_len, logprobs, echo)\n", " +chat_completion(dialogs, temperature, top_p, max_gen_len, logprobs)\n", " }\n", " class Transformer {\n", " +forward(tokens, start_pos)\n", " }\n", " class Tokenizer {\n", " +encode(s, bos, eos)\n", " +decode(t)\n", " }\n", " class ModelArgs {\n", " +dim\n", " +n_layers\n", " +n_heads\n", " +n_kv_heads\n", " +vocab_size\n", " +multiple_of\n", " +ffn_dim_multiplier\n", " +norm_eps\n", " +max_batch_size\n", " +max_seq_len\n", " }\n", " Llama --> Transformer\n", " Llama --> Tokenizer\n", " Transformer --> ModelArgs\n", "```\n", "\n", "### Abstractions and Descriptions\n", "\n", "* **Llama**: The main class that provides a simple interface for text completion and chat completion tasks. It uses a Transformer-based model and a tokenizer to generate text.\n", "* **Transformer**: A Transformer-based model that takes in token IDs and outputs logits. It consists of multiple layers, each with an attention mechanism and a feedforward network.\n", "* **Tokenizer**: A class that tokenizes and encodes/decodes text using SentencePiece.\n", "* **ModelArgs**: A dataclass that stores the model configuration parameters, such as the dimension, number of layers, and vocabulary size.\n", "* **Dialog**: A list of messages, where each message is a dictionary with a role and content.\n", "* **Message**: A dictionary with a role and content.\n", "\n", "## Interaction and Dependencies\n", "\n", "The Llama class depends on the Transformer and Tokenizer classes. The Transformer class depends on the ModelArgs dataclass. The Llama class uses the Transformer and Tokenizer classes to generate text.\n", "\n", "The data flow is as follows:\n", "\n", "1. The Llama class takes in a prompt or a dialog and tokenizes it using the Tokenizer class.\n", "2. The tokenized prompt or dialog is then passed to the Transformer class, which outputs logits.\n", "3. The logits are then used to generate text, which is returned by the Llama class.\n", "\n", "Cross-cutting concerns include:\n", "\n", "* **Model parallelism**: The Transformer class uses model parallelism to speed up computation.\n", "* **Caching**: The Transformer class caches the keys and values for attention to reduce computation.\n", "* **Error handling**: The Llama class and Transformer class handle errors, such as invalid input or out-of-range values.\n", "\n", "## Key Components and Their Responsibilities\n", "\n", "* **Llama**: Provides a simple interface for text completion and chat completion tasks.\n", "* **Transformer**: Implements the Transformer-based model for generating text.\n", "* **Tokenizer**: Tokenizes and encodes/decodes text using SentencePiece.\n", "* **ModelArgs**: Stores the model configuration parameters.\n", "\n", "## Generation Module\n", "\n", "The generation module is responsible for generating text based on given prompts. It uses the Transformer class and the Tokenizer class to generate text.\n", "\n", "The generation module provides two main functions:\n", "\n", "* **text_completion**: Generates text completions for a list of prompts.\n", "* **chat_completion**: Generates assistant responses for a list of conversational dialogs.\n", "\n", "These functions take in parameters such as temperature, top-p, and maximum generation length to control the generation process.\n", "\n", "## Conclusion\n", "\n", "The Llama project provides a simple and efficient way to generate text based on given prompts. The project consists of several key components, including a Transformer-based model, a tokenizer, and a generation module. These components work together to enable text completion and chat completion tasks.\n", "\n", "## File Summaries\n", "\n", "- **CODE_OF_CONDUCT.md** – The `CODE_OF_CONDUCT.md` file outlines the expected behavior and standards for contributors and maintainers of the project, aiming to create a harassment-free and welcoming environment. It defines acceptable and unacceptable behavior, roles and responsibilities, and procedures for reporting and addressing incidents, promoting a positive and inclusive community.\n", "- **CONTRIBUTING.md** – Here is a concise summary of the `CONTRIBUTING.md` file:\n", "\n", "The `CONTRIBUTING.md` file outlines the guidelines and processes for contributing to the Llama project. It provides instructions for submitting pull requests, including bug fixes, improvements, and new features, as well as information on the Contributor License Agreement, issue tracking, and licensing terms, to ensure a smooth and transparent contribution experience.\n", "- **MODEL_CARD.md** – The `MODEL_CARD.md` file provides detailed information about the Llama 2 family of large language models (LLMs), including model architecture, training data, performance evaluations, and intended use cases. It serves as a comprehensive model card, outlining the model's capabilities, limitations, and responsible use guidelines for developers and researchers.\n", "- **README.md** – This `README.md` file serves as a deprecated repository for Llama 2, a large language model, providing minimal examples for loading models and running inference. It directs users to new, consolidated repositories for Llama 3.1 and offers guidance on downloading models, quick start instructions, and responsible use guidelines.\n", "- **UPDATES.md** – Here is a concise summary of the `UPDATES.md` file:\n", "\n", "The `UPDATES.md` file documents recent updates to the project, specifically addressing issues with system prompts and token sanitization. Updates aim to reduce false refusal rates and prevent prompt injection attacks, enhancing model safety and security. Changes include removing default system prompts and sanitizing user-provided prompts to mitigate abuse.\n", "- **USE_POLICY.md** – Here is a concise summary of the `USE_POLICY.md` file:\n", "\n", "The Llama 2 Acceptable Use Policy outlines the guidelines for safe and responsible use of the Llama 2 tool. It prohibits uses that violate laws, harm individuals or groups, or facilitate malicious activities, and requires users to report any policy violations, bugs, or concerns to designated channels.\n", "- **download.sh** – The `download.sh` script downloads Llama 2 models and associated files from a provided presigned URL. It prompts for a URL and optional model sizes, then downloads the models, tokenizer, LICENSE, and usage policy to a target folder, verifying checksums for integrity.\n", "- **example_chat_completion.py** – This file, `example_chat_completion.py`, demonstrates how to use a pretrained Llama model for generating text in a conversational setting. It defines a `main` function that takes in model checkpoints, tokenizer paths, and generation parameters, and uses them to generate responses to a set of predefined dialogs. The file serves as an example for chat completion tasks in the broader project.\n", "- **example_text_completion.py** – This file, `example_text_completion.py`, demonstrates text generation using a pretrained Llama model. The `main` function initializes the model, generates text completions for a set of prompts, and prints the results. It showcases the model's capabilities in natural language continuation and translation tasks, serving as an example for integrating Llama into broader projects.\n", "- **llama/__init__.py** – The `llama/__init__.py` file serves as the entry point for the Llama project, exposing key classes and modules. It imports and makes available the main `Llama` and `Dialog` generation classes, `ModelArgs` and `Transformer` model components, and the `Tokenizer` class, providing a foundation for the project's functionality.\n", "- **llama/generation.py** – The `llama/generation.py` file contains the core logic for text generation using the Llama model. It defines the `Llama` class, which provides methods for building a model instance, generating text completions, and handling conversational dialogs. The class supports features like nucleus sampling, log probability computation, and special token handling.\n", "- **llama/model.py** – The `llama/model.py` file defines a Transformer-based model architecture, specifically the Llama model. It includes key components such as RMSNorm, attention mechanisms, feedforward layers, and a Transformer block, which are combined to form the overall model. The model is designed for efficient and scalable training and inference.\n", "- **llama/tokenizer.py** – The `llama/tokenizer.py` file implements a tokenizer class using SentencePiece, enabling text tokenization and encoding/decoding. The `Tokenizer` class loads a SentencePiece model, providing `encode` and `decode` methods for converting text to token IDs and vice versa, with optional BOS and EOS tokens.\n", "- **requirements.txt** – Here is a concise summary of the `requirements.txt` file:\n", "\n", "The `requirements.txt` file specifies the dependencies required to run the project. It lists essential libraries, including PyTorch, Fairscale, Fire, and SentencePiece, which provide core functionality for the project. This file ensures that all necessary packages are installed, enabling the project's features and functionality to work as intended.\n", "- **setup.py** – The `setup.py` file is a build script that packages and distributes the project. Its primary purpose is to define project metadata and dependencies. It uses `setuptools` to find and include packages, and loads required libraries from `requirements.txt`, enabling easy installation and setup of the project.\n", "\n", "---\n", "*This README was generated automatically using Meta's **Llama 4** models.*" ] } ], "source": [ "!cat $readme_path" ] }, { "cell_type": "code", "execution_count": 22, "id": "6a39ab8e-b3c3-455f-8dfd-9df80d8acf3f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Cleaning up temporary directory /var/folders/sz/kf8w7j1x1v790jxs8k2gl72c0000gn/T/tmptwo_kdt5 ---\n", "āœ… Cleanup complete.\n" ] } ], "source": [ "print(f\"\\n--- Cleaning up temporary directory {tmpdir} ---\")\n", "try:\n", " tmpdir_obj.cleanup()\n", " print(\"āœ… Cleanup complete.\")\n", "except Exception as e:\n", " print(f\"āš ļø Error during cleanup: {e}\")" ] }, { "cell_type": "markdown", "id": "bce59302-6b03-403c-8399-22d17b824d97", "metadata": {}, "source": [ "## Next steps and upgrade paths\n", "\n", "This tutorial provides a solid foundation for automated documentation generation. You can extend it in several ways for a production-grade application.\n", "\n", "| Need | Recommended approach |\n", "| :----------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n", "| **Private repositories** | For private GitHub repos, use authenticated requests with a personal access token. For GitLab or Bitbucket, adapt the download logic to their respective APIs. |\n", "| **Multiple languages** | Extend the `INCLUDE_EXTENSIONS` list and adjust prompts to handle language-specific documentation patterns. Consider using language-specific parsers for better code understanding. |\n", "| **Incremental updates** | Implement caching of file summaries with timestamps. Only reprocess files that have changed since the last run, significantly reducing API costs for large repositories. |\n", "| **Custom documentation formats** | Adapt the final assembly phase to generate different formats such as API documentation, developer guides, or architecture decision records (ADRs). |\n", "| **CI/CD integration** | Run the documentation generator as part of your continuous integration pipeline to keep documentation automatically synchronized with code changes. |\n", "| **Multi-repository analysis** | Extend the pipeline to analyze dependencies and generate documentation for entire microservice architectures or monorepos. |\n" ] } ], "metadata": { "kernelspec": { "display_name": "My Project (uv)", "language": "python", "name": "my-uv-project" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 5 }