瀏覽代碼

Merge pull request #16 from meta-llama/lg_inference_update

LG 1B and 11B vision inference update
albertodepaola 9 月之前
父節點
當前提交
b330247b76

+ 1 - 1
recipes/responsible_ai/README.md

@@ -4,7 +4,7 @@ The [Purple Llama](https://github.com/meta-llama/PurpleLlama/) project provides
 
 
 | Tool/Model | Description | Get Started
 | Tool/Model | Description | Get Started
 |---|---|---|
 |---|---|---|
-[Llama Guard](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama-guard-3) | Provide guardrailing on inputs and outputs | [Inference](./llama_guard/inference.py), [Finetuning](./llama_guard/llama_guard_customization_via_prompting_and_fine_tuning.ipynb)
+[Llama Guard](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama-guard-3) | Provide guardrailing on inputs and outputs | [Inference](./llama_guard/llama_guard_text_and_vision_inference.ipynb), [Finetuning](./llama_guard/llama_guard_customization_via_prompting_and_fine_tuning.ipynb)
 [Prompt Guard](https://llama.meta.com/docs/model-cards-and-prompt-formats/prompt-guard) | Model to safeguards against jailbreak attempts and embedded prompt injections | [Notebook](./prompt_guard/prompt_guard_tutorial.ipynb)
 [Prompt Guard](https://llama.meta.com/docs/model-cards-and-prompt-formats/prompt-guard) | Model to safeguards against jailbreak attempts and embedded prompt injections | [Notebook](./prompt_guard/prompt_guard_tutorial.ipynb)
 [Code Shield](https://github.com/meta-llama/PurpleLlama/tree/main/CodeShield) | Tool to safeguard against insecure code generated by the LLM | [Notebook](https://github.com/meta-llama/PurpleLlama/blob/main/CodeShield/notebook/CodeShieldUsageDemo.ipynb)
 [Code Shield](https://github.com/meta-llama/PurpleLlama/tree/main/CodeShield) | Tool to safeguard against insecure code generated by the LLM | [Notebook](https://github.com/meta-llama/PurpleLlama/blob/main/CodeShield/notebook/CodeShieldUsageDemo.ipynb)
 
 

+ 5 - 55
recipes/responsible_ai/llama_guard/README.md

@@ -2,62 +2,12 @@
 <!-- markdown-link-check-disable -->
 <!-- markdown-link-check-disable -->
 Meta Llama Guard is a language model that provides input and output guardrails for LLM inference. For more details and model cards, please visit the [PurpleLlama](https://github.com/meta-llama/PurpleLlama) repository.
 Meta Llama Guard is a language model that provides input and output guardrails for LLM inference. For more details and model cards, please visit the [PurpleLlama](https://github.com/meta-llama/PurpleLlama) repository.
 
 
-This folder contains an example file to run inference with a locally hosted model, either using the Hugging Face Hub or a local path.
+This [notebook](llama_guard_text_and_vision_inference.ipynb) shows how to load the models with the transformers library and how to customize the categories.
 
 
 ## Requirements
 ## Requirements
-1. Access to Llama guard model weights on Hugging Face. To get access, follow the steps described [here](https://github.com/facebookresearch/PurpleLlama/tree/main/Llama-Guard#download)
-2. Llama recipes package and it's dependencies [installed](https://github.com/meta-llama/llama-recipes?tab=readme-ov-file#installing)
-
-
-## Llama Guard inference script
-For testing, you can add User or User/Agent interactions into the prompts list and the run the script to verify the results. When the conversation has one or more Agent responses, it's considered of type agent.
-
-
-```
-    prompts: List[Tuple[List[str], AgentType]] = [
-        (["<Sample user prompt>"], AgentType.USER),
-
-        (["<Sample user prompt>",
-        "<Sample agent response>"], AgentType.AGENT),
-
-        (["<Sample user prompt>",
-        "<Sample agent response>",
-        "<Sample user reply>",
-        "<Sample agent response>",], AgentType.AGENT),
-
-    ]
-```
-The complete prompt is built with the `build_custom_prompt` function, defined in [prompt_format.py](../../../src/llama_recipes/inference/prompt_format_utils.py). The file contains the default Meta Llama Guard categories. These categories can adjusted and new ones can be added, as described in the [research paper](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), on section 4.5 Studying the adaptability of the model.
-<!-- markdown-link-check-enable -->
-
-To run the samples, with all the dependencies installed, execute this command:
-
-`python recipes/responsible_ai/llama_guard/inference.py`
-
-This is the output:
-
-```
-['<Sample user prompt>']
-> safe
-
-==================================
-
-['<Sample user prompt>', '<Sample agent response>']
-> safe
-
-==================================
-
-['<Sample user prompt>', '<Sample agent response>', '<Sample user reply>', '<Sample agent response>']
-> safe
-
-==================================
-```
-
-To run it with a local model, you can use the `model_id` param in the inference script:
-
-`python recipes/responsible_ai/llama_guard/inference.py --model_id=/home/ubuntu/models/llama3/Llama-Guard-3-8B/ --llama_guard_version=LLAMA_GUARD_3`
-
-Note: Make sure to also add the llama_guard_version; by default it uses LLAMA_GUARD_3
+1. Access to Llama guard model weights on Hugging Face. To get access, follow the steps described in the top of the model card in [Hugging Face](https://huggingface.co/meta-llama/Llama-Guard-3-1B)
+2. Llama recipes package and its dependencies [installed](https://github.com/meta-llama/llama-recipes?tab=readme-ov-file#installing)
+3. Pillow package installed
 
 
 ## Inference Safety Checker
 ## Inference Safety Checker
 When running the regular inference script with prompts, Meta Llama Guard will be used as a safety checker on the user prompt and the model output. If both are safe, the result will be shown, else a message with the error will be shown, with the word unsafe and a comma separated list of categories infringed. Meta Llama Guard is always loaded quantized using Hugging Face Transformers library with bitsandbytes.
 When running the regular inference script with prompts, Meta Llama Guard will be used as a safety checker on the user prompt and the model output. If both are safe, the result will be shown, else a message with the error will be shown, with the word unsafe and a comma separated list of categories infringed. Meta Llama Guard is always loaded quantized using Hugging Face Transformers library with bitsandbytes.
@@ -66,7 +16,7 @@ In this case, the default categories are applied by the tokenizer, using the `ap
 
 
 Use this command for testing with a quantized Llama model, modifying the values accordingly:
 Use this command for testing with a quantized Llama model, modifying the values accordingly:
 
 
-`python examples/inference.py --model_name <path_to_regular_llama_model> --prompt_file <path_to_prompt_file> --quantization 8bit --enable_llamaguard_content_safety`
+`python inference.py --model_name <path_to_regular_llama_model> --prompt_file <path_to_prompt_file> --enable_llamaguard_content_safety`
 
 
 ## Llama Guard 3 Finetuning & Customization
 ## Llama Guard 3 Finetuning & Customization
 The safety categories in Llama Guard 3 can be tuned for specific application needs. Existing categories can be removed and new categories can be added to the taxonomy. The [Llama Guard Customization](./llama_guard_customization_via_prompting_and_fine_tuning.ipynb) notebook walks through the process.
 The safety categories in Llama Guard 3 can be tuned for specific application needs. Existing categories can be removed and new categories can be added to the taxonomy. The [Llama Guard Customization](./llama_guard_customization_via_prompting_and_fine_tuning.ipynb) notebook walks through the process.

+ 0 - 75
recipes/responsible_ai/llama_guard/inference.py

@@ -1,75 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# This software may be used and distributed according to the terms of the Llama 2 Community License Agreement.
-
-import fire
-from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
-
-
-from llama_recipes.inference.prompt_format_utils import build_default_prompt, create_conversation, LlamaGuardVersion
-from typing import List, Tuple
-from enum import Enum
-
-class AgentType(Enum):
-    AGENT = "Agent"
-    USER = "User"
-
-def main(
-    model_id: str = "meta-llama/Llama-Guard-3-8B",
-    llama_guard_version: str = "LLAMA_GUARD_3"
-):
-    """
-    Entry point for Llama Guard inference sample script.
-
-    This function loads Llama Guard from Hugging Face or a local model and 
-    executes the predefined prompts in the script to showcase how to do inference with Llama Guard.
-
-    Args:
-        model_id (str): The ID of the pretrained model to use for generation. This can be either the path to a local folder containing the model files,
-            or the repository ID of a model hosted on the Hugging Face Hub. Defaults to 'meta-llama/LlamaGuard-7b'.
-        llama_guard_version (LlamaGuardVersion): The version of the Llama Guard model to use for formatting prompts. Defaults to LLAMA_GUARD_1.
-    """
-    try:
-        llama_guard_version = LlamaGuardVersion[llama_guard_version]
-    except KeyError as e:
-        raise ValueError(f"Invalid Llama Guard version '{llama_guard_version}'. Valid values are: {', '.join([lgv.name for lgv in LlamaGuardVersion])}") from e
-
-    prompts: List[Tuple[List[str], AgentType]] = [
-        (["<Sample user prompt>"], AgentType.USER),
-
-        (["<Sample user prompt>",
-        "<Sample agent response>"], AgentType.AGENT),
-        
-        (["<Sample user prompt>",
-        "<Sample agent response>",
-        "<Sample user reply>",
-        "<Sample agent response>",], AgentType.AGENT),
-
-    ]
-
-    quantization_config = BitsAndBytesConfig(load_in_8bit=True)
-
-    tokenizer = AutoTokenizer.from_pretrained(model_id)
-    model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto")
-    
-    for prompt in prompts:
-        formatted_prompt = build_default_prompt(
-                prompt[1], 
-                create_conversation(prompt[0]),
-                llama_guard_version)
-
-
-        input = tokenizer([formatted_prompt], return_tensors="pt").to("cuda")
-        prompt_len = input["input_ids"].shape[-1]
-        output = model.generate(**input, max_new_tokens=100, pad_token_id=0)
-        results = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
-       
-        
-        print(prompt[0])
-        print(f"> {results}")
-        print("\n==================================\n")
-
-if __name__ == "__main__":
-    try:
-        fire.Fire(main)
-    except Exception as e:
-        print(e)

文件差異過大導致無法顯示
+ 576 - 0
recipes/responsible_ai/llama_guard/llama_guard_text_and_vision_inference.ipynb


二進制
recipes/responsible_ai/llama_guard/resources/dog.jpg


二進制
recipes/responsible_ai/llama_guard/resources/pasta.jpeg