In this folder, we show various examples in a notebook for running Llama model inference on Azure's serverless API offerings. We will cover:
- HTTP requests API usage for Llama 3 instruct models in CLI
- HTTP requests API usage for Llama 3 instruct models in Python
- Plug the APIs into LangChain
- Wire the model with Gradio to build a simple chatbot with memory