Connor Treacy d03fda145d Add files via upload il y a 1 semaine
..
README.md d03fda145d Add files via upload il y a 1 semaine
main.tf d03fda145d Add files via upload il y a 1 semaine
outputs.tf d03fda145d Add files via upload il y a 1 semaine
terraform.tfvars d03fda145d Add files via upload il y a 1 semaine
terraform.tfvars.example d03fda145d Add files via upload il y a 1 semaine
variables.tf d03fda145d Add files via upload il y a 1 semaine

README.md

Amazon SageMaker deployment

Deploy Llama models using Amazon SageMaker with GPU instances.

Overview

This Terraform configuration sets up a basic example deployment, demonstrating how to deploy/serve foundation models using Amazon SageMaker. Amazon SageMaker provides managed inference endpoints with auto-scaling capabilities.

This example shows how to use basic services such as:

  • IAM roles for permissions management
  • Service accounts for fine-grained access control
  • Connecting model artifacts in S3 with SageMaker for deployment

In our architecture patterns for private cloud guide we outline advanced patterns for cloud deployment that you may choose to implement in a more complete deployment. This includes:

  • Deployment into multiple regions or clouds
  • Managed keys/secrets services
  • Comprehensive logging systems for auditing and compliance
  • Backup and recovery systems

Getting started

Prerequisites

  • AWS account with access to Amazon SageMaker
  • Terraform installed
  • Model artifacts packaged as tar.gz (see model setup below)
  • Container image (AWS pre-built or custom ECR)
  • A Hugging Face account with access to the appropriate models (such as Llama 3.2 1B or Llama 3.3 70B)
  • GPU quota: Request quota increase for ml.p4d.24xlarge instances via AWS Service Quotas (default is 0)

Deploy

  1. Configure AWS credentials:

    aws configure
    
  2. Prepare Llama model artifacts: ```bash

    Download model using Hugging Face CLI

    pip install huggingface-hub huggingface-cli download meta-llama/Llama-3.2-1B-Instruct --local-dir ./model

# Package for Amazon SageMaker tar -czf model.tar.gz -C model . aws s3 cp model.tar.gz s3://your-bucket/model/


3. Create configuration:
   ```bash
   cd terraform/amazon-sagemaker-default
   cp terraform.tfvars.example terraform.tfvars
  1. Edit terraform.tfvars with your model S3 path and other variables

  2. Deploy:

    terraform init
    terraform plan
    terraform apply
    

Usage

import boto3
import json

client = boto3.client('sagemaker-runtime', region_name='us-east-1')

response = client.invoke_endpoint(
    EndpointName='your-endpoint-name',
    ContentType='application/json',
    Body=json.dumps({
        "inputs": "Hello, how are you?",
        "parameters": {
            "max_new_tokens": 256,
            "temperature": 0.7
        }
    })
)

result = json.loads(response['Body'].read())
print(result)

Next steps