|
|
1 týždeň pred | |
|---|---|---|
| .. | ||
| README.md | 1 týždeň pred | |
| main.tf | 1 týždeň pred | |
| outputs.tf | 1 týždeň pred | |
| terraform.tfvars | 1 týždeň pred | |
| terraform.tfvars.example | 1 týždeň pred | |
| variables.tf | 1 týždeň pred | |
Deploy Llama models using Amazon SageMaker with GPU instances.
This Terraform configuration sets up a basic example deployment, demonstrating how to deploy/serve foundation models using Amazon SageMaker. Amazon SageMaker provides managed inference endpoints with auto-scaling capabilities.
This example shows how to use basic services such as:
In our architecture patterns for private cloud guide we outline advanced patterns for cloud deployment that you may choose to implement in a more complete deployment. This includes:
tar.gz (see model setup below)ml.p4d.24xlarge instances via AWS Service Quotas (default is 0)Configure AWS credentials:
aws configure
Prepare Llama model artifacts: ```bash
pip install huggingface-hub huggingface-cli download meta-llama/Llama-3.2-1B-Instruct --local-dir ./model
# Package for Amazon SageMaker tar -czf model.tar.gz -C model . aws s3 cp model.tar.gz s3://your-bucket/model/
3. Create configuration:
```bash
cd terraform/amazon-sagemaker-default
cp terraform.tfvars.example terraform.tfvars
Edit terraform.tfvars with your model S3 path and other variables
Deploy:
terraform init
terraform plan
terraform apply
import boto3
import json
client = boto3.client('sagemaker-runtime', region_name='us-east-1')
response = client.invoke_endpoint(
EndpointName='your-endpoint-name',
ContentType='application/json',
Body=json.dumps({
"inputs": "Hello, how are you?",
"parameters": {
"max_new_tokens": 256,
"temperature": 0.7
}
})
)
result = json.loads(response['Body'].read())
print(result)