# Google Cloud Platform Cloud Run deployment Deploy containerized Llama models using Google Cloud Run with auto-scaling. ## Overview This Terraform configuration sets up a basic example deployment, demonstrating how to deploy/serve foundation models using Google Cloud Run services. Google Cloud Run provides serverless container deployment with automatic scaling. This example shows how to use basic services such as: - IAM roles for permissions management - Service accounts for fine-grained access control - Containerization of Llama models on Cloud Run In our [architecture patterns for private cloud guide](/docs/open_source/private_cloud.md) we outline advanced patterns for cloud deployment that you may choose to implement in a more complete deployment. This includes: - Deployment into multiple regions or clouds - Managed keys/secrets services - Comprehensive logging systems for auditing and compliance - Backup and recovery systems ## Getting started ### Prerequisites * GCP project with **billing account enabled** (required for Google Cloud Run and Google Cloud Artifact Registry) * Terraform installed * Docker container image with Llama model (see container setup below) * Google Cloud CLI configured * Application Default Credentials: `gcloud auth application-default login` ### Deploy 1. Configure GCP authentication: ```bash gcloud auth login gcloud config set project YOUR_PROJECT_ID ``` 2. Prepare container image with vLLM. For speed and simplicity's sake, we will use a small 1B parameter model. You may choose to use a larger Llama model, and if so should increase the resource requirements in your tfvars file. ```bash # Create Dockerfile cat > Dockerfile << 'EOF' FROM vllm/vllm-openai:latest ENV MODEL_NAME=meta-llama/Llama-3.2-1B-Instruct CMD ["vllm", "serve", "$MODEL_NAME", "--host", "0.0.0.0", "--port", "8080"] EOF # Build and push docker build -t llama-inference . docker tag llama-inference gcr.io/YOUR_PROJECT_ID/llama-inference:latest docker push gcr.io/YOUR_PROJECT_ID/llama-inference:latest ``` 3. Edit terraform.tfvars with your project ID and container image. 4. Create configuration: ```bash cd terraform/gcp-cloud-run-default cp terraform.tfvars.example terraform.tfvars ``` 5. Deploy: ```bash terraform init terraform plan terraform apply ``` ### Usage ```bash # Get service URL SERVICE_URL=$(terraform output -raw service_url) # Make request curl -X POST $SERVICE_URL/predict \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, how are you?", "max_tokens": 100}' ``` ## Next steps * [Google Cloud Run Documentation](https://cloud.google.com/run/docs) * [Google Container Registry Guide](https://cloud.google.com/container-registry/docs)