Clone the repo
git clone https://github.com/NVIDIA/tensorrt-inference-server.git
Download models
cd tensorrt-inference-server/docs/examples/
./fetch_models.sh
Copy models to shared NFS location
cp -rp model_repository ensemble_model_repository /home/k8sSHARE
Prometheus collects metrics for viewing in Grafana. Install the prometheus-operator for these components. The serviceMonitorSelectorNilUsesHelmValues flag is needed so that Prometheus can find the inference server metrics in the example release deployed below:
helm install --name example-metrics --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false stable/prometheus-operator
Setup port-forward to the Grafana service for local access
kubectl port-forward service/example-metrics-grafana 8080:80
Navigate in your browser to localhost:8080 for the Grafana login page.
username=admin password=prom-operator
Change to helm chart directory
cd ~/tensorrt-inference-server/deploy/single_server/
'
Modify values.yaml
changing modelRepositoryPath
image: imageName: nvcr.io/nvidia/tensorrtserver:20.01-py3 pullPolicy: IfNotPresent #modelRepositoryPath: gs://tensorrt-inference-server-repository/model_repository modelRepositoryPath: /data/model_repository numGpus: 1
Modify templates/deployment.yaml
in bold to add the local NFS mount:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ template "tensorrt-inference-server.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
app: {{ template "tensorrt-inference-server.name" . }}
chart: {{ template "tensorrt-inference-server.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: {{ template "tensorrt-inference-server.name" . }}
release: {{ .Release.Name }}
template:
metadata:
labels:
app: {{ template "tensorrt-inference-server.name" . }}
release: {{ .Release.Name }}
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.imageName }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
<b style='background-color:yellow'> volumeMounts:
- mountPath: /data/
name: work-volume</b>
resources:
limits:
nvidia.com/gpu: {{ .Values.image.numGpus }}
args: ["trtserver", "--model-store={{ .Values.image.modelRepositoryPath }}"]
ports:
- containerPort: 8000
name: http
- containerPort: 8001
name: grpc
- containerPort: 8002
name: metrics
livenessProbe:
httpGet:
path: /api/health/live
port: http
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 5
httpGet:
path: /api/health/ready
port: http
securityContext:
runAsUser: 1000
fsGroup: 1000
volumes:
- name: work-volume
hostPath:
# directory locally mounted on host
path: /home/k8sSHARE
type: Directory
cd ~/tensorrt-inference-server/deploy/single_server/
$ helm install --name example .
helm ls
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
example 1 Wed Feb 26 15:46:18 2020 DEPLOYED tensorrt-inference-server-1.0.0 1.0 default
example-metrics 1 Tue Feb 25 17:45:54 2020 DEPLOYED prometheus-operator-8.9.2 0.36.0 default
kubectl get pods
NAME READY STATUS RESTARTS AGE
example-tensorrt-inference-server-f45d865dc-62c46 1/1 Running 0 53m
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
...
example-tensorrt-inference-server LoadBalancer 10.150.77.138 192.168.60.150 8000:31165/TCP,8001:31408/TCP,8002:30566/TCP 53m
kubectl create secret docker-registry <your-secret-name> --docker-server=<your-registry-server> --docker-username=<your-registry-username> --docker-password=<your-registry-apikey> --docker-email=<your-email>
Parameter Description:
docker-registry – the name you will use for this secret
docker-server – nvcr.io is the container registry for NGC
docker-username – for nvcr.io this is ‘$oauthtoken’ (including quotes)
docker-password – this is the API Key you obtained earlier
docker-email – your NGC email address
Example (you will need to generate your own oauth token)
kubectl create secret docker-registry ngc-secret --docker-server=nvcr.io --docker-username='$oauthtoken' --docker-password=clkaw309f3jfaJ002EIVCJAC0Cpcklajser90wezxc98wdn09ICJA09xjc09j09JV00JV0JVCLR0WQE8ACZz --docker-email=john@example.com
Verify your secret has been stored:
kubectl get secrets
NAME TYPE DATA AGE
...
ngc-secret kubernetes.io/dockerconfigjson 1 106m
kubectl apply -f trt-client.yaml
Verify it is running:
kubectl get pod tensorrt-client
NAME READY STATUS RESTARTS AGE
tensorrt-client 1/1 Running 0 5m
Run the inception test using the client Pod. The TensorRT Inference Service IP Address
kubectl exec -it tensorrt-client -- /bin/bash -c "image_client -u 192.168.60.150:8000 -m resnet50_netdef -s INCEPTION images/mug.jpg"
Request 0, batch size 1
Image 'images/mug.jpg':
504 (COFFEE MUG) = 0.723992