# Run Nvidia's TensorRT Inference Server on omnia Clone the repo `git clone https://github.com/NVIDIA/tensorrt-inference-server.git` Download models `cd tensorrt-inference-server/docs/examples/` `./fetch_models.sh` Copy models to shared NFS location `cp -rp model_repository ensemble_model_repository /home/k8sSHARE` Fix permissions on model files `chmod -R a+r /home/k8sSHARE/model_repository` ## Deploy Prometheus and Grafana Prometheus collects metrics for viewing in Grafana. Install the prometheus-operator for these components. The serviceMonitorSelectorNilUsesHelmValues flag is needed so that Prometheus can find the inference server metrics in the example release deployed below: `helm install --name example-metrics --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false stable/prometheus-operator` Setup port-forward to the Grafana service for local access `kubectl port-forward service/example-metrics-grafana 8080:80` Navigate in your browser to localhost:8080 for the Grafana login page. `username=admin password=prom-operator` ## Setup TensorRT Inference Server Deployment Change to helm chart directory `cd ~/tensorrt-inference-server/deploy/single_server/` Modify `values.yaml` changing `modelRepositoryPath`
image: imageName: nvcr.io/nvidia/tensorrtserver:20.01-py3 pullPolicy: IfNotPresent #modelRepositoryPath: gs://tensorrt-inference-server-repository/model_repository modelRepositoryPath: /data/model_repository numGpus: 1Modify `templates/deployment.yaml` in **bold** to add the local NFS mount:
... spec: containers: - name: {{ .Chart.Name }} image: "{{ .Values.image.imageName }}" imagePullPolicy: {{ .Values.image.pullPolicy }} volumeMounts: - mountPath: /data/ name: work-volume ... volumes: - name: work-volume hostPath: # directory locally mounted on host path: /home/k8sSHARE type: Directory### Deploy the inference server
cd ~/tensorrt-inference-server/deploy/single_server/ helm install --name example .### Verify deployment
helm ls NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE example 1 Wed Feb 26 15:46:18 2020 DEPLOYED tensorrt-inference-server-1.0.0 1.0 default example-metrics 1 Tue Feb 25 17:45:54 2020 DEPLOYED prometheus-operator-8.9.2 0.36.0 default
kubectl get pods NAME READY STATUS RESTARTS AGE example-tensorrt-inference-server-f45d865dc-62c46 1/1 Running 0 53m
kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ... example-tensorrt-inference-server LoadBalancer 10.150.77.138 192.168.60.150 8000:31165/TCP,8001:31408/TCP,8002:30566/TCP 53m## Setup NGC login secret for nvcr.io `kubectl create secret docker-registry
kubectl get secrets NAME TYPE DATA AGE ... ngc-secret kubernetes.io/dockerconfigjson 1 106m## Run TensorRT Client `kubectl apply -f trt-client.yaml` Verify it is running:
kubectl get pod tensorrt-client NAME READY STATUS RESTARTS AGE tensorrt-client 1/1 Running 0 5mRun the inception test using the client Pod. The TensorRT Inference IP Address can be found by running `kubectl get svc`
kubectl exec -it tensorrt-client -- /bin/bash -c "image_client -u 192.168.60.150:8000 -m resnet50_netdef -s INCEPTION images/mug.jpg" Request 0, batch size 1 Image 'images/mug.jpg': 504 (COFFEE MUG) = 0.723992Run inception test with batch size 2 and print top 3 classifications
kubectl exec -it tensorrt-client -- /bin/bash -c "image_client -u 192.168.60.150:8000 -m resnet50_netdef -s INCEPTION images/ -c 3 -b 2" Request 0, batch size 2 Image 'images//mug.jpg': 504 (COFFEE MUG) = 0.723992 968 (CUP) = 0.270953 967 (ESPRESSO) = 0.00115996 Image 'images//mug.jpg': 504 (COFFEE MUG) = 0.723992 968 (CUP) = 0.270953 967 (ESPRESSO) = 0.00115996