Clone the repo
git clone https://github.com/NVIDIA/tensorrt-inference-server.git
Download models
cd tensorrt-inference-server/docs/examples/
./fetch_models.sh
Copy models to shared NFS location
cp -rp model_repository ensemble_model_repository /home/k8sSHARE
Prometheus collects metrics for viewing in Grafana. Install the prometheus-operator for these components. The serviceMonitorSelectorNilUsesHelmValues flag is needed so that Prometheus can find the inference server metrics in the example release deployed below:
helm install --name example-metrics --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false stable/prometheus-operator
Setup port-forward to the Grafana service for local access
kubectl port-forward service/example-metrics-grafana 8080:80
Navigate in your browser to localhost:8080 for the Grafana login page.
username=admin password=prom-operator
Change to helm chart directory
cd ~/tensorrt-inference-server/deploy/single_server/
'
Modify values.yaml
changing modelRepositoryPath
image: imageName: nvcr.io/nvidia/tensorrtserver:20.01-py3 pullPolicy: IfNotPresent #modelRepositoryPath: gs://tensorrt-inference-server-repository/model_repository modelRepositoryPath: /data/model_repository numGpus: 1
Modify templates/deployment.yaml
in bold to add the local NFS mount:
apiVersion: apps/v1 kind: Deployment metadata: name: {{ template "tensorrt-inference-server.fullname" . }} namespace: {{ .Release.Namespace }} labels:app: {{ template "tensorrt-inference-server.name" . }} chart: {{ template "tensorrt-inference-server.chart" . }} release: {{ .Release.Name }} heritage: {{ .Release.Service }}
spec: replicas: {{ .Values.replicaCount }} selector:
matchLabels: app: {{ template "tensorrt-inference-server.name" . }} release: {{ .Release.Name }}
template:
metadata: labels: app: {{ template "tensorrt-inference-server.name" . }} release: {{ .Release.Name }} spec: containers: - name: {{ .Chart.Name }} image: "{{ .Values.image.imageName }}" imagePullPolicy: {{ .Values.image.pullPolicy }} <b style='background-color:yellow'> volumeMounts: - mountPath: /data/ name: work-volume</b> resources: limits: nvidia.com/gpu: {{ .Values.image.numGpus }} args: ["trtserver", "--model-store={{ .Values.image.modelRepositoryPath }}"] ports: - containerPort: 8000 name: http - containerPort: 8001 name: grpc - containerPort: 8002 name: metrics livenessProbe: httpGet: path: /api/health/live port: http readinessProbe: initialDelaySeconds: 5 periodSeconds: 5 httpGet: path: /api/health/ready port: http securityContext: runAsUser: 1000 fsGroup: 1000
volumes:
- name: work-volume hostPath: # directory locally mounted on host path: /home/k8sSHARE type: Directory
Deploy the inference server using the default configuration with:
cd ~/tensorrt-inference-server/deploy/single_server/ $ helm install --name example .