How to Run Nvidia's TensorRT Inference Server

Clone the repo

````git clone https://github.com/NVIDIA/tensorrt-inference-server.git````

Download models

````cd tensorrt-inference-server/docs/examples/````
````./fetch_models.sh````

Copy models to shared NFS location

````cp -rp model_repository ensemble_model_repository /home/k8sSHARE````

Deploy Prometheus and Grafana

Prometheus collects metrics for viewing in Grafana. Install the prometheus-operator for these components. The serviceMonitorSelectorNilUsesHelmValues flag is needed so that Prometheus can find the inference server metrics in the example release deployed below:

````helm install --name example-metrics --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false stable/prometheus-operator````

Setup port-forward to the Grafana service for local access:

````kubectl port-forward service/example-metrics-grafana 8080:80````

Navigate in your browser to localhost:8080 for the Grafana login page. 
````username=admin password=prom-operator````