{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "527a835c-afc5-4df7-a924-d5f61a417cf2",
   "metadata": {},
   "source": [
    "# Creating Evals with synthetic data and measuring hallucinations\n",
    "\n",
    "When you deploy Llama for your use case, it is a good practice to have Evals for your use case. Though it might be ideal to have human annotated Evals, this notebook shows a strategy fow how one might go about addressing this using synthetic data. However, the Evals generated still requires validation by a human to make sure that your production use case can rely on this. \n",
    "The notebook also shows how one could accurately measure hallucinations without using LLM-As-A-Judge methodology using Llama"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acfdf84f-0f4e-4684-83b0-9d2657441886",
   "metadata": {},
   "source": [
    "## Overall idea\n",
    "\n",
    "Let's assume we have a use case for generating a summarization report based on a given context, which is a pretty common use case with LLM. Both the context and the report have a lot of factual information and we want to make sure the generated report is not hallucinating.\n",
    "\n",
    "Since its not trivial to find an open source dataset for this, the idea is to take synthetic tabular data and then use Llama to generate a story(context) for every row of the tabular data using Prompt Engineering. Then we ask Llama to summarize the generated context as a report in a specific format using Prompt Engineering. Finally we check the factual accuracy of the generated report using Llama by converting this into a QA task using the tabular data as the ground truth.\n",
    "\n",
    "To generate synthetic data for this approach, we use an open source tool like [Synthetic Data Vault](https://github.com/sdv-dev/SDV)\n",
    "\n",
    "The overall workflow is shown in the below diagram\n",
    "\n",
    ""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d1e65c0-e9ae-41eb-b1a2-165d09dfacbf",
   "metadata": {},
   "source": [
    "### Synthetic Data Vault installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff645d5c-b441-40cc-97c2-451a2c103ba2",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install sdv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e37994a-1683-4c86-b4fa-e3e49563137c",
   "metadata": {},
   "source": [
    "SDV has a number of single table datasets. We choose `student_placements` dataset for this notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "4f0281a7-027d-4e4c-bbd8-7bda766a2183",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | dataset_name\n",
       " | size_MB\n",
       " | num_tables\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 0\n",
       " | KRK_v1\n",
       " | 0.06\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 1\n",
       " | adult\n",
       " | 3.91\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 2\n",
       " | alarm\n",
       " | 4.52\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 3\n",
       " | asia\n",
       " | 1.28\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 4\n",
       " | census\n",
       " | 98.17\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 5\n",
       " | census_extended\n",
       " | 4.95\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 6\n",
       " | child\n",
       " | 3.20\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 7\n",
       " | covtype\n",
       " | 255.65\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 8\n",
       " | credit\n",
       " | 68.35\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 9\n",
       " | expedia_hotel_logs\n",
       " | 0.20\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 10\n",
       " | fake_companies\n",
       " | 0.00\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 11\n",
       " | fake_hotel_guests\n",
       " | 0.03\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 12\n",
       " | grid\n",
       " | 0.32\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 13\n",
       " | gridr\n",
       " | 0.32\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 14\n",
       " | insurance\n",
       " | 3.34\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 15\n",
       " | intrusion\n",
       " | 162.04\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 16\n",
       " | mnist12\n",
       " | 81.20\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 17\n",
       " | mnist28\n",
       " | 439.60\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 18\n",
       " | news\n",
       " | 18.71\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 19\n",
       " | ring\n",
       " | 0.32\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 20\n",
       " | student_placements\n",
       " | 0.03\n",
       " | 1\n",
       " | 
\n",
       "    \n",
       "      | 21\n",
       " | student_placements_pii\n",
       " | 0.03\n",
       " | 1\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | student_id\n",
       " | gender\n",
       " | second_perc\n",
       " | high_perc\n",
       " | high_spec\n",
       " | degree_perc\n",
       " | degree_type\n",
       " | work_experience\n",
       " | experience_years\n",
       " | employability_perc\n",
       " | mba_spec\n",
       " | mba_perc\n",
       " | salary\n",
       " | placed\n",
       " | start_date\n",
       " | end_date\n",
       " | duration\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 0\n",
       " | 17264\n",
       " | M\n",
       " | 67.00\n",
       " | 91.00\n",
       " | Commerce\n",
       " | 58.00\n",
       " | Sci&Tech\n",
       " | False\n",
       " | 0\n",
       " | 55.0\n",
       " | Mkt&HR\n",
       " | 58.80\n",
       " | 27000.0\n",
       " | True\n",
       " | 2020-07-23\n",
       " | 2020-10-12\n",
       " | 3.0\n",
       " | 
\n",
       "    \n",
       "      | 1\n",
       " | 17265\n",
       " | M\n",
       " | 79.33\n",
       " | 78.33\n",
       " | Science\n",
       " | 77.48\n",
       " | Sci&Tech\n",
       " | True\n",
       " | 1\n",
       " | 86.5\n",
       " | Mkt&Fin\n",
       " | 66.28\n",
       " | 20000.0\n",
       " | True\n",
       " | 2020-01-11\n",
       " | 2020-04-09\n",
       " | 3.0\n",
       " | 
\n",
       "    \n",
       "      | 2\n",
       " | 17266\n",
       " | M\n",
       " | 65.00\n",
       " | 68.00\n",
       " | Arts\n",
       " | 64.00\n",
       " | Comm&Mgmt\n",
       " | False\n",
       " | 0\n",
       " | 75.0\n",
       " | Mkt&Fin\n",
       " | 57.80\n",
       " | 25000.0\n",
       " | True\n",
       " | 2020-01-26\n",
       " | 2020-07-13\n",
       " | 6.0\n",
       " | 
\n",
       "    \n",
       "      | 3\n",
       " | 17267\n",
       " | M\n",
       " | 56.00\n",
       " | 52.00\n",
       " | Science\n",
       " | 52.00\n",
       " | Sci&Tech\n",
       " | False\n",
       " | 0\n",
       " | 66.0\n",
       " | Mkt&HR\n",
       " | 59.43\n",
       " | NaN\n",
       " | False\n",
       " | NaN\n",
       " | NaN\n",
       " | NaN\n",
       " | 
\n",
       "    \n",
       "      | 4\n",
       " | 17268\n",
       " | M\n",
       " | 85.80\n",
       " | 73.60\n",
       " | Commerce\n",
       " | 73.30\n",
       " | Comm&Mgmt\n",
       " | False\n",
       " | 0\n",
       " | 96.8\n",
       " | Mkt&Fin\n",
       " | 55.50\n",
       " | 42500.0\n",
       " | True\n",
       " | 2020-07-04\n",
       " | 2020-09-27\n",
       " | 3.0\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | start_date\n",
       " | end_date\n",
       " | salary\n",
       " | duration\n",
       " | student_id\n",
       " | high_perc\n",
       " | high_spec\n",
       " | mba_spec\n",
       " | second_perc\n",
       " | gender\n",
       " | degree_perc\n",
       " | placed\n",
       " | experience_years\n",
       " | employability_perc\n",
       " | mba_perc\n",
       " | work_experience\n",
       " | degree_type\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 0\n",
       " | 2020-01-10\n",
       " | NaN\n",
       " | NaN\n",
       " | 3.0\n",
       " | 3040587\n",
       " | 66.62\n",
       " | Science\n",
       " | Mkt&Fin\n",
       " | 75.01\n",
       " | M\n",
       " | 75.76\n",
       " | True\n",
       " | 1\n",
       " | 85.98\n",
       " | 58.37\n",
       " | True\n",
       " | Sci&Tech\n",
       " | 
\n",
       "    \n",
       "      | 1\n",
       " | NaN\n",
       " | 2020-11-07\n",
       " | 39320.0\n",
       " | NaN\n",
       " | 5940200\n",
       " | 81.61\n",
       " | Commerce\n",
       " | Mkt&HR\n",
       " | 73.03\n",
       " | M\n",
       " | 67.27\n",
       " | True\n",
       " | 1\n",
       " | 91.44\n",
       " | 65.12\n",
       " | False\n",
       " | Comm&Mgmt\n",
       " | 
\n",
       "    \n",
       "      | 2\n",
       " | 2020-02-21\n",
       " | 2020-07-08\n",
       " | 36408.0\n",
       " | 3.0\n",
       " | 13408830\n",
       " | 62.71\n",
       " | Arts\n",
       " | Mkt&Fin\n",
       " | 82.09\n",
       " | F\n",
       " | 71.97\n",
       " | True\n",
       " | 1\n",
       " | 62.18\n",
       " | 71.15\n",
       " | True\n",
       " | Comm&Mgmt\n",
       " | 
\n",
       "    \n",
       "      | 3\n",
       " | 2020-01-30\n",
       " | 2020-09-29\n",
       " | 36591.0\n",
       " | 3.0\n",
       " | 16186310\n",
       " | 51.00\n",
       " | Commerce\n",
       " | Mkt&Fin\n",
       " | 62.04\n",
       " | M\n",
       " | 65.32\n",
       " | True\n",
       " | 1\n",
       " | 61.87\n",
       " | 58.90\n",
       " | False\n",
       " | Comm&Mgmt\n",
       " | 
\n",
       "    \n",
       "      | 4\n",
       " | 2020-01-16\n",
       " | NaN\n",
       " | 33032.0\n",
       " | NaN\n",
       " | 2086931\n",
       " | 67.04\n",
       " | Commerce\n",
       " | Mkt&Fin\n",
       " | 53.53\n",
       " | M\n",
       " | 51.08\n",
       " | True\n",
       " | 1\n",
       " | 58.65\n",
       " | 56.32\n",
       " | False\n",
       " | Sci&Tech\n",
       " | 
\n",
       "    \n",
       "      | 5\n",
       " | 2020-07-20\n",
       " | NaN\n",
       " | 31536.0\n",
       " | 3.0\n",
       " | 6414765\n",
       " | 80.30\n",
       " | Science\n",
       " | Mkt&HR\n",
       " | 87.34\n",
       " | M\n",
       " | 74.10\n",
       " | True\n",
       " | 1\n",
       " | 64.24\n",
       " | 68.55\n",
       " | False\n",
       " | Sci&Tech\n",
       " | 
\n",
       "    \n",
       "      | 6\n",
       " | 2020-02-13\n",
       " | 2020-11-26\n",
       " | 32428.0\n",
       " | 12.0\n",
       " | 6180804\n",
       " | 67.61\n",
       " | Commerce\n",
       " | Mkt&HR\n",
       " | 49.94\n",
       " | M\n",
       " | 72.79\n",
       " | True\n",
       " | 1\n",
       " | 86.51\n",
       " | 69.26\n",
       " | False\n",
       " | Sci&Tech\n",
       " | 
\n",
       "    \n",
       "      | 7\n",
       " | 2020-01-02\n",
       " | 2020-07-14\n",
       " | 36317.0\n",
       " | 6.0\n",
       " | 14357765\n",
       " | 63.09\n",
       " | Commerce\n",
       " | Mkt&Fin\n",
       " | 86.17\n",
       " | M\n",
       " | 83.25\n",
       " | True\n",
       " | 1\n",
       " | 71.89\n",
       " | 75.90\n",
       " | False\n",
       " | Sci&Tech\n",
       " | 
\n",
       "    \n",
       "      | 8\n",
       " | NaN\n",
       " | 2020-05-10\n",
       " | 27104.0\n",
       " | 3.0\n",
       " | 9499396\n",
       " | 77.42\n",
       " | Science\n",
       " | Mkt&HR\n",
       " | 71.74\n",
       " | F\n",
       " | 66.19\n",
       " | False\n",
       " | 1\n",
       " | 95.38\n",
       " | 59.49\n",
       " | False\n",
       " | Sci&Tech\n",
       " | 
\n",
       "    \n",
       "      | 9\n",
       " | 2020-01-01\n",
       " | 2020-04-15\n",
       " | NaN\n",
       " | 3.0\n",
       " | 10945558\n",
       " | 57.54\n",
       " | Science\n",
       " | Mkt&HR\n",
       " | 57.63\n",
       " | F\n",
       " | 72.51\n",
       " | True\n",
       " | 1\n",
       " | 86.40\n",
       " | 60.99\n",
       " | True\n",
       " | Comm&Mgmt\n",
       " | 
\n",
       "    \n",
       "      | 10\n",
       " | 2020-01-01\n",
       " | 2020-10-27\n",
       " | NaN\n",
       " | 6.0\n",
       " | 5714925\n",
       " | 82.43\n",
       " | Science\n",
       " | Mkt&Fin\n",
       " | 68.14\n",
       " | M\n",
       " | 76.55\n",
       " | True\n",
       " | 1\n",
       " | 95.86\n",
       " | 67.78\n",
       " | False\n",
       " | Sci&Tech\n",
       " | 
\n",
       "    \n",
       "      | 11\n",
       " | 2020-01-01\n",
       " | 2020-07-02\n",
       " | NaN\n",
       " | 3.0\n",
       " | 12273151\n",
       " | 58.25\n",
       " | Commerce\n",
       " | Mkt&Fin\n",
       " | 65.04\n",
       " | M\n",
       " | 61.35\n",
       " | True\n",
       " | 1\n",
       " | 65.73\n",
       " | 55.15\n",
       " | True\n",
       " | Comm&Mgmt\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | \n",
       " | Unnamed: 0\n",
       " | start_date\n",
       " | end_date\n",
       " | salary\n",
       " | duration\n",
       " | student_id\n",
       " | high_perc\n",
       " | high_spec\n",
       " | mba_spec\n",
       " | second_perc\n",
       " | gender\n",
       " | degree_perc\n",
       " | placed\n",
       " | experience_years\n",
       " | employability_perc\n",
       " | mba_perc\n",
       " | work_experience\n",
       " | degree_type\n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | 0\n",
       " | 0\n",
       " | 2020-01-10\n",
       " | NaN\n",
       " | NaN\n",
       " | 3.0\n",
       " | 3040587\n",
       " | 66.62\n",
       " | Science\n",
       " | Mkt&Fin\n",
       " | 75.01\n",
       " | M\n",
       " | 75.76\n",
       " | True\n",
       " | 1\n",
       " | 85.98\n",
       " | 58.37\n",
       " | True\n",
       " | Sci&Tech\n",
       " | 
\n",
       "  \n",
       "
\n",
       "