ソースを参照

Fixed Singualrity related problems

bharatk-parallel 4 年 前
コミット
547a4fbfe8
30 ファイル変更464 行追加5661 行削除
  1. 3 3
      ai/DeepStream/README.md
  2. 7 7
      ai/DeepStream_Perf_Lab/English/python/jupyter_notebook/Introduction_to_Performance_analysis.ipynb
  3. 27 61
      ai/DeepStream_Perf_Lab/English/python/jupyter_notebook/Performance_Analysis_using_NSight_systems.ipynb
  4. 1 0
      ai/DeepStream_Perf_Lab/English/python/source_code/reports/README.MD
  5. 3 3
      ai/DeepStream_Perf_Lab/README.md
  6. 41 0
      ai/DeepStream_Perf_Lab/Singularity
  7. BIN
      ai/RAPIDS/.Dockerfile.swp
  8. 1 1
      ai/RAPIDS/Dockerfile
  9. 178 3386
      ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Bike-Rental-Prediction/Backup.ipynb
  10. 8 8
      ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Bike-Rental-Prediction/Challenge.ipynb
  11. 8 8
      ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Bike-Rental-Prediction/Solution.ipynb
  12. 4 4
      ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Gene-Expression-Classification/Backup.ipynb
  13. 3 3
      ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Gene-Expression-Classification/Challenge.ipynb
  14. 4 4
      ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Gene-Expression-Classification/Solution.ipynb
  15. 3 44
      ai/RAPIDS/English/Python/jupyter_notebook/CuDF/02-Intro_to_cuDF_UDFs.ipynb
  16. 21 603
      ai/RAPIDS/English/Python/jupyter_notebook/CuDF/Backup.ipynb
  17. 10 39
      ai/RAPIDS/English/Python/jupyter_notebook/CuML/01-LinearRegression-Hyperparam.ipynb
  18. 0 515
      ai/RAPIDS/English/Python/jupyter_notebook/CuML/02-SGD.ipynb
  19. 6 9
      ai/RAPIDS/English/Python/jupyter_notebook/CuML/03_CuML_Exercise.ipynb
  20. 5 10
      ai/RAPIDS/English/Python/jupyter_notebook/CuML/04_CuML_Solution.ipynb
  21. 58 574
      ai/RAPIDS/English/Python/jupyter_notebook/CuML/Backup.ipynb
  22. 16 232
      ai/RAPIDS/English/Python/jupyter_notebook/Dask/01-Intro_to_Dask.ipynb
  23. 5 5
      ai/RAPIDS/English/Python/jupyter_notebook/Dask/02-CuDF_and_Dask.ipynb
  24. 1 1
      ai/RAPIDS/English/Python/jupyter_notebook/Dask/03-CuML_and_Dask.ipynb
  25. 2 2
      ai/RAPIDS/English/Python/jupyter_notebook/Dask/04-Challenge.ipynb
  26. 3 3
      ai/RAPIDS/English/Python/jupyter_notebook/Dask/05-Challenge_Solution.ipynb
  27. 33 128
      ai/RAPIDS/English/Python/jupyter_notebook/Dask/Backup.ipynb
  28. 0 1
      ai/RAPIDS/English/Python/jupyter_notebook/START_HERE.ipynb
  29. 7 3
      ai/RAPIDS/README.MD
  30. 6 4
      ai/RAPIDS/Singularity

+ 3 - 3
ai/DeepStream/README.md

@@ -31,14 +31,14 @@ Start working on the lab by clicking on the `Start_Here.ipynb` notebook.
 ### Singularity Container
 
 To build the singularity container, run:
-`sudo singularity build <image_name>.simg Singularity`
+`sudo singularity build --sandbox <image_name>.simg Singularity`
 
 and copy the files to your local machine to make sure changes are stored locally:
-`singularity run <image_name>.simg cp -rT /opt/nvidia/deepstream/deepstream-5.0/ ~/workspace`
+`singularity run --writable <image_name>.simg cp -rT /opt/nvidia/deepstream/deepstream-5.0/ ~/workspace`
 
 
 Then, run the container:
-`singularity run --nv <image_name>.simg jupyter notebook --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=~workspace/python`
+`singularity run --nv --writable <image_name>.simg jupyter notebook --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=~/workspace/python`
 
 Then, open the jupyter notebook in browser: http://localhost:8888
 Start working on the lab by clicking on the `Start_Here.ipynb` notebook.

+ 7 - 7
ai/DeepStream_Perf_Lab/English/python/jupyter_notebook/Introduction_to_Performance_analysis.ipynb

@@ -738,7 +738,7 @@
    "source": [
     "### Effects on OSD, Tiler, and Queues\n",
     "\n",
-    "We notice an average of 40-50 FPS for three concurrent streams on a Tesla V100. In the above case, OSD ( On-screen display and Tiling ) can slow down the pipeline. We can design our pipeline such that we get the Inference metadata without the need for visual outputs. This is particularly useful when using Edge devices that only need to send real-time inference metadata to the cloud server for further processing.\n",
+    "In the above case, OSD ( On-screen display and Tiling ) can slow down the pipeline. We can design our pipeline such that we get the Inference metadata without the need for visual outputs. This is particularly useful when using Edge devices that only need to send real-time inference metadata to the cloud server for further processing.\n",
     "\n",
     "#### Disabling OSD & Tiler\n",
     "\n",
@@ -778,12 +778,12 @@
     "\n",
     "Let us summarise our above benchmarks using a table.\n",
     "\n",
-    "|Pipeline|Total time|Avg FPS-per-stream|Number of streams|\n",
-    "|---|----|---|---|\n",
-    "|Default Pipeline|33.787|42|3|\n",
-    "|With Queues|12.526|115|3|\n",
-    "|Without OSD |11.087|129|3|\n",
-    "|With Queues and without OSD|11.054|130|3|\n",
+    "|Pipeline|Relative Time|\n",
+    "|---|----|\n",
+    "|Default Pipeline|baseline|\n",
+    "|With Queues|~3x|\n",
+    "|Without OSD |~3.1x|\n",
+    "|With Queues and without OSD|~3.15x|\n",
     "\n",
     "\n",
     "We can now move on to benchmark our code further using NSight systems in the upcoming notebook."

ファイルの差分が大きいため隠しています
+ 27 - 61
ai/DeepStream_Perf_Lab/English/python/jupyter_notebook/Performance_Analysis_using_NSight_systems.ipynb


+ 1 - 0
ai/DeepStream_Perf_Lab/English/python/source_code/reports/README.MD

@@ -0,0 +1 @@
+This folder will store the reports created by the Nsight Systems

+ 3 - 3
ai/DeepStream_Perf_Lab/README.md

@@ -36,14 +36,14 @@ Start working on the lab by clicking on the `Start_Here.ipynb` notebook.
 ### Singularity Container
 
 To build the singularity container, run:
-`sudo singularity build <image_name>.simg Singularity`
+`sudo singularity build --sandbox <image_name>.simg Singularity`
 
 and copy the files to your local machine to make sure changes are stored locally:
-`singularity run <image_name>.simg cp -rT /opt/nvidia/deepstream/deepstream-5.0/ ~/workspace`
+`singularity run --writable <image_name>.simg cp -rT /opt/nvidia/deepstream/deepstream-5.0/ ~/workspace`
 
 
 Then, run the container:
-`singularity run --nv <image_name>.simg jupyter notebook --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=~workspace/python`
+`singularity run --nv --writable <image_name>.simg jupyter notebook --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=~/workspace/python`
 
 Then, open the jupyter notebook in browser: http://localhost:8888
 Start working on the lab by clicking on the `Start_Here.ipynb` notebook.

+ 41 - 0
ai/DeepStream_Perf_Lab/Singularity

@@ -0,0 +1,41 @@
+Bootstrap: docker
+From: nvcr.io/nvidia/deepstream:5.0-20.07-triton
+
+%runscript
+
+    "$@"
+
+%post
+
+        apt-get -y update
+        DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends apt-transport-https ca-certificates gnupg  wget 
+        rm -rf /var/lib/apt/lists/*
+
+        wget -qO - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add -
+        echo "deb https://developer.download.nvidia.com/devtools/repo-deb/x86_64/ /" >> /etc/apt/sources.list.d/nsight.list 
+        apt-get -y update 
+        DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends nsight-systems-2020.2.1 
+        rm -rf /var/lib/apt/lists/*
+
+        apt-get -y update
+        apt-get install -y ffmpeg python3-gi python3-dev python3-pip cmake unzip
+        pip3 install pybind11 jupyterlab gdown
+
+        cd /opt/nvidia/deepstream/deepstream/lib
+        python3 setup.py install
+        cd /opt/nvidia/deepstream/deepstream-5.0/python/source_code/dataset/
+        python3 /opt/nvidia/deepstream/deepstream-5.0/python/source_code/dataset/download_dataset.py
+        unzip deepstream_dataset.zip
+        cd /opt/nvidia/deepstream/deepstream/lib
+%files
+
+    English/* /opt/nvidia/deepstream/deepstream-5.0/
+
+%environment
+XDG_RUNTIME_DIR=
+
+%labels
+
+AUTHOR bharatk
+
+

BIN
ai/RAPIDS/.Dockerfile.swp


+ 1 - 1
ai/RAPIDS/Dockerfile

@@ -1,5 +1,5 @@
 # Select Base Image 
-FROM rapidsai/rapidsai-nightly:cuda10.2-runtime-ubuntu18.04-py3.7
+FROM rapidsai/rapidsai:cuda10.1-runtime-ubuntu18.04-py3.7
 
 # Update the repo
 RUN apt-get update -y

ファイルの差分が大きいため隠しています
+ 178 - 3386
ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Bike-Rental-Prediction/Backup.ipynb


+ 8 - 8
ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Bike-Rental-Prediction/Challenge.ipynb

@@ -439,7 +439,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "weather.drop_column('RTemp')\n",
+    "weather.drop(['RTemp'],axis=1,inplace=True)\n",
     "weather"
    ]
   },
@@ -822,7 +822,7 @@
    "outputs": [],
    "source": [
     "holidays['date'] = cudf.to_datetime(holidays['date'])\n",
-    "holidays.drop_column('Description')\n",
+    "holidays.drop(['Description'],axis=1,inplace=True)\n",
     "holidays['Holiday'] = 1\n",
     "holidays"
    ]
@@ -919,7 +919,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "gdf = gdf.drop(['Holiday', 'date'])"
+    "gdf = gdf.drop(['Holiday', 'date'],axis=1)"
    ]
   },
   {
@@ -969,7 +969,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "gdf = gdf.drop(['Weather', 'Weather_dummy_1'])"
+    "gdf = gdf.drop(['Weather', 'Weather_dummy_1'],axis=1)"
    ]
   },
   {
@@ -1003,9 +1003,9 @@
     "    \n",
     "    \n",
     "    ### TODO drop one of the nowly added dummy variable (~ 1 line of code)\n",
-    "    gdf.drop_column(...)\n",
+    "    gdf.drop(...)\n",
     "    \n",
-    "    gdf.drop_column(item) # drop the original item\n",
+    "    gdf.drop(item) # drop the original item\n",
     "    \n",
     "# Inspect the resulting table\n",
     "gdf"
@@ -1024,7 +1024,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "gdf.to_csv('data/bike_sharing.csv')"
+    "gdf.to_csv('../../../data/bike_sharing.csv')"
    ]
   },
   {
@@ -1072,7 +1072,7 @@
    "outputs": [],
    "source": [
     "y = gdf['cnt']\n",
-    "X = gdf.drop('cnt')"
+    "X = gdf.drop('cnt',axis=1)"
    ]
   },
   {

+ 8 - 8
ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Bike-Rental-Prediction/Solution.ipynb

@@ -1181,7 +1181,7 @@
     }
    ],
    "source": [
-    "weather.drop_column('RTemp')\n",
+    "weather.drop(['RTemp'],axis=1,inplace=True)\n",
     "weather"
    ]
   },
@@ -2715,7 +2715,7 @@
    ],
    "source": [
     "holidays['date'] = cudf.to_datetime(holidays['date'])\n",
-    "holidays.drop_column('Description')\n",
+    "holidays.drop(['Description'],axis=1,inplace=True)\n",
     "holidays['Holiday'] = 1\n",
     "holidays"
    ]
@@ -3296,7 +3296,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "gdf = gdf.drop(['Holiday', 'date'])"
+    "gdf = gdf.drop(['Holiday', 'date'],axis=1)"
    ]
   },
   {
@@ -3481,7 +3481,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "gdf = gdf.drop(['Weather', 'Weather_dummy_1'])"
+    "gdf = gdf.drop(['Weather', 'Weather_dummy_1'],axis=1)"
    ]
   },
   {
@@ -3514,8 +3514,8 @@
     "    ### Todo implement one-hot encoding for item\n",
     "    codes = gdf[item].unique()\n",
     "    gdf = gdf.one_hot_encoding(item, item + '_dummy', codes)\n",
-    "    gdf = gdf.drop('{}_dummy_1'.format(item))\n",
-    "    gdf = gdf.drop(item) # drop the original item"
+    "    gdf = gdf.drop('{}_dummy_1'.format(item),axis=1)\n",
+    "    gdf = gdf.drop(item,axis=1) # drop the original item"
    ]
   },
   {
@@ -3971,7 +3971,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "gdf.to_csv('data/bike_sharing.csv')"
+    "gdf.to_csv('../../../data/bike_sharing.csv')"
    ]
   },
   {
@@ -4019,7 +4019,7 @@
    "outputs": [],
    "source": [
     "y = gdf['cnt']\n",
-    "X = gdf.drop('cnt')"
+    "X = gdf.drop('cnt',axis=1)"
    ]
   },
   {

+ 4 - 4
ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Gene-Expression-Classification/Backup.ipynb

@@ -201,7 +201,7 @@
    ],
    "source": [
     "%%time\n",
-    "y = pd.read_csv('../../data/actual.csv')\n",
+    "y = pd.read_csv('../../../data/actual.csv')\n",
     "print(y.shape)\n",
     "y.head()"
    ]
@@ -248,11 +248,11 @@
    ],
    "source": [
     "# Import training data\n",
-    "df_train = pd.read_csv('../../data/data_set_ALL_AML_train.csv')\n",
+    "df_train = pd.read_csv('../../../data/data_set_ALL_AML_train.csv')\n",
     "print(df_train.shape)\n",
     "\n",
     "# Import testing data\n",
-    "df_test = pd.read_csv('../../data/data_set_ALL_AML_independent.csv')\n",
+    "df_test = pd.read_csv('../../../data/data_set_ALL_AML_independent.csv')\n",
     "print(df_test.shape)"
    ]
   },
@@ -2043,7 +2043,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

+ 3 - 3
ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Gene-Expression-Classification/Challenge.ipynb

@@ -107,7 +107,7 @@
    "outputs": [],
    "source": [
     "%%time\n",
-    "y = pd.read_csv('../../data/actual.csv')\n",
+    "y = pd.read_csv('../../../data/actual.csv')\n",
     "print(y.shape)\n",
     "y.head()"
    ]
@@ -145,11 +145,11 @@
    "outputs": [],
    "source": [
     "# Import training data\n",
-    "df_train = pd.read_csv('../../data/data_set_ALL_AML_train.csv')\n",
+    "df_train = pd.read_csv('../../../data/data_set_ALL_AML_train.csv')\n",
     "print(df_train.shape)\n",
     "\n",
     "# Import testing data\n",
-    "df_test = pd.read_csv('../../data/data_set_ALL_AML_independent.csv')\n",
+    "df_test = pd.read_csv('../../../data/data_set_ALL_AML_independent.csv')\n",
     "print(df_test.shape)"
    ]
   },

+ 4 - 4
ai/RAPIDS/English/Python/jupyter_notebook/Challenge/Gene-Expression-Classification/Solution.ipynb

@@ -201,7 +201,7 @@
    ],
    "source": [
     "%%time\n",
-    "y = pd.read_csv('../../data/actual.csv')\n",
+    "y = pd.read_csv('../../../data/actual.csv')\n",
     "print(y.shape)\n",
     "y.head()"
    ]
@@ -248,11 +248,11 @@
    ],
    "source": [
     "# Import training data\n",
-    "df_train = pd.read_csv('../../data/data_set_ALL_AML_train.csv')\n",
+    "df_train = pd.read_csv('../../../data/data_set_ALL_AML_train.csv')\n",
     "print(df_train.shape)\n",
     "\n",
     "# Import testing data\n",
-    "df_test = pd.read_csv('../../data/data_set_ALL_AML_independent.csv')\n",
+    "df_test = pd.read_csv('../../../data/data_set_ALL_AML_independent.csv')\n",
     "print(df_test.shape)"
    ]
   },
@@ -2084,7 +2084,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

+ 3 - 44
ai/RAPIDS/English/Python/jupyter_notebook/CuDF/02-Intro_to_cuDF_UDFs.ipynb

@@ -302,6 +302,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "#modify the code in this cell\n",
     "def haversine_distance_kernel(lat1, lon1, lat2, lon2, out, r):\n",
     "    \"\"\"Haversine distance formula taken from Michael Dunn's StackOverflow post:\n",
     "    https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points\n",
@@ -332,20 +333,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%%time\n",
-    "\n",
-    "df = df.apply_rows(haversine_distance_kernel,\n",
-    "                   incols=['lat1', 'lon1', 'lat2', 'lon2'],\n",
-    "                   outcols=dict(out=np.float64),\n",
-    "                   kwargs=dict(r=6371))\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
     "print(df.head())"
    ]
   },
@@ -398,6 +385,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "#modify the code in this cell\n",
     "from math import atan2\n",
     "\n",
     "def bearing_kernel(lat1, lon1, lat2, lon2, out):\n",
@@ -423,20 +411,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%%time\n",
-    "\n",
-    "df = df.apply_rows(bearing_kernel,\n",
-    "                   incols=['lat1', 'lon1', 'lat2', 'lon2'],\n",
-    "                   outcols=dict(out=np.float64),\n",
-    "                   kwargs=dict())"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
     "print(df.head())\n",
     "\n"
    ]
@@ -577,6 +551,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "#modify the code in this cell\n",
     "from math import atan2\n",
     "\n",
     "\n",
@@ -606,22 +581,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%%time\n",
-    "df = df.apply_chunks(bearing_kernel,\n",
-    "                     incols=['lat1', 'lon1', 'lat2', 'lon2'],\n",
-    "                     outcols=dict(out=np.float64),\n",
-    "                     kwargs=dict(),\n",
-    "                     chunks=16,\n",
-    "                     tpb=8)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "\n",
     "print(df.head())"
    ]
   },

+ 21 - 603
ai/RAPIDS/English/Python/jupyter_notebook/CuDF/Backup.ipynb

@@ -56,7 +56,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -100,243 +100,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>TMC</th>\n",
-       "      <th>Severity</th>\n",
-       "      <th>Start_Lat</th>\n",
-       "      <th>Start_Lng</th>\n",
-       "      <th>End_Lat</th>\n",
-       "      <th>End_Lng</th>\n",
-       "      <th>Distance(mi)</th>\n",
-       "      <th>Number</th>\n",
-       "      <th>Temperature(F)</th>\n",
-       "      <th>Wind_Chill(F)</th>\n",
-       "      <th>Humidity(%)</th>\n",
-       "      <th>Pressure(in)</th>\n",
-       "      <th>Visibility(mi)</th>\n",
-       "      <th>Wind_Speed(mph)</th>\n",
-       "      <th>Precipitation(in)</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>count</th>\n",
-       "      <td>2.478818e+06</td>\n",
-       "      <td>3.513617e+06</td>\n",
-       "      <td>3.513617e+06</td>\n",
-       "      <td>3.513617e+06</td>\n",
-       "      <td>1.034799e+06</td>\n",
-       "      <td>1.034799e+06</td>\n",
-       "      <td>3.513617e+06</td>\n",
-       "      <td>1.250753e+06</td>\n",
-       "      <td>3.447885e+06</td>\n",
-       "      <td>1.645368e+06</td>\n",
-       "      <td>3.443930e+06</td>\n",
-       "      <td>3.457735e+06</td>\n",
-       "      <td>3.437761e+06</td>\n",
-       "      <td>3.059008e+06</td>\n",
-       "      <td>1.487743e+06</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>mean</th>\n",
-       "      <td>2.080226e+02</td>\n",
-       "      <td>2.339929e+00</td>\n",
-       "      <td>3.654194e+01</td>\n",
-       "      <td>-9.579151e+01</td>\n",
-       "      <td>3.755758e+01</td>\n",
-       "      <td>-1.004560e+02</td>\n",
-       "      <td>2.816170e-01</td>\n",
-       "      <td>5.975383e+03</td>\n",
-       "      <td>6.193512e+01</td>\n",
-       "      <td>5.355730e+01</td>\n",
-       "      <td>6.511427e+01</td>\n",
-       "      <td>2.974463e+01</td>\n",
-       "      <td>9.122644e+00</td>\n",
-       "      <td>8.219025e+00</td>\n",
-       "      <td>1.598300e-02</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>std</th>\n",
-       "      <td>2.076627e+01</td>\n",
-       "      <td>5.521930e-01</td>\n",
-       "      <td>4.883520e+00</td>\n",
-       "      <td>1.736877e+01</td>\n",
-       "      <td>4.861215e+00</td>\n",
-       "      <td>1.852879e+01</td>\n",
-       "      <td>1.550134e+00</td>\n",
-       "      <td>1.496624e+04</td>\n",
-       "      <td>1.862106e+01</td>\n",
-       "      <td>2.377334e+01</td>\n",
-       "      <td>2.275558e+01</td>\n",
-       "      <td>8.319760e-01</td>\n",
-       "      <td>2.885879e+00</td>\n",
-       "      <td>5.262847e+00</td>\n",
-       "      <td>1.928260e-01</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>min</th>\n",
-       "      <td>2.000000e+02</td>\n",
-       "      <td>1.000000e+00</td>\n",
-       "      <td>2.455527e+01</td>\n",
-       "      <td>-1.246238e+02</td>\n",
-       "      <td>2.457011e+01</td>\n",
-       "      <td>-1.244978e+02</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "      <td>-8.900000e+01</td>\n",
-       "      <td>-8.900000e+01</td>\n",
-       "      <td>1.000000e+00</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>25%</th>\n",
-       "      <td>2.010000e+02</td>\n",
-       "      <td>2.000000e+00</td>\n",
-       "      <td>3.363784e+01</td>\n",
-       "      <td>-1.174418e+02</td>\n",
-       "      <td>3.399477e+01</td>\n",
-       "      <td>-1.183440e+02</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "      <td>8.640000e+02</td>\n",
-       "      <td>5.000000e+01</td>\n",
-       "      <td>3.570000e+01</td>\n",
-       "      <td>4.800000e+01</td>\n",
-       "      <td>2.973000e+01</td>\n",
-       "      <td>1.000000e+01</td>\n",
-       "      <td>5.000000e+00</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>50%</th>\n",
-       "      <td>2.010000e+02</td>\n",
-       "      <td>2.000000e+00</td>\n",
-       "      <td>3.591687e+01</td>\n",
-       "      <td>-9.102601e+01</td>\n",
-       "      <td>3.779736e+01</td>\n",
-       "      <td>-9.703438e+01</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "      <td>2.798000e+03</td>\n",
-       "      <td>6.400000e+01</td>\n",
-       "      <td>5.700000e+01</td>\n",
-       "      <td>6.700000e+01</td>\n",
-       "      <td>2.995000e+01</td>\n",
-       "      <td>1.000000e+01</td>\n",
-       "      <td>7.000000e+00</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>75%</th>\n",
-       "      <td>2.010000e+02</td>\n",
-       "      <td>3.000000e+00</td>\n",
-       "      <td>4.032217e+01</td>\n",
-       "      <td>-8.093299e+01</td>\n",
-       "      <td>4.105139e+01</td>\n",
-       "      <td>-8.210168e+01</td>\n",
-       "      <td>1.000000e-02</td>\n",
-       "      <td>7.098000e+03</td>\n",
-       "      <td>7.590000e+01</td>\n",
-       "      <td>7.200000e+01</td>\n",
-       "      <td>8.400000e+01</td>\n",
-       "      <td>3.009000e+01</td>\n",
-       "      <td>1.000000e+01</td>\n",
-       "      <td>1.150000e+01</td>\n",
-       "      <td>0.000000e+00</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>max</th>\n",
-       "      <td>4.060000e+02</td>\n",
-       "      <td>4.000000e+00</td>\n",
-       "      <td>4.900220e+01</td>\n",
-       "      <td>-6.711317e+01</td>\n",
-       "      <td>4.907500e+01</td>\n",
-       "      <td>-6.710924e+01</td>\n",
-       "      <td>3.336300e+02</td>\n",
-       "      <td>9.999997e+06</td>\n",
-       "      <td>1.706000e+02</td>\n",
-       "      <td>1.150000e+02</td>\n",
-       "      <td>1.000000e+02</td>\n",
-       "      <td>5.774000e+01</td>\n",
-       "      <td>1.400000e+02</td>\n",
-       "      <td>9.840000e+02</td>\n",
-       "      <td>2.500000e+01</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "                TMC      Severity     Start_Lat     Start_Lng       End_Lat  \\\n",
-       "count  2.478818e+06  3.513617e+06  3.513617e+06  3.513617e+06  1.034799e+06   \n",
-       "mean   2.080226e+02  2.339929e+00  3.654194e+01 -9.579151e+01  3.755758e+01   \n",
-       "std    2.076627e+01  5.521930e-01  4.883520e+00  1.736877e+01  4.861215e+00   \n",
-       "min    2.000000e+02  1.000000e+00  2.455527e+01 -1.246238e+02  2.457011e+01   \n",
-       "25%    2.010000e+02  2.000000e+00  3.363784e+01 -1.174418e+02  3.399477e+01   \n",
-       "50%    2.010000e+02  2.000000e+00  3.591687e+01 -9.102601e+01  3.779736e+01   \n",
-       "75%    2.010000e+02  3.000000e+00  4.032217e+01 -8.093299e+01  4.105139e+01   \n",
-       "max    4.060000e+02  4.000000e+00  4.900220e+01 -6.711317e+01  4.907500e+01   \n",
-       "\n",
-       "            End_Lng  Distance(mi)        Number  Temperature(F)  \\\n",
-       "count  1.034799e+06  3.513617e+06  1.250753e+06    3.447885e+06   \n",
-       "mean  -1.004560e+02  2.816170e-01  5.975383e+03    6.193512e+01   \n",
-       "std    1.852879e+01  1.550134e+00  1.496624e+04    1.862106e+01   \n",
-       "min   -1.244978e+02  0.000000e+00  0.000000e+00   -8.900000e+01   \n",
-       "25%   -1.183440e+02  0.000000e+00  8.640000e+02    5.000000e+01   \n",
-       "50%   -9.703438e+01  0.000000e+00  2.798000e+03    6.400000e+01   \n",
-       "75%   -8.210168e+01  1.000000e-02  7.098000e+03    7.590000e+01   \n",
-       "max   -6.710924e+01  3.336300e+02  9.999997e+06    1.706000e+02   \n",
-       "\n",
-       "       Wind_Chill(F)   Humidity(%)  Pressure(in)  Visibility(mi)  \\\n",
-       "count   1.645368e+06  3.443930e+06  3.457735e+06    3.437761e+06   \n",
-       "mean    5.355730e+01  6.511427e+01  2.974463e+01    9.122644e+00   \n",
-       "std     2.377334e+01  2.275558e+01  8.319760e-01    2.885879e+00   \n",
-       "min    -8.900000e+01  1.000000e+00  0.000000e+00    0.000000e+00   \n",
-       "25%     3.570000e+01  4.800000e+01  2.973000e+01    1.000000e+01   \n",
-       "50%     5.700000e+01  6.700000e+01  2.995000e+01    1.000000e+01   \n",
-       "75%     7.200000e+01  8.400000e+01  3.009000e+01    1.000000e+01   \n",
-       "max     1.150000e+02  1.000000e+02  5.774000e+01    1.400000e+02   \n",
-       "\n",
-       "       Wind_Speed(mph)  Precipitation(in)  \n",
-       "count     3.059008e+06       1.487743e+06  \n",
-       "mean      8.219025e+00       1.598300e-02  \n",
-       "std       5.262847e+00       1.928260e-01  \n",
-       "min       0.000000e+00       0.000000e+00  \n",
-       "25%       5.000000e+00       0.000000e+00  \n",
-       "50%       7.000000e+00       0.000000e+00  \n",
-       "75%       1.150000e+01       0.000000e+00  \n",
-       "max       9.840000e+02       2.500000e+01  "
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "df.describe()"
    ]
@@ -350,20 +116,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "3513617"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "len(df)"
    ]
@@ -379,72 +134,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "<class 'cudf.core.dataframe.DataFrame'>\n",
-      "RangeIndex: 3513617 entries, 0 to 3513616\n",
-      "Data columns (total 49 columns):\n",
-      " #   Column                 Dtype\n",
-      "---  ------                 -----\n",
-      " 0   ID                     object\n",
-      " 1   Source                 object\n",
-      " 2   TMC                    float64\n",
-      " 3   Severity               int64\n",
-      " 4   Start_Time             object\n",
-      " 5   End_Time               object\n",
-      " 6   Start_Lat              float64\n",
-      " 7   Start_Lng              float64\n",
-      " 8   End_Lat                float64\n",
-      " 9   End_Lng                float64\n",
-      " 10  Distance(mi)           float64\n",
-      " 11  Description            object\n",
-      " 12  Number                 float64\n",
-      " 13  Street                 object\n",
-      " 14  Side                   object\n",
-      " 15  City                   object\n",
-      " 16  County                 object\n",
-      " 17  State                  object\n",
-      " 18  Zipcode                object\n",
-      " 19  Country                object\n",
-      " 20  Timezone               object\n",
-      " 21  Airport_Code           object\n",
-      " 22  Weather_Timestamp      object\n",
-      " 23  Temperature(F)         float64\n",
-      " 24  Wind_Chill(F)          float64\n",
-      " 25  Humidity(%)            float64\n",
-      " 26  Pressure(in)           float64\n",
-      " 27  Visibility(mi)         float64\n",
-      " 28  Wind_Direction         object\n",
-      " 29  Wind_Speed(mph)        float64\n",
-      " 30  Precipitation(in)      float64\n",
-      " 31  Weather_Condition      object\n",
-      " 32  Amenity                bool\n",
-      " 33  Bump                   bool\n",
-      " 34  Crossing               bool\n",
-      " 35  Give_Way               bool\n",
-      " 36  Junction               bool\n",
-      " 37  No_Exit                bool\n",
-      " 38  Railway                bool\n",
-      " 39  Roundabout             bool\n",
-      " 40  Station                bool\n",
-      " 41  Stop                   bool\n",
-      " 42  Traffic_Calming        bool\n",
-      " 43  Traffic_Signal         bool\n",
-      " 44  Turning_Loop           bool\n",
-      " 45  Sunrise_Sunset         object\n",
-      " 46  Civil_Twilight         object\n",
-      " 47  Nautical_Twilight      object\n",
-      " 48  Astronomical_Twilight  object\n",
-      "dtypes: bool(13), float64(14), int64(1), object(21)\n",
-      "memory usage: 1.4+ GB\n"
-     ]
-    }
-   ],
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "df.info()"
    ]
@@ -458,69 +150,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "ID                             0\n",
-       "Source                         0\n",
-       "TMC                      1034799\n",
-       "Severity                       0\n",
-       "Start_Time                     0\n",
-       "End_Time                       0\n",
-       "Start_Lat                      0\n",
-       "Start_Lng                      0\n",
-       "End_Lat                  2478818\n",
-       "End_Lng                  2478818\n",
-       "Distance(mi)                   0\n",
-       "Description                    1\n",
-       "Number                   2262864\n",
-       "Street                         0\n",
-       "Side                           0\n",
-       "City                         112\n",
-       "County                         0\n",
-       "State                          0\n",
-       "Zipcode                     1069\n",
-       "Country                        0\n",
-       "Timezone                    3880\n",
-       "Airport_Code                6758\n",
-       "Weather_Timestamp          43323\n",
-       "Temperature(F)             65732\n",
-       "Wind_Chill(F)            1868249\n",
-       "Humidity(%)                69687\n",
-       "Pressure(in)               55882\n",
-       "Visibility(mi)             75856\n",
-       "Wind_Direction             58874\n",
-       "Wind_Speed(mph)           454609\n",
-       "Precipitation(in)        2025874\n",
-       "Weather_Condition          76138\n",
-       "Amenity                        0\n",
-       "Bump                           0\n",
-       "Crossing                       0\n",
-       "Give_Way                       0\n",
-       "Junction                       0\n",
-       "No_Exit                        0\n",
-       "Railway                        0\n",
-       "Roundabout                     0\n",
-       "Station                        0\n",
-       "Stop                           0\n",
-       "Traffic_Calming                0\n",
-       "Traffic_Signal                 0\n",
-       "Turning_Loop                   0\n",
-       "Sunrise_Sunset               115\n",
-       "Civil_Twilight               115\n",
-       "Nautical_Twilight            115\n",
-       "Astronomical_Twilight        115\n",
-       "dtype: uint64"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "df.isna().sum()"
    ]
@@ -534,7 +166,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -543,7 +175,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -586,7 +218,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -596,7 +228,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -633,7 +265,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -670,223 +302,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Source</th>\n",
-       "      <th>TMC</th>\n",
-       "      <th>Severity</th>\n",
-       "      <th>Start_Lat</th>\n",
-       "      <th>Start_Lng</th>\n",
-       "      <th>End_Lat</th>\n",
-       "      <th>End_Lng</th>\n",
-       "      <th>Distance(mi)</th>\n",
-       "      <th>Temperature(F)</th>\n",
-       "      <th>Humidity(%)</th>\n",
-       "      <th>...</th>\n",
-       "      <th>WC_Thunderstorm</th>\n",
-       "      <th>WC_Thunderstorms and Rain</th>\n",
-       "      <th>WC_Thunderstorms and Snow</th>\n",
-       "      <th>WC_Tornado</th>\n",
-       "      <th>WC_Volcanic Ash</th>\n",
-       "      <th>WC_Widespread Dust</th>\n",
-       "      <th>WC_Widespread Dust / Windy</th>\n",
-       "      <th>WC_Wintry Mix</th>\n",
-       "      <th>WC_Wintry Mix / Windy</th>\n",
-       "      <th>out</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>MapQuest</td>\n",
-       "      <td>201.0</td>\n",
-       "      <td>3</td>\n",
-       "      <td>39.865147</td>\n",
-       "      <td>-84.058723</td>\n",
-       "      <td>37.557578</td>\n",
-       "      <td>-100.455981</td>\n",
-       "      <td>0.01</td>\n",
-       "      <td>36.9</td>\n",
-       "      <td>91.0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1443.524390</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>MapQuest</td>\n",
-       "      <td>201.0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>39.928059</td>\n",
-       "      <td>-82.831184</td>\n",
-       "      <td>37.557578</td>\n",
-       "      <td>-100.455981</td>\n",
-       "      <td>0.01</td>\n",
-       "      <td>37.9</td>\n",
-       "      <td>100.0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1548.467903</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>MapQuest</td>\n",
-       "      <td>201.0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>39.063148</td>\n",
-       "      <td>-84.032608</td>\n",
-       "      <td>37.557578</td>\n",
-       "      <td>-100.455981</td>\n",
-       "      <td>0.01</td>\n",
-       "      <td>36.0</td>\n",
-       "      <td>100.0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1440.697621</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>MapQuest</td>\n",
-       "      <td>201.0</td>\n",
-       "      <td>3</td>\n",
-       "      <td>39.747753</td>\n",
-       "      <td>-84.205582</td>\n",
-       "      <td>37.557578</td>\n",
-       "      <td>-100.455981</td>\n",
-       "      <td>0.01</td>\n",
-       "      <td>35.1</td>\n",
-       "      <td>96.0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1429.927497</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>MapQuest</td>\n",
-       "      <td>201.0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>39.627781</td>\n",
-       "      <td>-84.188354</td>\n",
-       "      <td>37.557578</td>\n",
-       "      <td>-100.455981</td>\n",
-       "      <td>0.01</td>\n",
-       "      <td>36.0</td>\n",
-       "      <td>89.0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1430.383177</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>5 rows × 1930 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "     Source    TMC  Severity  Start_Lat  Start_Lng    End_Lat     End_Lng  \\\n",
-       "0  MapQuest  201.0         3  39.865147 -84.058723  37.557578 -100.455981   \n",
-       "1  MapQuest  201.0         2  39.928059 -82.831184  37.557578 -100.455981   \n",
-       "2  MapQuest  201.0         2  39.063148 -84.032608  37.557578 -100.455981   \n",
-       "3  MapQuest  201.0         3  39.747753 -84.205582  37.557578 -100.455981   \n",
-       "4  MapQuest  201.0         2  39.627781 -84.188354  37.557578 -100.455981   \n",
-       "\n",
-       "   Distance(mi)  Temperature(F)  Humidity(%)  ...  WC_Thunderstorm  \\\n",
-       "0          0.01            36.9         91.0  ...                0   \n",
-       "1          0.01            37.9        100.0  ...                0   \n",
-       "2          0.01            36.0        100.0  ...                0   \n",
-       "3          0.01            35.1         96.0  ...                0   \n",
-       "4          0.01            36.0         89.0  ...                0   \n",
-       "\n",
-       "   WC_Thunderstorms and Rain  WC_Thunderstorms and Snow  WC_Tornado  \\\n",
-       "0                          0                          0           0   \n",
-       "1                          0                          0           0   \n",
-       "2                          0                          0           0   \n",
-       "3                          0                          0           0   \n",
-       "4                          0                          0           0   \n",
-       "\n",
-       "   WC_Volcanic Ash  WC_Widespread Dust  WC_Widespread Dust / Windy  \\\n",
-       "0                0                   0                           0   \n",
-       "1                0                   0                           0   \n",
-       "2                0                   0                           0   \n",
-       "3                0                   0                           0   \n",
-       "4                0                   0                           0   \n",
-       "\n",
-       "   WC_Wintry Mix  WC_Wintry Mix / Windy          out  \n",
-       "0              0                      0  1443.524390  \n",
-       "1              0                      0  1548.467903  \n",
-       "2              0                      0  1440.697621  \n",
-       "3              0                      0  1429.927497  \n",
-       "4              0                      0  1430.383177  \n",
-       "\n",
-       "[5 rows x 1930 columns]"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
     "df.head()"
    ]

ファイルの差分が大きいため隠しています
+ 10 - 39
ai/RAPIDS/English/Python/jupyter_notebook/CuML/01-LinearRegression-Hyperparam.ipynb


+ 0 - 515
ai/RAPIDS/English/Python/jupyter_notebook/CuML/02-SGD.ipynb

@@ -1,515 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&ensp;\n",
-    "[Home Page](../START_HERE.ipynb)\n",
-    "\n",
-    "[Previous Notebook](01-LinearRegression-Hyperparam.ipynb)\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "[1](01-LinearRegression-Hyperparam.ipynb)\n",
-    "[2]\n",
-    "[3](03_CuML_Exercise.ipynb)\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "[Next Notebook](03_CuML_Exercise.ipynb)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Mini Batch SGD classifier and regressor\n",
-    "Mini Batch SGD (MBSGD) models are linear models which are fitted by minimizing a regularized empirical loss with mini-batch SGD. In this notebook we compare the performance of cuMl's MBSGD classifier and regressor models with their respective scikit-learn counterparts.\n",
-    "\n",
-    "The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames as the input.\n",
-    "\n",
-    "For information about cuDF, refer to the cuDF documentation: https://rapidsai.github.io/projects/cudf/en/latest/"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Here is the list of exercises and modules in the lab:\n",
-    "\n",
-    "- <a href='#ex1'>Define Parameters</a><br> First we will define the data and model parameters, as we will be generating the data based on them later and creating a model to fit on the data.\n",
-    "- <a href='#ex2'>Generate Data</a><br> We will generate the data on the host device and then make them available to GPU using CuDF dataframes.\n",
-    "- <a href='#ex3'>Scikit-learn model</a><br> Here we create the MBSGD model in Scikit-learn for easy conversion to CuML format later.\n",
-    "- <a href='#ex4'>CuML model</a><br> Now we will convert the Scikit-learn implementation to CuML.\n",
-    "- <a href='#ex5'>Evaluate Results</a><br> Evaluate the performance of both models with respect to speed and accuracy.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import cudf as gd\n",
-    "import cuml\n",
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "import sklearn\n",
-    "\n",
-    "from sklearn import linear_model\n",
-    "from sklearn.datasets.samples_generator import make_classification, make_regression\n",
-    "from sklearn.metrics import accuracy_score, r2_score\n",
-    "from sklearn.model_selection import train_test_split"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id='ex1'></a>\n",
-    "\n",
-    "## Define parameters\n",
-    "\n",
-    "### Data parameters"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "num_samples = 2**13\n",
-    "num_features = 300\n",
-    "n_informative = 270\n",
-    "random_state = 0\n",
-    "train_size = 0.8\n",
-    "datatype = np.float32"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Model parameters\n",
-    "\n",
-    "- learning_ratestr, default=’optimal’\n",
-    "The learning rate schedule:\n",
-    "\n",
-    "    ‘constant’: eta = eta0\n",
-    "\n",
-    "    ‘optimal’: eta = 1.0 / (alpha * (t + t0)) where t0 is chosen by a heuristic proposed by Leon Bottou.\n",
-    "\n",
-    "    ‘invscaling’: eta = eta0 / pow(t, power_t)\n",
-    "\n",
-    "    ‘adaptive’: eta = eta0, as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol if early_stopping is True, the current learning rate is divided by 5.\n",
-    "    \n",
-    "- penalty{‘l2’, ‘l1’, ‘elasticnet’}, default=’l2’\n",
-    "The penalty (aka regularization term) to be used. Defaults to ‘l2’ which is the standard regularizer for linear SVM models. ‘l1’ and ‘elasticnet’ might bring sparsity to the model (feature selection) not achievable with ‘l2’.\n",
-    "\n",
-    "- eta0 double, default=0.0\n",
-    "The initial learning rate for the ‘constant’, ‘invscaling’ or ‘adaptive’ schedules. The default value is 0.0 as eta0 is not used by the default schedule ‘optimal’.\n",
-    "\n",
-    "- max_iter int, default=1000\n",
-    "The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the partial_fit method.\n",
-    "\n",
-    "- fit_intercept bool, default=True\n",
-    "Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.\n",
-    "\n",
-    "- tol float, default=1e-3\n",
-    "The stopping criterion. If it is not None, training will stop when (loss > best_loss - tol) for n_iter_no_change consecutive epochs. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "learning_rate = 'constant'\n",
-    "penalty = 'elasticnet'\n",
-    "eta0 = 0.005\n",
-    "max_iter = 100\n",
-    "fit_intercept = True\n",
-    "tol=0.0\n",
-    "batch_size=2"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id='ex2'></a>\n",
-    "\n",
-    "## Generate data\n",
-    "\n",
-    "### Host"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "X_class, y_class = make_classification(n_samples=num_samples, n_features=num_features,\n",
-    "                                       n_informative=n_informative, random_state=random_state)\n",
-    "# change the datatype of the input data\n",
-    "X_class = X_class.astype(datatype)\n",
-    "y_class = y_class.astype(datatype)\n",
-    "\n",
-    "# convert numpy arrays to pandas dataframe\n",
-    "X_class = pd.DataFrame(X_class)\n",
-    "y_class = pd.DataFrame(y_class)\n",
-    "\n",
-    "X_class_train, X_class_test, y_class_train, y_class_test = train_test_split(X_class, y_class,\n",
-    "                                                                            train_size=train_size,\n",
-    "                                                                            random_state=random_state)\n",
-    "X_reg, y_reg = make_regression(n_samples=num_samples, n_features=num_features,\n",
-    "                               n_informative=n_informative, random_state=random_state)\n",
-    "\n",
-    "# change the datatype of the input data\n",
-    "X_reg = X_reg.astype(datatype)\n",
-    "y_reg = y_reg.astype(datatype)\n",
-    "\n",
-    "# convert numpy arrays to pandas dataframe\n",
-    "X_reg = pd.DataFrame(X_reg)\n",
-    "y_reg = pd.DataFrame(y_reg)\n",
-    "\n",
-    "X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(X_reg, y_reg,\n",
-    "                                                                    train_size=train_size,\n",
-    "                                                                    random_state=random_state)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### GPU"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "# classification dataset\n",
-    "X_class_cudf = gd.DataFrame.from_pandas(X_class_train)\n",
-    "X_class_cudf_test = gd.DataFrame.from_pandas(X_class_test)\n",
-    "\n",
-    "y_class_cudf = gd.Series(y_class_train.values[:,0])\n",
-    "\n",
-    "# regression dataset\n",
-    "X_reg_cudf = gd.DataFrame.from_pandas(X_reg_train)\n",
-    "X_reg_cudf_test = gd.DataFrame.from_pandas(X_reg_test)\n",
-    "\n",
-    "y_reg_cudf = gd.Series(y_reg_train.values[:,0])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id='ex3'></a>\n",
-    "\n",
-    "## Scikit-learn Model\n",
-    "\n",
-    "### Classification :\n",
-    "\n",
-    "#### Fit"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "skl_sgd_classifier = sklearn.linear_model.SGDClassifier(learning_rate=learning_rate,\n",
-    "                                                        eta0=eta0,\n",
-    "                                                        max_iter=max_iter,\n",
-    "                                                        fit_intercept=fit_intercept,\n",
-    "                                                        tol=tol,\n",
-    "                                                        penalty=penalty,\n",
-    "                                                        random_state=random_state)\n",
-    "\n",
-    "skl_sgd_classifier.fit(X_class_train, y_class_train)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Predict"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "skl_class_pred = skl_sgd_classifier.predict(X_class_test)\n",
-    "skl_class_acc = accuracy_score(skl_class_pred, y_class_test)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Scikit-learn Model\n",
-    "\n",
-    "### Regression :\n",
-    "\n",
-    "#### Fit"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "skl_sgd_regressor = sklearn.linear_model.SGDRegressor(learning_rate=learning_rate,\n",
-    "                                                      eta0=eta0,\n",
-    "                                                      max_iter=max_iter,\n",
-    "                                                      fit_intercept=fit_intercept,\n",
-    "                                                      tol=tol,\n",
-    "                                                      penalty=penalty,\n",
-    "                                                      random_state=random_state)\n",
-    "\n",
-    "skl_sgd_regressor.fit(X_reg_train, y_reg_train)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Predict"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "skl_reg_pred = skl_sgd_regressor.predict(X_reg_test)\n",
-    "skl_reg_r2 = r2_score(skl_reg_pred, y_reg_test)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id='ex4'></a>\n",
-    "\n",
-    "## cuML Model\n",
-    "\n",
-    "### Classification:\n",
-    "\n",
-    "#### Fit"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "cu_mbsgd_classifier = cuml.linear_model.MBSGDClassifier(learning_rate=learning_rate,\n",
-    "                                                        eta0=eta0,\n",
-    "                                                        epochs=max_iter,\n",
-    "                                                        fit_intercept=fit_intercept,\n",
-    "                                                        batch_size=batch_size,\n",
-    "                                                        tol=tol,\n",
-    "                                                        penalty=penalty)\n",
-    "\n",
-    "cu_mbsgd_classifier.fit(X_class_cudf, y_class_cudf)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Predict"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "cu_class_pred = cu_mbsgd_classifier.predict(X_class_cudf_test).to_array()\n",
-    "cu_class_acc = accuracy_score(cu_class_pred, y_class_test)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Regression:\n",
-    "\n",
-    "#### Fit"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "cu_mbsgd_regressor = cuml.linear_model.MBSGDRegressor(learning_rate=learning_rate,\n",
-    "                                                      eta0=eta0,\n",
-    "                                                      epochs=max_iter,\n",
-    "                                                      fit_intercept=fit_intercept,\n",
-    "                                                      batch_size=batch_size,\n",
-    "                                                      tol=tol,\n",
-    "                                                      penalty=penalty)\n",
-    "\n",
-    "cu_mbsgd_regressor.fit(X_reg_cudf, y_reg_cudf)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Predict"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "cu_reg_pred = cu_mbsgd_regressor.predict(X_reg_cudf_test).to_array()\n",
-    "cu_reg_r2 = r2_score(cu_reg_pred, y_reg_test)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id='ex5'></a>\n",
-    "\n",
-    "## Evaluate Results\n",
-    "\n",
-    "### Classification"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print(\"Sklearn's R^2 score for classification : %s\" % skl_class_acc)\n",
-    "print(\"cuML's R^2 score for classification : %s\" % cu_class_acc)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Regression"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print(\"Sklearn's R^2 score for regression : %s\" % skl_reg_r2)\n",
-    "print(\"cuML's R^2 score for regression : %s\" % cu_reg_r2)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Licensing\n",
-    "  \n",
-    "This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Conclusion\n",
-    "\n",
-    "Now we know how to create both simple and complex machine learning models and deal with different data types using CuML and CUDf. If you would like to explore more models refer to the documentation here: https://docs.rapids.ai/api/cuml/stable/api.html#regression-and-classification or try out our bonus lab [here](Bonus_Lab-LogisticRegression.ipynb). If you are feeling fairly confident about CuML now, head over to the next lab which will test your skills with an interesting exercise."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "[Previous Notebook](01-LinearRegression-Hyperparam.ipynb)\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "[1](01-LinearRegression-Hyperparam.ipynb)\n",
-    "[2]\n",
-    "[3](03_CuML_Exercise.ipynb)\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "[Next Notebook](03_CuML_Exercise.ipynb)\n",
-    "\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
-    "&emsp;&emsp;&ensp;\n",
-    "[Home Page](../START_HERE.ipynb)"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.2"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}

+ 6 - 9
ai/RAPIDS/English/Python/jupyter_notebook/CuML/03_CuML_Exercise.ipynb

@@ -12,14 +12,13 @@
     "&emsp;&emsp;&ensp;\n",
     "[Home Page](../START_HERE.ipynb)\n",
     "\n",
-    "[Previous Notebook](02-SGD.ipynb)\n",
+    "[Previous Notebook](01-LinearRegression-Hyperparam.ipynb)\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "[1](01-LinearRegression-Hyperparam.ipynb)\n",
-    "[2](02-SGD.ipynb)\n",
-    "[3]"
+    "[2]"
    ]
   },
   {
@@ -190,7 +189,7 @@
    "outputs": [],
    "source": [
     "%%time\n",
-    "link to label encoder\n",
+    "#link to label encoder\n",
     "label_encoder = preprocessing.LabelEncoder() \n",
     "df['County']= label_encoder.fit_transform(df['County']) \n",
     "df['State']= label_encoder.fit_transform(df['State'])\n",
@@ -339,9 +338,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#Convert the data to CuDF dataframes here\n",
-    "\n",
     "%%time\n",
+    "#Convert the data to CuDF dataframes here\n",
     "X_cudf_train = cudf.DataFrame.from_pandas(X_train)\n",
     "X_cudf_test = cudf.DataFrame.from_pandas(X_test)\n",
     "\n",
@@ -696,14 +694,13 @@
     "&emsp;&emsp;&ensp;\n",
     "[Home Page](../START_HERE.ipynb)\n",
     "\n",
-    "[Previous Notebook](02-SGD.ipynb)\n",
+    "[Previous Notebook](01-LinearRegression-Hyperparam.ipynb)\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "[1](01-LinearRegression-Hyperparam.ipynb)\n",
-    "[2](02-SGD.ipynb)\n",
-    "[3]"
+    "[2]"
    ]
   }
  ],

+ 5 - 10
ai/RAPIDS/English/Python/jupyter_notebook/CuML/04_CuML_Solution.ipynb

@@ -18,9 +18,7 @@
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "[1](01-LinearRegression-Hyperparam.ipynb)\n",
-    "[2](02-SGD.ipynb)\n",
-    "[3](03_CuML_Exercise.ipynb)\n",
-    "[4]"
+    "[2](03_CuML_Exercise.ipynb)"
    ]
   },
   {
@@ -478,7 +476,7 @@
    ],
    "source": [
     "%%time\n",
-    "link to label encoder\n",
+    "#link to label encoder\n",
     "label_encoder = preprocessing.LabelEncoder() \n",
     "df['County']= label_encoder.fit_transform(df['County']) \n",
     "df['State']= label_encoder.fit_transform(df['State'])\n",
@@ -669,9 +667,8 @@
     }
    ],
    "source": [
-    "#Convert the data to CuDF dataframes here\n",
-    "\n",
     "%%time\n",
+    "#Convert the data to CuDF dataframes here\n",
     "X_cudf_train = cudf.DataFrame.from_pandas(X_train)\n",
     "X_cudf_test = cudf.DataFrame.from_pandas(X_test)\n",
     "\n",
@@ -1207,9 +1204,7 @@
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "[1](01-LinearRegression-Hyperparam.ipynb)\n",
-    "[2](02-SGD.ipynb)\n",
-    "[3](03_CuML_Exercise.ipynb)\n",
-    "[4]\n",
+    "[2](03_CuML_Exercise.ipynb)\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
@@ -1242,7 +1237,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

+ 58 - 574
ai/RAPIDS/English/Python/jupyter_notebook/CuML/Backup.ipynb

@@ -18,9 +18,7 @@
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "[1](01-LinearRegression-Hyperparam.ipynb)\n",
-    "[2](02-SGD.ipynb)\n",
-    "[3](03_CuML_Exercise.ipynb)\n",
-    "[4]"
+    "[2](03_CuML_Exercise.ipynb)"
    ]
   },
   {
@@ -51,18 +49,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "NumPy Version: 1.19.2\n",
-      "Scikit-Learn Version: 0.23.1\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "import matplotlib.pyplot as plt\n",
     "import numpy as np; print('NumPy Version:', np.__version__)\n",
@@ -130,71 +119,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 43 ms, sys: 10.1 ms, total: 53.1 ms\n",
-      "Wall time: 52.5 ms\n",
-      "       Unnamed: 0  Source    TMC  Severity  Start_Lat   Start_Lng    End_Lat  \\\n",
-      "0               0       1  201.0         3  39.865147  -84.058723  37.557578   \n",
-      "1               1       1  201.0         2  39.928059  -82.831184  37.557578   \n",
-      "2               2       1  201.0         2  39.063148  -84.032608  37.557578   \n",
-      "3               3       1  201.0         3  39.747753  -84.205582  37.557578   \n",
-      "4               4       1  201.0         2  39.627781  -84.188354  37.557578   \n",
-      "...           ...     ...    ...       ...        ...         ...        ...   \n",
-      "17317       17317       1  201.0         3  37.396164 -121.907578  37.557578   \n",
-      "17318       17318       1  201.0         3  37.825649 -122.304092  37.557578   \n",
-      "17319       17319       1  201.0         2  36.979454 -121.909035  37.557578   \n",
-      "17320       17320       1  201.0         2  37.314030 -121.827065  37.557578   \n",
-      "17321       17321       1  201.0         3  37.758404 -122.212173  37.557578   \n",
-      "\n",
-      "          End_Lng  Distance(mi)       County  ... Station  Stop  \\\n",
-      "0     -100.455981          0.01   Montgomery  ...     0.0   0.0   \n",
-      "1     -100.455981          0.01     Franklin  ...     0.0   0.0   \n",
-      "2     -100.455981          0.01     Clermont  ...     0.0   0.0   \n",
-      "3     -100.455981          0.01   Montgomery  ...     0.0   0.0   \n",
-      "4     -100.455981          0.01   Montgomery  ...     0.0   0.0   \n",
-      "...           ...           ...          ...  ...     ...   ...   \n",
-      "17317 -100.455981          0.01  Santa Clara  ...     0.0   0.0   \n",
-      "17318 -100.455981          0.01      Alameda  ...     0.0   0.0   \n",
-      "17319 -100.455981          0.00   Santa Cruz  ...     0.0   0.0   \n",
-      "17320 -100.455981          0.01  Santa Clara  ...     0.0   0.0   \n",
-      "17321 -100.455981          0.01      Alameda  ...     NaN   NaN   \n",
-      "\n",
-      "       Traffic_Calming  Traffic_Signal  Turning_Loop Sunrise_Sunset  \\\n",
-      "0                  0.0             0.0           0.0            0.0   \n",
-      "1                  0.0             0.0           0.0            0.0   \n",
-      "2                  0.0             0.0           0.0            0.0   \n",
-      "3                  0.0             0.0           0.0            0.0   \n",
-      "4                  0.0             0.0           0.0            1.0   \n",
-      "...                ...             ...           ...            ...   \n",
-      "17317              0.0             0.0           0.0            1.0   \n",
-      "17318              0.0             0.0           0.0            1.0   \n",
-      "17319              0.0             0.0           0.0            1.0   \n",
-      "17320              0.0             0.0           0.0            1.0   \n",
-      "17321              NaN             NaN           NaN            NaN   \n",
-      "\n",
-      "       Civil_Twilight  Nautical_Twilight  Astronomical_Twilight  cov_distance  \n",
-      "0                 0.0                0.0                    0.0   1443.524390  \n",
-      "1                 0.0                0.0                    1.0   1548.467903  \n",
-      "2                 0.0                1.0                    1.0   1440.697621  \n",
-      "3                 1.0                1.0                    1.0   1429.927497  \n",
-      "4                 1.0                1.0                    1.0   1430.383177  \n",
-      "...               ...                ...                    ...           ...  \n",
-      "17317             1.0                1.0                    1.0   1888.935551  \n",
-      "17318             1.0                1.0                    1.0   1918.251042  \n",
-      "17319             1.0                1.0                    1.0   1895.341155  \n",
-      "17320             1.0                1.0                    1.0   1883.025767  \n",
-      "17321             NaN                NaN                    NaN           NaN  \n",
-      "\n",
-      "[17322 rows x 34 columns]\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "%time df = pd.read_csv('../../data/data_proc.csv')\n",
     "print(df)"
@@ -209,7 +136,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -225,216 +152,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Source</th>\n",
-       "      <th>TMC</th>\n",
-       "      <th>Severity</th>\n",
-       "      <th>Start_Lat</th>\n",
-       "      <th>Start_Lng</th>\n",
-       "      <th>End_Lat</th>\n",
-       "      <th>End_Lng</th>\n",
-       "      <th>Distance(mi)</th>\n",
-       "      <th>County</th>\n",
-       "      <th>State</th>\n",
-       "      <th>...</th>\n",
-       "      <th>Station</th>\n",
-       "      <th>Stop</th>\n",
-       "      <th>Traffic_Calming</th>\n",
-       "      <th>Traffic_Signal</th>\n",
-       "      <th>Turning_Loop</th>\n",
-       "      <th>Sunrise_Sunset</th>\n",
-       "      <th>Civil_Twilight</th>\n",
-       "      <th>Nautical_Twilight</th>\n",
-       "      <th>Astronomical_Twilight</th>\n",
-       "      <th>cov_distance</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>1</td>\n",
-       "      <td>201.0</td>\n",
-       "      <td>3</td>\n",
-       "      <td>39.865147</td>\n",
-       "      <td>-84.058723</td>\n",
-       "      <td>37.557578</td>\n",
-       "      <td>-100.455981</td>\n",
-       "      <td>0.01</td>\n",
-       "      <td>Montgomery</td>\n",
-       "      <td>OH</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1443.524390</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>1</td>\n",
-       "      <td>201.0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>39.928059</td>\n",
-       "      <td>-82.831184</td>\n",
-       "      <td>37.557578</td>\n",
-       "      <td>-100.455981</td>\n",
-       "      <td>0.01</td>\n",
-       "      <td>Franklin</td>\n",
-       "      <td>OH</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1548.467903</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>1</td>\n",
-       "      <td>201.0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>39.063148</td>\n",
-       "      <td>-84.032608</td>\n",
-       "      <td>37.557578</td>\n",
-       "      <td>-100.455981</td>\n",
-       "      <td>0.01</td>\n",
-       "      <td>Clermont</td>\n",
-       "      <td>OH</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1440.697621</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>1</td>\n",
-       "      <td>201.0</td>\n",
-       "      <td>3</td>\n",
-       "      <td>39.747753</td>\n",
-       "      <td>-84.205582</td>\n",
-       "      <td>37.557578</td>\n",
-       "      <td>-100.455981</td>\n",
-       "      <td>0.01</td>\n",
-       "      <td>Montgomery</td>\n",
-       "      <td>OH</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1429.927497</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>1</td>\n",
-       "      <td>201.0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>39.627781</td>\n",
-       "      <td>-84.188354</td>\n",
-       "      <td>37.557578</td>\n",
-       "      <td>-100.455981</td>\n",
-       "      <td>0.01</td>\n",
-       "      <td>Montgomery</td>\n",
-       "      <td>OH</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1430.383177</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>5 rows × 33 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   Source    TMC  Severity  Start_Lat  Start_Lng    End_Lat     End_Lng  \\\n",
-       "0       1  201.0         3  39.865147 -84.058723  37.557578 -100.455981   \n",
-       "1       1  201.0         2  39.928059 -82.831184  37.557578 -100.455981   \n",
-       "2       1  201.0         2  39.063148 -84.032608  37.557578 -100.455981   \n",
-       "3       1  201.0         3  39.747753 -84.205582  37.557578 -100.455981   \n",
-       "4       1  201.0         2  39.627781 -84.188354  37.557578 -100.455981   \n",
-       "\n",
-       "   Distance(mi)      County State  ...  Station  Stop  Traffic_Calming  \\\n",
-       "0          0.01  Montgomery    OH  ...      0.0   0.0              0.0   \n",
-       "1          0.01    Franklin    OH  ...      0.0   0.0              0.0   \n",
-       "2          0.01    Clermont    OH  ...      0.0   0.0              0.0   \n",
-       "3          0.01  Montgomery    OH  ...      0.0   0.0              0.0   \n",
-       "4          0.01  Montgomery    OH  ...      0.0   0.0              0.0   \n",
-       "\n",
-       "   Traffic_Signal Turning_Loop  Sunrise_Sunset  Civil_Twilight  \\\n",
-       "0             0.0          0.0             0.0             0.0   \n",
-       "1             0.0          0.0             0.0             0.0   \n",
-       "2             0.0          0.0             0.0             0.0   \n",
-       "3             0.0          0.0             0.0             1.0   \n",
-       "4             0.0          0.0             1.0             1.0   \n",
-       "\n",
-       "   Nautical_Twilight  Astronomical_Twilight  cov_distance  \n",
-       "0                0.0                    0.0   1443.524390  \n",
-       "1                0.0                    1.0   1548.467903  \n",
-       "2                1.0                    1.0   1440.697621  \n",
-       "3                1.0                    1.0   1429.927497  \n",
-       "4                1.0                    1.0   1430.383177  \n",
-       "\n",
-       "[5 rows x 33 columns]"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "df.head()"
    ]
@@ -448,7 +168,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -464,21 +184,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 14.6 ms, sys: 515 µs, total: 15.1 ms\n",
-      "Wall time: 14.9 ms\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "%%time\n",
-    "link to label encoder\n",
+    "#link to label encoder\n",
     "label_encoder = preprocessing.LabelEncoder() \n",
     "df['County']= label_encoder.fit_transform(df['County']) \n",
     "df['State']= label_encoder.fit_transform(df['State'])\n",
@@ -536,33 +247,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 30.1 ms, sys: 4.45 ms, total: 34.6 ms\n",
-      "Wall time: 34.1 ms\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "Severity\n",
-       "1    10584\n",
-       "2    10584\n",
-       "3    10584\n",
-       "4    10584\n",
-       "Name: Severity, dtype: int64"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "%%time\n",
     "# Class Balancing | Using Up Sampling\n",
@@ -597,18 +284,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 7.11 ms, sys: 5.16 ms, total: 12.3 ms\n",
-      "Wall time: 11.5 ms\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "%%time\n",
     "# Set the target for the prediction\n",
@@ -640,7 +318,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -656,22 +334,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 855 ms, sys: 445 ms, total: 1.3 s\n",
-      "Wall time: 1.31 s\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
+    "%%time\n",
     "#Convert the data to CuDF dataframes here\n",
     "\n",
-    "%%time\n",
     "X_cudf_train = cudf.DataFrame.from_pandas(X_train)\n",
     "X_cudf_test = cudf.DataFrame.from_pandas(X_test)\n",
     "\n",
@@ -704,42 +373,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 20 s, sys: 50.9 s, total: 1min 10s\n",
-      "Wall time: 1.9 s\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/opt/conda/envs/rapids/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "LogisticRegression()"
-      ]
-     },
-     "execution_count": 19,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "%%time\n",
     "clf = skLogistic()\n",
@@ -755,19 +391,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.501299110306275\n",
-      "CPU times: user 81.7 ms, sys: 193 ms, total: 275 ms\n",
-      "Wall time: 7.28 ms\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "%%time\n",
     "print(clf.score(X_test, y_test))"
@@ -786,33 +412,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[E] [23:59:24.749199] L-BFGS line search failed\n",
-      "CPU times: user 686 ms, sys: 2.11 s, total: 2.8 s\n",
-      "Wall time: 74 ms\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "LogisticRegression(penalty='l2', tol=0.0001, C=1.0, fit_intercept=True, max_iter=1000, linesearch_max_iter=50, verbose=4, l1_ratio=None, solver='qn', handle=<cuml.raft.common.handle.Handle object at 0x7fd97c0ee1f0>, output_type='cudf')"
-      ]
-     },
-     "execution_count": 21,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
+    "%%time\n",
     "#Modify the code in this cell\n",
     "\n",
-    "%%time\n",
     "reg = LogisticRegression()\n",
     "reg.fit() # Pass the train cudf dataframes as arguments here"
    ]
@@ -826,23 +432,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.24864183366298676\n",
-      "CPU times: user 171 ms, sys: 523 ms, total: 695 ms\n",
-      "Wall time: 18.4 ms\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
+    "%%time\n",
     "#Modify the code in this cell\n",
     "\n",
-    "%%time\n",
+    "\n",
     "print(reg.score())  # Pass the test cudf dataframes as arguments here"
    ]
   },
@@ -861,28 +458,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 522 ms, sys: 4.43 ms, total: 527 ms\n",
-      "Wall time: 526 ms\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "KNeighborsClassifier(n_neighbors=3)"
-      ]
-     },
-     "execution_count": 31,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "%%time\n",
     "neigh = KNeighborsClassifier(n_neighbors=3)\n",
@@ -898,19 +476,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 32,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.8876466419966932\n",
-      "CPU times: user 1.15 s, sys: 5.22 ms, total: 1.15 s\n",
-      "Wall time: 1.15 s\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "%%time\n",
     "print(neigh.score(X_test, y_test))"
@@ -929,32 +497,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 14.5 ms, sys: 2.39 ms, total: 16.8 ms\n",
-      "Wall time: 16 ms\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "KNeighborsClassifier(weights='uniform')"
-      ]
-     },
-     "execution_count": 33,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
+    "%%time\n",
     "#Modify the code in this cell\n",
     "\n",
-    "%%time\n",
+    "\n",
     "knn = KNeighborsC(n_neighbors=10)\n",
     "knn.fit() # Pass the train cudf dataframes as arguments here"
    ]
@@ -968,23 +518,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.8689079880714417\n",
-      "CPU times: user 22.1 ms, sys: 126 ms, total: 148 ms\n",
-      "Wall time: 148 ms\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
+    "%%time\n",
     "#Modify the code in this cell\n",
     "\n",
-    "%%time\n",
+    "\n",
     "print(knn.score()) # Pass the test cudf dataframes as arguments here"
    ]
   },
@@ -1004,28 +545,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 163 ms, sys: 62.1 ms, total: 225 ms\n",
-      "Wall time: 226 ms\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "ElasticNet(alpha=1.0, l1_ratio=0.5, fit_intercept=True, normalize=False, max_iter=1000, tol=0.001, selection='cyclic', handle=<cuml.raft.common.handle.Handle object at 0x7fd97c163210>, output_type='numpy', verbose=4)"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "%%time\n",
     "regr = ElasticNet()\n",
@@ -1041,19 +563,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.22519596677633613\n",
-      "CPU times: user 5.97 ms, sys: 2.98 ms, total: 8.96 ms\n",
-      "Wall time: 8.11 ms\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "%%time\n",
     "X_test = X_test.astype(np.float64)\n",
@@ -1074,32 +586,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CPU times: user 126 ms, sys: 3.94 ms, total: 130 ms\n",
-      "Wall time: 129 ms\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "ElasticNet(alpha=1.0, l1_ratio=0.5, fit_intercept=True, normalize=False, max_iter=1000, tol=0.001, selection='cyclic', handle=<cuml.raft.common.handle.Handle object at 0x7fd97c152b70>, output_type='cudf', verbose=4)"
-      ]
-     },
-     "execution_count": 13,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
+    "%%time\n",
     "#Modify the code in this cell\n",
     "\n",
-    "%%time\n",
+    "\n",
     "enet = ElasticNet()\n",
     "\n",
     "enet.fit() # Pass the train cudf dataframes as arguments here"
@@ -1114,23 +608,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.22519596677633613\n",
-      "CPU times: user 6.12 ms, sys: 2.09 ms, total: 8.21 ms\n",
-      "Wall time: 7.49 ms\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
+    "%%time\n",
     "#Modify the code in this cell\n",
     "\n",
-    "%%time\n",
+    "\n",
     "X_cudf_test = X_cudf_test.astype(np.float64)\n",
     "y_cudf_test = y_cudf_test.astype(np.float64)\n",
     "print(enet.score()) # Pass the test cudf dataframes as arguments here"
@@ -1213,9 +698,8 @@
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "[1](01-LinearRegression-Hyperparam.ipynb)\n",
-    "[2](02-SGD.ipynb)\n",
-    "[3](03_CuML_Exercise.ipynb)\n",
-    "[4]\n",
+    "[2](03_CuML_Exercise.ipynb)\n",
+    "\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
     "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
@@ -1248,7 +732,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

+ 16 - 232
ai/RAPIDS/English/Python/jupyter_notebook/Dask/01-Intro_to_Dask.ipynb

@@ -74,41 +74,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<table style=\"border: 2px solid white;\">\n",
-       "<tr>\n",
-       "<td style=\"vertical-align: top; border: 0px solid white\">\n",
-       "<h3 style=\"text-align: left;\">Client</h3>\n",
-       "<ul style=\"text-align: left; list-style: none; margin: 0; padding: 0;\">\n",
-       "  <li><b>Scheduler: </b>tcp://127.0.0.1:32851</li>\n",
-       "  <li><b>Dashboard: </b><a href='http://127.0.0.1:8787/status' target='_blank'>http://127.0.0.1:8787/status</a></li>\n",
-       "</ul>\n",
-       "</td>\n",
-       "<td style=\"vertical-align: top; border: 0px solid white\">\n",
-       "<h3 style=\"text-align: left;\">Cluster</h3>\n",
-       "<ul style=\"text-align: left; list-style:none; margin: 0; padding: 0;\">\n",
-       "  <li><b>Workers: </b>2</li>\n",
-       "  <li><b>Cores: </b>2</li>\n",
-       "  <li><b>Memory: </b>270.12 GB</li>\n",
-       "</ul>\n",
-       "</td>\n",
-       "</tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "<Client: 'tcp://127.0.0.1:32851' processes=2 threads=2, memory=270.12 GB>"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "import dask\n",
     "from dask.distributed import Client, wait\n",
@@ -138,39 +106,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Thu Dec 17 13:04:09 2020       \n",
-      "+-----------------------------------------------------------------------------+\n",
-      "| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: 11.0     |\n",
-      "|-------------------------------+----------------------+----------------------+\n",
-      "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
-      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
-      "|                               |                      |               MIG M. |\n",
-      "|===============================+======================+======================|\n",
-      "|   0  Tesla V100-PCIE...  Off  | 00000000:18:00.0 Off |                    0 |\n",
-      "| N/A   65C    P0    48W / 250W |  13126MiB / 32510MiB |      0%      Default |\n",
-      "|                               |                      |                  N/A |\n",
-      "+-------------------------------+----------------------+----------------------+\n",
-      "|   1  Tesla V100-PCIE...  Off  | 00000000:AF:00.0 Off |                    0 |\n",
-      "| N/A   41C    P0    42W / 250W |  14026MiB / 32510MiB |      0%      Default |\n",
-      "|                               |                      |                  N/A |\n",
-      "+-------------------------------+----------------------+----------------------+\n",
-      "                                                                               \n",
-      "+-----------------------------------------------------------------------------+\n",
-      "| Processes:                                                                  |\n",
-      "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
-      "|        ID   ID                                                   Usage      |\n",
-      "|=============================================================================|\n",
-      "+-----------------------------------------------------------------------------+\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "!nvidia-smi"
    ]
@@ -188,7 +126,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -200,13 +138,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "rs = da.random.RandomState(RandomState=cp.random.RandomState, seed=12)  # <-- we specify cupy here\n",
     "\n",
-    "x = rs.random((1000000, 1000), chunks=(10000,1000))\n",
+    "x = rs.random((100000, 1000), chunks=(10000,1000))\n",
     "x = x.persist() # so quick we don't need to wait"
    ]
   },
@@ -224,76 +162,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "<tr>\n",
-       "<td>\n",
-       "<table>\n",
-       "  <thead>\n",
-       "    <tr><td> </td><th> Array </th><th> Chunk </th></tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr><th> Bytes </th><td> 8.00 GB </td> <td> 80.00 MB </td></tr>\n",
-       "    <tr><th> Shape </th><td> (1000000, 1000) </td> <td> (10000, 1000) </td></tr>\n",
-       "    <tr><th> Count </th><td> 100 Tasks </td><td> 100 Chunks </td></tr>\n",
-       "    <tr><th> Type </th><td> float64 </td><td> cupy.ndarray </td></tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</td>\n",
-       "<td>\n",
-       "<svg width=\"75\" height=\"170\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
-       "\n",
-       "  <!-- Horizontal lines -->\n",
-       "  <line x1=\"0\" y1=\"0\" x2=\"25\" y2=\"0\" style=\"stroke-width:2\" />\n",
-       "  <line x1=\"0\" y1=\"6\" x2=\"25\" y2=\"6\" />\n",
-       "  <line x1=\"0\" y1=\"12\" x2=\"25\" y2=\"12\" />\n",
-       "  <line x1=\"0\" y1=\"18\" x2=\"25\" y2=\"18\" />\n",
-       "  <line x1=\"0\" y1=\"25\" x2=\"25\" y2=\"25\" />\n",
-       "  <line x1=\"0\" y1=\"31\" x2=\"25\" y2=\"31\" />\n",
-       "  <line x1=\"0\" y1=\"37\" x2=\"25\" y2=\"37\" />\n",
-       "  <line x1=\"0\" y1=\"43\" x2=\"25\" y2=\"43\" />\n",
-       "  <line x1=\"0\" y1=\"50\" x2=\"25\" y2=\"50\" />\n",
-       "  <line x1=\"0\" y1=\"56\" x2=\"25\" y2=\"56\" />\n",
-       "  <line x1=\"0\" y1=\"62\" x2=\"25\" y2=\"62\" />\n",
-       "  <line x1=\"0\" y1=\"68\" x2=\"25\" y2=\"68\" />\n",
-       "  <line x1=\"0\" y1=\"75\" x2=\"25\" y2=\"75\" />\n",
-       "  <line x1=\"0\" y1=\"81\" x2=\"25\" y2=\"81\" />\n",
-       "  <line x1=\"0\" y1=\"87\" x2=\"25\" y2=\"87\" />\n",
-       "  <line x1=\"0\" y1=\"93\" x2=\"25\" y2=\"93\" />\n",
-       "  <line x1=\"0\" y1=\"100\" x2=\"25\" y2=\"100\" />\n",
-       "  <line x1=\"0\" y1=\"106\" x2=\"25\" y2=\"106\" />\n",
-       "  <line x1=\"0\" y1=\"112\" x2=\"25\" y2=\"112\" />\n",
-       "  <line x1=\"0\" y1=\"120\" x2=\"25\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "\n",
-       "  <!-- Vertical lines -->\n",
-       "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "  <line x1=\"25\" y1=\"0\" x2=\"25\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "\n",
-       "  <!-- Colored Rectangle -->\n",
-       "  <polygon points=\"0.0,0.0 25.412616514582485,0.0 25.412616514582485,120.0 0.0,120.0\" style=\"fill:#8B4903A0;stroke-width:0\"/>\n",
-       "\n",
-       "  <!-- Text -->\n",
-       "  <text x=\"12.706308\" y=\"140.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >1000</text>\n",
-       "  <text x=\"45.412617\" y=\"60.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(-90,45.412617,60.000000)\">1000000</text>\n",
-       "</svg>\n",
-       "</td>\n",
-       "</tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "dask.array<random_sample, shape=(1000000, 1000), dtype=float64, chunksize=(10000, 1000), chunktype=cupy.ndarray>"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "x"
    ]
@@ -309,7 +180,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -318,76 +189,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "<tr>\n",
-       "<td>\n",
-       "<table>\n",
-       "  <thead>\n",
-       "    <tr><td> </td><th> Array </th><th> Chunk </th></tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr><th> Bytes </th><td> 8.00 GB </td> <td> 80.00 MB </td></tr>\n",
-       "    <tr><th> Shape </th><td> (1000000, 1000) </td> <td> (10000, 1000) </td></tr>\n",
-       "    <tr><th> Count </th><td> 875 Tasks </td><td> 100 Chunks </td></tr>\n",
-       "    <tr><th> Type </th><td> float64 </td><td> numpy.ndarray </td></tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</td>\n",
-       "<td>\n",
-       "<svg width=\"75\" height=\"170\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
-       "\n",
-       "  <!-- Horizontal lines -->\n",
-       "  <line x1=\"0\" y1=\"0\" x2=\"25\" y2=\"0\" style=\"stroke-width:2\" />\n",
-       "  <line x1=\"0\" y1=\"6\" x2=\"25\" y2=\"6\" />\n",
-       "  <line x1=\"0\" y1=\"12\" x2=\"25\" y2=\"12\" />\n",
-       "  <line x1=\"0\" y1=\"18\" x2=\"25\" y2=\"18\" />\n",
-       "  <line x1=\"0\" y1=\"25\" x2=\"25\" y2=\"25\" />\n",
-       "  <line x1=\"0\" y1=\"31\" x2=\"25\" y2=\"31\" />\n",
-       "  <line x1=\"0\" y1=\"37\" x2=\"25\" y2=\"37\" />\n",
-       "  <line x1=\"0\" y1=\"43\" x2=\"25\" y2=\"43\" />\n",
-       "  <line x1=\"0\" y1=\"50\" x2=\"25\" y2=\"50\" />\n",
-       "  <line x1=\"0\" y1=\"56\" x2=\"25\" y2=\"56\" />\n",
-       "  <line x1=\"0\" y1=\"62\" x2=\"25\" y2=\"62\" />\n",
-       "  <line x1=\"0\" y1=\"68\" x2=\"25\" y2=\"68\" />\n",
-       "  <line x1=\"0\" y1=\"75\" x2=\"25\" y2=\"75\" />\n",
-       "  <line x1=\"0\" y1=\"81\" x2=\"25\" y2=\"81\" />\n",
-       "  <line x1=\"0\" y1=\"87\" x2=\"25\" y2=\"87\" />\n",
-       "  <line x1=\"0\" y1=\"93\" x2=\"25\" y2=\"93\" />\n",
-       "  <line x1=\"0\" y1=\"100\" x2=\"25\" y2=\"100\" />\n",
-       "  <line x1=\"0\" y1=\"106\" x2=\"25\" y2=\"106\" />\n",
-       "  <line x1=\"0\" y1=\"112\" x2=\"25\" y2=\"112\" />\n",
-       "  <line x1=\"0\" y1=\"120\" x2=\"25\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "\n",
-       "  <!-- Vertical lines -->\n",
-       "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "  <line x1=\"25\" y1=\"0\" x2=\"25\" y2=\"120\" style=\"stroke-width:2\" />\n",
-       "\n",
-       "  <!-- Colored Rectangle -->\n",
-       "  <polygon points=\"0.0,0.0 25.412616514582485,0.0 25.412616514582485,120.0 0.0,120.0\" style=\"fill:#8B4903A0;stroke-width:0\"/>\n",
-       "\n",
-       "  <!-- Text -->\n",
-       "  <text x=\"12.706308\" y=\"140.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >1000</text>\n",
-       "  <text x=\"45.412617\" y=\"60.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(-90,45.412617,60.000000)\">1000000</text>\n",
-       "</svg>\n",
-       "</td>\n",
-       "</tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "dask.array<mul, shape=(1000000, 1000), dtype=float64, chunksize=(10000, 1000), chunktype=numpy.ndarray>"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "u"
    ]
@@ -401,7 +205,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -417,29 +221,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "array([[ 9.93120508e-04, -5.33913544e-04,  4.62268101e-05,\n",
-       "        -3.45476128e-04, -1.65744840e-03],\n",
-       "       [ 1.00547562e-03, -3.77065720e-04, -1.66230680e-03,\n",
-       "        -2.56230601e-04, -4.87001391e-04],\n",
-       "       [ 9.70568388e-04, -7.14696011e-04, -3.99702364e-04,\n",
-       "        -5.90555975e-04,  5.82605644e-04],\n",
-       "       [ 9.96828082e-04,  3.19535134e-04,  6.19697709e-04,\n",
-       "        -1.27881587e-03,  4.81231481e-04],\n",
-       "       [ 9.95186590e-04, -1.79692605e-03, -2.55216038e-04,\n",
-       "         4.35893465e-04, -6.94865772e-04]])"
-      ]
-     },
-     "execution_count": 12,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "u[:5, :5].compute()"
    ]
@@ -521,7 +305,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

+ 5 - 5
ai/RAPIDS/English/Python/jupyter_notebook/Dask/02-CuDF_and_Dask.ipynb

@@ -137,10 +137,10 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "df = cudf.DataFrame([('a', list(range(20))),\n",
-    "('b', list(reversed(range(20)))),\n",
-    "('c', list(range(20)))])\n",
-    "print(df)"
+    "df = cudf.DataFrame({'a': list(range(20)),\n",
+    "                     'b': list(reversed(range(20))),\n",
+    "                     'c': list(range(20))\n",
+    "                    })"
    ]
   },
   {
@@ -1381,7 +1381,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "df = cudf.read_parquet('example_output/temp_parquet/72706b163a0d4feb949005d22146ad83.parquet')\n",
+    "df = cudf.read_parquet('example_output/temp_parquet')\n",
     "print(df.to_pandas())"
    ]
   },

+ 1 - 1
ai/RAPIDS/English/Python/jupyter_notebook/Dask/03-CuML_and_Dask.ipynb

@@ -277,7 +277,7 @@
    "source": [
     "# Conclusion\n",
     "\n",
-    "We can see that the timing output reduces from 11 seconds to 4 seconds and the accuracy is consistent. We are familiar with CuML and Dask integration now. If you wish to explore this in detail, you can refer [here](https://github.com/rapidsai/cuml/tree/branch-0.18/notebooks). If you are confident with the concepts explained till now, you can move to the next lab and attempt to solve the Dask exercise."
+    "We can see that the timing output reduces and the accuracy is similar. We are familiar with CuML and Dask integration now. If you wish to explore this in detail, you can refer [here](https://github.com/rapidsai/cuml/tree/branch-0.18/notebooks). If you are confident with the concepts explained till now, you can move to the next lab and attempt to solve the Dask exercise."
    ]
   },
   {

+ 2 - 2
ai/RAPIDS/English/Python/jupyter_notebook/Dask/04-Challenge.ipynb

@@ -83,7 +83,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "n_samples = 100000\n",
+    "n_samples = 10000\n",
     "n_features = 2\n",
     "\n",
     "n_clusters = 5\n",
@@ -339,7 +339,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "n_samples = 1000000\n",
+    "n_samples = 100000\n",
     "n_features = 2\n",
     "\n",
     "n_total_partitions = len(list(client.has_what().keys()))"

+ 3 - 3
ai/RAPIDS/English/Python/jupyter_notebook/Dask/05-Challenge_Solution.ipynb

@@ -84,7 +84,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "n_samples = 100000\n",
+    "n_samples = 10000\n",
     "n_features = 2\n",
     "\n",
     "n_clusters = 5\n",
@@ -425,7 +425,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "n_samples = 1000000\n",
+    "n_samples = 100000\n",
     "n_features = 2\n",
     "\n",
     "n_total_partitions = len(list(client.has_what().keys()))"
@@ -719,7 +719,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,

ファイルの差分が大きいため隠しています
+ 33 - 128
ai/RAPIDS/English/Python/jupyter_notebook/Dask/Backup.ipynb


+ 0 - 1
ai/RAPIDS/English/Python/jupyter_notebook/START_HERE.ipynb

@@ -37,7 +37,6 @@
     "    \n",
     "- CuML\n",
     "    - [Linear Regression and Hyperparameter Tuning](CuML/01-LinearRegression-Hyperparam.ipynb)\n",
-    "    - [SGD Algorithm](CuML/02-SGD.ipynb)\n",
     "    - [Exercise](CuML/03_CuML_Exercise.ipynb)\n",
     "    - [Backup](CuML/Backup.ipynb)\n",
     "    - [Bonus Lab- Logistic Regression and CuPy](CuML/Bonus_Lab-LogisticRegression.ipynb)\n",

+ 7 - 3
ai/RAPIDS/README.MD

@@ -30,13 +30,13 @@ Start working on the lab by clicking on the `Start_Here.ipynb` notebook.
 ### Singularity Container
 
 To build the singularity container, run: 
-`sudo singularity build <image_name>.simg Singularity`
+`sudo singularity build --sandbox <image_name>.simg Singularity`
 
 and copy the files to your local machine to make sure changes are stored locally:
-`singularity run <image_name>.simg cp -rT /workspace ~/workspace`
+`singularity run --writable <image_name>.simg cp -rT /workspace ~/workspace`
 
 Then, run the container:
-`singularity run --nv <image_name>.simg jupyter lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace/jupyter_notebook`
+`singularity run --nv --writable <image_name>.simg /opt/conda/envs/rapids/bin/jupyter lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace/jupyter_notebook`
 
 Then, open the jupyter notebook in browser: http://localhost:8888
 Start working on the lab by clicking on the `Start_Here.ipynb` notebook.
@@ -47,4 +47,8 @@ Q. Cannot write to /tmp directory
 
 A. Some notebooks depend on writing logs to /tmp directory. While creating container make sure /tmp director is accesible with write permission to container. Else the user can also change the tmp directory location
 
+Q. Out of memory Error
+
+A. The bootcamp is designed considering a GPU with minimum 16 GB memory. The users can reduce the overall size of the array sizes to reduce the overall memory footprint if required based on GPU card RAM .
+
 # For more information about RAPIDS applications and Docker, please refer <a href="https://hub.docker.com/r/rapidsai/rapidsai/"> here</a>

+ 6 - 4
ai/RAPIDS/Singularity

@@ -1,15 +1,17 @@
 # Copyright (c) 2020 NVIDIA Corporation.  All rights reserved.
 
 Bootstrap: docker
-FROM: rapidsai/rapidsai-nightly:cuda10.2-runtime-ubuntu18.04-py3.7
+FROM: rapidsai/rapidsai:cuda10.1-runtime-ubuntu18.04-py3.7
 
 %environment
 %post
     apt-get update -y
-    apt-get install -y libsm6 libxext6 libxrender-dev git 
+    apt-get install -y libsm6 libxext6 libxrender-dev git
+    export PATH=/opt/conda/bin:/opt/conda/envs/rapids/bin/jupyter:$PATH
     pip install gdown
     python3 /workspace/source_code/dataset.py
-    
+    chmod 777 -R /workspace
+
 %files
     English/Python/* /workspace/
 
@@ -17,4 +19,4 @@ FROM: rapidsai/rapidsai-nightly:cuda10.2-runtime-ubuntu18.04-py3.7
     "$@"
 
 %labels
-    AUTHOR Infernolia
+    AUTHOR Infernolia