Browse Source

Updated Preliminaries, added new Kmeans examples, new notebook on blaze

Plus: updated dependencies
Valerio Maggio 10 years ago
parent
commit
049c302f92

+ 21 - 14
01 - Preliminaries.ipynb

@@ -53,9 +53,9 @@
     "- Vectorization\n",
     "- Using arrays in Conditions\n",
     "\n",
-    "** 15:30 - 16:00 ** (30 mins) Coffee Break\n",
+    "** 15:30 - 15:45 ** (15 mins) Coffee Break\n",
     "\n",
-    "** 16:00 - 16:30** (30 mins) Numpy Operations\n",
+    "** 15:45 - 16:15** (30 mins) Numpy Operations\n",
     "\n",
     "- Linear Algebra\n",
     "- Array and Matrix\n",
@@ -64,16 +64,18 @@
     "\n",
     "** PART 2 ** Advanced Numpy Functions and Applications (16:30 - 17:30)\n",
     "\n",
-    "** 16:30- 17:00 ** (30 mins) Data Processing\n",
+    "** 16:15- 16:55 ** (40 mins) Data Processing\n",
     "\n",
     "- File I/0\n",
     "- Data Processing\n",
     "- Memmap and Serialization\n",
     "- `numexpr`\n",
     "\n",
-    "** 12:55 - 13:25 ** Connecting Numpy with the Rest of the world\n",
+    "** 16:55 - 17:25 ** (30 mins) Numpy Application (Machine Learning)\n",
     "\n",
-    "- Machine Learning with scikit-learn\n",
+    "- Machine Learning Intro\n",
+    "- Clustering with scipy\n",
+    "- Clustering with scikit-learn\n",
     "\n",
     "** 17:25 - 17:30 ** A look at the future (of Numpy)"
    ]
@@ -131,7 +133,7 @@
     "\n",
     "The following command will install all required packages:\n",
     "\n",
-    "    $ conda install numpy scipy matplotlib scikit-learn ipython-notebook\n",
+    "    $ conda install numpy scipy matplotlib scikit-learn ipython-notebook numexpr\n",
     "    \n",
     "Alternatively, you can download and install the (very large) **Anaconda software distribution**, found at [https://store.continuum.io/]()."
    ]
@@ -157,7 +159,8 @@
     "    - `pip install scipy`\n",
     "    - `pip install matplotlib`\n",
     "    - `pip install \"ipython[all]\"  # don't forget the quotation!`\n",
-    "    - `pip install scikit-learn`"
+    "    - `pip install scikit-learn`\n",
+    "    - `pip install numexpr`"
    ]
   },
   {
@@ -248,7 +251,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "metadata": {
     "collapsed": true
    },
@@ -257,7 +260,7 @@
     "import numpy as np\n",
     "import scipy as sp\n",
     "import matplotlib.pyplot as plt\n",
-    "import pandas as pd\n",
+    "import numexpr as ne\n",
     "import sklearn"
    ]
   },
@@ -270,7 +273,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 2,
    "metadata": {
     "collapsed": false
    },
@@ -280,10 +283,11 @@
      "output_type": "stream",
      "text": [
       "numpy: 1.9.2\n",
-      "scipy: 0.15.1\n",
+      "scipy: 0.16.0\n",
       "matplotlib: 1.4.3\n",
-      "iPython: 3.2.0\n",
-      "scikit-learn: 0.16.1\n"
+      "iPython: 4.0.0\n",
+      "scikit-learn: 0.16.1\n",
+      "numexpr: 2.4.3\n"
      ]
     }
    ],
@@ -301,7 +305,10 @@
     "print('iPython:', IPython.__version__)\n",
     "\n",
     "import sklearn\n",
-    "print('scikit-learn:', sklearn.__version__)"
+    "print('scikit-learn:', sklearn.__version__)\n",
+    "\n",
+    "import numexpr\n",
+    "print('numexpr:', numexpr.__version__)"
    ]
   },
   {

File diff suppressed because it is too large
+ 286 - 65
07_0_MachineLearning_Data.ipynb


+ 83 - 0
08_A_look_at_the_future.ipynb

@@ -0,0 +1,83 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# The future of Numpy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The future of NumPy is **Blaze**, a new open source Python numerical library. \n",
+    "\n",
+    "<img src=\"images/blaze.png\" />\n",
+    "\n",
+    "Blaze is supposed to process *Big Data* better than NumPy ever can. \n",
+    "\n",
+    "Big Data is nowadays a sort of *buzzword*, and can be defined in many ways. \n",
+    "\n",
+    "Here, we will define Big Data as data that cannot be stored in memory or even on a single machine. \n",
+    "\n",
+    "Usually, the data is distributed amongst several servers. Blaze should also be able to handle large quantities of streaming data that is never stored."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Blaze Project**: [http://blaze.pydata.org/]()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Blaze is centered around **general multidimensional array** and **table abstractions**. \n",
+    "\n",
+    "The classes in Blaze represent different data types and data structures as found in the real world. \n",
+    "\n",
+    "Blaze has a generic computation engine that can process data spread out over multiple servers and send instructions to specialized low-level kernels."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Blaze extends NumPy to provide custom-defined data types and heterogeneous shapes. \n",
+    "\n",
+    "This, of course, allows for greater flexibility and ease of use.\n",
+    "\n",
+    "Blaze is designed around `arrays`. Just like the NumPy `ndarray`, Blaze offers metadata with extra computational information. The metadata defines how data is stored, (`heterogeneously`) typed and indexed as multidimensional arrays. \n",
+    "\n",
+    "Computation\n",
+    "can be performed on various hardware including heterogeneous clusters of CPUs and GPUs.\n",
+    "\n",
+    "Blaze has the ambition to become the NumPy of multiple node clusters and distributed computing. The main idea, just as with NumPy, is to focus on arrays and array operations while abstracting the messy details away."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.4.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}

+ 1 - 1
README.md

@@ -49,7 +49,7 @@ during the training:
 * numpy 1.9+
 * scipy 0.14+
 * scikit-learn 0.15+
-* pandas 0.8+
+* numexpr: 2.4.+
 
 
 ## Target Audience

BIN
images/blaze.png


BIN
images/scikit-learn.png


+ 33 - 0
utility/plot_clustering.py

@@ -0,0 +1,33 @@
+
+import matplotlib.pyplot as plt
+
+def plot_kmeans_clustering_results(c1, c2, c3, vq1, vq2, vq3):
+
+    # Setting plot limits
+    x1, x2 = -10, 10
+    y1, y2 = -10, 10
+
+    fig = plt.figure()
+    fig.subplots_adjust(hspace=0.1, wspace=0.1)
+
+    ax1 = fig.add_subplot(121, aspect='equal')
+    ax1.scatter(c1[:, 0], c1[:, 1], lw=0.5, color='#00CC00')
+    ax1.scatter(c2[:, 0], c2[:, 1], lw=0.5, color='#028E9B')
+    ax1.scatter(c3[:, 0], c3[:, 1], lw=0.5, color='#FF7800')
+    ax1.xaxis.set_visible(False)
+    ax1.yaxis.set_visible(False)
+    ax1.set_xlim(x1, x2)
+    ax1.set_ylim(y1, y2)
+    ax1.text(-9, 8, 'Original')
+
+    ax2 = fig.add_subplot(122, aspect='equal')
+    ax2.scatter(vqc1[:, 0], vqc1[:, 1], lw=0.5, color='#00CC00')
+    ax2.scatter(vqc2[:, 0], vqc2[:, 1], lw=0.5, color='#028E9B')
+    ax2.scatter(vqc3[:, 0], vqc3[:, 1], lw=0.5, color='#FF7800')
+    ax2.xaxis.set_visible(False)
+    ax2.yaxis.set_visible(False)
+    ax2.set_xlim(x1, x2)
+    ax2.set_ylim(y1, y2)
+    ax2.text(-9, 8, 'VQ identified')
+
+    return fig