瀏覽代碼

Updated Preliminaries, added new Kmeans examples, new notebook on blaze

Plus: updated dependencies
Valerio Maggio 10 年之前
父節點
當前提交
049c302f92
共有 7 個文件被更改,包括 424 次插入80 次删除
  1. 21 14
      01 - Preliminaries.ipynb
  2. 286 65
      07_0_MachineLearning_Data.ipynb
  3. 83 0
      08_A_look_at_the_future.ipynb
  4. 1 1
      README.md
  5. 二進制
      images/blaze.png
  6. 二進制
      images/scikit-learn.png
  7. 33 0
      utility/plot_clustering.py

+ 21 - 14
01 - Preliminaries.ipynb

@@ -53,9 +53,9 @@
     "- Vectorization\n",
     "- Using arrays in Conditions\n",
     "\n",
-    "** 15:30 - 16:00 ** (30 mins) Coffee Break\n",
+    "** 15:30 - 15:45 ** (15 mins) Coffee Break\n",
     "\n",
-    "** 16:00 - 16:30** (30 mins) Numpy Operations\n",
+    "** 15:45 - 16:15** (30 mins) Numpy Operations\n",
     "\n",
     "- Linear Algebra\n",
     "- Array and Matrix\n",
@@ -64,16 +64,18 @@
     "\n",
     "** PART 2 ** Advanced Numpy Functions and Applications (16:30 - 17:30)\n",
     "\n",
-    "** 16:30- 17:00 ** (30 mins) Data Processing\n",
+    "** 16:15- 16:55 ** (40 mins) Data Processing\n",
     "\n",
     "- File I/0\n",
     "- Data Processing\n",
     "- Memmap and Serialization\n",
     "- `numexpr`\n",
     "\n",
-    "** 12:55 - 13:25 ** Connecting Numpy with the Rest of the world\n",
+    "** 16:55 - 17:25 ** (30 mins) Numpy Application (Machine Learning)\n",
     "\n",
-    "- Machine Learning with scikit-learn\n",
+    "- Machine Learning Intro\n",
+    "- Clustering with scipy\n",
+    "- Clustering with scikit-learn\n",
     "\n",
     "** 17:25 - 17:30 ** A look at the future (of Numpy)"
    ]
@@ -131,7 +133,7 @@
     "\n",
     "The following command will install all required packages:\n",
     "\n",
-    "    $ conda install numpy scipy matplotlib scikit-learn ipython-notebook\n",
+    "    $ conda install numpy scipy matplotlib scikit-learn ipython-notebook numexpr\n",
     "    \n",
     "Alternatively, you can download and install the (very large) **Anaconda software distribution**, found at [https://store.continuum.io/]()."
    ]
@@ -157,7 +159,8 @@
     "    - `pip install scipy`\n",
     "    - `pip install matplotlib`\n",
     "    - `pip install \"ipython[all]\"  # don't forget the quotation!`\n",
-    "    - `pip install scikit-learn`"
+    "    - `pip install scikit-learn`\n",
+    "    - `pip install numexpr`"
    ]
   },
   {
@@ -248,7 +251,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "metadata": {
     "collapsed": true
    },
@@ -257,7 +260,7 @@
     "import numpy as np\n",
     "import scipy as sp\n",
     "import matplotlib.pyplot as plt\n",
-    "import pandas as pd\n",
+    "import numexpr as ne\n",
     "import sklearn"
    ]
   },
@@ -270,7 +273,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 2,
    "metadata": {
     "collapsed": false
    },
@@ -280,10 +283,11 @@
      "output_type": "stream",
      "text": [
       "numpy: 1.9.2\n",
-      "scipy: 0.15.1\n",
+      "scipy: 0.16.0\n",
       "matplotlib: 1.4.3\n",
-      "iPython: 3.2.0\n",
-      "scikit-learn: 0.16.1\n"
+      "iPython: 4.0.0\n",
+      "scikit-learn: 0.16.1\n",
+      "numexpr: 2.4.3\n"
      ]
     }
    ],
@@ -301,7 +305,10 @@
     "print('iPython:', IPython.__version__)\n",
     "\n",
     "import sklearn\n",
-    "print('scikit-learn:', sklearn.__version__)"
+    "print('scikit-learn:', sklearn.__version__)\n",
+    "\n",
+    "import numexpr\n",
+    "print('numexpr:', numexpr.__version__)"
    ]
   },
   {

文件差異過大導致無法顯示
+ 286 - 65
07_0_MachineLearning_Data.ipynb


+ 83 - 0
08_A_look_at_the_future.ipynb

@@ -0,0 +1,83 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# The future of Numpy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The future of NumPy is **Blaze**, a new open source Python numerical library. \n",
+    "\n",
+    "<img src=\"images/blaze.png\" />\n",
+    "\n",
+    "Blaze is supposed to process *Big Data* better than NumPy ever can. \n",
+    "\n",
+    "Big Data is nowadays a sort of *buzzword*, and can be defined in many ways. \n",
+    "\n",
+    "Here, we will define Big Data as data that cannot be stored in memory or even on a single machine. \n",
+    "\n",
+    "Usually, the data is distributed amongst several servers. Blaze should also be able to handle large quantities of streaming data that is never stored."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Blaze Project**: [http://blaze.pydata.org/]()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Blaze is centered around **general multidimensional array** and **table abstractions**. \n",
+    "\n",
+    "The classes in Blaze represent different data types and data structures as found in the real world. \n",
+    "\n",
+    "Blaze has a generic computation engine that can process data spread out over multiple servers and send instructions to specialized low-level kernels."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Blaze extends NumPy to provide custom-defined data types and heterogeneous shapes. \n",
+    "\n",
+    "This, of course, allows for greater flexibility and ease of use.\n",
+    "\n",
+    "Blaze is designed around `arrays`. Just like the NumPy `ndarray`, Blaze offers metadata with extra computational information. The metadata defines how data is stored, (`heterogeneously`) typed and indexed as multidimensional arrays. \n",
+    "\n",
+    "Computation\n",
+    "can be performed on various hardware including heterogeneous clusters of CPUs and GPUs.\n",
+    "\n",
+    "Blaze has the ambition to become the NumPy of multiple node clusters and distributed computing. The main idea, just as with NumPy, is to focus on arrays and array operations while abstracting the messy details away."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.4.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}

+ 1 - 1
README.md

@@ -49,7 +49,7 @@ during the training:
 * numpy 1.9+
 * scipy 0.14+
 * scikit-learn 0.15+
-* pandas 0.8+
+* numexpr: 2.4.+
 
 
 ## Target Audience

二進制
images/blaze.png


二進制
images/scikit-learn.png


+ 33 - 0
utility/plot_clustering.py

@@ -0,0 +1,33 @@
+
+import matplotlib.pyplot as plt
+
+def plot_kmeans_clustering_results(c1, c2, c3, vq1, vq2, vq3):
+
+    # Setting plot limits
+    x1, x2 = -10, 10
+    y1, y2 = -10, 10
+
+    fig = plt.figure()
+    fig.subplots_adjust(hspace=0.1, wspace=0.1)
+
+    ax1 = fig.add_subplot(121, aspect='equal')
+    ax1.scatter(c1[:, 0], c1[:, 1], lw=0.5, color='#00CC00')
+    ax1.scatter(c2[:, 0], c2[:, 1], lw=0.5, color='#028E9B')
+    ax1.scatter(c3[:, 0], c3[:, 1], lw=0.5, color='#FF7800')
+    ax1.xaxis.set_visible(False)
+    ax1.yaxis.set_visible(False)
+    ax1.set_xlim(x1, x2)
+    ax1.set_ylim(y1, y2)
+    ax1.text(-9, 8, 'Original')
+
+    ax2 = fig.add_subplot(122, aspect='equal')
+    ax2.scatter(vqc1[:, 0], vqc1[:, 1], lw=0.5, color='#00CC00')
+    ax2.scatter(vqc2[:, 0], vqc2[:, 1], lw=0.5, color='#028E9B')
+    ax2.scatter(vqc3[:, 0], vqc3[:, 1], lw=0.5, color='#FF7800')
+    ax2.xaxis.set_visible(False)
+    ax2.yaxis.set_visible(False)
+    ax2.set_xlim(x1, x2)
+    ax2.set_ylim(y1, y2)
+    ax2.text(-9, 8, 'VQ identified')
+
+    return fig