浏览代码

Updated Preliminaries, added new Kmeans examples, new notebook on blaze

Plus: updated dependencies
Valerio Maggio 10 年之前
父节点
当前提交
049c302f92
共有 7 个文件被更改,包括 424 次插入80 次删除
  1. 21 14
      01 - Preliminaries.ipynb
  2. 286 65
      07_0_MachineLearning_Data.ipynb
  3. 83 0
      08_A_look_at_the_future.ipynb
  4. 1 1
      README.md
  5. 二进制
      images/blaze.png
  6. 二进制
      images/scikit-learn.png
  7. 33 0
      utility/plot_clustering.py

+ 21 - 14
01 - Preliminaries.ipynb

@@ -53,9 +53,9 @@
     "- Vectorization\n",
     "- Vectorization\n",
     "- Using arrays in Conditions\n",
     "- Using arrays in Conditions\n",
     "\n",
     "\n",
-    "** 15:30 - 16:00 ** (30 mins) Coffee Break\n",
+    "** 15:30 - 15:45 ** (15 mins) Coffee Break\n",
     "\n",
     "\n",
-    "** 16:00 - 16:30** (30 mins) Numpy Operations\n",
+    "** 15:45 - 16:15** (30 mins) Numpy Operations\n",
     "\n",
     "\n",
     "- Linear Algebra\n",
     "- Linear Algebra\n",
     "- Array and Matrix\n",
     "- Array and Matrix\n",
@@ -64,16 +64,18 @@
     "\n",
     "\n",
     "** PART 2 ** Advanced Numpy Functions and Applications (16:30 - 17:30)\n",
     "** PART 2 ** Advanced Numpy Functions and Applications (16:30 - 17:30)\n",
     "\n",
     "\n",
-    "** 16:30- 17:00 ** (30 mins) Data Processing\n",
+    "** 16:15- 16:55 ** (40 mins) Data Processing\n",
     "\n",
     "\n",
     "- File I/0\n",
     "- File I/0\n",
     "- Data Processing\n",
     "- Data Processing\n",
     "- Memmap and Serialization\n",
     "- Memmap and Serialization\n",
     "- `numexpr`\n",
     "- `numexpr`\n",
     "\n",
     "\n",
-    "** 12:55 - 13:25 ** Connecting Numpy with the Rest of the world\n",
+    "** 16:55 - 17:25 ** (30 mins) Numpy Application (Machine Learning)\n",
     "\n",
     "\n",
-    "- Machine Learning with scikit-learn\n",
+    "- Machine Learning Intro\n",
+    "- Clustering with scipy\n",
+    "- Clustering with scikit-learn\n",
     "\n",
     "\n",
     "** 17:25 - 17:30 ** A look at the future (of Numpy)"
     "** 17:25 - 17:30 ** A look at the future (of Numpy)"
    ]
    ]
@@ -131,7 +133,7 @@
     "\n",
     "\n",
     "The following command will install all required packages:\n",
     "The following command will install all required packages:\n",
     "\n",
     "\n",
-    "    $ conda install numpy scipy matplotlib scikit-learn ipython-notebook\n",
+    "    $ conda install numpy scipy matplotlib scikit-learn ipython-notebook numexpr\n",
     "    \n",
     "    \n",
     "Alternatively, you can download and install the (very large) **Anaconda software distribution**, found at [https://store.continuum.io/]()."
     "Alternatively, you can download and install the (very large) **Anaconda software distribution**, found at [https://store.continuum.io/]()."
    ]
    ]
@@ -157,7 +159,8 @@
     "    - `pip install scipy`\n",
     "    - `pip install scipy`\n",
     "    - `pip install matplotlib`\n",
     "    - `pip install matplotlib`\n",
     "    - `pip install \"ipython[all]\"  # don't forget the quotation!`\n",
     "    - `pip install \"ipython[all]\"  # don't forget the quotation!`\n",
-    "    - `pip install scikit-learn`"
+    "    - `pip install scikit-learn`\n",
+    "    - `pip install numexpr`"
    ]
    ]
   },
   },
   {
   {
@@ -248,7 +251,7 @@
   },
   },
   {
   {
    "cell_type": "code",
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "metadata": {
    "metadata": {
     "collapsed": true
     "collapsed": true
    },
    },
@@ -257,7 +260,7 @@
     "import numpy as np\n",
     "import numpy as np\n",
     "import scipy as sp\n",
     "import scipy as sp\n",
     "import matplotlib.pyplot as plt\n",
     "import matplotlib.pyplot as plt\n",
-    "import pandas as pd\n",
+    "import numexpr as ne\n",
     "import sklearn"
     "import sklearn"
    ]
    ]
   },
   },
@@ -270,7 +273,7 @@
   },
   },
   {
   {
    "cell_type": "code",
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 2,
    "metadata": {
    "metadata": {
     "collapsed": false
     "collapsed": false
    },
    },
@@ -280,10 +283,11 @@
      "output_type": "stream",
      "output_type": "stream",
      "text": [
      "text": [
       "numpy: 1.9.2\n",
       "numpy: 1.9.2\n",
-      "scipy: 0.15.1\n",
+      "scipy: 0.16.0\n",
       "matplotlib: 1.4.3\n",
       "matplotlib: 1.4.3\n",
-      "iPython: 3.2.0\n",
-      "scikit-learn: 0.16.1\n"
+      "iPython: 4.0.0\n",
+      "scikit-learn: 0.16.1\n",
+      "numexpr: 2.4.3\n"
      ]
      ]
     }
     }
    ],
    ],
@@ -301,7 +305,10 @@
     "print('iPython:', IPython.__version__)\n",
     "print('iPython:', IPython.__version__)\n",
     "\n",
     "\n",
     "import sklearn\n",
     "import sklearn\n",
-    "print('scikit-learn:', sklearn.__version__)"
+    "print('scikit-learn:', sklearn.__version__)\n",
+    "\n",
+    "import numexpr\n",
+    "print('numexpr:', numexpr.__version__)"
    ]
    ]
   },
   },
   {
   {

文件差异内容过多而无法显示
+ 286 - 65
07_0_MachineLearning_Data.ipynb


+ 83 - 0
08_A_look_at_the_future.ipynb

@@ -0,0 +1,83 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# The future of Numpy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The future of NumPy is **Blaze**, a new open source Python numerical library. \n",
+    "\n",
+    "<img src=\"images/blaze.png\" />\n",
+    "\n",
+    "Blaze is supposed to process *Big Data* better than NumPy ever can. \n",
+    "\n",
+    "Big Data is nowadays a sort of *buzzword*, and can be defined in many ways. \n",
+    "\n",
+    "Here, we will define Big Data as data that cannot be stored in memory or even on a single machine. \n",
+    "\n",
+    "Usually, the data is distributed amongst several servers. Blaze should also be able to handle large quantities of streaming data that is never stored."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Blaze Project**: [http://blaze.pydata.org/]()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Blaze is centered around **general multidimensional array** and **table abstractions**. \n",
+    "\n",
+    "The classes in Blaze represent different data types and data structures as found in the real world. \n",
+    "\n",
+    "Blaze has a generic computation engine that can process data spread out over multiple servers and send instructions to specialized low-level kernels."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Blaze extends NumPy to provide custom-defined data types and heterogeneous shapes. \n",
+    "\n",
+    "This, of course, allows for greater flexibility and ease of use.\n",
+    "\n",
+    "Blaze is designed around `arrays`. Just like the NumPy `ndarray`, Blaze offers metadata with extra computational information. The metadata defines how data is stored, (`heterogeneously`) typed and indexed as multidimensional arrays. \n",
+    "\n",
+    "Computation\n",
+    "can be performed on various hardware including heterogeneous clusters of CPUs and GPUs.\n",
+    "\n",
+    "Blaze has the ambition to become the NumPy of multiple node clusters and distributed computing. The main idea, just as with NumPy, is to focus on arrays and array operations while abstracting the messy details away."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.4.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}

+ 1 - 1
README.md

@@ -49,7 +49,7 @@ during the training:
 * numpy 1.9+
 * numpy 1.9+
 * scipy 0.14+
 * scipy 0.14+
 * scikit-learn 0.15+
 * scikit-learn 0.15+
-* pandas 0.8+
+* numexpr: 2.4.+
 
 
 
 
 ## Target Audience
 ## Target Audience

二进制
images/blaze.png


二进制
images/scikit-learn.png


+ 33 - 0
utility/plot_clustering.py

@@ -0,0 +1,33 @@
+
+import matplotlib.pyplot as plt
+
+def plot_kmeans_clustering_results(c1, c2, c3, vq1, vq2, vq3):
+
+    # Setting plot limits
+    x1, x2 = -10, 10
+    y1, y2 = -10, 10
+
+    fig = plt.figure()
+    fig.subplots_adjust(hspace=0.1, wspace=0.1)
+
+    ax1 = fig.add_subplot(121, aspect='equal')
+    ax1.scatter(c1[:, 0], c1[:, 1], lw=0.5, color='#00CC00')
+    ax1.scatter(c2[:, 0], c2[:, 1], lw=0.5, color='#028E9B')
+    ax1.scatter(c3[:, 0], c3[:, 1], lw=0.5, color='#FF7800')
+    ax1.xaxis.set_visible(False)
+    ax1.yaxis.set_visible(False)
+    ax1.set_xlim(x1, x2)
+    ax1.set_ylim(y1, y2)
+    ax1.text(-9, 8, 'Original')
+
+    ax2 = fig.add_subplot(122, aspect='equal')
+    ax2.scatter(vqc1[:, 0], vqc1[:, 1], lw=0.5, color='#00CC00')
+    ax2.scatter(vqc2[:, 0], vqc2[:, 1], lw=0.5, color='#028E9B')
+    ax2.scatter(vqc3[:, 0], vqc3[:, 1], lw=0.5, color='#FF7800')
+    ax2.xaxis.set_visible(False)
+    ax2.yaxis.set_visible(False)
+    ax2.set_xlim(x1, x2)
+    ax2.set_ylim(y1, y2)
+    ax2.text(-9, 8, 'VQ identified')
+
+    return fig