{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What is Numpy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "NumPy is the fundamental package for scientific computing with Python. \n", "It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. \n", "It is implemented in C and Fortran so when calculations are **vectorized**, performance is very good.\n", "\n", "So, in a nutshell:\n", "\n", "* a powerful Python extension for N-dimensional array\n", "* a tool for integrating C/C++ and Fortran code\n", "* designed for scientific computation: linear algebra and Signal Analysis\n", "\n", "If you are a MATLAB® user we recommend to read [Numpy for MATLAB Users](http://www.scipy.org/NumPy_for_Matlab_Users) and [Benefit of Open Source Python versus commercial packages](http://www.scipy.org/NumPyProConPage). \n", "\n", "I'm a supporter of the **Open Science Movement**, thus I humbly suggest you to take a look at the [Science Code Manifesto](http://sciencecodemanifesto.org/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Getting Started with Numpy Arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "NumPy's main object is the **homogeneous** ***multidimensional array***. It is a table of elements (usually numbers), all of the same type. \n", "\n", "In Numpy dimensions are called **axes**. \n", "\n", "The number of axes is called **rank**. \n", "\n", "The most important attributes of an ndarray object are:\n", "\n", "* **ndarray.ndim** - the number of axes (dimensions) of the array. \n", "* **ndarray.shape** - the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m). \n", "* **ndarray.size** - the total number of elements of the array. \n", "* **ndarray.dtype** - numpy.int32, numpy.int16, and numpy.float64 are some examples. \n", "* **ndarray.itemsize** - the size in bytes of elements of the array. For example, elements of type float64 has itemsize 8 (=64/8) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To use `numpy` need to import the module it using of example:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import numpy as np # naming import convention" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Terminology Assumption" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "In the `numpy` package the terminology used for vectors, matrices and higher-dimensional data sets is *array*. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Reference Documentation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* On the web: [http://docs.scipy.org](http://docs.scipy.org)/\n", "\n", "* Interactive help:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "np.array?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If you're looking for something" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "np.lookfor('create array')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "np.con*?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Help is your friend" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Whenever in doubt, there is the `help` function to the rescue" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "scrolled": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# For example, try \n", "help(np.ndarray)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Numpy Array Object" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`NumPy` has a multidimensional array object called ndarray. It consists of two parts as follows:\n", " \n", " * The actual data\n", " * Some metadata describing the data\n", " \n", " \n", "The majority of array operations leave the raw data untouched. The only aspect that changes is the metadata." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Creating `numpy` arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "There are a number of ways to initialize new numpy arrays, for example from\n", "\n", "* a Python list or tuples\n", "* using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, etc." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### From lists" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For example, to create new vector and matrix arrays from Python lists we can use the `numpy.array` function." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v = np.array([1,2,3,4])\n", "v" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M = np.array([[1, 2], [3, 4]])\n", "M" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `v` and `M` objects are both of the type `ndarray` that the `numpy` module provides." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type of v: \n", "Type of M: \n" ] } ], "source": [ "print('Type of v: ', type(v))\n", "print('Type of M: ', type(M))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The difference between the `v` and `M` arrays is only their shapes. \n", "\n", "To do so, we could use the `numpy.shape` function:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Size of v: (4,)\n", "Size of M: (2, 2)\n" ] } ], "source": [ "print('Shape of v: ', np.shape(v))\n", "print('Shape of M: ', np.shape(M))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Alternatively, We can get information about the shape of an array by using the `ndarray.shape` **property** :" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "((4,), (2, 2))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v.shape, M.shape" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Equivalently, we can get information about the **size** of the two `ndarrays`, namely the *total number of elements* in the array." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Size of v: 4\n", "Size of M: 4\n" ] } ], "source": [ "print('Size of v:', v.size)\n", "print('Size of M:', M.size)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### More properties of the `numpy array`" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M.itemsize # bytes per element" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "32" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M.nbytes # number of bytes" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M.ndim # number of dimensions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Using array-generating functions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For larger arrays it is inpractical to initialize the data manually, using explicit python lists. \n", "\n", "Instead we can use one of the many **functions** in `numpy` that generates arrays of different forms. \n", "\n", "Some of the more common are: \n", "\n", "* `np.arange`; \n", "* `np.linspace`; \n", "* `np.logspace`; \n", "* `np.mgrid`;\n", "* `np.random.rand`;\n", "* `np.diag`;\n", "* `np.zeros`;\n", "* `np.ones`;\n", "* `np.empty`;\n", "* `np.tile`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `np.arange`" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5 6 7 8 9]\n" ] } ], "source": [ "x = np.arange(0, 10, 1) \n", "print(x)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ -1.00000000e+00 -9.00000000e-01 -8.00000000e-01 -7.00000000e-01\n", " -6.00000000e-01 -5.00000000e-01 -4.00000000e-01 -3.00000000e-01\n", " -2.00000000e-01 -1.00000000e-01 -2.22044605e-16 1.00000000e-01\n", " 2.00000000e-01 3.00000000e-01 4.00000000e-01 5.00000000e-01\n", " 6.00000000e-01 7.00000000e-01 8.00000000e-01 9.00000000e-01]\n" ] } ], "source": [ "# floating point step-wise range generatation\n", "x = np.arange(-1, 1, 0.1) \n", "print(x)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `np.linspace` and `np.logspace`" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0. , 0.41666667, 0.83333333, 1.25 ,\n", " 1.66666667, 2.08333333, 2.5 , 2.91666667,\n", " 3.33333333, 3.75 , 4.16666667, 4.58333333,\n", " 5. , 5.41666667, 5.83333333, 6.25 ,\n", " 6.66666667, 7.08333333, 7.5 , 7.91666667,\n", " 8.33333333, 8.75 , 9.16666667, 9.58333333, 10. ])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# using linspace, both end points **ARE included**\n", "np.linspace(0, 10, 25)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([ 1.00000000e+00, 2.27278564e+00, 5.16555456e+00,\n", " 1.17401982e+01, 2.66829540e+01, 6.06446346e+01,\n", " 1.37832255e+02, 3.13263169e+02, 7.11980032e+02,\n", " 1.61817799e+03])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.logspace(0, np.e**2, 10, base=np.e)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `np.mgrid`" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x, y = np.mgrid[0:5, 0:5] # similar to meshgrid in MATLAB" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([[0, 0, 0, 0, 0],\n", " [1, 1, 1, 1, 1],\n", " [2, 2, 2, 2, 2],\n", " [3, 3, 3, 3, 3],\n", " [4, 4, 4, 4, 4]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4]])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `np.random.rand` & `np.random.randn`" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 0.33658948, 0.28564552, 0.73183017, 0.7395105 , 0.66427382],\n", " [ 0.25942094, 0.43844615, 0.48250402, 0.24063916, 0.90171053],\n", " [ 0.51114245, 0.49587249, 0.61832302, 0.71996951, 0.22064571],\n", " [ 0.38625609, 0.44313367, 0.74975323, 0.57600147, 0.80771956],\n", " [ 0.84511666, 0.6064582 , 0.62365173, 0.62766319, 0.80129396]])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# uniform random numbers in [0,1]\n", "np.random.rand(5,5)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 0.65782724, 0.65168367, 0.58525852, 0.33781734, -0.00700978],\n", " [ 0.61574011, 0.59150639, -0.33797592, -0.2509655 , 0.77237429],\n", " [-0.15693266, -0.38377945, -0.28140147, 0.90558314, 0.25437408],\n", " [-1.136108 , 2.43964939, 0.28583627, -0.27540796, -0.57253111],\n", " [-0.79080395, 0.50525127, 2.1113386 , -0.33769711, -0.64914575]])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# standard normal distributed random numbers\n", "np.random.randn(5,5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `np.diag`" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0],\n", " [0, 2, 0],\n", " [0, 0, 3]])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a diagonal matrix\n", "np.diag([1,2,3])" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 0, 0],\n", " [0, 0, 2, 0],\n", " [0, 0, 0, 3],\n", " [0, 0, 0, 0]])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# diagonal with offset from the main diagonal\n", "np.diag([1,2,3], k=1) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `np.eye`" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 0., 0.],\n", " [ 0., 1., 0.],\n", " [ 0., 0., 1.]])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a diagonal matrix with ones on the main diagonal\n", "np.eye(3) # 3 is the " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `np.zeros` and `np.ones`" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 0., 0., 0.],\n", " [ 0., 0., 0.],\n", " [ 0., 0., 0.]])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros((3,3))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 1., 1.],\n", " [ 1., 1., 1.],\n", " [ 1., 1., 1.]])" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.ones((3, 3))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### DIY" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***Try by yourself*** the following commands:\n", "\n", " np.zeros((3,4))\n", " np.ones((3,4))\n", " np.empty((2,3))\n", " np.eye(5)\n", " np.diag(np.arange(5))\n", " np.tile(np.array([[6, 7], [8, 9]]), (2, 2))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## So, why is it useful then?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "So far the `numpy.ndarray` looks awefully much like a Python **list** (or **nested list**). \n", "\n", "*Why not simply use Python lists for computations instead of creating a new array type?*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "There are several reasons:\n", "\n", "* Python lists are very general. \n", " - They can contain any kind of object. \n", " - They are dynamically typed. \n", " - They do not support mathematical functions such as matrix and dot multiplications, etc. \n", " - Implementing such functions for Python lists would not be very efficient because of the dynamic typing.\n", " \n", " \n", "* Numpy arrays are **statically typed** and **homogeneous**. \n", " - The type of the elements is determined when array is created.\n", " \n", " \n", "* Numpy arrays are memory efficient.\n", " - Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "L = range(1000)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1000 loops, best of 3: 519 µs per loop\n" ] } ], "source": [ "%timeit [i**2 for i in L]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "a = np.arange(1000)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The slowest run took 1538.53 times longer than the fastest. This could mean that an intermediate result is being cached.\n", "100000 loops, best of 3: 1.98 µs per loop\n" ] } ], "source": [ "%timeit a**2" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The slowest run took 9.51 times longer than the fastest. This could mean that an intermediate result is being cached.\n", "1000 loops, best of 3: 332 µs per loop\n" ] } ], "source": [ "%timeit [element**2 for element in a]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Simple arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Create simple one and two dimensional arrays. First, redo the examples\n", "from above. And then create your own.\n", "\n", "* Use the functions `len`, `shape` and `ndim` on some of those arrays and\n", "observe their output." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Creating arrays using functions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Experiment with `arange`, `linspace`, `ones`, `zeros`, `eye` and `diag`.\n", "\n", "* Create different kinds of arrays with random numbers.\n", "\n", "* Try setting the seed before creating an array with random values \n", " - *hint*: use `np.random.seed`\n", "\n", "* Look at the function `np.empty`. What does it do? When might this be\n", "useful?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Basic Data Type" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "You may have noticed that, in some instances, array elements are\n", "displayed with a trailing dot (e.g. `2.` vs `2`). This is due to a\n", "difference in the data-type used:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([1, 2, 3])\n", "a.dtype" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([1., 2., 3.])\n", "b.dtype" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Note\n", "\n", "Different data-types allow us to store data more compactly in memory,\n", "but most of the time we simply work with floating point numbers. Note\n", "that, in the example above, NumPy auto-detects the data-type from the\n", "input." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "You can explicitly specify which data-type you want:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = np.array([1, 2, 3], dtype=float)\n", "c.dtype" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The **default** data type is floating point:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.ones((3, 3))\n", "a.dtype" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Basic Data Types" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ " bool | This stores boolean (True or False) as a bit\n", "\n", " inti | This is a platform integer (normally either int32 or int64)\n", " int8 | This is an integer ranging from -128 to 127\n", " int16 | This is an integer ranging from -32768 to 32767\n", " int32 | This is an integer ranging from -2 ** 31 to 2 ** 31 -1\n", " int64 | This is an integer ranging from -2 ** 63 to 2 ** 63 -1\n", " \n", " uint8 | This is an unsigned integer ranging from 0 to 255\n", " uint16 | This is an unsigned integer ranging from 0 to 65535\n", " uint32 | This is an unsigned integer ranging from 0 to 2 ** 32 - 1\n", " uint64 | This is an unsigned integer ranging from 0 to 2 ** 64 - 1\n", "\n", " float16 | This is a half precision float with sign bit, 5 bits exponent, and 10 bits mantissa\n", " float32 | This is a single precision float with sign bit, 8 bits exponent, and 23 bits mantissa\n", " float64 or float | This is a double precision float with sign bit, 11 bits exponent, and 52 bits mantissa\n", " complex64 | This is a complex number represented by two 32-bit floats (real and imaginary components)\n", " complex128 | This is a complex number represented by two 64-bit floats (real and imaginary components)\n", " (or complex)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Conversions and Type Casting" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "42.0" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.float64(42) # int to float" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "42" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.int8(42.0) # float to int8" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.bool(42) # int to bool" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.bool(0) # \"special\" int to bool" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.bool(42.0) # float to bool" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.float(True) # bool to float" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.float(False)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6], dtype=uint16)" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(7, dtype=np.uint16)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "ename": "TypeError", "evalue": "can't convert complex to int", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m42.0\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1.j\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# complex to int\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: can't convert complex to int" ] } ], "source": [ "np.int(42.0 + 1.j) # complex to int" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "ename": "TypeError", "evalue": "can't convert complex to float", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfloat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m42.0\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1.j\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# complex to float\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: can't convert complex to float" ] } ], "source": [ "np.float(42.0 + 1.j) # complex to float" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "ename": "TypeError", "evalue": "can't convert complex to float", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfloat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m42.0\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m0.j\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# complex to float\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: can't convert complex to float" ] } ], "source": [ "np.float(42.0 + 0.j) # complex to float" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(42+0j)\n" ] } ], "source": [ "cn = np.complex(42.0) # Btw, you can convert a float to a complex..\n", "print(cn)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "42.0" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extracting the Real part..\n", "cn.real" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# .. and the Imaginary part\n", "cn.imag" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Numerical Types and Representation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The **numerical dtype** of an array should be selected very carefully, as it directly affects the numerical representation of elements, that is: \n", "\n", " * the number of **bytes used; \n", " * the *numerical range*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "So, then: **What happens if I try to represent a number that is Out of range?**\n", "\n", "Let's have a go with **integers**, i.e., `int8` and `uint8`" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 0], dtype=int8)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.zeros(4, 'int8') # Integer ranging from -128 to 127\n", "x" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([127, 0, 0, 0], dtype=int8)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0] = 127\n", "x" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([-128, 0, 0, 0], dtype=int8)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0] = 128\n", "x" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([-128, -127, 0, 0], dtype=int8)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[1] = 129\n", "x" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([-128, -127, 1, 0], dtype=int8)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[2] = 257 # i.e. (128 x 2) + 1\n", "x" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 0], dtype=uint8)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ux = np.zeros(4, 'uint8') # Integer ranging from 0 to 255\n", "ux" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([255, 0, 1, 1], dtype=uint8)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ux[0] = 255\n", "ux[1] = 256\n", "ux[2] = 257\n", "ux[3] = 513 # (256 x 2) + 1\n", "ux" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Data Type Object" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Data type objects** are instances of the `numpy.dtype` class. \n", "\n", "Once again, arrays have a data type. \n", "
\n", "To be precise, *every element* in a NumPy array has the same data type. \n", "\n", "The data type object can tell you the `size` of the data in bytes.\n", "
\n", "(**Recall**: The size in bytes is given by the `itemsize` attribute of the dtype class)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a itemsize: 2\n", "a.dtype.itemsize: 2\n" ] } ], "source": [ "a = np.arange(7, dtype=np.uint16)\n", "print('a itemsize: ', a.itemsize)\n", "print('a.dtype.itemsize: ', a.dtype.itemsize)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "We may also have access to the `byteorder`, i.e. **Big Endian** or **Little Endian**" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "'='" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.dtype.byteorder" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### Note:\n", "\n", "**Byte Order** can be one of:\n", "\n", "* `=\tnative`\n", "* `<\tlittle-endian`\n", "* `>\tbig-endian`\n", "* `|\tnot applicable`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Character Codes\n", "\n", "Character codes are included for backward compatibility with **Numeric**. \n", "
\n", "Numeric is the predecessor of NumPy. Their use is not recommended, but these codes pop up in several places. \n", "\n", "Btw, You should instead use the **dtype** objects. \n", "\n", " integer i\n", " Unsigned integer u\n", " Single precision float f\n", " Double precision float d\n", " bool b\n", " complex D\n", " string S\n", " unicode U" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `dtypes` properties" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "t = np.dtype('Float64')" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'d'" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.char" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "numpy.float64" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.type" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'