{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What is Numpy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "NumPy is the fundamental package for scientific computing with Python. \n", "It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. \n", "It is implemented in C and Fortran so when calculations are **vectorized**, performance is very good.\n", "\n", "So, in a nutshell:\n", "\n", "* a powerful Python extension for N-dimensional array\n", "* a tool for integrating C/C++ and Fortran code\n", "* designed for scientific computation: linear algebra and Signal Analysis\n", "\n", "If you are a MATLAB® user we recommend to read [Numpy for MATLAB Users](http://www.scipy.org/NumPy_for_Matlab_Users) and [Benefit of Open Source Python versus commercial packages](http://www.scipy.org/NumPyProConPage). \n", "\n", "I'm a supporter of the **Open Science Movement**, thus I humbly suggest you to take a look at the [Science Code Manifesto](http://sciencecodemanifesto.org/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Getting Started with Numpy Arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "NumPy's main object is the **homogeneous** ***multidimensional array***. It is a table of elements (usually numbers), all of the same type. \n", "\n", "In Numpy dimensions are called **axes**. \n", "\n", "The number of axes is called **rank**. \n", "\n", "The most important attributes of an ndarray object are:\n", "\n", "* **ndarray.ndim** - the number of axes (dimensions) of the array. \n", "* **ndarray.shape** - the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m). \n", "* **ndarray.size** - the total number of elements of the array. \n", "* **ndarray.dtype** - numpy.int32, numpy.int16, and numpy.float64 are some examples. \n", "* **ndarray.itemsize** - the size in bytes of elements of the array. For example, elements of type float64 has itemsize 8 (=64/8) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To use `numpy` need to import the module it using of example:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import numpy as np # naming import convention" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Terminology Assumption" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "In the `numpy` package the terminology used for vectors, matrices and higher-dimensional data sets is *array*. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Reference Documentation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* On the web: [http://docs.scipy.org](http://docs.scipy.org)/\n", "\n", "* Interactive help:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "np.array?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If you're looking for something" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Creating `numpy` arrays\n", "\n", "### Get acquainted with NumPy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's start by creating some `numpy.array` objects in order to get our hands into the very details of **numpy basic data structure**.\n", "\n", "NumPy is a very flexible library, and provides many ways to create (and initialize) new numpy arrays. \n", "\n", "One way is **using specific functions dedicated to generate numpy arrays** \n", "(usually, *array of numbers*)\\[+\\]\n", "\n", "\n", "\n", "\\[+\\] More on data types, later on !-)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# First `numpy array` example: array of numbers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "NumPy provides many functions to generate arrays with with specific properties (e.g. `size` or `shape`).\n", "\n", "We will see later examples in which we will generate `ndarray` using explicit Python lists. \n", "\n", "However, for larger arrays, using Python lists is simply inpractical. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `np.arange`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In standard Python, we use the `range` function to generate an **iterable** object of **integers** within a specific range (at a specified `step`, default: `1`)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n", "\n" ] } ], "source": [ "r = range(10)\n", "print(list(r))\n", "\n", "print(type(r)) # NOTE: if this print will return a it means you're using Py2.7" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, in numpy there is the `arange` function which instead generates a `numpy.ndarray`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5 6 7 8 9]\n", "\n" ] } ], "source": [ "ra = np.arange(10) \n", "print(ra)\n", "\n", "print(type(ra))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, we are working with the **Numerical Python** library, so we should expect more when it comes to numbers.\n", "\n", "In fact, we can create an array within a _floating point step-wise range_:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-1.00000000e+00 -9.00000000e-01 -8.00000000e-01 -7.00000000e-01\n", " -6.00000000e-01 -5.00000000e-01 -4.00000000e-01 -3.00000000e-01\n", " -2.00000000e-01 -1.00000000e-01 -2.22044605e-16 1.00000000e-01\n", " 2.00000000e-01 3.00000000e-01 4.00000000e-01 5.00000000e-01\n", " 6.00000000e-01 7.00000000e-01 8.00000000e-01 9.00000000e-01]\n" ] } ], "source": [ "# floating point step-wise range generatation\n", "raf = np.arange(-1, 1, 0.1) \n", "print(raf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Properties of `numpy array`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Apart from the actual content, which is of course different because specified ranges are different, the `ra` and `raf` arrays differ by their **`dtype`**:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dtype of 'ra': int64, dtype of 'raf': float64\n" ] } ], "source": [ "print(f\"dtype of 'ra': {ra.dtype}, dtype of 'raf': {raf.dtype}\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### More properties of the `numpy array`" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ra.itemsize # bytes per element" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "80" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ra.nbytes # number of bytes" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ra.ndim # number of dimensions" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10,)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ra.shape # shape, i.e. number of elements per-dimension/axis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## please replicate the same set of operations here for `raf`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q**: Do you notice any relevant difference?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `np.linspace` and `np.logspace`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like `np.arange`, in numpy there are other two \"similar\" functions: \n", "\n", "- np.linspace\n", "- np.logspace\n", "\n", "Looking at the examples below, can you spot the difference?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0. , 0.52631579, 1.05263158, 1.57894737, 2.10526316,\n", " 2.63157895, 3.15789474, 3.68421053, 4.21052632, 4.73684211,\n", " 5.26315789, 5.78947368, 6.31578947, 6.84210526, 7.36842105,\n", " 7.89473684, 8.42105263, 8.94736842, 9.47368421, 10. ])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linspace(0, 10, 20)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([1.00000000e+00, 2.27278564e+00, 5.16555456e+00, 1.17401982e+01,\n", " 2.66829540e+01, 6.06446346e+01, 1.37832255e+02, 3.13263169e+02,\n", " 7.11980032e+02, 1.61817799e+03])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.logspace(0, np.e**2, 10, base=np.e)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Random Number Generation\n", "\n", "### `np.random.rand` & `np.random.randn`" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# uniform random numbers in [0,1]\n", "ru = np.random.rand(10)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.06629061, 0.56102955, 0.81081042, 0.80936217, 0.19182628,\n", " 0.78609316, 0.88379009, 0.45329187, 0.84304588, 0.56232631])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ru" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Note: numbers and the content of the array may vary_" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# standard normal distributed random numbers\n", "rs = np.random.randn(10)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.45052791, -0.80566857, -0.10401981, 0.91948746, -0.0329787 ,\n", " -0.71872119, 1.42738938, -0.63292836, 0.5397375 , 0.89186053])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Note: numbers and the content of the array may vary_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q**: What if I ask you to generate random numbers in a way that we both obtain the __very same__ numbers? (_Provided we share the same CPU architecture_)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Zeros and Ones (or Empty)\n", "\n", "### `np.zeros`, `np.ones`, `np.empty`\n", "\n", "Sometimes it may be required to initialise arrays of `zeros`, or of all `ones` or finally just `rubbish` (i.e. `empty`) of a specific shape:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 0. 0.]\n", " [0. 0. 0.]\n", " [0. 0. 0.]]\n" ] } ], "source": [ "Z = np.zeros((3,3))\n", "\n", "print(Z)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1. 1.]\n", " [1. 1. 1.]\n", " [1. 1. 1.]]\n" ] } ], "source": [ "O = np.ones((3, 3))\n", "print(O)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.45052791 0.80566857 0.10401981 0.91948746 0.0329787 0.71872119\n", " 1.42738938 0.63292836 0.5397375 0.89186053]\n" ] } ], "source": [ "E = np.empty(10)\n", "\n", "print(E)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TRY THIS!\n", "\n", "np.empty(9)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Other specialised Functions\n", "\n", "## Diagonal Matrices\n", "\n", "### 1. `np.diag`" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0],\n", " [0, 2, 0],\n", " [0, 0, 3]])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a diagonal matrix\n", "np.diag([1,2,3])" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[0, 0, 3, 0],\n", " [0, 2, 0, 0],\n", " [1, 0, 0, 0],\n", " [0, 0, 0, 0]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# diagonal with offset from the main diagonal\n", "np.diag([1,2,3], k=-1)[::-1]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Identity Matrix $\\mathrm{I} \\mapsto$ `np.eye`" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0],\n", " [0, 1, 0],\n", " [0, 0, 1]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a diagonal matrix with ones on the main diagonal\n", "np.eye(3, dtype='int') # 3 is the " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Create `numpy.ndarray` from `list`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To create new vector or matrix arrays from Python lists we can use the \n", "`numpy.array` constructor function:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v = np.array([1,2,3,4])\n", "v" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "print(type(v))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Alternatively** there is also the `np.asarray` function which easily convert a Python list into a numpy array:\n", "\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v = np.asarray([1, 2, 3, 4])\n", "v" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "print(type(v))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the very same strategy for higher-dimensional arrays.\n", "\n", "E.g. Let's create a matrix from a list of lists:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M = np.array([[1, 2], [3, 4]])\n", "M" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "((4,), (2, 2))" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v.shape, M.shape" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## So, why is it useful then?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "So far the `numpy.ndarray` looks awefully much like a Python **list** (or **nested list**). \n", "\n", "*Why not simply use Python lists for computations instead of creating a new array type?*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "There are several reasons:\n", "\n", "* Python lists are very general. \n", " - They can contain any kind of object. \n", " - They are dynamically typed. \n", " - They do not support mathematical functions such as matrix and dot multiplications, etc. \n", " - Implementing such functions for Python lists would not be very efficient because of the dynamic typing.\n", " \n", " \n", "* Numpy arrays are **statically typed** and **homogeneous**. \n", " - The type of the elements is determined when array is created.\n", " \n", " \n", "* Numpy arrays are memory efficient.\n", " - Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used)." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "L = range(100000)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "41.7 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "%timeit [i**2 for i in L]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "a = np.arange(100000)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "92.9 µs ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ "%timeit a**2 # This operation is called Broadcasting - more on this later!" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "48.4 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "%timeit [element**2 for element in a]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercises: DIY" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "### Simple arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "* Create simple one and two dimensional arrays. First, redo the examples\n", "from above. And then create your own.\n", "\n", "* Use the functions `len`, `shape` and `ndim` on some of those arrays and\n", "observe their output." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Creating arrays using functions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Experiment with `arange`, `linspace`, `ones`, `zeros`, `eye` and `diag`.\n", "\n", "* Create different kinds of arrays with random numbers.\n", "\n", "* Try setting the seed before creating an array with random values \n", " - *hint*: use `np.random.seed`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Numpy Array Object" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`NumPy` has a multidimensional array object called ndarray. It consists of two parts as follows:\n", " \n", " * The actual data\n", " * Some metadata describing the data\n", " \n", " \n", "The majority of array operations leave the raw data untouched. The only aspect that changes is the metadata." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data vs Metadata (Attributes)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This internal separation between actual data (i.e. the content of the array --> the `memory`) and metadata (i.e. properties and attributes of the data), allows for example for an efficient memory management.\n", "\n", "For example, the shape of an Numpy array **can be modified without copying and/or affecting** the actual data, which makes it a fast operation even for large arrays." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n", " 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,\n", " 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(45)\n", "\n", "a" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(45,)" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.shape" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19],\n", " [20, 21, 22, 23, 24],\n", " [25, 26, 27, 28, 29],\n", " [30, 31, 32, 33, 34],\n", " [35, 36, 37, 38, 39],\n", " [40, 41, 42, 43, 44]])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = a.reshape(9, 5)\n", "\n", "A" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "n, m = A.shape" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,\n", " 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,\n", " 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B = A.reshape((1,n*m))\n", "B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q**: What is the difference (in terms of shape) between `B` and the original `a`?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Flattening\n", "\n", "Another (quite common) reshaping operation you will end up performing on n-dimensional arrays is **flattening**.\n", "\n", "Flattening means _collapsing all the axis into a unique one_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `np.ravel`\n", "\n", "`numpy.ndarray` objects have a `ravel` method that generates a new version of the array as a `1D` vector. \n", "\n", "Also this time, the original memory is unaffected, and a pointer with different metadata is returned." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4, 5, 6])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.array([[1, 2, 3], [4, 5, 6]])\n", "A.ravel()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the `np.ravel` performs the operation _row-wise_ á-la-C. Numpy also support a Fortran-style order of indices (i.e. _column-major_ indexing)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 4, 2, 5, 3, 6])" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.ravel('F') # order F (Fortran) is column-major, C (default) row-major" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Alternatively** We can also use the function `np.flatten` to make a higher-dimensional array into a vector. But this function create a copy of the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transpose\n", "\n", "Similarly, we can transpose a matrix" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([[1, 4],\n", " [2, 5],\n", " [3, 6]])" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.T" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([1, 4, 2, 5, 3, 6])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.T.ravel()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introducing `np.newaxis`\n", "\n", "In addition to shape, we can also manipulate the axis of an array." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(1)** We can always add as many axis as we want:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1, 10, 2)\n" ] } ], "source": [ "A = np.arange(20).reshape(10, 2)\n", "A = A[np.newaxis, ...] # this is called ellipsis\n", "\n", "print(A.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(2)** We can also _permute_ axis:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2, 10, 1)\n" ] } ], "source": [ "A = A.swapaxes(0, 2) # swap axis 0 with axis 2 --> new shape: (2, 10, 1)\n", "\n", "print(A.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, changin and manipulating the `axis` will not touch the memory, it will just change parameters (i.e. `strides` and `offset`) to navigate data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Numerical Types and Precision\n", "\n", "In NumPy, talking about `int` or `float` does not make \"real sense\". This is mainly for two reasons:\n", "\n", "(a) `int` or `float` are assumed at the maximum precision available on your machine (presumably `int64` and \n", "`float64`, respectively.\n", "\n", "(b) Different precision imply different numerical ranges, and so different memory size (i.e. _number of bytes_ required to represent all the numbers in the corresponding numerical range).\n", "\n", "Numpy support the following numerical types:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ " bool | This stores boolean (True or False) as a bit\n", "\n", " int0 | This is a platform integer (normally either int32 or int64)\n", " int8 | This is an integer ranging from -128 to 127\n", " int16 | This is an integer ranging from -32768 to 32767\n", " int32 | This is an integer ranging from -2 ** 31 to 2 ** 31 -1\n", " int64 | This is an integer ranging from -2 ** 63 to 2 ** 63 -1\n", " \n", " uint8 | This is an unsigned integer ranging from 0 to 255\n", " uint16 | This is an unsigned integer ranging from 0 to 65535\n", " uint32 | This is an unsigned integer ranging from 0 to 2 ** 32 - 1\n", " uint64 | This is an unsigned integer ranging from 0 to 2 ** 64 - 1\n", "\n", " float16 | This is a half precision float with sign bit, 5 bits exponent, and 10 bits mantissa\n", " float32 | This is a single precision float with sign bit, 8 bits exponent, and 23 bits mantissa\n", " float64 or float | This is a double precision float with sign bit, 11 bits exponent, and 52 bits mantissa\n", " complex64 | This is a complex number represented by two 32-bit floats (real and imaginary components)\n", " complex128 | This is a complex number represented by two 64-bit floats (real and imaginary components)\n", " (or complex)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Numerical Types and Representation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The **numerical dtype** of an array should be selected very carefully, as it directly affects the numerical representation of elements, that is: \n", "\n", " * the number of **bytes** used; \n", " * the *numerical range*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can **always specify** the `dtype` of an array when we create one. If we do not, the `dtype` of the array will be inferred, namely `np.int_` or `np.float_` depending on the case." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5 6 7 8 9]\n", "int64\n" ] } ], "source": [ "a = np.arange(10)\n", "print(a)\n", "\n", "print(a.dtype)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5 6 7 8 9]\n", "uint8\n" ] } ], "source": [ "au = np.arange(10, dtype=np.uint8)\n", "print(au)\n", "\n", "print(au.dtype)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "So, then: **What happens if I try to represent a number that is Out of range?**\n", "\n", "Let's have a go with **integers**, i.e., `int8` and `uint8`" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 0], dtype=int8)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.zeros(4, 'int8') # Integer ranging from -128 to 127\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">__Spoiler Alert__: _very simple example of indexing in NumPy_\n", ">\n", "> _Well...it works as expected, doesn't it?_" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([127, 0, 0, 0], dtype=int8)" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0] = 127\n", "x" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([-128, 0, 0, 0], dtype=int8)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[0] = 128\n", "x" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([-128, -127, 0, 0], dtype=int8)" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[1] = 129\n", "x" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([-128, -127, 1, 0], dtype=int8)" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[2] = 257 # i.e. (128 x 2) + 1\n", "x" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 0], dtype=uint8)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ux = np.zeros(4, 'uint8') # Integer ranging from 0 to 255, dtype also as string!\n", "ux" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([255, 0, 1, 1], dtype=uint8)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ux[0] = 255\n", "ux[1] = 256\n", "ux[2] = 257\n", "ux[3] = 513 # (256 x 2) + 1\n", "ux" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Machine Info and Supported Numerical Representation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy provides two functions to inspect the information of supported integer and floating-point types, namely `np.iinfo` and `np.finfo`:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "iinfo(min=-2147483648, max=2147483647, dtype=int32)" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.iinfo(np.int32)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "finfo(resolution=0.001, min=-6.55040e+04, max=6.55040e+04, dtype=float16)" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.finfo(np.float16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition, the `MachAr` class will provide information on the current machine : " ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "machine_info = np.MachAr()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.220446049250313e-16" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "machine_info.epsilon" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.7976931348623157e+308" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "machine_info.huge" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.finfo(np.float64).max == machine_info.huge" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TRY THIS!\n", "\n", "help(machine_info)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Data Type Object" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Data type objects** are instances of the `numpy.dtype` class. \n", "\n", "Once again, arrays have a data type. \n", "
\n", "To be precise, *every element* in a NumPy array has the same data type. \n", "\n", "The data type object can tell you the `size` of the data in bytes.\n", "
\n", "(**Recall**: The size in bytes is given by the `itemsize` attribute of the dtype class)" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a itemsize: 2\n", "a.dtype.itemsize: 2\n" ] } ], "source": [ "a = np.arange(7, dtype=np.uint16)\n", "print('a itemsize: ', a.itemsize)\n", "print('a.dtype.itemsize: ', a.dtype.itemsize)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Character Codes\n", "\n", "Character codes are included for backward compatibility with **Numeric**. \n", "
\n", "Numeric is the predecessor of NumPy. Their use is not recommended, but these codes pop up in several places. \n", "\n", "Btw, You should instead use the **dtype** objects. \n", "\n", " integer i\n", " Unsigned integer u\n", " Single precision float f\n", " Double precision float d\n", " bool b\n", " complex D\n", " string S\n", " unicode U" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### `dtype` contructors" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dtype(float)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype('float32')" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dtype('f')" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dtype('d')" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dtype('f8')" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype('