{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Indexing"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Setting up the data\n",
"\n",
"Let's create the structures that will be used later in this notebook"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"np.random.seed(42) # Setting the random seed"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864,\n",
" 0.15599452, 0.05808361, 0.86617615, 0.60111501, 0.70807258])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# a vector: the argument to the array function is a Python list\n",
"v = np.random.rand(10)\n",
"v"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.02058449, 0.96990985],\n",
" [0.83244264, 0.21233911],\n",
" [0.18182497, 0.18340451],\n",
" [0.30424224, 0.52475643],\n",
" [0.43194502, 0.29122914],\n",
" [0.61185289, 0.13949386],\n",
" [0.29214465, 0.36636184],\n",
" [0.45606998, 0.78517596],\n",
" [0.19967378, 0.51423444],\n",
" [0.59241457, 0.04645041]])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# a matrix: the argument to the array function is a nested Python list\n",
"M = np.random.rand(10, 2)\n",
"M"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We can index elements in an array using the square bracket and indices:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0.3745401188473625"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# v is a vector, and has only one dimension, taking one index\n",
"v[0]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0.21233911067827616"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# M is a matrix, or a 2 dimensional array, taking two indices \n",
"M[1,1]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"If we omit an index of a multidimensional array it returns the whole row (or, in general, a N-1 dimensional array) "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0.83244264, 0.21233911])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"M[1] "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The same thing can be achieved with using `:` instead of an index: "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0.83244264, 0.21233911])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"M[1,:] # row 1"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0.96990985, 0.21233911, 0.18340451, 0.52475643, 0.29122914,\n",
" 0.13949386, 0.36636184, 0.78517596, 0.51423444, 0.04645041])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"M[:,1] # column 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can assign new values to elements in an array using indexing:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"M[0,0] = 1"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1. , 0.96990985],\n",
" [0.83244264, 0.21233911],\n",
" [0.18182497, 0.18340451],\n",
" [0.30424224, 0.52475643],\n",
" [0.43194502, 0.29122914],\n",
" [0.61185289, 0.13949386],\n",
" [0.29214465, 0.36636184],\n",
" [0.45606998, 0.78517596],\n",
" [0.19967378, 0.51423444],\n",
" [0.59241457, 0.04645041]])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"M"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# also works for rows and columns\n",
"M[1,:] = 0\n",
"M[:,1] = -1"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1. , -1. ],\n",
" [ 0. , -1. ],\n",
" [ 0.18182497, -1. ],\n",
" [ 0.30424224, -1. ],\n",
" [ 0.43194502, -1. ],\n",
" [ 0.61185289, -1. ],\n",
" [ 0.29214465, -1. ],\n",
" [ 0.45606998, -1. ],\n",
" [ 0.19967378, -1. ],\n",
" [ 0.59241457, -1. ]])"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"M"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Index slicing"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4, 5])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([1,2,3,4,5])\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 3])"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[1:3]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Array slices are **mutable**: if they are assigned a new value the original array from which the slice was extracted is modified:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, -2, -3, 4, 5])"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[1:3] = [-2,-3]\n",
"\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* We can omit any of the three parameters in `M[lower:upper:step]`:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, -2, -3, 4, 5])"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[::] # lower, upper, step all take the default values"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, -3, 5])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[::2] # step is 2, lower and upper defaults to the beginning and end of the array"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, -2, -3])"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[:3] # first three elements"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([4, 5])"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[3:] # elements from index 3"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Negative indices counts from the end of the array (positive index from the begining):"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"a = np.array([1,2,3,4,5])"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"5"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[-1] # the last element in the array"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 4, 5])"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[-3:] # the last three elements"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Index slicing works exactly the same way for multidimensional arrays:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0, 1, 2, 3, 4],\n",
" [10, 11, 12, 13, 14],\n",
" [20, 21, 22, 23, 24],\n",
" [30, 31, 32, 33, 34],\n",
" [40, 41, 42, 43, 44]])"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A = np.array([[n+m*10 for n in range(5)] \n",
" for m in range(5)])\n",
"A"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[11, 12, 13],\n",
" [21, 22, 23],\n",
" [31, 32, 33]])"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# a block from the original array\n",
"A[1:4, 1:4]"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0, 2, 4],\n",
" [20, 22, 24],\n",
" [40, 42, 44]])"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# strides\n",
"A[::2, ::2]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Indexing and Array Memory Management"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Numpy arrays support two different way of storing data into memory, namely\n",
"\n",
"* F-Contiguous \n",
" - i.e. *column-wise* storage, Fortran-like\n",
"* C-Contiguous\n",
" - i.e. *row-wise* storage, C-like\n",
" \n",
"The **storage** strategy is controlled by the parameter `order` of `np.array`\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Fancy indexing"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"FC = np.array([[1, 2, 3], [4, 5, 6], \n",
" [7, 8, 9], [10, 11, 12]], order='F')"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"CC = np.array([[1, 2, 3], [4, 5, 6], \n",
" [7, 8, 9], [10, 11, 12]], order='C')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* **Note**: no changes in meaning for indexing operations"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FC[0, 1]"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CC[0, 1]"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(4, 3)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FC.shape"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(4, 3)"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CC.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fancy Indexing"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Fancy indexing is the name for when an array or list is used in-place of an index: "
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[10, 11, 12, 13, 14],\n",
" [20, 21, 22, 23, 24],\n",
" [30, 31, 32, 33, 34]])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"row_indices = [1, 2, 3]\n",
"A[row_indices]"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([11, 22, 34])"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"col_indices = [1, 2, -1] # remember, index -1 means the last element\n",
"A[row_indices, col_indices]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* We can also index **masks**: \n",
"\n",
" - If the index mask is an Numpy array of with data type `bool`, then an element is selected (True) or not (False) depending on the value of the index mask at the position each element: "
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4])"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b = np.array([n for n in range(5)])\n",
"b"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 2])"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"row_mask = np.array([True, False, True, False, False])\n",
"b[row_mask]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Alternatively:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 2])"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# same thing\n",
"row_mask = np.array([1,0,1,0,0], dtype=bool)\n",
"b[row_mask]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"This feature is very useful to conditionally select elements from an array, using for example comparison operators:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,\n",
" 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.arange(0, 10, 0.5)\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([False, False, False, False, False, False, False, False, False,\n",
" False, False, True, True, True, True, True, True, True,\n",
" True, True])"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mask = (5 < x)\n",
"\n",
"mask"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([5.5, 6. , 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[mask]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, we can use the condition (mask) array directly within brackets to index the array"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([5.5, 6. , 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[(5 < x)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Exercises on Indexing"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Ex 2.1\n",
"\n",
"Generate a three-dimensional array of any size containing random numbers taken from an uniform distribution (_guess the numpy function in `np.random`_). Then print out separately the first entry along the three axis (i.e. `x, y, z`) \n",
"\n",
"\n",
"* _hint_: Slicing with numpy arrays works quite like Python lists"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Ex 2.2\n",
"\n",
"Create a vector and print out elements in reverse order\n",
"\n",
"#### Hint: Use slicing for this exercise"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Ex 2.3\n",
"\n",
"Generate a $7 \\times 7$ matrix and replace all the elements in odd rows and even columns with `1`.\n",
"\n",
"#### Hint: Use slicing to solve this exercise!\n",
"\n",
"#### Note: Take a look at the original matrix, then."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Use fancy indexing** to get all the elements of the previous matrix that are equals to `1`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Ex 2.4 \n",
"\n",
"Generate a `10 x 10` matrix of numbers `A`. Then, generate a numpy array of integers in range `1-9`. Pick `5` random values (with no repetition) from this array and use these values to extract rows from the original matrix `A`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Ex 2.5 \n",
"\n",
"Repeat the previous exercise but this time extract columns from `A`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Ex 2.6\n",
"\n",
"Generate an array of numbers from `0` to `20` with step `0.5`. \n",
"Extract all the values greater than a randomly generated number in the same range.\n",
"\n",
"* _hint_: Try to write the condition as an expression and save it to a variable. Then, use this variable in square brackets to index.... this is when the magic happens!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Stacking and repeating arrays"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Using function `repeat`, `tile`, `vstack`, `hstack`, and `concatenate` we can create larger vectors and matrices from smaller ones:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## `np.tile` and `np.repeat`"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"a = np.array([[1, 2], [3, 4]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `np.repeat`"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# repeat each element 3 times\n",
"np.repeat(a, 3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `np.tile`"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2, 1, 2, 1, 2],\n",
" [3, 4, 3, 4, 3, 4]])"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# tile the matrix 3 times \n",
"np.tile(a, 3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## `np.concatenate`"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"b = np.array([[5, 6]])"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2],\n",
" [3, 4],\n",
" [5, 6]])"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.concatenate((a, b), axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2, 5],\n",
" [3, 4, 6]])"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.concatenate((a, b.T), axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## `np.hstack` and `np.vstack`"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2],\n",
" [3, 4],\n",
" [5, 6]])"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.vstack((a,b))"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2, 5],\n",
" [3, 4, 6]])"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.hstack((a,b.T))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Copy and \"deep copy\""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"To achieve high performance, assignments in Python usually do not copy the underlaying objects. \n",
"\n",
"This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (techincal term: **pass by reference**).\n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2],\n",
" [3, 4]])"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A = np.array([[1, 2], [3, 4]])\n",
"\n",
"A"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# now B is referring to the same array data as A \n",
"B = A "
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[10, 2],\n",
" [ 3, 4]])"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# changing B affects A\n",
"B[0,0] = 10\n",
"\n",
"B"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[10, 2],\n",
" [ 3, 4]])"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* If we want to **avoid** this behavior, so that when we get a new completely independent object `B` copied from `A`, then we need to do a so-called **deep copy** using the function `np.copy`:"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"B = np.copy(A)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[-5, 2],\n",
" [ 3, 4]])"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# now, if we modify B, A is not affected\n",
"B[0,0] = -5\n",
"\n",
"B"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[10, 2],\n",
" [ 3, 4]])"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Iterating over array elements"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Generally, we want to avoid iterating over the elements of arrays whenever we can (at all costs). The reason is that in a interpreted language like Python (or MATLAB), iterations are really slow compared to vectorized operations. \n",
"\n",
"However, sometimes iterations are unavoidable. For such cases, the Python `for` loop is the most convenient way to iterate over an array:"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"3\n",
"4\n"
]
}
],
"source": [
"v = np.array([1,2,3,4])\n",
"\n",
"for element in v:\n",
" print(element)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"row [1 2]\n",
"1\n",
"2\n",
"row [3 4]\n",
"3\n",
"4\n"
]
}
],
"source": [
"M = np.array([[1,2], [3,4]])\n",
"\n",
"for row in M:\n",
" print(\"row\", row)\n",
" \n",
" for element in row:\n",
" print(element)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* When we need to iterate over each element of an array and modify its elements, it is convenient to use the `enumerate` function to obtain both the element and its index in the `for` loop: "
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"row_idx 0 row [1 2]\n",
"col_idx 0 element 1\n",
"col_idx 1 element 2\n",
"row_idx 1 row [3 4]\n",
"col_idx 0 element 3\n",
"col_idx 1 element 4\n"
]
}
],
"source": [
"for row_idx, row in enumerate(M):\n",
" print(\"row_idx\", row_idx, \"row\", row)\n",
" \n",
" for col_idx, element in enumerate(row):\n",
" print(\"col_idx\", col_idx, \"element\", element)\n",
" \n",
" # update the matrix M: square each element\n",
" M[row_idx, col_idx] = element ** 2"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1, 4],\n",
" [ 9, 16]])"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# each element in M is now squared\n",
"M"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Vectorizing functions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"As mentioned several times by now, to get good performance we should try to avoid looping over elements in our vectors and matrices, and instead use vectorized algorithms. The first step in converting a scalar algorithm to a vectorized algorithm is to make sure that the functions we write work with vector inputs."
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"def Theta(x):\n",
" \"\"\"\n",
" Scalar implemenation of the Heaviside step function.\n",
" \"\"\"\n",
" if x >= 0:\n",
" return 1\n",
" else:\n",
" return 0"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'array' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mTheta\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNameError\u001b[0m: name 'array' is not defined"
]
}
],
"source": [
"Theta(array([-3,-2,-1,0,1,2,3]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"OK, that didn't work because we didn't write the `Theta` function so that it can handle with vector input... \n",
"\n",
"To get a vectorized version of Theta we can use the Numpy function `np.vectorize`. In many cases it can automatically vectorize a function:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `np.vectorize`"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"Theta_vec = np.vectorize(Theta)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, 1, 1, 1, 1])"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Theta_vec(np.array([-3,-2,-1,0,1,2,3]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `np.frompyfunc`"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Universal functions** (Ufuncs) work on arrays, element-by-element, or on scalars. \n",
"\n",
"Ufuncs accept a set of scalars as input, and produce a set of scalars as output."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Result: [1 1 1 1]\n"
]
}
],
"source": [
"Theta_ufunc = np.frompyfunc(Theta, 1, 1)\n",
"print(\"Result: \", Theta_ufunc(np.arange(4)))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Excercise"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Avoiding Vectorize\n",
"\n",
"* Implement the function to accept vector input from the beginning \n",
" - This requires \"more effort\" but might give better performance"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Vectorisation and Broadcasting"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Vectorizing** code is the key to writing efficient numerical calculation with Python/Numpy. \n",
"\n",
"That means that as much as possible of a program should be formulated in terms of matrix and vector operations, like **matrix-matrix multiplication**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, sometimes it may happen that we need to perform operations between `ndarray` objects of different size and shape so **we might be tempted of performing this operation** explicitly (i.e. _explicit for loop_).\n",
"\n",
"Although we will get correct results (assumed we iterated arrays in the right way), this will produce **very inefficient** code.\n",
"\n",
"For this reason, NumPy has a very important concept called **Broadcasting**.\n",
"\n",
"_Broadcasting_ is Numpy's terminology for performing mathematical operations between arrays with different shapes (assuming _scalars_ being array with `0-dim`)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Working example\n",
"\n",
"Assume we have some data in a matrix `D`:"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [],
"source": [
"D = np.array([ [0.3, 2.5, 3.5],\n",
" [2.9, 27.5, 0],\n",
" [0.4, 1.3, 23.9],\n",
" [14.4, 6, 2.3]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And we also have another vector `adj` of values that contains some adjusting factors that we might want to apply to each sample (row) of data in `D`"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"adj = np.array([9, 4, 4])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Naive Solution"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"12.5 µs ± 375 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"\n",
"# Create a new array filled with zeros, of the same shape as macros.\n",
"result = np.zeros_like(D)\n",
"\n",
"# Now multiply each row of macros by cal_per_macro. In Numpy, `*` is\n",
"# element-wise multiplication between two arrays.\n",
"for i in range(D.shape[0]):\n",
" result[i, :] = D[i, :] * adj\n",
" \n",
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is a reasonable approach when coding in a low-level programming language: allocate the output, loop over input performing some operation, write result into output. \n",
"\n",
"In Numpy, however, this is fairly bad for performance because the looping is done in (slow) Python code instead of internally by Numpy in (fast) C code\n",
"\n",
"(we've proven this already when we compared `ndarray` vs `list`)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using `np.tile`\n",
"\n",
"Idea: Leverage on `np.tile` function to replicate `adj` over the number\n",
"of rows `D` has (i.e. `D.shape[0]`)."
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"8.78 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"%%timeit \n",
"\n",
"adj_stretch = np.tile(adj, (D.shape[0], 1))\n",
"\n",
"D * adj_stretch"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[9, 4, 4],\n",
" [9, 4, 4],\n",
" [9, 4, 4],\n",
" [9, 4, 4]])"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adj_stretch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nice, it's shorter too, and slightly faster! **To appreciate even more** performance gain, of our `np.tile` solution, we could try increasing the size of D to a bigger structure:"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [],
"source": [
"D_large = np.random.rand(10**6, 10)\n",
"adj_large = np.random.rand(10)"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((1000000, 10), (10,))"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"D_large.shape, adj_large.shape"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.48 s ± 126 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%%timeit\n",
"\n",
"# Create a new array filled with zeros, of the same shape as macros.\n",
"result_large = np.zeros_like(D_large)\n",
"\n",
"# Now multiply each row of macros by cal_per_macro. In Numpy, `*` is\n",
"# element-wise multiplication between two arrays.\n",
"for i in range(D_large.shape[0]):\n",
" result_large[i, :] = D_large[i, :] * adj_large"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"48.6 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
"source": [
"%%timeit \n",
"\n",
"adj_large_stretch = np.tile(adj_large, (D_large.shape[0], 1))\n",
"\n",
"D_large * adj_large_stretch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The loop-in-Python method takes `~1.5` seconds, the stretching method takes `~48` milliseconds, a `~75x` speedup."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now, finally, comes the interesting part. \n",
"\n",
"You see, the operation we just did - stretching one array so that its shape matches that of another and then applying some element-wise operation between them - is actually pretty common. \n",
"\n",
"This often happens when we want to take a lower-dimensional array and use it to perform a computation along some axis of a higher-dimensional array. \n",
"\n",
"In fact, when taken to the extreme this is exactly what happens when we perform an operation between an array and a scalar - the scalar is stretched across the whole array so that the element-wise operation gets the same scalar value for each element it computes.\n",
"\n",
"Numpy generalizes this concept into **broadcasting** - a set of rules that permit element-wise computations between arrays of different shapes, as long as some constraints apply. \n",
"\n",
"Incidentally, this lets Numpy achieve two separate goals - **usefulness as well as more consistent and general semantics**. \n",
"\n",
"Binary operators like `*` are element-wise, but what happens when we apply them between arrays of different shapes? Should it work or should it be rejected? If it works, how should it work? \n",
"\n",
"**Broadcasting defines the semantics of these operations**."
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.79 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"## Back to our example\n",
"D * adj # Broadcasting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### How Broadcasting works"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Element-wise operations on arrays are only valid when the arrays' shapes are either equal or **compatible**. \n",
"\n",
"The equal shapes case is trivial - this is the stretched array from the example above. What does **compatible** mean, though?\n",
"\n",
"To determine if two shapes are compatible, Numpy compares their dimensions, starting with the trailing ones and working its way backwards. \n",
"\n",
"**For example**, for the shape `(4, 3, 2)` the trailing dimension is `2`, and working from `2` \"backwards\" produces: `2`, then `3`, then `4`.\n",
"\n",
"If two dimensions are equal, or if one of them equals 1, the comparison continues. Otherwise, you'll see a ValueError raised (saying something like `\"operands could not be broadcast together with shapes ...\"`).\n",
"\n",
"When one of the shapes runs out of dimensions (because it has less dimensions than the other shape), Numpy will use `1` in the comparison process until the other shape's dimensions run out as well.\n",
"\n",
"Once Numpy determines that two shapes are compatible, the shape of the result is simply the **maximum of the two shapes' sizes** in each dimension."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**More here**: [Broadcasting Arrays in NumPy](https://eli.thegreenplace.net/2015/broadcasting-arrays-in-numpy/#id8)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Scalar-array operations"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers."
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"v1 = np.arange(0, 5)"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 2, 4, 6, 8])"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1 * 2"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 3, 4, 5, 6])"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1 + 2"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"A = np.array([[n+m*10 for n in range(5)] for m in range(5)])"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"A * 2: \n",
" [[ 0 2 4 6 8]\n",
" [20 22 24 26 28]\n",
" [40 42 44 46 48]\n",
" [60 62 64 66 68]\n",
" [80 82 84 86 88]]\n",
"A + 2: \n",
" [[ 2 3 4 5 6]\n",
" [12 13 14 15 16]\n",
" [22 23 24 25 26]\n",
" [32 33 34 35 36]\n",
" [42 43 44 45 46]]\n"
]
}
],
"source": [
"print('A * 2: ', '\\n', A * 2)\n",
"print('A + 2: ', '\\n', A + 2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Element-wise array-array operations"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"When we add, subtract, multiply and divide arrays with each other, the default behaviour is **element-wise** operations:"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0, 1, 4, 9, 16],\n",
" [ 100, 121, 144, 169, 196],\n",
" [ 400, 441, 484, 529, 576],\n",
" [ 900, 961, 1024, 1089, 1156],\n",
" [1600, 1681, 1764, 1849, 1936]])"
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A * A # element-wise multiplication"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 1, 4, 9, 16])"
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v1 * v1"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row:"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"((5, 5), (5,))"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A.shape, v1.shape"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0, 1, 4, 9, 16],\n",
" [ 0, 11, 24, 39, 56],\n",
" [ 0, 21, 44, 69, 96],\n",
" [ 0, 31, 64, 99, 136],\n",
" [ 0, 41, 84, 129, 176]])"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A * v1 #Broadcasting"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Matrix algebra"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"What about **matrix mutiplication**? \n",
"\n",
"There are two ways. \n",
"\n",
"We can either use the `np.dot` function, which applies a **matrix-matrix**, **matrix-vector**, or **inner vector multiplication** to its two arguments: "
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 300, 310, 320, 330, 340],\n",
" [1300, 1360, 1420, 1480, 1540],\n",
" [2300, 2410, 2520, 2630, 2740],\n",
" [3300, 3460, 3620, 3780, 3940],\n",
" [4300, 4510, 4720, 4930, 5140]])"
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.dot(A, A)"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 30, 130, 230, 330, 430])"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.dot(A, v1)"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"30"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.dot(v1, v1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## A *new* dedicated Infix operator for Matrix Multiplication\n",
"\n",
"**Python 3.5** (released on the **24th Aug.**) introduces a new binary operator to be used for matrix multiplication, called `@` . (Mnemonic: `@` is * for `mATrices`.)\n",
"\n",
"Some useful hints:\n",
"\n",
"- PEP465 Description: [https://www.python.org/dev/peps/pep-0465/#abstract]()\n",
"- [Motivations](https://www.python.org/dev/peps/pep-0465/#motivation)\n",
"- [Rationale for Specification Details](https://www.python.org/dev/peps/pep-0465/#rationale-for-specification-details)"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 30, 130, 230, 330, 430])"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A @ v1"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.7 (NumPy EuroSciPy)",
"language": "python",
"name": "numpy-euroscipy"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}