{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# What is Numpy"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"NumPy is the fundamental package for scientific computing with Python. \n",
"It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. \n",
"It is implemented in C and Fortran so when calculations are **vectorized**, performance is very good.\n",
"\n",
"So, in a nutshell:\n",
"\n",
"* a powerful Python extension for N-dimensional array\n",
"* a tool for integrating C/C++ and Fortran code\n",
"* designed for scientific computation: linear algebra and Signal Analysis\n",
"\n",
"If you are a MATLAB® user we recommend to read [Numpy for MATLAB Users](http://www.scipy.org/NumPy_for_Matlab_Users) and [Benefit of Open Source Python versus commercial packages](http://www.scipy.org/NumPyProConPage). \n",
"\n",
"I'm a supporter of the **Open Science Movement**, thus I humbly suggest you to take a look at the [Science Code Manifesto](http://sciencecodemanifesto.org/)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Getting Started with Numpy Arrays"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"NumPy's main object is the **homogeneous** ***multidimensional array***. It is a table of elements (usually numbers), all of the same type. \n",
"\n",
"In Numpy dimensions are called **axes**. \n",
"\n",
"The number of axes is called **rank**. \n",
"\n",
"The most important attributes of an ndarray object are:\n",
"\n",
"* **ndarray.ndim** - the number of axes (dimensions) of the array. \n",
"* **ndarray.shape** - the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m). \n",
"* **ndarray.size** - the total number of elements of the array. \n",
"* **ndarray.dtype** - numpy.int32, numpy.int16, and numpy.float64 are some examples. \n",
"* **ndarray.itemsize** - the size in bytes of elements of the array. For example, elements of type float64 has itemsize 8 (=64/8) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"To use `numpy` need to import the module it using of example:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import numpy as np # naming import convention"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Terminology Assumption"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"In the `numpy` package the terminology used for vectors, matrices and higher-dimensional data sets is *array*. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Reference Documentation"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* On the web: [http://docs.scipy.org](http://docs.scipy.org)/\n",
"\n",
"* Interactive help:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"np.array?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"If you're looking for something"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"np.lookfor('create array')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"np.con*?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Help is your friend"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Whenever in doubt, there is the `help` function to the rescue"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"scrolled": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# For example, try \n",
"help(np.ndarray)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Numpy Array Object"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"`NumPy` has a multidimensional array object called ndarray. It consists of two parts as follows:\n",
" \n",
" * The actual data\n",
" * Some metadata describing the data\n",
" \n",
" \n",
"The majority of array operations leave the raw data untouched. The only aspect that changes is the metadata."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Creating `numpy` arrays"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"There are a number of ways to initialize new numpy arrays, for example from\n",
"\n",
"* a Python list or tuples\n",
"* using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, etc."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### From lists"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"For example, to create new vector and matrix arrays from Python lists we can use the `numpy.array` function."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v = np.array([1,2,3,4])\n",
"v"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2],\n",
" [3, 4]])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"M = np.array([[1, 2], [3, 4]])\n",
"M"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The `v` and `M` objects are both of the type `ndarray` that the `numpy` module provides."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Type of v: \n",
"Type of M: \n"
]
}
],
"source": [
"print('Type of v: ', type(v))\n",
"print('Type of M: ', type(M))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The difference between the `v` and `M` arrays is only their shapes. \n",
"\n",
"To do so, we could use the `numpy.shape` function:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Size of v: (4,)\n",
"Size of M: (2, 2)\n"
]
}
],
"source": [
"print('Shape of v: ', np.shape(v))\n",
"print('Shape of M: ', np.shape(M))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Alternatively, We can get information about the shape of an array by using the `ndarray.shape` **property** :"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"((4,), (2, 2))"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v.shape, M.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Equivalently, we can get information about the **size** of the two `ndarrays`, namely the *total number of elements* in the array."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Size of v: 4\n",
"Size of M: 4\n"
]
}
],
"source": [
"print('Size of v:', v.size)\n",
"print('Size of M:', M.size)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### More properties of the `numpy array`"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"M.itemsize # bytes per element"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"32"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"M.nbytes # number of bytes"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"M.ndim # number of dimensions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Using array-generating functions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"For larger arrays it is inpractical to initialize the data manually, using explicit python lists. \n",
"\n",
"Instead we can use one of the many **functions** in `numpy` that generates arrays of different forms. \n",
"\n",
"Some of the more common are: \n",
"\n",
"* `np.arange`; \n",
"* `np.linspace`; \n",
"* `np.logspace`; \n",
"* `np.mgrid`;\n",
"* `np.random.rand`;\n",
"* `np.diag`;\n",
"* `np.zeros`;\n",
"* `np.ones`;\n",
"* `np.empty`;\n",
"* `np.tile`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `np.arange`"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0 1 2 3 4 5 6 7 8 9]\n"
]
}
],
"source": [
"x = np.arange(0, 10, 1) \n",
"print(x)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ -1.00000000e+00 -9.00000000e-01 -8.00000000e-01 -7.00000000e-01\n",
" -6.00000000e-01 -5.00000000e-01 -4.00000000e-01 -3.00000000e-01\n",
" -2.00000000e-01 -1.00000000e-01 -2.22044605e-16 1.00000000e-01\n",
" 2.00000000e-01 3.00000000e-01 4.00000000e-01 5.00000000e-01\n",
" 6.00000000e-01 7.00000000e-01 8.00000000e-01 9.00000000e-01]\n"
]
}
],
"source": [
"# floating point step-wise range generatation\n",
"x = np.arange(-1, 1, 0.1) \n",
"print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `np.linspace` and `np.logspace`"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0. , 0.41666667, 0.83333333, 1.25 ,\n",
" 1.66666667, 2.08333333, 2.5 , 2.91666667,\n",
" 3.33333333, 3.75 , 4.16666667, 4.58333333,\n",
" 5. , 5.41666667, 5.83333333, 6.25 ,\n",
" 6.66666667, 7.08333333, 7.5 , 7.91666667,\n",
" 8.33333333, 8.75 , 9.16666667, 9.58333333, 10. ])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# using linspace, both end points **ARE included**\n",
"np.linspace(0, 10, 25)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1.00000000e+00, 2.27278564e+00, 5.16555456e+00,\n",
" 1.17401982e+01, 2.66829540e+01, 6.06446346e+01,\n",
" 1.37832255e+02, 3.13263169e+02, 7.11980032e+02,\n",
" 1.61817799e+03])"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.logspace(0, np.e**2, 10, base=np.e)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `np.mgrid`"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"x, y = np.mgrid[0:5, 0:5] # similar to meshgrid in MATLAB"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[0, 0, 0, 0, 0],\n",
" [1, 1, 1, 1, 1],\n",
" [2, 2, 2, 2, 2],\n",
" [3, 3, 3, 3, 3],\n",
" [4, 4, 4, 4, 4]])"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[0, 1, 2, 3, 4],\n",
" [0, 1, 2, 3, 4],\n",
" [0, 1, 2, 3, 4],\n",
" [0, 1, 2, 3, 4],\n",
" [0, 1, 2, 3, 4]])"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `np.random.rand` & `np.random.randn`"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.33658948, 0.28564552, 0.73183017, 0.7395105 , 0.66427382],\n",
" [ 0.25942094, 0.43844615, 0.48250402, 0.24063916, 0.90171053],\n",
" [ 0.51114245, 0.49587249, 0.61832302, 0.71996951, 0.22064571],\n",
" [ 0.38625609, 0.44313367, 0.74975323, 0.57600147, 0.80771956],\n",
" [ 0.84511666, 0.6064582 , 0.62365173, 0.62766319, 0.80129396]])"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# uniform random numbers in [0,1]\n",
"np.random.rand(5,5)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.65782724, 0.65168367, 0.58525852, 0.33781734, -0.00700978],\n",
" [ 0.61574011, 0.59150639, -0.33797592, -0.2509655 , 0.77237429],\n",
" [-0.15693266, -0.38377945, -0.28140147, 0.90558314, 0.25437408],\n",
" [-1.136108 , 2.43964939, 0.28583627, -0.27540796, -0.57253111],\n",
" [-0.79080395, 0.50525127, 2.1113386 , -0.33769711, -0.64914575]])"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# standard normal distributed random numbers\n",
"np.random.randn(5,5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `np.diag`"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 0, 0],\n",
" [0, 2, 0],\n",
" [0, 0, 3]])"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# a diagonal matrix\n",
"np.diag([1,2,3])"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[0, 1, 0, 0],\n",
" [0, 0, 2, 0],\n",
" [0, 0, 0, 3],\n",
" [0, 0, 0, 0]])"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# diagonal with offset from the main diagonal\n",
"np.diag([1,2,3], k=1) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `np.eye`"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1., 0., 0.],\n",
" [ 0., 1., 0.],\n",
" [ 0., 0., 1.]])"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# a diagonal matrix with ones on the main diagonal\n",
"np.eye(3) # 3 is the "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `np.zeros` and `np.ones`"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0., 0., 0.],\n",
" [ 0., 0., 0.],\n",
" [ 0., 0., 0.]])"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.zeros((3,3))"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1., 1., 1.],\n",
" [ 1., 1., 1.],\n",
" [ 1., 1., 1.]])"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.ones((3, 3))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### DIY"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Try by yourself*** the following commands:\n",
"\n",
" np.zeros((3,4))\n",
" np.ones((3,4))\n",
" np.empty((2,3))\n",
" np.eye(5)\n",
" np.diag(np.arange(5))\n",
" np.tile(np.array([[6, 7], [8, 9]]), (2, 2))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## So, why is it useful then?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"So far the `numpy.ndarray` looks awefully much like a Python **list** (or **nested list**). \n",
"\n",
"*Why not simply use Python lists for computations instead of creating a new array type?*"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"There are several reasons:\n",
"\n",
"* Python lists are very general. \n",
" - They can contain any kind of object. \n",
" - They are dynamically typed. \n",
" - They do not support mathematical functions such as matrix and dot multiplications, etc. \n",
" - Implementing such functions for Python lists would not be very efficient because of the dynamic typing.\n",
" \n",
" \n",
"* Numpy arrays are **statically typed** and **homogeneous**. \n",
" - The type of the elements is determined when array is created.\n",
" \n",
" \n",
"* Numpy arrays are memory efficient.\n",
" - Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"L = range(1000)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1000 loops, best of 3: 519 µs per loop\n"
]
}
],
"source": [
"%timeit [i**2 for i in L]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"a = np.arange(1000)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The slowest run took 1538.53 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
"100000 loops, best of 3: 1.98 µs per loop\n"
]
}
],
"source": [
"%timeit a**2"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The slowest run took 9.51 times longer than the fastest. This could mean that an intermediate result is being cached.\n",
"1000 loops, best of 3: 332 µs per loop\n"
]
}
],
"source": [
"%timeit [element**2 for element in a]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Simple arrays"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Create simple one and two dimensional arrays. First, redo the examples\n",
"from above. And then create your own.\n",
"\n",
"* Use the functions `len`, `shape` and `ndim` on some of those arrays and\n",
"observe their output."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Creating arrays using functions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Experiment with `arange`, `linspace`, `ones`, `zeros`, `eye` and `diag`.\n",
"\n",
"* Create different kinds of arrays with random numbers.\n",
"\n",
"* Try setting the seed before creating an array with random values \n",
" - *hint*: use `np.random.seed`\n",
"\n",
"* Look at the function `np.empty`. What does it do? When might this be\n",
"useful?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Basic Data Type"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"You may have noticed that, in some instances, array elements are\n",
"displayed with a trailing dot (e.g. `2.` vs `2`). This is due to a\n",
"difference in the data-type used:"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"dtype('int64')"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([1, 2, 3])\n",
"a.dtype"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"dtype('float64')"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b = np.array([1., 2., 3.])\n",
"b.dtype"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Note\n",
"\n",
"Different data-types allow us to store data more compactly in memory,\n",
"but most of the time we simply work with floating point numbers. Note\n",
"that, in the example above, NumPy auto-detects the data-type from the\n",
"input."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"You can explicitly specify which data-type you want:"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"dtype('float64')"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c = np.array([1, 2, 3], dtype=float)\n",
"c.dtype"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The **default** data type is floating point:"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"dtype('float64')"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.ones((3, 3))\n",
"a.dtype"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Basic Data Types"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
" bool | This stores boolean (True or False) as a bit\n",
"\n",
" inti | This is a platform integer (normally either int32 or int64)\n",
" int8 | This is an integer ranging from -128 to 127\n",
" int16 | This is an integer ranging from -32768 to 32767\n",
" int32 | This is an integer ranging from -2 ** 31 to 2 ** 31 -1\n",
" int64 | This is an integer ranging from -2 ** 63 to 2 ** 63 -1\n",
" \n",
" uint8 | This is an unsigned integer ranging from 0 to 255\n",
" uint16 | This is an unsigned integer ranging from 0 to 65535\n",
" uint32 | This is an unsigned integer ranging from 0 to 2 ** 32 - 1\n",
" uint64 | This is an unsigned integer ranging from 0 to 2 ** 64 - 1\n",
"\n",
" float16 | This is a half precision float with sign bit, 5 bits exponent, and 10 bits mantissa\n",
" float32 | This is a single precision float with sign bit, 8 bits exponent, and 23 bits mantissa\n",
" float64 or float | This is a double precision float with sign bit, 11 bits exponent, and 52 bits mantissa\n",
" complex64 | This is a complex number represented by two 32-bit floats (real and imaginary components)\n",
" complex128 | This is a complex number represented by two 64-bit floats (real and imaginary components)\n",
" (or complex)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Conversions and Type Casting"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"42.0"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.float64(42) # int to float"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"42"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.int8(42.0) # float to int8"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.bool(42) # int to bool"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.bool(0) # \"special\" int to bool"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.bool(42.0) # float to bool"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1.0"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.float(True) # bool to float"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0.0"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.float(False)"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4, 5, 6], dtype=uint16)"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.arange(7, dtype=np.uint16)"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "can't convert complex to int",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m42.0\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1.j\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# complex to int\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: can't convert complex to int"
]
}
],
"source": [
"np.int(42.0 + 1.j) # complex to int"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "can't convert complex to float",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfloat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m42.0\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m1.j\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# complex to float\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: can't convert complex to float"
]
}
],
"source": [
"np.float(42.0 + 1.j) # complex to float"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "can't convert complex to float",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfloat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m42.0\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m0.j\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# complex to float\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: can't convert complex to float"
]
}
],
"source": [
"np.float(42.0 + 0.j) # complex to float"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(42+0j)\n"
]
}
],
"source": [
"cn = np.complex(42.0) # Btw, you can convert a float to a complex..\n",
"print(cn)"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"42.0"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Extracting the Real part..\n",
"cn.real"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0.0"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# .. and the Imaginary part\n",
"cn.imag"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Numerical Types and Representation"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The **numerical dtype** of an array should be selected very carefully, as it directly affects the numerical representation of elements, that is: \n",
"\n",
" * the number of **bytes used; \n",
" * the *numerical range*"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"So, then: **What happens if I try to represent a number that is Out of range?**\n",
"\n",
"Let's have a go with **integers**, i.e., `int8` and `uint8`"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, 0], dtype=int8)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.zeros(4, 'int8') # Integer ranging from -128 to 127\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([127, 0, 0, 0], dtype=int8)"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[0] = 127\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([-128, 0, 0, 0], dtype=int8)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[0] = 128\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([-128, -127, 0, 0], dtype=int8)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[1] = 129\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([-128, -127, 1, 0], dtype=int8)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[2] = 257 # i.e. (128 x 2) + 1\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 0, 0, 0], dtype=uint8)"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ux = np.zeros(4, 'uint8') # Integer ranging from 0 to 255\n",
"ux"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([255, 0, 1, 1], dtype=uint8)"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ux[0] = 255\n",
"ux[1] = 256\n",
"ux[2] = 257\n",
"ux[3] = 513 # (256 x 2) + 1\n",
"ux"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Data Type Object"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Data type objects** are instances of the `numpy.dtype` class. \n",
"\n",
"Once again, arrays have a data type. \n",
"
\n",
"To be precise, *every element* in a NumPy array has the same data type. \n",
"\n",
"The data type object can tell you the `size` of the data in bytes.\n",
"
\n",
"(**Recall**: The size in bytes is given by the `itemsize` attribute of the dtype class)"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a itemsize: 2\n",
"a.dtype.itemsize: 2\n"
]
}
],
"source": [
"a = np.arange(7, dtype=np.uint16)\n",
"print('a itemsize: ', a.itemsize)\n",
"print('a.dtype.itemsize: ', a.dtype.itemsize)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We may also have access to the `byteorder`, i.e. **Big Endian** or **Little Endian**"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'='"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.dtype.byteorder"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"### Note:\n",
"\n",
"**Byte Order** can be one of:\n",
"\n",
"* `=\tnative`\n",
"* `<\tlittle-endian`\n",
"* `>\tbig-endian`\n",
"* `|\tnot applicable`"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Character Codes\n",
"\n",
"Character codes are included for backward compatibility with **Numeric**. \n",
"
\n",
"Numeric is the predecessor of NumPy. Their use is not recommended, but these codes pop up in several places. \n",
"\n",
"Btw, You should instead use the **dtype** objects. \n",
"\n",
" integer i\n",
" Unsigned integer u\n",
" Single precision float f\n",
" Double precision float d\n",
" bool b\n",
" complex D\n",
" string S\n",
" unicode U"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### `dtypes` properties"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"t = np.dtype('Float64')"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'d'"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t.char"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"numpy.float64"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t.type"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'