# What is Numpy

NumPy is the fundamental package for scientific computing with Python. 
It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. 
It is implemented in C and Fortran so when calculations are **vectorized**, performance is very good.

So, in a nutshell:

* a powerful Python extension for N-dimensional array
* a tool for integrating C/C++ and Fortran code
* designed for scientific computation: linear algebra and Signal Analysis

If you are a MATLAB&reg; user we recommend to read [Numpy for MATLAB Users](http://www.scipy.org/NumPy_for_Matlab_Users) and [Benefit of Open Source Python versus commercial packages](http://www.scipy.org/NumPyProConPage). 

I'm a supporter of the **Open Science Movement**, thus I humbly suggest you to take a look at the [Science Code Manifesto](http://sciencecodemanifesto.org/)

# Getting Started with Numpy Arrays

NumPy's main object is the **homogeneous** ***multidimensional array***. It is a table of elements (usually numbers), all of the same type. 

In Numpy dimensions are called **axes**. 

The number of axes is called **rank**. 

The most important attributes of an ndarray object are:

* **ndarray.ndim**     - the number of axes (dimensions) of the array. 
* **ndarray.shape**    - the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m). 
* **ndarray.size**     - the total number of elements of the array. 
* **ndarray.dtype**    - numpy.int32, numpy.int16, and numpy.float64 are some examples. 
* **ndarray.itemsize** - the size in bytes of elements of the array. For example, elements of type float64 has itemsize 8 (=64/8) 

To use `numpy` need to import the module it using of example:

In [2]:
import numpy as np  # naming import convention

### Terminology Assumption

In the `numpy` package the terminology used for vectors, matrices and higher-dimensional data sets is *array*. 

### Reference Documentation

* On the web: [http://docs.scipy.org](http://docs.scipy.org)/

* Interactive help:

In [None]:
np.array?

If you're looking for something

In [None]:
np.lookfor('create array')

In [None]:
np.con*?

#### Help is your friend

Whenever in doubt, there is the `help` function to the rescue

In [None]:
# For example, try 
help(np.ndarray)

## Numpy Array Object

`NumPy` has a multidimensional array object called ndarray. It consists of two parts as follows:
   
   * The actual data
   * Some metadata describing the data
    
    
The majority of array operations leave the raw data untouched. The only aspect that changes is the metadata.

<img src="images/ndarray_with_details.png" />

## Creating `numpy` arrays

There are a number of ways to initialize new numpy arrays, for example from

* a Python list or tuples
* using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, etc.

### From lists

For example, to create new vector and matrix arrays from Python lists we can use the `numpy.array` function.

In [3]:
# a vector: the argument to the array function is a Python list
v = np.array([1,2,3,4])
v

array([1, 2, 3, 4])

In [4]:
# a matrix: the argument to the array function is a nested Python list
M = np.array([[1, 2], [3, 4]])
M

array([[1, 2],
       [3, 4]])

The `v` and `M` objects are both of the type `ndarray` that the `numpy` module provides.

In [5]:
print('Type of v: ', type(v))
print('Type of M: ', type(M))

Type of v:  <class 'numpy.ndarray'>
Type of M:  <class 'numpy.ndarray'>


The difference between the `v` and `M` arrays is only their shapes. 

To do so, we could use the `numpy.shape` function:

In [12]:
print('Shape of v: ', np.shape(v))
print('Shape of M: ', np.shape(M))

Size of v:  (4,)
Size of M:  (2, 2)


Alternatively, We can get information about the shape of an array by using the `ndarray.shape` **property** :

In [10]:
v.shape, M.shape

((4,), (2, 2))

Equivalently, we can get information about the **size** of the two `ndarrays`, namely the *total number of elements* in the array.

In [13]:
print('Size of v:', v.size)
print('Size of M:', M.size)

Size of v: 4
Size of M: 4


#### More properties of the `numpy array`

In [32]:
M.itemsize # bytes per element

8

In [33]:
M.nbytes # number of bytes

32

In [34]:
M.ndim # number of dimensions

2

## Using array-generating functions

For larger arrays it is inpractical to initialize the data manually, using explicit python lists. 

Instead we can use one of the many **functions** in `numpy` that generates arrays of different forms. 

Some of the more common are: 

* `np.arange`; 
* `np.linspace`; 
* `np.logspace`; 
* `np.mgrid`;
* `np.random.rand`;
* `np.diag`;
* `np.zeros`;
* `np.ones`;
* `np.empty`;
* `np.tile`.

### `np.arange`

In [15]:
# create a range
x = np.arange(0, 10, 1) # arguments: start, stop, step
print(x)

[0 1 2 3 4 5 6 7 8 9]


In [17]:
x = np.arange(-1, 1, 0.1)  # floating point step-wise range generatation
print(x)

[ -1.00000000e+00  -9.00000000e-01  -8.00000000e-01  -7.00000000e-01
  -6.00000000e-01  -5.00000000e-01  -4.00000000e-01  -3.00000000e-01
  -2.00000000e-01  -1.00000000e-01  -2.22044605e-16   1.00000000e-01
   2.00000000e-01   3.00000000e-01   4.00000000e-01   5.00000000e-01
   6.00000000e-01   7.00000000e-01   8.00000000e-01   9.00000000e-01]


### `np.linspace` and `np.logspace`

In [18]:
# using linspace, both end points **ARE included**
np.linspace(0, 10, 25)

array([  0.        ,   0.41666667,   0.83333333,   1.25      ,
         1.66666667,   2.08333333,   2.5       ,   2.91666667,
         3.33333333,   3.75      ,   4.16666667,   4.58333333,
         5.        ,   5.41666667,   5.83333333,   6.25      ,
         6.66666667,   7.08333333,   7.5       ,   7.91666667,
         8.33333333,   8.75      ,   9.16666667,   9.58333333,  10.        ])

In [36]:
np.logspace(0, np.e**2, 10, base=np.e)

array([  1.00000000e+00,   2.27278564e+00,   5.16555456e+00,
         1.17401982e+01,   2.66829540e+01,   6.06446346e+01,
         1.37832255e+02,   3.13263169e+02,   7.11980032e+02,
         1.61817799e+03])

### `np.mgrid`

In [21]:
x, y = np.mgrid[0:5, 0:5]  # similar to meshgrid in MATLAB

In [22]:
x

array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])

In [23]:
y

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

### `np.random.rand` & `np.random.randn`

In [24]:
# uniform random numbers in [0,1]
np.random.rand(5,5)

array([[ 0.33658948,  0.28564552,  0.73183017,  0.7395105 ,  0.66427382],
       [ 0.25942094,  0.43844615,  0.48250402,  0.24063916,  0.90171053],
       [ 0.51114245,  0.49587249,  0.61832302,  0.71996951,  0.22064571],
       [ 0.38625609,  0.44313367,  0.74975323,  0.57600147,  0.80771956],
       [ 0.84511666,  0.6064582 ,  0.62365173,  0.62766319,  0.80129396]])

In [25]:
# standard normal distributed random numbers
np.random.randn(5,5)

array([[ 0.65782724,  0.65168367,  0.58525852,  0.33781734, -0.00700978],
       [ 0.61574011,  0.59150639, -0.33797592, -0.2509655 ,  0.77237429],
       [-0.15693266, -0.38377945, -0.28140147,  0.90558314,  0.25437408],
       [-1.136108  ,  2.43964939,  0.28583627, -0.27540796, -0.57253111],
       [-0.79080395,  0.50525127,  2.1113386 , -0.33769711, -0.64914575]])

### `np.diag`

In [27]:
# a diagonal matrix
np.diag([1,2,3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [29]:
# diagonal with offset from the main diagonal
np.diag([1,2,3], k=1) 

array([[0, 1, 0, 0],
       [0, 0, 2, 0],
       [0, 0, 0, 3],
       [0, 0, 0, 0]])

### `np.eye`

In [50]:
# a diagonal matrix with ones on the main diagonal
np.eye(3)  # 3 is the 

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

### `np.zeros` and `np.ones`

In [30]:
np.zeros((3,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [31]:
np.ones((3, 3))

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

### DIY

***Try by yourself*** the following commands:

    np.zeros((3,4))
    np.ones((3,4))
    np.empty((2,3))
    np.eye(5)
    np.diag(np.arange(5))
    np.tile(np.array([[6, 7], [8, 9]]), (2, 2))

## So, why is it useful then?

So far the `numpy.ndarray` looks awefully much like a Python **list** (or **nested list**). 

*Why not simply use Python lists for computations instead of creating a new array type?*

There are several reasons:

* Python lists are very general. 
    - They can contain any kind of object. 
    - They are dynamically typed. 
    - They do not support mathematical functions such as matrix and dot multiplications, etc. 
    - Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
    
    
* Numpy arrays are **statically typed** and **homogeneous**. 
    - The type of the elements is determined when array is created.
    
    
* Numpy arrays are memory efficient.
    - Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used).

In [51]:
L = range(1000)

In [52]:
%timeit [i**2 for i in L]

1000 loops, best of 3: 558 µs per loop


In [53]:
a = np.arange(1000)

In [54]:
%timeit a**2

The slowest run took 52.96 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 2.19 µs per loop


## Exercises

### Simple arrays

* Create simple one and two dimensional arrays. First, redo the examples
from above. And then create your own.

* Use the functions `len`, `shape` and `ndim` on some of those arrays and
observe their output.

### Creating arrays using functions

* Experiment with `arange`, `linspace`, `ones`, `zeros`, `eye` and `diag`.

* Create different kinds of arrays with random numbers.

* Try setting the seed before creating an array with random values 
    - *hint*: use `np.random.seed`

* Look at the function `np.empty`. What does it do? When might this be
useful?

# Basic Data Type

You may have noticed that, in some instances, array elements are
displayed with a trailing dot (e.g. `2.` vs `2`). This is due to a
difference in the data-type used:

In [59]:
a = np.array([1, 2, 3])
a.dtype

dtype('int64')

In [60]:
b = np.array([1., 2., 3.])
b.dtype

dtype('float64')

### Note

Different data-types allow us to store data more compactly in memory,
but most of the time we simply work with floating point numbers. Note
that, in the example above, NumPy auto-detects the data-type from the
input.

You can explicitly specify which data-type you want:

In [61]:
c = np.array([1, 2, 3], dtype=float)
c.dtype

dtype('float64')

The **default** data type is floating point:

In [62]:
a = np.ones((3, 3))
a.dtype

dtype('float64')

## Basic Data Types

    bool             | This stores boolean (True or False) as a bit

    inti             | This is a platform integer (normally either int32 or int64)
    int8             | This is an integer ranging from -128 to 127
    int16            | This is an integer ranging from -32768 to 32767
    int32            | This is an integer ranging from -2 ** 31 to 2 ** 31 -1
    int64            | This is an integer ranging from -2 ** 63 to 2 ** 63 -1
    
    uint8            | This is an unsigned integer ranging from 0 to 255
    uint16           | This is an unsigned integer ranging from 0 to 65535
    uint32           | This is an unsigned integer ranging from 0 to 2 ** 32 - 1
    uint64           | This is an unsigned integer ranging from 0 to 2 ** 64 - 1

    float16          | This is a half precision float with sign bit, 5 bits exponent, and 10 bits mantissa
    float32          | This is a single precision float with sign bit, 8 bits exponent, and 23 bits mantissa
    float64 or float | This is a double precision float with sign bit, 11 bits exponent, and 52 bits mantissa
    complex64        | This is a complex number represented by two 32-bit floats (real and imaginary components)
    complex128       | This is a complex number represented by two 64-bit floats (real and imaginary components)
    (or complex)


## Conversions and Type Casting

In [63]:
np.float64(42)  # int to float

42.0

In [64]:
np.int8(42.0)  # float to int8

42

In [67]:
np.bool(42)  # int to bool

True

In [68]:
np.bool(0)   # "special" int to bool

False

In [69]:
np.bool(42.0)  # float to bool

True

In [70]:
np.float(True)  # bool to float

1.0

In [71]:
np.float(False)

0.0

In [72]:
np.arange(7, dtype=np.uint16)

array([0, 1, 2, 3, 4, 5, 6], dtype=uint16)

In [74]:
np.int(42.0 + 1.j)  # complex to int

TypeError: can't convert complex to int

In [73]:
np.float(42.0 + 1.j)  # complex to float

TypeError: can't convert complex to float

In [75]:
np.float(42.0 + 0.j)  # complex to float

TypeError: can't convert complex to float

In [77]:
cn = np.complex(42.0)  # Btw, you can convert a float to a complex..
print(cn)

(42+0j)


In [79]:
# Extracting the Real part..
cn.real

42.0

In [80]:
# .. and the Imaginary part
cn.imag

0.0

## Numerical Types and Representation

The **numerical dtype** of an array should be selected very carefully, as it directly affects the numerical representation of elements, that is: 

   * the number of **bytes used; 
   * the *numerical range*

So, then: **What happens if I try to represent a number that is Out of range?**

Let's have a go with **integers**, i.e., `int8` and `uint8`

In [27]:
x = np.zeros(4, 'int8')  # Integer ranging from -128 to 127
x

array([0, 0, 0, 0], dtype=int8)

In [28]:
x[0] = 127
x

array([127,   0,   0,   0], dtype=int8)

In [29]:
x[0] = 128
x

array([-128,    0,    0,    0], dtype=int8)

In [30]:
x[1] = 129
x

array([-128, -127,    0,    0], dtype=int8)

In [31]:
x[2] = 257  # i.e. (128 x 2) + 1
x

array([-128, -127,    1,    0], dtype=int8)

In [32]:
ux = np.zeros(4, 'uint8')  # Integer ranging from 0 to 255
ux

array([0, 0, 0, 0], dtype=uint8)

In [33]:
ux[0] = 255
ux[1] = 256
ux[2] = 257
ux[3] = 513  # (256 x 2) + 1
ux

array([255,   0,   1,   1], dtype=uint8)

## Data Type Object

**Data type objects** are instances of the `numpy.dtype` class. 

Once again, arrays have a data type. 
<br>
To be precise, *every element* in a NumPy array has the same data type. 

The data type object can tell you the `size` of the data in bytes.
<br>
(**Recall**: The size in bytes is given by the `itemsize` attribute of the dtype class)

In [81]:
a = np.arange(7, dtype=np.uint16)
print('a itemsize: ', a.itemsize)
print('a.dtype.itemsize: ', a.dtype.itemsize)

a itemsize:  2
a.dtype.itemsize:  2


We may also have access to the `byteorder`, i.e. **Big Endian** or **Little Endian**

In [82]:
a.dtype.byteorder

'='

### Note:

**Byte Order** can be one of:

* `=	native`
* `<	little-endian`
* `>	big-endian`
* `|	not applicable`

### Character Codes

Character codes are included for backward compatibility with **Numeric**. 
<br>
Numeric is the predecessor of NumPy. Their use is not recommended, but these codes pop up in several places. 

Btw, You should instead use the **dtype** objects. 

    integer                     i
    Unsigned integer            u
    Single precision float      f
    Double precision float      d
    bool                        b
    complex                     D
    string                      S
    unicode                     U

### `dtypes` properties

In [83]:
t = np.dtype('Float64')

In [84]:
t.char

'd'

In [85]:
t.type

numpy.float64

In [86]:
t.str

'<f8'

### `dtype` contructors

In [87]:
np.dtype(float)

dtype('float64')

In [88]:
np.dtype('f')

dtype('float32')

In [89]:
np.dtype('d')

dtype('float64')

In [90]:
np.dtype('f8')

dtype('float64')

In [91]:
np.dtype('Float64')

dtype('float64')

**Note**: A listing of all data type names can be found by calling `np.sctypeDict.keys()`

## Creating a Record of `dtype`

* Define the new record type

In [102]:
rt = np.dtype([('name', np.str_, 40), ('numitems', np.int32), ('price', np.float32)])

In [103]:
rt['name']  # see the difference with Python 2

dtype('<U40')

In [104]:
rt['numitems']

dtype('int32')

In [105]:
rt['price']

dtype('float32')

* Instantiate an array of `dtype` equal to `t` (record type)


In [106]:
itemz = np.array([('Meaning of life DVD', 42, 3.14), ('Butter', 13, 2.72)], dtype=rt)

In [108]:
print(itemz)

[('Meaning of life DVD', 42, 3.140000104904175)
 ('Butter', 13, 2.7200000286102295)]


## Exercise

### Creating your own record type

* Practicing by creating your own record type

### Record Types and Array Creation

1. Create an array of elements whose type is a record defined as a couple of numbers. The first number refers to a ranking position, and the second one to a degree value.
2. Create an array of this new type, feeding the element from a dictionary, sorted with respect to the ranking positions.