# What is Numpy

NumPy is the fundamental package for scientific computing with Python. 
It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. 
It is implemented in C and Fortran so when calculations are **vectorized**, performance is very good.

So, in a nutshell:

* a powerful Python extension for N-dimensional array
* a tool for integrating C/C++ and Fortran code
* designed for scientific computation: linear algebra and Signal Analysis

If you are a MATLAB&reg; user we recommend to read [Numpy for MATLAB Users](http://www.scipy.org/NumPy_for_Matlab_Users) and [Benefit of Open Source Python versus commercial packages](http://www.scipy.org/NumPyProConPage). 

I'm a supporter of the **Open Science Movement**, thus I humbly suggest you to take a look at the [Science Code Manifesto](http://sciencecodemanifesto.org/)

# Getting Started with Numpy Arrays

NumPy's main object is the **homogeneous** ***multidimensional array***. It is a table of elements (usually numbers), all of the same type. 

In Numpy dimensions are called **axes**. 

The number of axes is called **rank**. 

The most important attributes of an ndarray object are:

* **ndarray.ndim**     - the number of axes (dimensions) of the array. 
* **ndarray.shape**    - the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m). 
* **ndarray.size**     - the total number of elements of the array. 
* **ndarray.dtype**    - numpy.int32, numpy.int16, and numpy.float64 are some examples. 
* **ndarray.itemsize** - the size in bytes of elements of the array. For example, elements of type float64 has itemsize 8 (=64/8) 

To use `numpy` need to import the module it using of example:

In [1]:
import numpy as np  # naming import convention

### Terminology Assumption

In the `numpy` package the terminology used for vectors, matrices and higher-dimensional data sets is *array*. 

### Reference Documentation

* On the web: [http://docs.scipy.org](http://docs.scipy.org)/

* Interactive help:

In [None]:
np.array?

If you're looking for something

# Creating `numpy` arrays

### Get acquainted with NumPy

Let's start by creating some `numpy.array` objects in order to get our hands into the very details of **numpy basic data structure**.

NumPy is a very flexible library, and provides many ways to create (and initialize) new numpy arrays. 

One way is **using specific functions dedicated to generate numpy arrays** 
(usually, *array of numbers*)\[+\]



\[+\] More on data types, later on !-)



# First `numpy array` example: array of numbers

NumPy provides many functions to generate arrays with with specific properties (e.g. `size` or `shape`).

We will see later examples in which we will generate `ndarray` using explicit Python lists. 

However, for larger arrays, using Python lists is simply inpractical. 

### `np.arange`

In standard Python, we use the `range` function to generate an **iterable** object of **integers** within a specific range (at a specified `step`, default: `1`)

In [2]:
r = range(10)
print(list(r))

print(type(r))  # NOTE: if this print will return a <type 'list'> it means you're using Py2.7

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
<class 'range'>


Similarly, in numpy there is the `arange` function which instead generates a `numpy.ndarray`

In [3]:
ra = np.arange(10) 
print(ra)

print(type(ra))

[0 1 2 3 4 5 6 7 8 9]
<class 'numpy.ndarray'>


However, we are working with the **Numerical Python** library, so we should expect more when it comes to numbers.

In fact, we can create an array within a _floating point step-wise range_:

In [4]:
# floating point step-wise range generatation
raf = np.arange(-1, 1, 0.1)  
print(raf)

[-1.00000000e+00 -9.00000000e-01 -8.00000000e-01 -7.00000000e-01
 -6.00000000e-01 -5.00000000e-01 -4.00000000e-01 -3.00000000e-01
 -2.00000000e-01 -1.00000000e-01 -2.22044605e-16  1.00000000e-01
  2.00000000e-01  3.00000000e-01  4.00000000e-01  5.00000000e-01
  6.00000000e-01  7.00000000e-01  8.00000000e-01  9.00000000e-01]


### Properties of `numpy array`

Apart from the actual content, which is of course different because specified ranges are different, the `ra` and `raf` arrays differ by their **`dtype`**:

In [5]:
print(f"dtype of 'ra': {ra.dtype}, dtype of 'raf': {raf.dtype}")

dtype of 'ra': int64, dtype of 'raf': float64


#### More properties of the `numpy array`

In [6]:
ra.itemsize # bytes per element

8

In [7]:
ra.nbytes # number of bytes

80

In [8]:
ra.ndim # number of dimensions

1

In [9]:
ra.shape # shape, i.e. number of elements per-dimension/axis

(10,)

In [10]:
## please replicate the same set of operations here for `raf`


In [11]:
# your code here

**Q**: Do you notice any relevant difference?

### `np.linspace` and `np.logspace`

Like `np.arange`, in numpy there are other two "similar" functions: 

- np.linspace
- np.logspace

Looking at the examples below, can you spot the difference?

In [12]:
np.linspace(0, 10, 20)

array([ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
        2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
        5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
        7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ])

In [13]:
np.logspace(0, np.e**2, 10, base=np.e)

array([1.00000000e+00, 2.27278564e+00, 5.16555456e+00, 1.17401982e+01,
       2.66829540e+01, 6.06446346e+01, 1.37832255e+02, 3.13263169e+02,
       7.11980032e+02, 1.61817799e+03])

## Random Number Generation

### `np.random.rand` & `np.random.randn`

In [14]:
# uniform random numbers in [0,1]
ru = np.random.rand(10)

In [15]:
ru

array([0.1008622 , 0.10457894, 0.28894687, 0.48408672, 0.46725924,
       0.5321588 , 0.13349925, 0.81708913, 0.60289559, 0.48526737])

_Note: numbers and the content of the array may vary_

In [16]:
# standard normal distributed random numbers
rs = np.random.randn(10)

In [17]:
rs

array([ 0.51706645, -2.34518602,  1.15502622, -2.46003476, -1.17790258,
        3.56947467,  0.93810606, -1.70039961, -0.5510843 ,  0.38638648])

_Note: numbers and the content of the array may vary_

**Q**: What if I ask you to generate random numbers in a way that we both obtain the __very same__ numbers? (_Provided we share the same CPU architecture_)

## Zeros and Ones (or Empty)

### `np.zeros`, `np.ones`, `np.empty`

Sometimes it may be required to initialise arrays of `zeros`, or of all `ones` or finally just `rubbish` (i.e. `empty`) of a specific shape:

In [18]:
Z = np.zeros((3,3))

print(Z)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [19]:
O = np.ones((3, 3))
print(O)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [20]:
E = np.empty(10)

print(E)

[0.51706645 2.34518602 1.15502622 2.46003476 1.17790258 3.56947467
 0.93810606 1.70039961 0.5510843  0.38638648]


In [None]:
# TRY THIS!

np.empty(9)

# Other specialised Functions

## Diagonal Matrices

### 1. `np.diag`

In [22]:
# a diagonal matrix
np.diag([1,2,3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [23]:
# diagonal with offset from the main diagonal
np.diag([1,2,3], k=-1)[::-1]

array([[0, 0, 3, 0],
       [0, 2, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 0, 0]])

### Identity Matrix $\mathrm{I} \mapsto$  `np.eye`

In [24]:
# a diagonal matrix with ones on the main diagonal
np.eye(3, dtype='int')  # 3 is the 

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

---

# Create `numpy.ndarray` from `list`

To create new vector or matrix arrays from Python lists we can use the 
`numpy.array` constructor function:

In [25]:
v = np.array([1,2,3,4])
v

array([1, 2, 3, 4])

In [26]:
print(type(v))

<class 'numpy.ndarray'>


**Alternatively** there is also the `np.asarray` function which easily convert a Python list into a numpy array:



In [27]:
v = np.asarray([1, 2, 3, 4])
v

array([1, 2, 3, 4])

In [28]:
print(type(v))

<class 'numpy.ndarray'>


We can use the very same strategy for higher-dimensional arrays.

E.g. Let's create a matrix from a list of lists:

In [29]:
M = np.array([[1, 2], [3, 4]])
M

array([[1, 2],
       [3, 4]])

In [30]:
v.shape, M.shape

((4,), (2, 2))

## So, why is it useful then?

So far the `numpy.ndarray` looks awefully much like a Python **list** (or **nested list**). 

*Why not simply use Python lists for computations instead of creating a new array type?*

There are several reasons:

* Python lists are very general. 
    - They can contain any kind of object. 
    - They are dynamically typed. 
    - They do not support mathematical functions such as matrix and dot multiplications, etc. 
    - Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
    
    
* Numpy arrays are **statically typed** and **homogeneous**. 
    - The type of the elements is determined when array is created.
    
    
* Numpy arrays are memory efficient.
    - Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used).

In [31]:
L = range(100000)

In [32]:
%timeit [i**2 for i in L]

32.1 ms ± 6.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [33]:
a = np.arange(100000)

In [34]:
%timeit a**2  # This operation is called Broadcasting - more on this later!

55.5 µs ± 187 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [35]:
%timeit [element**2 for element in a]

36 ms ± 2.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


---

## Exercises: DIY

### Simple arrays

* Create simple one and two dimensional arrays. First, redo the examples
from above. And then create your own.

* Use the functions `len`, `shape` and `ndim` on some of those arrays and
observe their output.

### Creating arrays using functions

* Experiment with `arange`, `linspace`, `ones`, `zeros`, `eye` and `diag`.

* Create different kinds of arrays with random numbers.

* Try setting the seed before creating an array with random values 
    - *hint*: use `np.random.seed`


---

## Numpy Array Object

`NumPy` has a multidimensional array object called ndarray. It consists of two parts as follows:
   
   * The actual data
   * Some metadata describing the data
    
    
The majority of array operations leave the raw data untouched. The only aspect that changes is the metadata.

<img src="images/ndarray_with_details.png" />

## Data vs Metadata (Attributes)

This internal separation between actual data (i.e. the content of the array --> the `memory`) and metadata (i.e. properties and attributes of the data), allows for example for an efficient memory management.

For example, the shape of an Numpy array **can be modified without copying and/or affecting** the actual data, which makes it a fast operation even for large arrays.

In [36]:
a = np.arange(45)

a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44])

In [37]:
a.shape

(45,)

In [38]:
A = a.reshape(9, 5)

A

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44]])

In [39]:
n, m = A.shape

In [40]:
B = A.reshape((1,n*m))
B

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
        32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]])

**Q**: What is the difference (in terms of shape) between `B` and the original `a`?

### Introducing `np.newaxis`

In addition to shape, we can also manipulate the axis of an array.

**(1)** We can always add as many axis as we want:

In [41]:
A = np.arange(20).reshape(10, 2)
A = A[np.newaxis, ...]  # this is called ellipsis

print(A.shape)

(1, 10, 2)


**(2)** We can also _permute_ axis:

In [42]:
A = A.swapaxes(0, 2)  # swap axis 0 with axis 2 --> new shape: (2, 10, 1)

print(A.shape)

(2, 10, 1)


Again, changin and manipulating the `axis` will not touch the memory, it will just change parameters (i.e. `strides` and `offset`) to navigate data.

---

## Numerical Types and Precision

In NumPy, talking about `int` or `float` does not make "real sense". This is mainly for two reasons:

(a) `int` or `float` are assumed at the maximum precision available on your machine (presumably `int64` and 
`float64`, respectively.

(b) Different precision imply different numerical ranges, and so different memory size (i.e. _number of bytes_ required to represent all the numbers in the corresponding numerical range).

Numpy support the following numerical types:

    bool             | This stores boolean (True or False) as a bit

    int0             | This is a platform integer (normally either int32 or int64)
    int8             | This is an integer ranging from -128 to 127
    int16            | This is an integer ranging from -32768 to 32767
    int32            | This is an integer ranging from -2 ** 31 to 2 ** 31 -1
    int64            | This is an integer ranging from -2 ** 63 to 2 ** 63 -1
    
    uint8            | This is an unsigned integer ranging from 0 to 255
    uint16           | This is an unsigned integer ranging from 0 to 65535
    uint32           | This is an unsigned integer ranging from 0 to 2 ** 32 - 1
    uint64           | This is an unsigned integer ranging from 0 to 2 ** 64 - 1

    float16          | This is a half precision float with sign bit, 5 bits exponent, and 10 bits mantissa
    float32          | This is a single precision float with sign bit, 8 bits exponent, and 23 bits mantissa
    float64 or float | This is a double precision float with sign bit, 11 bits exponent, and 52 bits mantissa
    complex64        | This is a complex number represented by two 32-bit floats (real and imaginary components)
    complex128       | This is a complex number represented by two 64-bit floats (real and imaginary components)
    (or complex)


### Numerical Types and Representation

The **numerical dtype** of an array should be selected very carefully, as it directly affects the numerical representation of elements, that is: 

   * the number of **bytes** used; 
   * the *numerical range*

We can **always specify** the `dtype` of an array when we create one. If we do not, the `dtype` of the array will be inferred, namely `np.int_` or `np.float_` depending on the case.

In [43]:
a = np.arange(10)
print(a)

print(a.dtype)

[0 1 2 3 4 5 6 7 8 9]
int64


In [44]:
au = np.arange(10, dtype=np.uint8)
print(au)

print(au.dtype)

[0 1 2 3 4 5 6 7 8 9]
uint8


So, then: **What happens if I try to represent a number that is Out of range?**

Let's have a go with **integers**, i.e., `int8` and `uint8`

In [45]:
x = np.zeros(4, 'int8')  # Integer ranging from -128 to 127
x

array([0, 0, 0, 0], dtype=int8)

>__Spoiler Alert__: _very simple example of indexing in NumPy_
>
> _Well...it works as expected, doesn't it?_

In [46]:
x[0] = 127
x

array([127,   0,   0,   0], dtype=int8)

In [47]:
x[0] = 128
x

array([-128,    0,    0,    0], dtype=int8)

In [48]:
x[1] = 129
x

array([-128, -127,    0,    0], dtype=int8)

In [49]:
x[2] = 257  # i.e. (128 x 2) + 1
x

array([-128, -127,    1,    0], dtype=int8)

In [50]:
ux = np.zeros(4, 'uint8')  # Integer ranging from 0 to 255, dtype also as string!
ux

array([0, 0, 0, 0], dtype=uint8)

In [51]:
ux[0] = 255
ux[1] = 256
ux[2] = 257
ux[3] = 513  # (256 x 2) + 1
ux

array([255,   0,   1,   1], dtype=uint8)

### Machine Info and Supported Numerical Representation

Numpy provides two functions to inspect the information of supported integer and floating-point types, namely `np.iinfo` and `np.finfo`:

In [52]:
np.iinfo(np.int32)

iinfo(min=-2147483648, max=2147483647, dtype=int32)

In [53]:
np.finfo(np.float16)

finfo(resolution=0.001, min=-6.55040e+04, max=6.55040e+04, dtype=float16)

In addition, the `MachAr` class will provide information on the current machine : 

In [54]:
machine_info = np.MachAr()

In [55]:
machine_info.epsilon

2.220446049250313e-16

In [57]:
machine_info.huge

1.7976931348623157e+308

In [58]:
np.finfo(np.float64).max == machine_info.huge

True

In [None]:
# TRY THIS!

help(machine_info)

# Data Type Object

**Data type objects** are instances of the `numpy.dtype` class. 

Once again, arrays have a data type. 
<br>
To be precise, *every element* in a NumPy array has the same data type. 

The data type object can tell you the `size` of the data in bytes.
<br>
(**Recall**: The size in bytes is given by the `itemsize` attribute of the dtype class)

In [59]:
a = np.arange(7, dtype=np.uint16)
print('a itemsize: ', a.itemsize)
print('a.dtype.itemsize: ', a.dtype.itemsize)

a itemsize:  2
a.dtype.itemsize:  2


### Character Codes

Character codes are included for backward compatibility with **Numeric**. 
<br>
Numeric is the predecessor of NumPy. Their use is not recommended, but these codes pop up in several places. 

Btw, You should instead use the **dtype** objects. 

    integer                     i
    Unsigned integer            u
    Single precision float      f
    Double precision float      d
    bool                        b
    complex                     D
    string                      S
    unicode                     U

### `dtype` contructors

In [60]:
np.dtype(float)

dtype('float64')

In [61]:
np.dtype('f')

dtype('float32')

In [62]:
np.dtype('d')

dtype('float64')

In [63]:
np.dtype('f8')

dtype('float64')

In [64]:
np.dtype('U10')  # Unicode string of up to 10 chars

dtype('<U10')

**Note**: A listing of all data type names can be found by calling `np.sctypeDict.keys()`

## Custom `dtype`

We can use the `np.dtype` constructor to create a **custom** record type.

In [65]:
rt = np.dtype([('name', np.str_, 40), ('numitems', np.int32), ('price', np.float32)])

In [66]:
rt['name']  # see the difference with Python 2

dtype('<U40')

In [67]:
rt['numitems']

dtype('int32')

In [68]:
rt['price']

dtype('float32')

* Instantiate an array of `dtype` equal to `t` (record type)


In [69]:
record_items = np.array([('Meaning of life DVD', 42, 3.14), ('Butter', 13, 2.72)], 
                        dtype=rt)

In [70]:
print(record_items)

[('Meaning of life DVD', 42, 3.14) ('Butter', 13, 2.72)]


---

# Exercises - Basic Numpy

## Ex 1. Create an array containing integers from $2$ to $2^6$


## 1.1 Print `ndarray` attributes and properties
(e.g. `type`, `dtype`, `shape...`) using previous on

## Ex 2. Create a 3x3 Matrix array and fill it with random integer numbers

- _hint_: Take a look at `np.random.randint`

## Ex3. Create a list containing $5$ others lists of integers, all of the same size. Convert this list of lists into a matrix (i.e. `numpy.ndarray`)


#### Ex 3.1 What happens if we generate an array converting a list of lists of different size/length?