{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to SciPy\n", "Tutorial at EuroSciPy 2019, Bilbao" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting started" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first import a number of packages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.__version__" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import scipy\n", "scipy.__version__" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy import fftpack, optimize, signal, stats" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib notebook\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Rocking motion of a TGV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our data are stored in a compressed file `TGV_data.csv.bz2`. Each row of the uncompressed file contains entries separated by commas and the first row contains labels explaining the content of the respective column.\n", "\n", "NumPy provides rather universal tools to import data from files into NumPy arrays. We will use `genfromtxt` which allows to deal with `bz2` compressed data files and also handles the labels in the first row of the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.info(np.genfromtxt)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = np.genfromtxt('TGV_data.csv.bz2', delimiter=',', names=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are five columns identified by names:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.dtype.names" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "time = data['Time_s']\n", "omega_x = data['Gyroscope_x_rads']\n", "omega_y = data['Gyroscope_y_rads']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us first get an idea of the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(time, omega_x)\n", "plt.plot(time, omega_y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Statistical analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use the data for $\\omega_x$ to demonstrate some statistical analysis. Let us first take a look at a histogram of the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n, bins = np.histogram(omega_x, bins=100, density=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because we have set `density=True`, the data can be considered as a normalized probability distribution." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.sum(n)*(bins[1]-bins[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The array `bins` contains the edges of the bins. To plot the histogram, we need their centers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bincenters = 0.5*(bins[1:]+bins[:-1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(bincenters, n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With SciPy we can easily determine some statistical characteristics from the original data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "description = stats.describe(omega_x, ddof=False)\n", "description" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can compare our histogram with a Gaussian for the mean and variance just obtained." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "loc = description.mean\n", "scale = np.sqrt(description.variance)\n", "loc, scale" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.linspace(-0.1, 0.1, 200)\n", "plt.plot(x, stats.norm.pdf(x, loc, scale))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A maximum likelihood fit with a Gaussian will yield the same result." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.norm.fit(omega_x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How can we fit the histogram to a Gaussian? Use nonlinear least-squares curve fitting." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def gaussian(x, loc, scale):\n", " return stats.norm.pdf(x, loc, scale)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "popt, pcov = optimize.curve_fit(gaussian, bincenters, n)\n", "popt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(x, gaussian(x, *popt))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How likely is it that our data follow a normal distribution?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.normaltest(omega_x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Normally distributed data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "normdata = stats.norm.rvs(1, 0.2, 5000)\n", "plt.plot(normdata)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n, bins = np.histogram(normdata, bins=100, density=True)\n", "bincenters = 0.5*(bins[1:]+bins[:-1])\n", "plt.plot(bincenters, n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.linspace(0, 1.6, 200)\n", "plt.plot(x, stats.norm.pdf(x, *stats.norm.fit(normdata)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "popt, pcov = optimize.curve_fit(gaussian, bincenters, n)\n", "popt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(x, gaussian(x, *popt))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stats.normaltest(normdata)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Distribution with skewness" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for a in range(-4, 5, 2):\n", " x = np.linspace(stats.skewnorm.ppf(0.001, a),\n", " stats.skewnorm.ppf(0.999, a), 100)\n", " plt.plot(x, stats.skewnorm.pdf(x, a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Spectral properties" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before analyzing the data any further, let us take a look at the time elapsed between subsequent measurements." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "time_intervals = time[1:]-time[:-1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_ = plt.hist(time_intervals, bins=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will ignore these small differences and assume the data to be equally spaced in time with the mean time difference between subsequent data points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "delta_t = np.mean(time_intervals)\n", "delta_t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will do the spectral analysis with the data for $\\omega_y$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(*signal.periodogram(omega_y, 1/delta_t))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z = fftpack.rfft(omega_y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(z[::2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(z[1::2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "freqs = fftpack.rfftfreq(omega_y.shape[0], delta_t)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(freqs[::2], z[::2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(freqs[1::2], z[1::2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are interested in the signal around 1.4 Hz. Filter out frequencies beyond 2 Hz." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filter_coeffs = signal.firwin(301, 3, pass_zero=True, fs=1/delta_t)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "freqs, response = signal.freqz(filter_coeffs, fs=1/delta_t)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(freqs, 20*np.log10(np.abs(response)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.polar(np.angle(response), np.abs(response))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(time, omega_y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "omega_y_filtered = signal.convolve(omega_y, filter_coeffs, mode='valid')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "omega_y.shape, omega_y_filtered.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(time[150:-150], omega_y[150:-150])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(time[150:-150], omega_y_filtered)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filter out anything but the range from 53-70 Hz" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filter_coeffs = signal.firwin(301, [53, 70], pass_zero='bandpass', fs=1/delta_t)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "freqs, response = signal.freqz(filter_coeffs, fs=1/delta_t)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(freqs, 20*np.log10(abs(response)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "omega_y_filtered = signal.convolve(omega_y, filter_coeffs, mode='valid')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(time[150:-150], omega_y[150:-150])\n", "plt.plot(time[150:-150], omega_y_filtered)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(fftpack.rfftfreq(omega_y_filtered.shape[0], delta_t)[::2],\n", " fftpack.rfft(omega_y_filtered)[::2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "freq, sp_time, Sxx = signal.spectrogram(omega_y, fs=1/delta_t, nperseg=512)\n", "plt.pcolormesh(freq, sp_time, Sxx)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib import cm\n", "from mpl_toolkits.mplot3d import Axes3D\n", "fig = plt.figure()\n", "ax = Axes3D(fig)\n", "ax.plot_surface(freq[:, np.newaxis], time, Sxx, cmap=cm.viridis)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }