Browse Source

revision of statistics part

Gert-Ludwig Ingold 5 years ago
parent
commit
a23ab12f95
1 changed files with 62 additions and 17 deletions
  1. 62 17
      notebooks/1_tgv_stats.ipynb

+ 62 - 17
notebooks/1_tgv_stats.ipynb

@@ -32,9 +32,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Our data are stored in a compressed file `TGV_data.csv.bz2`. Each row of the uncompressed file contains entries separated by commas and the first row contains labels explaining the content of the respective column.\n",
-    "\n",
-    "NumPy provides rather universal tools to import data from files into NumPy arrays. We will use `genfromtxt` which allows to deal with `bz2` compressed data files and also handles the labels in the first row of the data."
+    "Our gyroscope data are stored in a compressed file `data/TGV_data.csv.bz2`. Each row of the uncompressed file contains entries separated by commas and the first row contains labels explaining the content of the respective column."
    ]
   },
   {
@@ -69,8 +67,7 @@
    "outputs": [],
    "source": [
     "time = data['Time_s']\n",
-    "omega_x = data['Gyroscope_x_rads']\n",
-    "omega_y = data['Gyroscope_y_rads']"
+    "omega_x = data['Gyroscope_x_rads']"
    ]
   },
   {
@@ -86,8 +83,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "plt.plot(time, omega_x)\n",
-    "plt.plot(time, omega_y)"
+    "_ = plt.plot(time, omega_x)"
    ]
   },
   {
@@ -101,7 +97,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We use the data for $\\omega_x$ to demonstrate some statistical analysis. Let us first take a look at a histogram of the data."
+    "We use the data for $\\omega_x$ to demonstrate some aspects of statistical analysis with SciPy. Let us first take a look at a histogram of the data."
    ]
   },
   {
@@ -133,6 +129,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "<div style=\"color: DarkBlue;background-color: LightYellow\">\n",
+    "    <i>Exercise:</i> Check the values of n. What happens if you set density=False?\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "The array `bins` contains the edges of the bins. To plot the histogram, we need their centers."
    ]
   },
@@ -151,7 +156,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "plt.plot(bincenters, n)"
+    "_ = plt.plot(bincenters, n)"
    ]
   },
   {
@@ -175,7 +180,38 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can compare our histogram with a Gaussian for the mean and variance just obtained."
+    "* mean $\\langle x\\rangle$\n",
+    "\n",
+    "\n",
+    "* variance $\\langle (x-\\langle x\\rangle)^2\\rangle$\n",
+    "\n",
+    "\n",
+    "* skewness $\\dfrac{\\langle(x-\\langle x\\rangle)^3\\rangle}{\\langle(x-\\langle x\\rangle)^2\\rangle^{3/2}}$\n",
+    "\n",
+    "\n",
+    "* kurtosis $\\dfrac{\\langle(x-\\langle x\\rangle)^4\\rangle}{\\langle(x-\\langle x\\rangle)^2\\rangle^2}-3$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div style=\"color: DarkBlue;background-color: LightYellow\">\n",
+    "    <i>Exercise:</i> Check the result for the kurtosis as there are two different definitions with 3 subtracted or not.\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can compare our histogram with a Gaussian for the mean and variance just obtained.\n",
+    "\n",
+    "Probability density function of the normal distribution\n",
+    "$$p(x) = \\frac{1}{\\sqrt{2\\pi}}\\exp\\left(-\\frac{x^2}{2}\\right)$$\n",
+    "\n",
+    "For general mean and variance:\n",
+    "$$x = \\frac{y-\\mathrm{loc}}{\\mathrm{scale}}$$"
    ]
   },
   {
@@ -196,7 +232,7 @@
    "outputs": [],
    "source": [
     "x = np.linspace(-0.1, 0.1, 200)\n",
-    "plt.plot(x, stats.norm.pdf(x, loc, scale))"
+    "_ = plt.plot(x, stats.norm.pdf(x, loc, scale))"
    ]
   },
   {
@@ -248,7 +284,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "plt.plot(x, gaussian(x, *popt))"
+    "_ = plt.plot(x, gaussian(x, *popt))"
    ]
   },
   {
@@ -271,7 +307,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Normally distributed data"
+    "### Generation of normally distributed data"
    ]
   },
   {
@@ -281,7 +317,7 @@
    "outputs": [],
    "source": [
     "normdata = stats.norm.rvs(1, 0.2, 5000)\n",
-    "plt.plot(normdata)"
+    "_ = plt.plot(normdata)"
    ]
   },
   {
@@ -302,7 +338,7 @@
    "outputs": [],
    "source": [
     "x = np.linspace(0, 1.6, 200)\n",
-    "plt.plot(x, stats.norm.pdf(x, *stats.norm.fit(normdata)))"
+    "_ = plt.plot(x, stats.norm.pdf(x, *stats.norm.fit(normdata)))"
    ]
   },
   {
@@ -321,7 +357,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "plt.plot(x, gaussian(x, *popt))"
+    "_ = plt.plot(x, gaussian(x, *popt))"
    ]
   },
   {
@@ -337,7 +373,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Distribution with skewness"
+    "<div style=\"color: DarkBlue;background-color: LightYellow\">\n",
+    "    <i>Exercise:</i> Do these results depend on the number of samples and the realization of the random variates?\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Distribution with skewness"
    ]
   },
   {