{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 2: Feature Processing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature standardization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `vinho verde` data set contains physico-chemical information on a number of Portuguese wines, as well as their rating by human tasters. \n", "\n", "Our goal is to use these data to automatically predict the rating of a wine, so as to assist oenologists, improve wine production, and target the taste of niche consumers.\n", "\n", "This data set has been made available on the UCI archive repository (it is one of the oldest and most well-known repository of ML problems).\n", "\n", "It is available from: http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/ (but already in your repository; we will focus on white wines here)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data = pd.read_csv('data/winequality-white.csv', sep=\";\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have loaded the data in a _pandas DataFrame_ object. Let us examine what information is available:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | fixed acidity | \n", "volatile acidity | \n", "citric acid | \n", "residual sugar | \n", "chlorides | \n", "free sulfur dioxide | \n", "total sulfur dioxide | \n", "density | \n", "pH | \n", "sulphates | \n", "alcohol | \n", "quality | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "7.0 | \n", "0.27 | \n", "0.36 | \n", "20.7 | \n", "0.045 | \n", "45.0 | \n", "170.0 | \n", "1.0010 | \n", "3.00 | \n", "0.45 | \n", "8.8 | \n", "6 | \n", "
1 | \n", "6.3 | \n", "0.30 | \n", "0.34 | \n", "1.6 | \n", "0.049 | \n", "14.0 | \n", "132.0 | \n", "0.9940 | \n", "3.30 | \n", "0.49 | \n", "9.5 | \n", "6 | \n", "
2 | \n", "8.1 | \n", "0.28 | \n", "0.40 | \n", "6.9 | \n", "0.050 | \n", "30.0 | \n", "97.0 | \n", "0.9951 | \n", "3.26 | \n", "0.44 | \n", "10.1 | \n", "6 | \n", "
3 | \n", "7.2 | \n", "0.23 | \n", "0.32 | \n", "8.5 | \n", "0.058 | \n", "47.0 | \n", "186.0 | \n", "0.9956 | \n", "3.19 | \n", "0.40 | \n", "9.9 | \n", "6 | \n", "
4 | \n", "7.2 | \n", "0.23 | \n", "0.32 | \n", "8.5 | \n", "0.058 | \n", "47.0 | \n", "186.0 | \n", "0.9956 | \n", "3.19 | \n", "0.40 | \n", "9.9 | \n", "6 | \n", "
\n", " | class | \n", "cap-shape | \n", "cap-surface | \n", "cap-color | \n", "bruises | \n", "odor | \n", "gill-attachment | \n", "gill-spacing | \n", "gill-size | \n", "gill-color | \n", "... | \n", "stalk-surface-below-ring | \n", "stalk-color-above-ring | \n", "stalk-color-below-ring | \n", "veil-type | \n", "veil-color | \n", "ring-number | \n", "ring-type | \n", "spore-print-color | \n", "population | \n", "habitat | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "p | \n", "x | \n", "s | \n", "n | \n", "t | \n", "p | \n", "f | \n", "c | \n", "n | \n", "k | \n", "... | \n", "s | \n", "w | \n", "w | \n", "p | \n", "w | \n", "o | \n", "p | \n", "k | \n", "s | \n", "u | \n", "
1 | \n", "e | \n", "x | \n", "s | \n", "y | \n", "t | \n", "a | \n", "f | \n", "c | \n", "b | \n", "k | \n", "... | \n", "s | \n", "w | \n", "w | \n", "p | \n", "w | \n", "o | \n", "p | \n", "n | \n", "n | \n", "g | \n", "
2 | \n", "e | \n", "b | \n", "s | \n", "w | \n", "t | \n", "l | \n", "f | \n", "c | \n", "b | \n", "n | \n", "... | \n", "s | \n", "w | \n", "w | \n", "p | \n", "w | \n", "o | \n", "p | \n", "n | \n", "n | \n", "m | \n", "
3 | \n", "p | \n", "x | \n", "y | \n", "w | \n", "t | \n", "p | \n", "f | \n", "c | \n", "n | \n", "n | \n", "... | \n", "s | \n", "w | \n", "w | \n", "p | \n", "w | \n", "o | \n", "p | \n", "k | \n", "s | \n", "u | \n", "
4 | \n", "e | \n", "x | \n", "s | \n", "g | \n", "f | \n", "n | \n", "f | \n", "w | \n", "b | \n", "k | \n", "... | \n", "s | \n", "w | \n", "w | \n", "p | \n", "w | \n", "o | \n", "e | \n", "n | \n", "a | \n", "g | \n", "
5 rows × 23 columns
\n", "\n", " | class | \n", "cap-shape | \n", "cap-surface | \n", "cap-color | \n", "bruises | \n", "odor | \n", "gill-attachment | \n", "gill-spacing | \n", "gill-size | \n", "gill-color | \n", "... | \n", "stalk-surface-below-ring | \n", "stalk-color-above-ring | \n", "stalk-color-below-ring | \n", "veil-type | \n", "veil-color | \n", "ring-number | \n", "ring-type | \n", "spore-print-color | \n", "population | \n", "habitat | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "5 | \n", "2 | \n", "4 | \n", "1 | \n", "6 | \n", "1 | \n", "0 | \n", "1 | \n", "4 | \n", "... | \n", "2 | \n", "7 | \n", "7 | \n", "0 | \n", "2 | \n", "1 | \n", "4 | \n", "2 | \n", "3 | \n", "5 | \n", "
1 | \n", "0 | \n", "5 | \n", "2 | \n", "9 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "4 | \n", "... | \n", "2 | \n", "7 | \n", "7 | \n", "0 | \n", "2 | \n", "1 | \n", "4 | \n", "3 | \n", "2 | \n", "1 | \n", "
2 | \n", "0 | \n", "0 | \n", "2 | \n", "8 | \n", "1 | \n", "3 | \n", "1 | \n", "0 | \n", "0 | \n", "5 | \n", "... | \n", "2 | \n", "7 | \n", "7 | \n", "0 | \n", "2 | \n", "1 | \n", "4 | \n", "3 | \n", "2 | \n", "3 | \n", "
3 | \n", "1 | \n", "5 | \n", "3 | \n", "8 | \n", "1 | \n", "6 | \n", "1 | \n", "0 | \n", "1 | \n", "5 | \n", "... | \n", "2 | \n", "7 | \n", "7 | \n", "0 | \n", "2 | \n", "1 | \n", "4 | \n", "2 | \n", "3 | \n", "5 | \n", "
4 | \n", "0 | \n", "5 | \n", "2 | \n", "3 | \n", "0 | \n", "5 | \n", "1 | \n", "1 | \n", "0 | \n", "4 | \n", "... | \n", "2 | \n", "7 | \n", "7 | \n", "0 | \n", "2 | \n", "1 | \n", "0 | \n", "3 | \n", "0 | \n", "1 | \n", "
5 rows × 23 columns
\n", "