{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 6: Trees and forests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this lab is to explore and understand tree-based models on classification problems.\n", "\n", "We will focus successively on decision trees, bagging trees and random forests. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import required libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "# import required libraries\n", "import time\n", "import math\n", "import pandas as pd\n", "%pylab inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Classification data\n", "We will use the same data as in Lab 4: the samples are tumors, each described by the expression (= the abundance) of 3,000 genes. The goal is to separate the endometrium tumors from the uterine ones." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ID_REF | \n", "1554530_at | \n", "1553185_at | \n", "1554340_a_at | \n", "1556202_at | \n", "1553957_at | \n", "1555469_a_at | \n", "1553660_at | \n", "1554681_a_at | \n", "1554938_a_at | \n", "... | \n", "1553967_at | \n", "1553362_at | \n", "1553002_at | \n", "1556194_a_at | \n", "1556420_s_at | \n", "1555855_at | \n", "1554508_at | \n", "1555097_a_at | \n", "1556371_at | \n", "Tissue | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "117722 | \n", "10.8 | \n", "13233.7 | \n", "27.2 | \n", "167.8 | \n", "450.7 | \n", "283.8 | \n", "6.4 | \n", "8.6 | \n", "26.7 | \n", "... | \n", "165.2 | \n", "43.7 | \n", "77.0 | \n", "42.2 | \n", "154.8 | \n", "266.6 | \n", "444.0 | \n", "66.9 | \n", "50.6 | \n", "Endometrium | \n", "
1 | \n", "76638 | \n", "12.6 | \n", "4986.8 | \n", "1.7 | \n", "221.1 | \n", "380.8 | \n", "394.3 | \n", "121.2 | \n", "8.0 | \n", "153.8 | \n", "... | \n", "190.7 | \n", "3.2 | \n", "84.0 | \n", "183.0 | \n", "288.0 | \n", "20.6 | \n", "99.3 | \n", "6.4 | \n", "12.2 | \n", "Endometrium | \n", "
2 | \n", "88952 | \n", "16.6 | \n", "6053.8 | \n", "121.4 | \n", "342.7 | \n", "217.6 | \n", "367.9 | \n", "159.7 | \n", "10.8 | \n", "124.4 | \n", "... | \n", "95.9 | \n", "17.1 | \n", "72.3 | \n", "292.9 | \n", "209.5 | \n", "11.6 | \n", "51.3 | \n", "33.8 | \n", "33.4 | \n", "Endometrium | \n", "
3 | \n", "76632 | \n", "9.9 | \n", "6109.1 | \n", "23.0 | \n", "139.3 | \n", "501.8 | \n", "289.9 | \n", "101.7 | \n", "9.7 | \n", "204.8 | \n", "... | \n", "235.1 | \n", "37.9 | \n", "81.5 | \n", "109.3 | \n", "537.7 | \n", "58.7 | \n", "73.9 | \n", "58.9 | \n", "15.4 | \n", "Endometrium | \n", "
4 | \n", "88966 | \n", "13.1 | \n", "8430.9 | \n", "17.4 | \n", "29.4 | \n", "449.1 | \n", "248.2 | \n", "104.1 | \n", "11.2 | \n", "94.5 | \n", "... | \n", "125.0 | \n", "59.9 | \n", "186.8 | \n", "122.5 | \n", "355.2 | \n", "65.1 | \n", "139.9 | \n", "14.1 | \n", "11.2 | \n", "Endometrium | \n", "
5 rows × 3002 columns
\n", "