Explorar o código

add new examples

aymericdamien %!s(int64=5) %!d(string=hai) anos
pai
achega
bb4daed5bc

+ 3 - 0
README.md

@@ -56,6 +56,9 @@ It is suitable for beginners who want to find clear and concise examples about T
 #### 5 - Data Management
 - **Build an image dataset** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/5_DataManagement/build_an_image_dataset.ipynb)) ([code](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/5_DataManagement/build_an_image_dataset.py)). Build your own images dataset with TensorFlow data queues, from image folders or a dataset file.
 - **TensorFlow Dataset API** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/5_DataManagement/tensorflow_dataset_api.ipynb)) ([code](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/5_DataManagement/tensorflow_dataset_api.py)). Introducing TensorFlow Dataset API for optimizing the input data pipeline.
+- **Load and Parse data** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/5_DataManagement/load_data.ipynb)). Build efficient data pipeline (Numpy arrays, Images, CSV files, custom data, ...).
+- **Build and Load TFRecords** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/5_DataManagement/tfrecords.ipynb)). Convert data into TFRecords format, and load them.
+- **Image Transformation (i.e. Image Augmentation)** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/5_DataManagement/image_transformation.ipynb)). Apply various image augmentation techniques, to generate distorted images for training.
 
 #### 6 - Multi GPU
 - **Basic Operations on multi-GPU** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/6_MultiGPU/multigpu_basics.ipynb)) ([code](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/6_MultiGPU/multigpu_basics.py)). A simple example to introduce multi-GPU in TensorFlow.

A diferenza do arquivo foi suprimida porque é demasiado grande
+ 418 - 0
notebooks/5_DataManagement/image_tranformation.ipynb


+ 577 - 0
notebooks/5_DataManagement/load_data.ipynb

@@ -0,0 +1,577 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Load and parse data with TensorFlow\n",
+    "\n",
+    "A TensorFlow example to build input pipelines for loading data efficiently.\n",
+    "\n",
+    "\n",
+    "- Numpy Arrays\n",
+    "- Images\n",
+    "- CSV file\n",
+    "- Custom data from a Generator\n",
+    "\n",
+    "For more information about creating and loading TensorFlow's `TFRecords` data format, see: [tfrecords.ipynb](tfrecords.ipynb)\n",
+    "\n",
+    "- Author: Aymeric Damien\n",
+    "- Project: https://github.com/aymericdamien/TensorFlow-Examples/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from __future__ import absolute_import, division, print_function\n",
+    "\n",
+    "import numpy as np\n",
+    "import random\n",
+    "import requests\n",
+    "import string\n",
+    "import tarfile\n",
+    "import tensorflow as tf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load Numpy Arrays\n",
+    "\n",
+    "Build a data pipeline over numpy arrays."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a toy dataset (even and odd numbers, with respective labels of 0 and 1).\n",
+    "evens = np.arange(0, 100, step=2, dtype=np.int32)\n",
+    "evens_label = np.zeros(50, dtype=np.int32)\n",
+    "odds = np.arange(1, 100, step=2, dtype=np.int32)\n",
+    "odds_label = np.ones(50, dtype=np.int32)\n",
+    "# Concatenate arrays\n",
+    "features = np.concatenate([evens, odds])\n",
+    "labels = np.concatenate([evens_label, odds_label])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with tf.Graph().as_default():\n",
+    "    # Create TF session.\n",
+    "    sess = tf.Session()\n",
+    "    \n",
+    "    # Slice the numpy arrays (each row becoming a record).\n",
+    "    data = tf.data.Dataset.from_tensor_slices((features, labels))\n",
+    "    # Refill data indefinitely.  \n",
+    "    data = data.repeat()\n",
+    "    # Shuffle data.\n",
+    "    data = data.shuffle(buffer_size=100)\n",
+    "    # Batch data (aggregate records together).\n",
+    "    data = data.batch(batch_size=4)\n",
+    "    # Prefetch batch (pre-load batch for faster consumption).\n",
+    "    data = data.prefetch(buffer_size=1)\n",
+    "    \n",
+    "    # Create an iterator over the dataset.\n",
+    "    iterator = data.make_initializable_iterator()\n",
+    "    # Initialize the iterator.\n",
+    "    sess.run(iterator.initializer)\n",
+    "\n",
+    "    # Get next data batch.\n",
+    "    d = iterator.get_next()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[82 58 80 23] [0 0 0 1]\n",
+      "[16 91 74 96] [0 1 0 0]\n",
+      "[ 4 17 32 34] [0 1 0 0]\n",
+      "[16  8 77 21] [0 0 1 1]\n",
+      "[20 99 48 18] [0 1 0 0]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Display data.\n",
+    "for i in range(5):\n",
+    "    x, y = sess.run(d)\n",
+    "    print(x, y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load CSV files\n",
+    "\n",
+    "Build a data pipeline from features stored in a CSV file. For this example, Titanic dataset will be used as a toy dataset stored in CSV format."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Titanic Dataset\n",
+    "\n",
+    "\n",
+    "\n",
+    "survived|pclass|name|sex|age|sibsp|parch|ticket|fare\n",
+    "--------|------|----|---|---|-----|-----|------|----\n",
+    "1|1|\"Allen, Miss. Elisabeth Walton\"|female|29|0|0|24160|211.3375\n",
+    "1|1|\"Allison, Master. Hudson Trevor\"|male|0.9167|1|2|113781|151.5500\n",
+    "0|1|\"Allison, Miss. Helen Loraine\"|female|2|1|2|113781|151.5500\n",
+    "0|1|\"Allison, Mr. Hudson Joshua Creighton\"|male|30|1|2|113781|151.5500\n",
+    "...|...|...|...|...|...|...|...|..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download Titanic dataset (in csv format).\n",
+    "d = requests.get(\"https://raw.githubusercontent.com/tflearn/tflearn.github.io/master/resources/titanic_dataset.csv\")\n",
+    "with open(\"titanic_dataset.csv\", \"wb\") as f:\n",
+    "    f.write(d.content)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load Titanic dataset.\n",
+    "# Original features: survived,pclass,name,sex,age,sibsp,parch,ticket,fare\n",
+    "# Select specific columns: survived,pclass,name,sex,age,fare\n",
+    "column_to_use = [0, 1, 2, 3, 4, 8]\n",
+    "record_defaults = [tf.int32, tf.int32, tf.string, tf.string, tf.float32, tf.float32]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with tf.Graph().as_default():\n",
+    "    # Create TF session.\n",
+    "    sess = tf.Session()\n",
+    "    \n",
+    "    # Load the whole dataset file, and slice each line.\n",
+    "    data = tf.data.experimental.CsvDataset(\"titanic_dataset.csv\", record_defaults, header=True, select_cols=column_to_use)\n",
+    "    # Refill data indefinitely.  \n",
+    "    data = data.repeat()\n",
+    "    # Shuffle data.\n",
+    "    data = data.shuffle(buffer_size=1000)\n",
+    "    # Batch data (aggregate records together).\n",
+    "    data = data.batch(batch_size=2)\n",
+    "    # Prefetch batch (pre-load batch for faster consumption).\n",
+    "    data = data.prefetch(buffer_size=1)\n",
+    "    \n",
+    "    # Create an iterator over the dataset.\n",
+    "    iterator = data.make_initializable_iterator()\n",
+    "    # Initialize the iterator.\n",
+    "    sess.run(iterator.initializer)\n",
+    "\n",
+    "    # Get next data batch.\n",
+    "    d = iterator.get_next()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[1 0]\n",
+      "[3 1]\n",
+      "['Lam, Mr. Ali' 'Widener, Mr. Harry Elkins']\n",
+      "['male' 'male']\n",
+      "[ 0. 27.]\n",
+      "[ 56.4958 211.5   ]\n",
+      "\n",
+      "[0 1]\n",
+      "[1 1]\n",
+      "['Baumann, Mr. John D' 'Daly, Mr. Peter Denis ']\n",
+      "['male' 'male']\n",
+      "[ 0. 51.]\n",
+      "[25.925 26.55 ]\n",
+      "\n",
+      "[0 1]\n",
+      "[3 1]\n",
+      "['Assam, Mr. Ali' 'Newell, Miss. Madeleine']\n",
+      "['male' 'female']\n",
+      "[23. 31.]\n",
+      "[  7.05  113.275]\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Display data.\n",
+    "for i in range(3):\n",
+    "    survived, pclass, name, sex, age, fare = sess.run(d)\n",
+    "    print(survived)\n",
+    "    print(pclass)\n",
+    "    print(name)\n",
+    "    print(sex)\n",
+    "    print(age)\n",
+    "    print(fare)\n",
+    "    print(\"\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load Images\n",
+    "\n",
+    "Build a data pipeline by loading images from disk. For this example, Oxford Flowers dataset will be used."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download Oxford 17 flowers dataset.\n",
+    "d = requests.get(\"http://www.robots.ox.ac.uk/~vgg/data/flowers/17/17flowers.tgz\")\n",
+    "with open(\"17flowers.tgz\", \"wb\") as f:\n",
+    "    f.write(d.content)\n",
+    "# Extract archive.\n",
+    "with tarfile.open(\"17flowers.tgz\") as t:\n",
+    "    t.extractall()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a file to list all images path and their corresponding label.\n",
+    "with open('jpg/dataset.csv', 'w') as f:\n",
+    "    c = 0\n",
+    "    for i in range(1360):\n",
+    "        f.write(\"jpg/image_%04i.jpg,%i\\n\" % (i+1, c))\n",
+    "        if (i+1) % 80 == 0:\n",
+    "            c += 1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with tf.Graph().as_default():\n",
+    "    \n",
+    "    # Load Images.\n",
+    "    with open(\"jpg/dataset.csv\") as f:\n",
+    "        dataset_file = f.read().splitlines()\n",
+    "    \n",
+    "    # Create TF session.\n",
+    "    sess = tf.Session()\n",
+    "\n",
+    "    # Load the whole dataset file, and slice each line.\n",
+    "    data = tf.data.Dataset.from_tensor_slices(dataset_file)\n",
+    "    # Refill data indefinitely.\n",
+    "    data = data.repeat()\n",
+    "    # Shuffle data.\n",
+    "    data = data.shuffle(buffer_size=1000)\n",
+    "\n",
+    "    # Load and pre-process images.\n",
+    "    def load_image(path):\n",
+    "        # Read image from path.\n",
+    "        image = tf.io.read_file(path)\n",
+    "        # Decode the jpeg image to array [0, 255].\n",
+    "        image = tf.image.decode_jpeg(image)\n",
+    "        # Resize images to a common size of 256x256.\n",
+    "        image = tf.image.resize(image, [256, 256])\n",
+    "        # Rescale values to [-1, 1].\n",
+    "        image = 1. - image / 127.5\n",
+    "        return image\n",
+    "    # Decode each line from the dataset file.\n",
+    "    def parse_records(line):\n",
+    "        # File is in csv format: \"image_path,label_id\".\n",
+    "        # TensorFlow requires a default value, but it will never be used.\n",
+    "        image_path, image_label = tf.io.decode_csv(line, [\"\", 0])\n",
+    "        # Apply the function to load images.\n",
+    "        image = load_image(image_path)\n",
+    "        return image, image_label\n",
+    "    # Use 'map' to apply the above functions in parallel.\n",
+    "    data = data.map(parse_records, num_parallel_calls=4)\n",
+    "\n",
+    "    # Batch data (aggregate images-array together).\n",
+    "    data = data.batch(batch_size=2)\n",
+    "    # Prefetch batch (pre-load batch for faster consumption).\n",
+    "    data = data.prefetch(buffer_size=1)\n",
+    "    \n",
+    "    # Create an iterator over the dataset.\n",
+    "    iterator = data.make_initializable_iterator()\n",
+    "    # Initialize the iterator.\n",
+    "    sess.run(iterator.initializer)\n",
+    "\n",
+    "    # Get next data batch.\n",
+    "    d = iterator.get_next()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[[[[ 0.1294117   0.05098033  0.46666664]\n",
+      "   [ 0.1368872   0.05098033  0.48909312]\n",
+      "   [ 0.0931372   0.0068627   0.46029407]\n",
+      "   ...\n",
+      "   [ 0.23480386  0.0522058   0.6102941 ]\n",
+      "   [ 0.12696075 -0.05416667  0.38063723]\n",
+      "   [-0.10024512 -0.28848052  0.10367644]]\n",
+      "\n",
+      "  [[ 0.04120708 -0.06118262  0.36256123]\n",
+      "   [ 0.08009624 -0.02229345  0.41640145]\n",
+      "   [ 0.06797445 -0.04132879  0.41923058]\n",
+      "   ...\n",
+      "   [ 0.2495715   0.06697345  0.6251221 ]\n",
+      "   [ 0.12058818 -0.06094813  0.37577546]\n",
+      "   [-0.05184889 -0.24009418  0.16777915]]\n",
+      "\n",
+      "  [[-0.09234071 -0.22738981  0.20484066]\n",
+      "   [-0.03100491 -0.17312062  0.2811274 ]\n",
+      "   [ 0.01051998 -0.13237214  0.3376838 ]\n",
+      "   ...\n",
+      "   [ 0.27787983  0.07494056  0.64203525]\n",
+      "   [ 0.11533964 -0.09005249  0.3869906 ]\n",
+      "   [-0.02704227 -0.23958337  0.19454747]]\n",
+      "\n",
+      "  ...\n",
+      "\n",
+      "  [[ 0.07913595 -0.13069856  0.29874384]\n",
+      "   [ 0.10140878 -0.09445572  0.35912937]\n",
+      "   [ 0.08869672 -0.08415675  0.41446364]\n",
+      "   ...\n",
+      "   [ 0.25821072  0.22463232  0.69197303]\n",
+      "   [ 0.31636214  0.25750512  0.79362744]\n",
+      "   [ 0.09552741  0.01709598  0.57395875]]\n",
+      "\n",
+      "  [[ 0.09019601 -0.12156868  0.3098039 ]\n",
+      "   [ 0.17446858 -0.02271283  0.43218917]\n",
+      "   [ 0.06583172 -0.10818791  0.39230233]\n",
+      "   ...\n",
+      "   [ 0.27021956  0.23664117  0.70269513]\n",
+      "   [ 0.19560927  0.1385014   0.6740407 ]\n",
+      "   [ 0.04364848 -0.03478289  0.5220798 ]]\n",
+      "\n",
+      "  [[ 0.02830875 -0.18345594  0.24791664]\n",
+      "   [ 0.12937105 -0.06781042  0.38709164]\n",
+      "   [ 0.01120263 -0.162817    0.33767325]\n",
+      "   ...\n",
+      "   [ 0.25989532  0.22631687  0.69237083]\n",
+      "   [ 0.1200884   0.06298059  0.5985198 ]\n",
+      "   [ 0.05961001 -0.01882136  0.53804135]]]\n",
+      "\n",
+      "\n",
+      " [[[ 0.3333333   0.25490195  0.05882347]\n",
+      "   [ 0.3333333   0.25490195  0.05882347]\n",
+      "   [ 0.3340686   0.24705875  0.03039211]\n",
+      "   ...\n",
+      "   [-0.5215688  -0.4599266  -0.14632356]\n",
+      "   [-0.5100491  -0.47083342 -0.03725493]\n",
+      "   [-0.43419123 -0.39497554  0.05992639]]\n",
+      "\n",
+      "  [[ 0.34117645  0.26274508  0.0666666 ]\n",
+      "   [ 0.35646445  0.2630821   0.0744791 ]\n",
+      "   [ 0.3632046   0.2548713   0.04384762]\n",
+      "   ...\n",
+      "   [-0.9210479  -0.84267783 -0.4540485 ]\n",
+      "   [-0.9017464  -0.8390626  -0.3507018 ]\n",
+      "   [-0.83339334 -0.7632048  -0.2534927 ]]\n",
+      "\n",
+      "  [[ 0.3646446   0.2706495   0.06678915]\n",
+      "   [ 0.37248772  0.27837008  0.07445425]\n",
+      "   [ 0.38033658  0.27053267  0.05950326]\n",
+      "   ...\n",
+      "   [-0.94302344 -0.84222686 -0.30278325]\n",
+      "   [-0.91017747 -0.8090074  -0.18615782]\n",
+      "   [-0.83437514 -0.7402575  -0.08192408]]\n",
+      "\n",
+      "  ...\n",
+      "\n",
+      "  [[ 0.64705884  0.654902    0.67058825]\n",
+      "   [ 0.6318321   0.63967526  0.65536153]\n",
+      "   [ 0.63128924  0.6391324   0.65481865]\n",
+      "   ...\n",
+      "   [ 0.6313726   0.57647055  0.51372546]\n",
+      "   [ 0.6078431   0.53725487  0.4823529 ]\n",
+      "   [ 0.6078431   0.53725487  0.4823529 ]]\n",
+      "\n",
+      "  [[ 0.654902    0.654902    0.6704657 ]\n",
+      "   [ 0.654902    0.654902    0.6704657 ]\n",
+      "   [ 0.64778835  0.64778835  0.6492474 ]\n",
+      "   ...\n",
+      "   [ 0.6392157   0.5843137   0.5215686 ]\n",
+      "   [ 0.6393325   0.56874424  0.5138422 ]\n",
+      "   [ 0.63106614  0.5604779   0.50557595]]\n",
+      "\n",
+      "  [[ 0.654902    0.64705884  0.6313726 ]\n",
+      "   [ 0.6548728   0.64702964  0.63134336]\n",
+      "   [ 0.64705884  0.63210785  0.6377451 ]\n",
+      "   ...\n",
+      "   [ 0.63244915  0.5775472   0.5148021 ]\n",
+      "   [ 0.6698529   0.5992647   0.5443627 ]\n",
+      "   [ 0.6545358   0.5839475   0.5290455 ]]]] [5 9]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Display data.\n",
+    "for i in range(1):\n",
+    "    batch_x, batch_y = sess.run(d)\n",
+    "    print(batch_x, batch_y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load data from a Generator\n",
+    "\n",
+    "Build a data pipeline from a custom generator. For this example, a toy generator yielding random string, vector and it is used."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a dummy generator.\n",
+    "def generate_features():\n",
+    "    # Function to generate a random string.\n",
+    "    def random_string(length):\n",
+    "        return ''.join(random.choice(string.ascii_letters) for m in xrange(length))\n",
+    "    # Return a random string, a random vector, and a random int.\n",
+    "    yield random_string(4), np.random.uniform(size=4), random.randint(0, 10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with tf.Graph().as_default():\n",
+    "\n",
+    "    # Create TF session.\n",
+    "    sess = tf.Session()\n",
+    "\n",
+    "    # Create TF dataset from the generator.\n",
+    "    data = tf.data.Dataset.from_generator(generate_features, output_types=(tf.string, tf.float32, tf.int32))\n",
+    "    # Refill data indefinitely.\n",
+    "    data = data.repeat()\n",
+    "    # Shuffle data.\n",
+    "    data = data.shuffle(buffer_size=100)\n",
+    "    # Batch data (aggregate records together).\n",
+    "    data = data.batch(batch_size=4)\n",
+    "    # Prefetch batch (pre-load batch for faster consumption).\n",
+    "    data = data.prefetch(buffer_size=1)\n",
+    "\n",
+    "    # Create an iterator over the dataset.\n",
+    "    iterator = data.make_initializable_iterator()\n",
+    "    # Initialize the iterator.\n",
+    "    sess.run(iterator.initializer)\n",
+    "\n",
+    "    # Get next data batch.\n",
+    "    d = iterator.get_next()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['AvCS' 'kAaI' 'QwGX' 'IWOI'] [[0.6096093  0.32192084 0.26622605 0.70250475]\n",
+      " [0.72534287 0.7637426  0.19977213 0.74121326]\n",
+      " [0.6930984  0.09409562 0.4063325  0.5002103 ]\n",
+      " [0.05160935 0.59411395 0.276416   0.98264974]] [1 3 5 6]\n",
+      "['EXjS' 'brvx' 'kwNz' 'eFOb'] [[0.34355283 0.26881003 0.70575935 0.7503411 ]\n",
+      " [0.9584373  0.27466875 0.27802315 0.9563204 ]\n",
+      " [0.19129485 0.07014314 0.0932724  0.20726128]\n",
+      " [0.28744072 0.81736153 0.37507302 0.8984588 ]] [1 9 7 0]\n",
+      "['vpSa' 'UuqW' 'xaTO' 'milw'] [[0.2942028  0.8228986  0.5793326  0.16651365]\n",
+      " [0.28259405 0.599063   0.2922477  0.95071274]\n",
+      " [0.23645316 0.00258607 0.06772221 0.7291911 ]\n",
+      " [0.12861755 0.31435087 0.576638   0.7333119 ]] [3 5 8 4]\n",
+      "['UBBb' 'MUXs' 'nLJB' 'OBGl'] [[0.2677402  0.17931737 0.02607645 0.85898155]\n",
+      " [0.58647937 0.727203   0.13329858 0.8898983 ]\n",
+      " [0.13872191 0.47390288 0.7061665  0.08478573]\n",
+      " [0.3786016  0.22002582 0.91989636 0.45837343]] [ 5  8  0 10]\n",
+      "['kiiz' 'bQYG' 'WpUU' 'AuIY'] [[0.74781317 0.13744462 0.9236441  0.63558507]\n",
+      " [0.23649399 0.35303807 0.0951511  0.03541444]\n",
+      " [0.33599988 0.6906629  0.97166294 0.55850506]\n",
+      " [0.90997607 0.5545979  0.43635726 0.9127501 ]] [8 1 4 4]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Display data.\n",
+    "for i in range(5):\n",
+    "    batch_str, batch_vector, batch_int = sess.run(d)\n",
+    "    print(batch_str, batch_vector, batch_int)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "tf1",
+   "language": "python",
+   "name": "tf1"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.15+"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

+ 271 - 0
notebooks/5_DataManagement/tfrecords.ipynb

@@ -0,0 +1,271 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Create and Load TFRecords\n",
+    "\n",
+    "A simple TensorFlow example to parse a dataset into TFRecord format, and then read that dataset.\n",
+    "\n",
+    "In this example, the Titanic Dataset (in CSV format) will be used as a toy dataset, for parsing all the dataset features into TFRecord format, and then building an input pipeline that can be used for training models.\n",
+    "\n",
+    "- Author: Aymeric Damien\n",
+    "- Project: https://github.com/aymericdamien/TensorFlow-Examples/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Titanic Dataset\n",
+    "\n",
+    "The titanic dataset is a popular dataset for ML that provides a list of all passengers onboard the Titanic, along with various features such as their age, sex, class (1st, 2nd, 3rd)... And if the passenger survived the disaster or not.\n",
+    "\n",
+    "It can be used to see that even though some luck was involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class...\n",
+    "\n",
+    "#### Overview\n",
+    "survived|pclass|name|sex|age|sibsp|parch|ticket|fare\n",
+    "--------|------|----|---|---|-----|-----|------|----\n",
+    "1|1|\"Allen, Miss. Elisabeth Walton\"|female|29|0|0|24160|211.3375\n",
+    "1|1|\"Allison, Master. Hudson Trevor\"|male|0.9167|1|2|113781|151.5500\n",
+    "0|1|\"Allison, Miss. Helen Loraine\"|female|2|1|2|113781|151.5500\n",
+    "0|1|\"Allison, Mr. Hudson Joshua Creighton\"|male|30|1|2|113781|151.5500\n",
+    "...|...|...|...|...|...|...|...|...\n",
+    "\n",
+    "\n",
+    "#### Variable Descriptions\n",
+    "```\n",
+    "survived        Survived\n",
+    "                (0 = No; 1 = Yes)\n",
+    "pclass          Passenger Class\n",
+    "                (1 = 1st; 2 = 2nd; 3 = 3rd)\n",
+    "name            Name\n",
+    "sex             Sex\n",
+    "age             Age\n",
+    "sibsp           Number of Siblings/Spouses Aboard\n",
+    "parch           Number of Parents/Children Aboard\n",
+    "ticket          Ticket Number\n",
+    "fare            Passenger Fare\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from __future__ import absolute_import, division, print_function\n",
+    "\n",
+    "import csv\n",
+    "import requests\n",
+    "import tensorflow as tf"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download Titanic dataset (in csv format).\n",
+    "d = requests.get(\"https://raw.githubusercontent.com/tflearn/tflearn.github.io/master/resources/titanic_dataset.csv\")\n",
+    "with open(\"titanic_dataset.csv\", \"wb\") as f:\n",
+    "    f.write(d.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create TFRecords"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate Integer Features.\n",
+    "def build_int64_feature(data):\n",
+    "    return tf.train.Feature(int64_list=tf.train.Int64List(value=[data]))\n",
+    "\n",
+    "# Generate Float Features.\n",
+    "def build_float_feature(data):\n",
+    "    return tf.train.Feature(float_list=tf.train.FloatList(value=[data]))\n",
+    "\n",
+    "# Generate String Features.\n",
+    "def build_string_feature(data):\n",
+    "    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[data]))\n",
+    "\n",
+    "# Generate a TF `Example`, parsing all features of the dataset.\n",
+    "def convert_to_tfexample(survived, pclass, name, sex, age, sibsp, parch, ticket, fare):\n",
+    "    return tf.train.Example(\n",
+    "        features=tf.train.Features(\n",
+    "            feature={\n",
+    "                'survived': build_int64_feature(survived),\n",
+    "                'pclass': build_int64_feature(pclass),\n",
+    "                'name': build_string_feature(name),\n",
+    "                'sex': build_string_feature(sex),\n",
+    "                'age': build_float_feature(age),\n",
+    "                'sibsp': build_int64_feature(sibsp),\n",
+    "                'parch': build_int64_feature(parch),\n",
+    "                'ticket': build_string_feature(ticket),\n",
+    "                'fare': build_float_feature(fare),\n",
+    "            })\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Open dataset file.\n",
+    "with open(\"titanic_dataset.csv\") as f:\n",
+    "    # Output TFRecord file.\n",
+    "    with tf.io.TFRecordWriter(\"titanic_dataset.tfrecord\") as w:\n",
+    "        # Generate a TF Example for all row in our dataset.\n",
+    "        # CSV reader will read and parse all rows.\n",
+    "        reader = csv.reader(f, skipinitialspace=True)\n",
+    "        for i, record in enumerate(reader):\n",
+    "            # Skip header.\n",
+    "            if i == 0:\n",
+    "                continue\n",
+    "            survived, pclass, name, sex, age, sibsp, parch, ticket, fare = record\n",
+    "            # Parse each csv row to TF Example using the above functions.\n",
+    "            example = convert_to_tfexample(int(survived), int(pclass), name, sex, float(age), int(sibsp), int(parch), ticket, float(fare))\n",
+    "            # Serialize each TF Example to string, and write to TFRecord file.\n",
+    "            w.write(example.SerializeToString())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load TFRecords"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Build features template, with types.\n",
+    "features = {\n",
+    "    'survived': tf.io.FixedLenFeature([], tf.int64),\n",
+    "    'pclass': tf.io.FixedLenFeature([], tf.int64),\n",
+    "    'name': tf.io.FixedLenFeature([], tf.string),\n",
+    "    'sex': tf.io.FixedLenFeature([], tf.string),\n",
+    "    'age': tf.io.FixedLenFeature([], tf.float32),\n",
+    "    'sibsp': tf.io.FixedLenFeature([], tf.int64),\n",
+    "    'parch': tf.io.FixedLenFeature([], tf.int64),\n",
+    "    'ticket': tf.io.FixedLenFeature([], tf.string),\n",
+    "    'fare': tf.io.FixedLenFeature([], tf.float32),\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "WARNING:tensorflow:From /home/orus/tf1/lib/python2.7/site-packages/tensorflow/python/data/ops/dataset_ops.py:1419: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Colocations handled automatically by placer.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Create TensorFlow session.\n",
+    "sess = tf.Session()\n",
+    "\n",
+    "# Load TFRecord data.\n",
+    "filenames = [\"titanic_dataset.tfrecord\"]\n",
+    "data = tf.data.TFRecordDataset(filenames)\n",
+    "\n",
+    "# Parse features, using the above template.\n",
+    "def parse_record(record):\n",
+    "    return tf.io.parse_single_example(record, features=features)\n",
+    "# Apply the parsing to each record from the dataset.\n",
+    "data = data.map(parse_record)\n",
+    "\n",
+    "# Refill data indefinitely.\n",
+    "data = data.repeat()\n",
+    "# Shuffle data.\n",
+    "data = data.shuffle(buffer_size=1000)\n",
+    "# Batch data (aggregate records together).\n",
+    "data = data.batch(batch_size=4)\n",
+    "# Prefetch batch (pre-load batch for faster consumption).\n",
+    "data = data.prefetch(buffer_size=1)\n",
+    "\n",
+    "# Create an iterator over the dataset.\n",
+    "iterator = data.make_initializable_iterator()\n",
+    "# Initialize the iterator.\n",
+    "sess.run(iterator.initializer)\n",
+    "\n",
+    "# Get next data batch.\n",
+    "x = iterator.get_next()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'fare': array([ 35.5   ,  73.5   , 133.65  ,  19.2583], dtype=float32), 'name': array(['Sloper, Mr. William Thompson', 'Davies, Mr. Charles Henry',\n",
+      "       'Frauenthal, Dr. Henry William', 'Baclini, Miss. Marie Catherine'],\n",
+      "      dtype=object), 'age': array([28., 18., 50.,  5.], dtype=float32), 'parch': array([0, 0, 0, 1]), 'pclass': array([1, 2, 1, 3]), 'sex': array(['male', 'male', 'male', 'female'], dtype=object), 'survived': array([1, 0, 1, 1]), 'sibsp': array([0, 0, 2, 2]), 'ticket': array(['113788', 'S.O.C. 14879', 'PC 17611', '2666'], dtype=object)}\n",
+      "\n",
+      "{'fare': array([ 18.75 , 106.425,  78.85 ,  90.   ], dtype=float32), 'name': array(['Richards, Mrs. Sidney (Emily Hocking)', 'LeRoy, Miss. Bertha',\n",
+      "       'Cavendish, Mrs. Tyrell William (Julia Florence Siegel)',\n",
+      "       'Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)'], dtype=object), 'age': array([24., 30., 76., 35.], dtype=float32), 'parch': array([3, 0, 0, 0]), 'pclass': array([2, 1, 1, 1]), 'sex': array(['female', 'female', 'female', 'female'], dtype=object), 'survived': array([1, 1, 1, 1]), 'sibsp': array([2, 0, 1, 1]), 'ticket': array(['29106', 'PC 17761', '19877', '19943'], dtype=object)}\n",
+      "\n",
+      "{'fare': array([19.9667, 15.5   , 15.0458, 66.6   ], dtype=float32), 'name': array(['Hagland, Mr. Konrad Mathias Reiersen', 'Lennon, Miss. Mary',\n",
+      "       'Richard, Mr. Emile', 'Pears, Mr. Thomas Clinton'], dtype=object), 'age': array([ 0.,  0., 23., 29.], dtype=float32), 'parch': array([0, 0, 0, 0]), 'pclass': array([3, 3, 2, 1]), 'sex': array(['male', 'female', 'male', 'male'], dtype=object), 'survived': array([0, 0, 0, 0]), 'sibsp': array([1, 1, 0, 1]), 'ticket': array(['65304', '370371', 'SC/PARIS 2133', '113776'], dtype=object)}\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Dequeue data and display.\n",
+    "for i in range(3):\n",
+    "    print(sess.run(x))\n",
+    "    print(\"\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "tf1",
+   "language": "python",
+   "name": "tf1"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.15+"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

+ 5 - 0
tensorflow_v2/README.md

@@ -33,6 +33,11 @@
 - **Save and Restore a model** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/tensorflow_v2/notebooks/4_Utils/save_restore_model.ipynb)). Save and Restore a model with TensorFlow 2.0.
 - **Build Custom Layers & Modules** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/tensorflow_v2/notebooks/4_Utils/build_custom_layers.ipynb)). Learn how to build your own layers / modules and integrate them into TensorFlow 2.0 Models.
 
+#### 5 - Data Management
+- **Load and Parse data** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/tensorflow_v2/notebooks/5_DataManagement/load_data.ipynb)). Build efficient data pipeline with TensorFlow 2.0 (Numpy arrays, Images, CSV files, custom data, ...).
+- **Build and Load TFRecords** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/tensorflow_v2/notebooks/5_DataManagement/tfrecords.ipynb)). Convert data into TFRecords format, and load them with TensorFlow 2.0.
+- **Image Transformation (i.e. Image Augmentation)** ([notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/tensorflow_v2/notebooks/5_DataManagement/image_transformation.ipynb)). Apply various image augmentation techniques with TensorFlow 2.0, to generate distorted images for training.
+
 ## Installation
 
 To install TensorFlow 2.0, simply run:

A diferenza do arquivo foi suprimida porque é demasiado grande
+ 408 - 0
tensorflow_v2/notebooks/5_DataManagement/image_transformation.ipynb


+ 530 - 0
tensorflow_v2/notebooks/5_DataManagement/load_data.ipynb

@@ -0,0 +1,530 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Load and parse data with TensorFlow 2.0 (tf.data)\n",
+    "\n",
+    "A TensorFlow 2.0 example to build input pipelines for loading data efficiently.\n",
+    "\n",
+    "\n",
+    "- Numpy Arrays\n",
+    "- Images\n",
+    "- CSV file\n",
+    "- Custom data from a Generator\n",
+    "\n",
+    "For more information about creating and loading TensorFlow's `TFRecords` data format, see: [tfrecords.ipynb](tfrecords.ipynb)\n",
+    "\n",
+    "- Author: Aymeric Damien\n",
+    "- Project: https://github.com/aymericdamien/TensorFlow-Examples/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from __future__ import absolute_import, division, print_function\n",
+    "\n",
+    "import numpy as np\n",
+    "import random\n",
+    "import requests\n",
+    "import string\n",
+    "import tarfile\n",
+    "import tensorflow as tf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load Numpy Arrays\n",
+    "\n",
+    "Build a data pipeline over numpy arrays."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a toy dataset (even and odd numbers, with respective labels of 0 and 1).\n",
+    "evens = np.arange(0, 100, step=2, dtype=np.int32)\n",
+    "evens_label = np.zeros(50, dtype=np.int32)\n",
+    "odds = np.arange(1, 100, step=2, dtype=np.int32)\n",
+    "odds_label = np.ones(50, dtype=np.int32)\n",
+    "# Concatenate arrays\n",
+    "features = np.concatenate([evens, odds])\n",
+    "labels = np.concatenate([evens_label, odds_label])\n",
+    "\n",
+    "# Load a numpy array using tf data api with `from_tensor_slices`.\n",
+    "data = tf.data.Dataset.from_tensor_slices((features, labels))\n",
+    "# Refill data indefinitely.  \n",
+    "data = data.repeat()\n",
+    "# Shuffle data.\n",
+    "data = data.shuffle(buffer_size=100)\n",
+    "# Batch data (aggregate records together).\n",
+    "data = data.batch(batch_size=4)\n",
+    "# Prefetch batch (pre-load batch for faster consumption).\n",
+    "data = data.prefetch(buffer_size=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tf.Tensor([ 9 94 29 85], shape=(4,), dtype=int32) tf.Tensor([1 0 1 1], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([68 57 88 41], shape=(4,), dtype=int32) tf.Tensor([0 1 0 1], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([51 19 18 56], shape=(4,), dtype=int32) tf.Tensor([1 1 0 0], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([70 84 99 32], shape=(4,), dtype=int32) tf.Tensor([0 0 1 0], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([40  0 25 28], shape=(4,), dtype=int32) tf.Tensor([0 0 1 0], shape=(4,), dtype=int32)\n"
+     ]
+    }
+   ],
+   "source": [
+    "for batch_x, batch_y in data.take(5):\n",
+    "    print(batch_x, batch_y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tf.Tensor([ 9 94 29 85], shape=(4,), dtype=int32) tf.Tensor([1 0 1 1], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([68 57 88 41], shape=(4,), dtype=int32) tf.Tensor([0 1 0 1], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([51 19 18 56], shape=(4,), dtype=int32) tf.Tensor([1 1 0 0], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([70 84 99 32], shape=(4,), dtype=int32) tf.Tensor([0 0 1 0], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([40  0 25 28], shape=(4,), dtype=int32) tf.Tensor([0 0 1 0], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([20 38 22 79], shape=(4,), dtype=int32) tf.Tensor([0 0 0 1], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([20 22 96 27], shape=(4,), dtype=int32) tf.Tensor([0 0 0 1], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([34 58 86 67], shape=(4,), dtype=int32) tf.Tensor([0 0 0 1], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([ 2 98 24 21], shape=(4,), dtype=int32) tf.Tensor([0 0 0 1], shape=(4,), dtype=int32)\n",
+      "tf.Tensor([16 45 18 35], shape=(4,), dtype=int32) tf.Tensor([0 1 0 1], shape=(4,), dtype=int32)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Note: If you are planning on calling multiple time,\n",
+    "# you can user the iterator way:\n",
+    "ite_data = iter(data)\n",
+    "for i in range(5):\n",
+    "    batch_x, batch_y = next(ite_data)\n",
+    "    print(batch_x, batch_y)\n",
+    "\n",
+    "for i in range(5):\n",
+    "    batch_x, batch_y = next(ite_data)\n",
+    "    print(batch_x, batch_y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load CSV files\n",
+    "\n",
+    "Build a data pipeline from features stored in a CSV file. For this example, Titanic dataset will be used as a toy dataset stored in CSV format."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Titanic Dataset\n",
+    "\n",
+    "\n",
+    "\n",
+    "survived|pclass|name|sex|age|sibsp|parch|ticket|fare\n",
+    "--------|------|----|---|---|-----|-----|------|----\n",
+    "1|1|\"Allen, Miss. Elisabeth Walton\"|female|29|0|0|24160|211.3375\n",
+    "1|1|\"Allison, Master. Hudson Trevor\"|male|0.9167|1|2|113781|151.5500\n",
+    "0|1|\"Allison, Miss. Helen Loraine\"|female|2|1|2|113781|151.5500\n",
+    "0|1|\"Allison, Mr. Hudson Joshua Creighton\"|male|30|1|2|113781|151.5500\n",
+    "...|...|...|...|...|...|...|...|..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download Titanic dataset (in csv format).\n",
+    "d = requests.get(\"https://raw.githubusercontent.com/tflearn/tflearn.github.io/master/resources/titanic_dataset.csv\")\n",
+    "with open(\"titanic_dataset.csv\", \"wb\") as f:\n",
+    "    f.write(d.content)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load Titanic dataset.\n",
+    "# Original features: survived,pclass,name,sex,age,sibsp,parch,ticket,fare\n",
+    "# Select specific columns: survived,pclass,name,sex,age,fare\n",
+    "column_to_use = [0, 1, 2, 3, 4, 8]\n",
+    "record_defaults = [tf.int32, tf.int32, tf.string, tf.string, tf.float32, tf.float32]\n",
+    "\n",
+    "# Load the whole dataset file, and slice each line.\n",
+    "data = tf.data.experimental.CsvDataset(\"titanic_dataset.csv\", record_defaults, header=True, select_cols=column_to_use)\n",
+    "# Refill data indefinitely.\n",
+    "data = data.repeat()\n",
+    "# Shuffle data.\n",
+    "data = data.shuffle(buffer_size=1000)\n",
+    "# Batch data (aggregate records together).\n",
+    "data = data.batch(batch_size=2)\n",
+    "# Prefetch batch (pre-load batch for faster consumption).\n",
+    "data = data.prefetch(buffer_size=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[1 1]\n",
+      "[2 2]\n",
+      "['Richards, Master. George Sibley' 'Rugg, Miss. Emily']\n",
+      "['male' 'female']\n",
+      "[ 0.8333 21.    ]\n",
+      "[18.75 10.5 ]\n"
+     ]
+    }
+   ],
+   "source": [
+    "for survived, pclass, name, sex, age, fare in data.take(1):\n",
+    "    print(survived.numpy())\n",
+    "    print(pclass.numpy())\n",
+    "    print(name.numpy())\n",
+    "    print(sex.numpy())\n",
+    "    print(age.numpy())\n",
+    "    print(fare.numpy())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load Images\n",
+    "\n",
+    "Build a data pipeline by loading images from disk. For this example, Oxford Flowers dataset will be used."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download Oxford 17 flowers dataset\n",
+    "d = requests.get(\"http://www.robots.ox.ac.uk/~vgg/data/flowers/17/17flowers.tgz\")\n",
+    "with open(\"17flowers.tgz\", \"wb\") as f:\n",
+    "    f.write(d.content)\n",
+    "# Extract archive.\n",
+    "with tarfile.open(\"17flowers.tgz\") as t:\n",
+    "    t.extractall()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('jpg/dataset.csv', 'w') as f:\n",
+    "    c = 0\n",
+    "    for i in range(1360):\n",
+    "        f.write(\"jpg/image_%04i.jpg,%i\\n\" % (i+1, c))\n",
+    "        if (i+1) % 80 == 0:\n",
+    "            c += 1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load Images\n",
+    "with open(\"jpg/dataset.csv\") as f:\n",
+    "    dataset_file = f.read().splitlines()\n",
+    "\n",
+    "# Load the whole dataset file, and slice each line.\n",
+    "data = tf.data.Dataset.from_tensor_slices(dataset_file)\n",
+    "# Refill data indefinitely.\n",
+    "data = data.repeat()\n",
+    "# Shuffle data.\n",
+    "data = data.shuffle(buffer_size=1000)\n",
+    "\n",
+    "# Load and pre-process images.\n",
+    "def load_image(path):\n",
+    "    # Read image from path.\n",
+    "    image = tf.io.read_file(path)\n",
+    "    # Decode the jpeg image to array [0, 255].\n",
+    "    image = tf.image.decode_jpeg(image)\n",
+    "    # Resize images to a common size of 256x256.\n",
+    "    image = tf.image.resize(image, [256, 256])\n",
+    "    # Rescale values to [-1, 1].\n",
+    "    image = 1. - image / 127.5\n",
+    "    return image\n",
+    "# Decode each line from the dataset file.\n",
+    "def parse_records(line):\n",
+    "    # File is in csv format: \"image_path,label_id\".\n",
+    "    # TensorFlow requires a default value, but it will never be used.\n",
+    "    image_path, image_label = tf.io.decode_csv(line, [\"\", 0])\n",
+    "    # Apply the function to load images.\n",
+    "    image = load_image(image_path)\n",
+    "    return image, image_label\n",
+    "# Use 'map' to apply the above functions in parallel.\n",
+    "data = data.map(parse_records, num_parallel_calls=4)\n",
+    "\n",
+    "# Batch data (aggregate images-array together).\n",
+    "data = data.batch(batch_size=2)\n",
+    "# Prefetch batch (pre-load batch for faster consumption).\n",
+    "data = data.prefetch(buffer_size=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tf.Tensor(\n",
+      "[[[[-0.90260804 -0.9550551  -0.9444355 ]\n",
+      "   [-0.9538603  -0.9715073  -0.9136642 ]\n",
+      "   [-0.41687727 -0.37570083 -0.25462234]\n",
+      "   ...\n",
+      "   [ 0.4617647   0.422549    0.3754902 ]\n",
+      "   [ 0.4934436   0.45422792  0.4071691 ]\n",
+      "   [ 0.5530829   0.5138672   0.46680838]]\n",
+      "\n",
+      "  [[-0.9301815  -0.98563874 -0.9595933 ]\n",
+      "   [-0.9379289  -0.95557594 -0.89773285]\n",
+      "   [-0.68581116 -0.6446346  -0.5305033 ]\n",
+      "   ...\n",
+      "   [ 0.46960783  0.43039215  0.38333333]\n",
+      "   [ 0.5009191   0.46170342  0.4146446 ]\n",
+      "   [ 0.56071925  0.52150357  0.4744447 ]]\n",
+      "\n",
+      "  [[-0.9480392  -0.9862745  -0.96889937]\n",
+      "   [-0.93367803 -0.9485103  -0.8916054 ]\n",
+      "   [-0.9224341  -0.9033165  -0.7915518 ]\n",
+      "   ...\n",
+      "   [ 0.48045343  0.44123775  0.39417893]\n",
+      "   [ 0.51623774  0.47702205  0.42996323]\n",
+      "   [ 0.5740809   0.5348652   0.48780638]]\n",
+      "\n",
+      "  ...\n",
+      "\n",
+      "  [[ 0.0824219   0.37201285  0.5615885 ]\n",
+      "   [ 0.09744179  0.3858226   0.57758886]\n",
+      "   [ 0.1170305   0.4023859   0.59906554]\n",
+      "   ...\n",
+      "   [ 0.02599955  0.65661     0.7460593 ]\n",
+      "   [-0.0751493   0.6735256   0.7022212 ]\n",
+      "   [-0.06794965  0.73861444  0.7482958 ]]\n",
+      "\n",
+      "  [[ 0.10942864  0.39136028  0.5135914 ]\n",
+      "   [ 0.18471968  0.4658088   0.5954542 ]\n",
+      "   [ 0.21578586  0.4813496   0.6320619 ]\n",
+      "   ...\n",
+      "   [ 0.22432214  0.676777    0.8324946 ]\n",
+      "   [ 0.10089612  0.73174024  0.7959444 ]\n",
+      "   [ 0.00907248  0.74025357  0.7495098 ]]\n",
+      "\n",
+      "  [[ 0.15197992  0.43433285  0.54413676]\n",
+      "   [ 0.20049018  0.48284316  0.60343134]\n",
+      "   [ 0.2664752   0.5252987   0.6713772 ]\n",
+      "   ...\n",
+      "   [ 0.24040669  0.6644263   0.8296224 ]\n",
+      "   [ 0.10060894  0.7192364   0.78786385]\n",
+      "   [ 0.05363435  0.77765393  0.78206575]]]\n",
+      "\n",
+      "\n",
+      " [[[-0.49571514 -0.2133621   0.6807555 ]\n",
+      "   [-0.52243936 -0.2322433   0.66971743]\n",
+      "   [-0.5502666  -0.24438429  0.6732628 ]\n",
+      "   ...\n",
+      "   [-0.61084557 -0.22653186  0.7019608 ]\n",
+      "   [-0.60784316 -0.21568632  0.65843004]\n",
+      "   [-0.6197916  -0.22585356  0.6411722 ]]\n",
+      "\n",
+      "  [[-0.5225973  -0.24024439  0.6538732 ]\n",
+      "   [-0.54144406 -0.26501226  0.64094764]\n",
+      "   [-0.56139374 -0.27119768  0.6341878 ]\n",
+      "   ...\n",
+      "   [-0.6186887  -0.22824419  0.67053366]\n",
+      "   [-0.59662986 -0.22015929  0.6358456 ]\n",
+      "   [-0.6119485  -0.23387194  0.6130515 ]]\n",
+      "\n",
+      "  [[-0.54999995 -0.26764703  0.61539805]\n",
+      "   [-0.56739867 -0.28504562  0.6056473 ]\n",
+      "   [-0.58733106 -0.297135    0.5988358 ]\n",
+      "   ...\n",
+      "   [-0.62097263 -0.22653186  0.62466395]\n",
+      "   [-0.60171235 -0.21739864  0.5984136 ]\n",
+      "   [-0.614951   -0.23063731  0.579271  ]]\n",
+      "\n",
+      "  ...\n",
+      "\n",
+      "  [[-0.49420047 -0.25567698 -0.29812205]\n",
+      "   [-0.5336498  -0.31243873 -0.34749448]\n",
+      "   [-0.5600954  -0.35433567 -0.38869584]\n",
+      "   ...\n",
+      "   [ 0.4558211   0.22837007  0.47150737]\n",
+      "   [ 0.49019605  0.24705881  0.4980392 ]\n",
+      "   [ 0.5021446   0.25900733  0.5099877 ]]\n",
+      "\n",
+      "  [[-0.50617576 -0.29696214 -0.31009734]\n",
+      "   [-0.47532892 -0.28324962 -0.28901553]\n",
+      "   [-0.45759463 -0.28628123 -0.28675795]\n",
+      "   ...\n",
+      "   [ 0.46366423  0.2362132   0.4793505 ]\n",
+      "   [ 0.4980392   0.25490195  0.5058824 ]\n",
+      "   [ 0.5099877   0.26685047  0.51783085]]\n",
+      "\n",
+      "  [[-0.45882356 -0.254902   -0.26274514]\n",
+      "   [-0.4185791  -0.23034382 -0.23034382]\n",
+      "   [-0.37365198 -0.21194851 -0.20410538]\n",
+      "   ...\n",
+      "   [ 0.46366423  0.2362132   0.4793505 ]\n",
+      "   [ 0.4980392   0.25490195  0.5058824 ]\n",
+      "   [ 0.5099877   0.26685047  0.51783085]]]], shape=(2, 256, 256, 3), dtype=float32) tf.Tensor([8 8], shape=(2,), dtype=int32)\n"
+     ]
+    }
+   ],
+   "source": [
+    "for batch_x, batch_y in data.take(1):\n",
+    "    print(batch_x, batch_y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load data from a Generator"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a dummy generator.\n",
+    "def generate_features():\n",
+    "    # Function to generate a random string.\n",
+    "    def random_string(length):\n",
+    "        return ''.join(random.choice(string.ascii_letters) for m in xrange(length))\n",
+    "    # Return a random string, a random vector, and a random int.\n",
+    "    yield random_string(4), np.random.uniform(size=4), random.randint(0, 10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load a numpy array using tf data api with `from_tensor_slices`.\n",
+    "data = tf.data.Dataset.from_generator(generate_features, output_types=(tf.string, tf.float32, tf.int32))\n",
+    "# Refill data indefinitely.\n",
+    "data = data.repeat()\n",
+    "# Shuffle data.\n",
+    "data = data.shuffle(buffer_size=100)\n",
+    "# Batch data (aggregate records together).\n",
+    "data = data.batch(batch_size=4)\n",
+    "# Prefetch batch (pre-load batch for faster consumption).\n",
+    "data = data.prefetch(buffer_size=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tf.Tensor(['snDw' 'NvMp' 'sXsw' 'qwuk'], shape=(4,), dtype=string) tf.Tensor(\n",
+      "[[0.22296238 0.03515657 0.3893014  0.6875752 ]\n",
+      " [0.05003363 0.27605608 0.23262134 0.10671499]\n",
+      " [0.8992419  0.34516433 0.29739627 0.8413017 ]\n",
+      " [0.91913974 0.7142106  0.48333576 0.04300505]], shape=(4, 4), dtype=float32) tf.Tensor([ 2 10  4  1], shape=(4,), dtype=int32)\n",
+      "tf.Tensor(['vdUx' 'InFi' 'nLzy' 'oklE'], shape=(4,), dtype=string) tf.Tensor(\n",
+      "[[0.6512162  0.8695475  0.7012295  0.6849636 ]\n",
+      " [0.00812997 0.01264008 0.7774404  0.44849646]\n",
+      " [0.92055863 0.894824   0.3628448  0.85603875]\n",
+      " [0.32219294 0.9767527  0.0307372  0.12051418]], shape=(4, 4), dtype=float32) tf.Tensor([9 7 4 0], shape=(4,), dtype=int32)\n",
+      "tf.Tensor(['ULGI' 'dBbm' 'URgs' 'Pkpt'], shape=(4,), dtype=string) tf.Tensor(\n",
+      "[[0.39586228 0.7472     0.3759462  0.9277406 ]\n",
+      " [0.44489694 0.38694733 0.9592599  0.82675934]\n",
+      " [0.12597603 0.299358   0.6940909  0.34155408]\n",
+      " [0.3401377  0.97620344 0.6047712  0.51667166]], shape=(4, 4), dtype=float32) tf.Tensor([ 4 10  0  0], shape=(4,), dtype=int32)\n",
+      "tf.Tensor(['kvao' 'wWvG' 'vrzf' 'cMgG'], shape=(4,), dtype=string) tf.Tensor(\n",
+      "[[0.8090979  0.65837437 0.9732402  0.9298921 ]\n",
+      " [0.67059356 0.91655296 0.52894515 0.8964492 ]\n",
+      " [0.05753202 0.45829964 0.74948853 0.41164723]\n",
+      " [0.42602295 0.8696292  0.57220364 0.9475169 ]], shape=(4, 4), dtype=float32) tf.Tensor([6 7 6 2], shape=(4,), dtype=int32)\n",
+      "tf.Tensor(['kyLQ' 'kxbI' 'CkQD' 'PHlJ'], shape=(4,), dtype=string) tf.Tensor(\n",
+      "[[0.29089147 0.6438517  0.31005543 0.31286424]\n",
+      " [0.0937152  0.8887667  0.24011584 0.25746483]\n",
+      " [0.47577712 0.53731906 0.9178111  0.3249844 ]\n",
+      " [0.38328    0.39294246 0.08126572 0.5995307 ]], shape=(4, 4), dtype=float32) tf.Tensor([3 1 3 2], shape=(4,), dtype=int32)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Display data.\n",
+    "for batch_str, batch_vector, batch_int in data.take(5):\n",
+    "    print(batch_str, batch_vector, batch_int)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

+ 244 - 0
tensorflow_v2/notebooks/5_DataManagement/tfrecords.ipynb

@@ -0,0 +1,244 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Create and Load TFRecords\n",
+    "\n",
+    "A simple TensorFlow 2.0 example to parse a dataset into TFRecord format, and then read that dataset.\n",
+    "\n",
+    "In this example, the Titanic Dataset (in CSV format) will be used as a toy dataset, for parsing all the dataset features into TFRecord format, and then building an input pipeline that can be used for training models.\n",
+    "\n",
+    "- Author: Aymeric Damien\n",
+    "- Project: https://github.com/aymericdamien/TensorFlow-Examples/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Titanic Dataset\n",
+    "\n",
+    "The titanic dataset is a popular dataset for ML that provides a list of all passengers onboard the Titanic, along with various features such as their age, sex, class (1st, 2nd, 3rd)... And if the passenger survived the disaster or not.\n",
+    "\n",
+    "It can be used to see that even though some luck was involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class...\n",
+    "\n",
+    "#### Overview\n",
+    "survived|pclass|name|sex|age|sibsp|parch|ticket|fare\n",
+    "--------|------|----|---|---|-----|-----|------|----\n",
+    "1|1|\"Allen, Miss. Elisabeth Walton\"|female|29|0|0|24160|211.3375\n",
+    "1|1|\"Allison, Master. Hudson Trevor\"|male|0.9167|1|2|113781|151.5500\n",
+    "0|1|\"Allison, Miss. Helen Loraine\"|female|2|1|2|113781|151.5500\n",
+    "0|1|\"Allison, Mr. Hudson Joshua Creighton\"|male|30|1|2|113781|151.5500\n",
+    "...|...|...|...|...|...|...|...|...\n",
+    "\n",
+    "\n",
+    "#### Variable Descriptions\n",
+    "```\n",
+    "survived        Survived\n",
+    "                (0 = No; 1 = Yes)\n",
+    "pclass          Passenger Class\n",
+    "                (1 = 1st; 2 = 2nd; 3 = 3rd)\n",
+    "name            Name\n",
+    "sex             Sex\n",
+    "age             Age\n",
+    "sibsp           Number of Siblings/Spouses Aboard\n",
+    "parch           Number of Parents/Children Aboard\n",
+    "ticket          Ticket Number\n",
+    "fare            Passenger Fare\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from __future__ import absolute_import, division, print_function\n",
+    "\n",
+    "import csv\n",
+    "import requests\n",
+    "import tensorflow as tf"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Download Titanic dataset (in csv format).\n",
+    "d = requests.get(\"https://raw.githubusercontent.com/tflearn/tflearn.github.io/master/resources/titanic_dataset.csv\")\n",
+    "with open(\"titanic_dataset.csv\", \"wb\") as f:\n",
+    "    f.write(d.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create TFRecords"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate Integer Features.\n",
+    "def build_int64_feature(data):\n",
+    "    return tf.train.Feature(int64_list=tf.train.Int64List(value=[data]))\n",
+    "\n",
+    "# Generate Float Features.\n",
+    "def build_float_feature(data):\n",
+    "    return tf.train.Feature(float_list=tf.train.FloatList(value=[data]))\n",
+    "\n",
+    "# Generate String Features.\n",
+    "def build_string_feature(data):\n",
+    "    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[data]))\n",
+    "\n",
+    "# Generate a TF `Example`, parsing all features of the dataset.\n",
+    "def convert_to_tfexample(survived, pclass, name, sex, age, sibsp, parch, ticket, fare):\n",
+    "    return tf.train.Example(\n",
+    "        features=tf.train.Features(\n",
+    "            feature={\n",
+    "                'survived': build_int64_feature(survived),\n",
+    "                'pclass': build_int64_feature(pclass),\n",
+    "                'name': build_string_feature(name),\n",
+    "                'sex': build_string_feature(sex),\n",
+    "                'age': build_float_feature(age),\n",
+    "                'sibsp': build_int64_feature(sibsp),\n",
+    "                'parch': build_int64_feature(parch),\n",
+    "                'ticket': build_string_feature(ticket),\n",
+    "                'fare': build_float_feature(fare),\n",
+    "            })\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Open dataset file.\n",
+    "with open(\"titanic_dataset.csv\") as f:\n",
+    "    # Output TFRecord file.\n",
+    "    with tf.io.TFRecordWriter(\"titanic_dataset.tfrecord\") as w:\n",
+    "        # Generate a TF Example for all row in our dataset.\n",
+    "        # CSV reader will read and parse all rows.\n",
+    "        reader = csv.reader(f, skipinitialspace=True)\n",
+    "        for i, record in enumerate(reader):\n",
+    "            # Skip header.\n",
+    "            if i == 0:\n",
+    "                continue\n",
+    "            survived, pclass, name, sex, age, sibsp, parch, ticket, fare = record\n",
+    "            # Parse each csv row to TF Example using the above functions.\n",
+    "            example = convert_to_tfexample(int(survived), int(pclass), name, sex, float(age), int(sibsp), int(parch), ticket, float(fare))\n",
+    "            # Serialize each TF Example to string, and write to TFRecord file.\n",
+    "            w.write(example.SerializeToString())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load TFRecords"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Build features template, with types.\n",
+    "features = {\n",
+    "    'survived': tf.io.FixedLenFeature([], tf.int64),\n",
+    "    'pclass': tf.io.FixedLenFeature([], tf.int64),\n",
+    "    'name': tf.io.FixedLenFeature([], tf.string),\n",
+    "    'sex': tf.io.FixedLenFeature([], tf.string),\n",
+    "    'age': tf.io.FixedLenFeature([], tf.float32),\n",
+    "    'sibsp': tf.io.FixedLenFeature([], tf.int64),\n",
+    "    'parch': tf.io.FixedLenFeature([], tf.int64),\n",
+    "    'ticket': tf.io.FixedLenFeature([], tf.string),\n",
+    "    'fare': tf.io.FixedLenFeature([], tf.float32),\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load TFRecord data.\n",
+    "filenames = [\"titanic_dataset.tfrecord\"]\n",
+    "data = tf.data.TFRecordDataset(filenames)\n",
+    "\n",
+    "# Parse features, using the above template.\n",
+    "def parse_record(record):\n",
+    "    return tf.io.parse_single_example(record, features=features)\n",
+    "# Apply the parsing to each record from the dataset.\n",
+    "data = data.map(parse_record)\n",
+    "\n",
+    "# Refill data indefinitely.\n",
+    "data = data.repeat()\n",
+    "# Shuffle data.\n",
+    "data = data.shuffle(buffer_size=1000)\n",
+    "# Batch data (aggregate records together).\n",
+    "data = data.batch(batch_size=4)\n",
+    "# Prefetch batch (pre-load batch for faster consumption).\n",
+    "data = data.prefetch(buffer_size=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[0 1 0 0]\n",
+      "['Gallagher, Mr. Martin' 'Fortune, Miss. Mabel Helen'\n",
+      " 'Andersson, Mr. Johan Samuel' 'Jensen, Mr. Niels Peder']\n",
+      "[  7.7417 263.       7.775    7.8542]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Dequeue data and display.\n",
+    "for record in data.take(1):\n",
+    "    print(record['survived'].numpy())\n",
+    "    print(record['name'].numpy())\n",
+    "    print(record['fare'].numpy())"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}