Bladeren bron

Added STREET model for FSNS dataset

Ray Smith 9 jaren geleden
bovenliggende
commit
4157e58ec0

+ 234 - 0
street/README.md

@@ -0,0 +1,234 @@
+# StreetView Tensorflow Recurrent End-to-End Transcription (STREET) Model.
+
+A TensorFlow implementation of the STREET model described in the paper:
+
+"End-to-End Interpretation of the French Street Name Signs Dataset"
+
+Raymond Smith, Chunhui Gu, Dar-Shyang Lee, Huiyi Hu, Ranjith
+Unnikrishnan, Julian Ibarz, Sacha Arnoud, Sophia Lin.
+
+*International Workshop on Robust Reading, Amsterdam, 9 October 2016.*
+
+Available at: http://link.springer.com/chapter/10.1007%2F978-3-319-46604-0_30
+
+
+## Contact
+***Author:*** Ray Smith (rays@google.com).
+
+***Pull requests and issues:*** @theraysmith.
+
+## Contents
+* [Introduction](#introduction)
+* [Installing and setting up the STREET model](#installing-and-setting-up-the-street-model)
+* [Downloading the datasets](#downloading-the-datasets)
+* [Confidence Tests](#confidence-tests)
+* [Training a model](#training-a-model)
+* [The Variable Graph Specification Language](#the-variable-graph-specification-language)
+
+## Introduction
+
+The *STREET* model is a deep recurrent neural network that learns how to
+identify the name of a street (in France) from an image containing upto four
+different views of the street name sign. The model merges information from the
+different views and normalizes the text to the correct format. For example:
+
+<center>
+![Example image](g3doc/avdessapins.png)
+
+Avenue des Sapins
+</center>
+
+
+## Installing and setting up the STREET model
+[Install Tensorflow](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#virtualenv-installation)
+
+Install numpy:
+
+```
+sudo pip install numpy
+```
+
+Build the LSTM op:
+
+```
+cd cc
+TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
+g++ -std=c++11 -shared rnn_ops.cc -o rnn_ops.so -fPIC -I $TF_INC -O3 -mavx
+```
+
+Run the unittests:
+
+```
+cd ../python
+python decoder_test.py
+python errorcounter_test.py
+python shapes_test.py
+python vgslspecs_test.py
+python vgsl_model_test.py
+```
+
+## Downloading the datasets
+
+The French Street Name Signs (FSNS) datasets can be downloaded from:
+`https://download.tensorflow.org/data/fsns-20160927`
+Note that these datasets are very large. The approximate sizes are:
+
+*   Train: 512 files of 300MB each.
+*   Validation: 64 files of 40MB each.
+*   Test: 64 file of 50MB each.
+*   Testdata: some smaller data files of a few MB for testing.
+
+## Confidence Tests
+
+The datasets download includes a directory `testdata` that contains some small
+datasets that are big enough to test that models can actually learn something.
+Assuming that you have put the downloads in directory `data` alongside
+`python` then you can run the following tests:
+
+### Mnist for zero-dimensional data
+
+```
+cd python
+train_dir=/tmp/mnist
+rm -rf $train_dir
+python vgsl_train.py --model_str='16,0,0,1[Ct5,5,16 Mp3,3 Lfys32 Lfxs64]O0s12' \
+  --max_steps=1024 --train_data=../data/testdata/mnist-sample-00000-of-00001 \
+  --initial_learning_rate=0.001 --final_learning_rate=0.001 \
+  --num_preprocess_threads=1 --train_dir=$train_dir
+python vgsl_eval.py --model_str='16,0,0,1[Ct5,5,16 Mp3,3 Lfys32 Lfxs64]O0s12' \
+  --num_steps=256 --eval_data=../data/testdata/mnist-sample-00000-of-00001 \
+  --num_preprocess_threads=1 --decoder=../testdata/numbers.charset_size=12.txt \
+  --eval_interval_secs=0 --train_dir=$train_dir --eval_dir=$train_dir/eval
+```
+
+Depending on your machine, this should run in about 1 minute, and should obtain
+error rates below 50%. Actual error rates will vary according to random
+initialization.
+
+### Fixed-length targets for number recognition
+
+```
+cd python
+train_dir=/tmp/fixed
+rm -rf $train_dir
+python vgsl_train.py --model_str='8,16,0,1[S1(1x16)1,3 Lfx32 Lrx32 Lfx32]O1s12' \
+  --max_steps=3072 --train_data=../data/testdata/numbers-16-00000-of-00001 \
+  --initial_learning_rate=0.001 --final_learning_rate=0.001 \
+  --num_preprocess_threads=1 --train_dir=$train_dir
+python vgsl_eval.py --model_str='8,16,0,1[S1(1x16)1,3 Lfx32 Lrx32 Lfx32]O1s12' \
+  --num_steps=256 --eval_data=../data/testdata/numbers-16-00000-of-00001 \
+  --num_preprocess_threads=1 --decoder=../testdata/numbers.charset_size=12.txt \
+  --eval_interval_secs=0 --train_dir=$train_dir --eval_dir=$train_dir/eval
+```
+
+Depending on your machine, this should run in about 1-2 minutes, and should
+obtain a label error rate between 50 and 80%, with word error rates probably
+not coming below 100%. Actual error rates will vary
+according to random initialization.
+
+### OCR-style data with CTC
+
+```
+cd python
+train_dir=/tmp/ctc
+rm -rf $train_dir
+python vgsl_train.py --model_str='1,32,0,1[S1(1x32)1,3 Lbx100]O1c105' \
+  --max_steps=4096 --train_data=../data/testdata/arial-32-00000-of-00001 \
+  --initial_learning_rate=0.001 --final_learning_rate=0.001 \
+  --num_preprocess_threads=1 --train_dir=$train_dir &
+python vgsl_eval.py --model_str='1,32,0,1[S1(1x32)1,3 Lbx100]O1c105' \
+  --num_steps=256 --eval_data=../data/testdata/arial-32-00000-of-00001 \
+  --num_preprocess_threads=1 --decoder=../testdata/arial.charset_size=105.txt \
+  --eval_interval_secs=15 --train_dir=$train_dir --eval_dir=$train_dir/eval &
+tensorboard --logdir=$train_dir
+```
+
+Depending on your machine, the background training should run for about 3-4
+minutes, and should obtain a label error rate between 10 and 50%, with
+correspondingly higher word error rates and even higher sequence error rate.
+Actual error rates will vary according to random initialization.
+The background eval will run for ever, and will have to be terminated by hand.
+The tensorboard command will run a visualizer that can be viewed with a
+browser. Go to the link that it prints to view tensorboard and see the
+training progress. See the [Tensorboard](https://www.tensorflow.org/versions/r0.10/how_tos/summaries_and_tensorboard/index.html)
+introduction for more information.
+
+
+### Mini FSNS dataset
+
+You can test the actual STREET model on a small FSNS data set. The model will
+overfit to this small dataset, but will give some confidence that everything
+is working correctly. *Note* that this test runs the training and evaluation
+in parallel, which is something that you should do when training any substantial
+system, so you can monitor progress.
+
+
+```
+cd python
+train_dir=/tmp/fsns
+rm -rf $train_dir
+python vgsl_train.py --max_steps=10000 --num_preprocess_threads=1 \
+  --train_data=../data/testdata/fsns-00000-of-00001 \
+  --initial_learning_rate=0.0001 --final_learning_rate=0.0001 \
+  --train_dir=$train_dir &
+python vgsl_eval.py --num_steps=256 --num_preprocess_threads=1 \
+   --eval_data=../data/testdata/fsns-00000-of-00001 \
+   --decoder=../testdata/charset_size=134.txt \
+   --eval_interval_secs=300 --train_dir=$train_dir --eval_dir=$train_dir/eval &
+tensorboard --logdir=$train_dir
+```
+
+Depending on your machine, the training should finish in about 1-2 *hours*.
+As with the CTC testset above, the eval and tensorboard will have to be
+terminated manually.
+
+## Training a full FSNS model
+
+After running the tests above, you are ready to train the real thing!
+*Note* that you might want to use a train_dir somewhere other than /tmp as
+you can stop the training, reboot if needed and continue if you keep the
+data intact, but /tmp gets deleted on a reboot.
+
+```
+cd python
+train_dir=/tmp/fsns
+rm -rf $train_dir
+python vgsl_train.py --max_steps=100000000 --train_data=../data/train/train* \
+  --train_dir=$train_dir &
+python vgsl_eval.py --num_steps=1000 \
+  --eval_data=../data/validation/validation* \
+  --decoder=../testdata/charset_size=134.txt \
+  --eval_interval_secs=300 --train_dir=$train_dir --eval_dir=$train_dir/eval &
+tensorboard --logdir=$train_dir
+```
+
+Training will take a very long time (probably many weeks) to reach minimum
+error rate on a single machine, although it will probably take substatially
+fewer iterations than with parallel training. Faster training can be obtained
+with parallel training on a cluster.
+Since the setup is likely to be very site-specific, please see the TensorFlow
+documentation on
+[Distributed TensorFlow](https://www.tensorflow.org/versions/r0.10/how_tos/distributed/index.html)
+for more information. Some code changes may be needed in the `Train` function
+in `vgsl_model.py`.
+
+With 40 parallel training workers, nearly optimal error rates (about 25%
+sequence error on the validation set) are obtained in about 30 million steps,
+although the error continues to fall slightly over the next 30 million, to
+perhaps as low as 23%.
+
+With a single machine the number of steps could be substantially lower.
+Although untested on this problem, on other problems the ratio is typically
+5 to 1 so low error rates could be obtained as soon as 6 million iterations,
+which could be reached in about 4 weeks.
+
+
+## The Variable Graph Specification Language
+
+The STREET model makes use of a graph specification language (VGSL) that
+enables rapid experimentation with different model architectures. The language
+defines a Tensor Flow graph that can be used to process images of variable sizes
+to output a 1-dimensional sequence, like a transcription/OCR problem, or a
+0-dimensional label, as for image identification problems. For more information
+see [vgslspecs](g3doc/vgslspecs.md)
+

+ 538 - 0
street/cc/rnn_ops.cc

@@ -0,0 +1,538 @@
+/* Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+// OpKernel of LSTM Neural Networks:
+//
+//   LSTM: VariableLSTMOp (VariableLSTMGradOp)
+//
+// where (.*) are the ops to compute gradients for the corresponding ops.
+
+#define EIGEN_USE_THREADS
+
+#include <vector>
+#ifdef GOOGLE_INCLUDES
+#include "third_party/eigen3/Eigen/Core"
+#include "third_party/tensorflow/core/framework/op.h"
+#include "third_party/tensorflow/core/framework/op_kernel.h"
+#include "third_party/tensorflow/core/framework/tensor.h"
+#else
+#include "Eigen/Core"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor.h"
+#endif  // GOOGLE_INCLUDES
+
+namespace tensorflow {
+
+using Eigen::array;
+using Eigen::DenseIndex;
+using IndexPair = Eigen::IndexPair<int>;
+
+Status AreDimsEqual(int dim1, int dim2, const string& message) {
+  if (dim1 != dim2) {
+    return errors::InvalidArgument(message, ": ", dim1, " vs. ", dim2);
+  }
+  return Status::OK();
+}
+
+// ------------------------------- VariableLSTMOp -----------------------------
+
+// Kernel to compute the forward propagation of a Long Short-Term Memory
+// network. See the doc of the op below for more detail.
+class VariableLSTMOp : public OpKernel {
+ public:
+  explicit VariableLSTMOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("clip", &clip_));
+    OP_REQUIRES(
+        ctx, clip_ >= 0.0,
+        errors::InvalidArgument("clip_ needs to be equal or greator than 0"));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    // Inputs.
+    const auto input = ctx->input(0).tensor<float, 4>();
+    const auto initial_state = ctx->input(1).tensor<float, 2>();
+    const auto initial_memory = ctx->input(2).tensor<float, 2>();
+    const auto w_m_m = ctx->input(3).tensor<float, 3>();
+    const int batch_size = input.dimension(0);
+    const int seq_len = input.dimension(1);
+    const int output_dim = input.dimension(3);
+
+    // Sanity checks.
+    OP_REQUIRES_OK(ctx, AreDimsEqual(4, input.dimension(2), "Input num"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, initial_state.dimension(0),
+                                     "State batch"));
+    OP_REQUIRES_OK(
+        ctx, AreDimsEqual(output_dim, initial_state.dimension(1), "State dim"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, initial_memory.dimension(0),
+                                     "Memory batch"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, initial_memory.dimension(1),
+                                     "Memory dim"));
+    OP_REQUIRES_OK(
+        ctx, AreDimsEqual(output_dim, w_m_m.dimension(0), "Weight dim 0"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(4, w_m_m.dimension(1), "Weight dim 1"));
+    OP_REQUIRES_OK(
+        ctx, AreDimsEqual(output_dim, w_m_m.dimension(2), "Weight dim 2"));
+
+    // Outputs.
+    Tensor* act_tensor = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(
+                            0, {batch_size, seq_len, output_dim}, &act_tensor));
+    auto act = act_tensor->tensor<float, 3>();
+    act.setZero();
+
+    Tensor* gate_raw_act_tensor = nullptr;
+    OP_REQUIRES_OK(ctx,
+                   ctx->allocate_output(1, {batch_size, seq_len, 4, output_dim},
+                                        &gate_raw_act_tensor));
+    auto gate_raw_act = gate_raw_act_tensor->tensor<float, 4>();
+    gate_raw_act.setZero();
+
+    Tensor* memory_tensor = nullptr;
+    OP_REQUIRES_OK(ctx,
+                   ctx->allocate_output(2, {batch_size, seq_len, output_dim},
+                                        &memory_tensor));
+    auto memory = memory_tensor->tensor<float, 3>();
+    memory.setZero();
+
+    // Const and scratch tensors.
+    Tensor ones_tensor;
+    OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, {batch_size, output_dim},
+                                           &ones_tensor));
+    auto ones = ones_tensor.tensor<float, 2>();
+    ones.setConstant(1.0);
+
+    Tensor state_tensor;
+    OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, {batch_size, output_dim},
+                                           &state_tensor));
+    auto state = state_tensor.tensor<float, 2>();
+    state = initial_state;
+
+    Tensor scratch_tensor;
+    OP_REQUIRES_OK(ctx,
+                   ctx->allocate_temp(DT_FLOAT, {batch_size, 4, output_dim},
+                                      &scratch_tensor));
+    auto scratch = scratch_tensor.tensor<float, 3>();
+    scratch.setZero();
+
+    // Uses the most efficient order for the contraction depending on the batch
+    // size.
+
+    // This is the code shared by both cases. It is discouraged to use the
+    // implicit capture with lambda functions, but it should be clear that what
+    // is done here.
+    auto Forward = [&](int i) {
+      // Each pre-activation value is stored in the following order (See the
+      // comment of the op for the meaning):
+      //
+      //   i: 0
+      //   j: 1
+      //   f: 2
+      //   o: 3
+
+      // Adds one to the pre-activation values of the forget gate. This is a
+      // heuristic to make the training easier.
+      scratch.chip(2, 1) += ones;
+
+      gate_raw_act.chip(i, 1) = scratch;
+
+      // c_t = f_t * c_{t-1} + i_t * j_t
+      if (i == 0) {
+        state = initial_memory * scratch.chip(2, 1).sigmoid();
+      } else {
+        state = memory.chip(i - 1, 1) * scratch.chip(2, 1).sigmoid();
+      }
+      state += scratch.chip(0, 1).sigmoid() * scratch.chip(1, 1).tanh();
+
+      if (clip_ > 0.0) {
+        // Clips the values if required.
+        state = state.cwiseMax(-clip_).cwiseMin(clip_);
+      }
+
+      memory.chip(i, 1) = state;
+
+      // h_t = o_t * tanh(c_t)
+      state = scratch.chip(3, 1).sigmoid() * state.tanh();
+
+      act.chip(i, 1) = state;
+    };
+    if (batch_size == 1) {
+      // Reshapes the weight tensor to pretend as if it is a matrix
+      // multiplication which is more efficient.
+      auto w_m_m_r =
+          w_m_m.reshape(array<DenseIndex, 2>{output_dim, 4 * output_dim});
+      // Dimensions for the contraction.
+      const array<IndexPair, 1> m_m_dim = {IndexPair(1, 0)};
+      for (int i = 0; i < seq_len; ++i) {
+        // Computes the pre-activation value of the input and each gate.
+        scratch = input.chip(i, 1) +
+                  state.contract(w_m_m_r, m_m_dim)
+                      .reshape(array<DenseIndex, 3>{batch_size, 4, output_dim});
+        Forward(i);
+      }
+    } else {
+      // Shuffles the dimensions of the weight tensor to be efficient when used
+      // in the left-hand side. Allocates memory for the shuffled tensor for
+      // efficiency.
+      Tensor w_m_m_s_tensor;
+      OP_REQUIRES_OK(ctx,
+                     ctx->allocate_temp(DT_FLOAT, {output_dim * 4, output_dim},
+                                        &w_m_m_s_tensor));
+      auto w_m_m_s = w_m_m_s_tensor.tensor<float, 2>();
+      w_m_m_s = w_m_m.shuffle(array<int, 3>{2, 1, 0})
+                    .reshape(array<DenseIndex, 2>{output_dim * 4, output_dim});
+      // Dimensions for the contraction.
+      const array<IndexPair, 1> m_m_dim = {IndexPair(1, 1)};
+      for (int i = 0; i < seq_len; ++i) {
+        // Computes the pre-activation value of the input and each gate.
+        scratch = input.chip(i, 1) +
+                  w_m_m_s.contract(state, m_m_dim)
+                      .reshape(array<DenseIndex, 3>{output_dim, 4, batch_size})
+                      .shuffle(array<int, 3>{2, 1, 0});
+        Forward(i);
+      }
+    }
+  }
+
+ private:
+  // Threshold to clip the values of memory cells.
+  float clip_ = 0;
+};
+
+REGISTER_KERNEL_BUILDER(Name("VariableLSTM").Device(DEVICE_CPU),
+                        VariableLSTMOp);
+REGISTER_OP("VariableLSTM")
+    .Attr("clip: float = 0.0")
+    .Input("input: float32")
+    .Input("initial_state: float32")
+    .Input("initial_memory: float32")
+    .Input("w_m_m: float32")
+    .Output("activation: float32")
+    .Output("gate_raw_act: float32")
+    .Output("memory: float32")
+    .Doc(R"doc(
+Computes the forward propagation of a Long Short-Term Memory Network.
+
+It computes the following equation recursively for `0<t<=T`:
+
+  i_t  = sigmoid(a_{i,t})
+  j_t  = tanh(a_{j,t})
+  f_t  = sigmoid(a_{f,t} + 1.0)
+  o_t  = sigmoid(a_{o,t})
+  c_t  = f_t * c_{t-1} + i_t * j_t
+  c'_t = min(max(c_t, -clip), clip) if clip > 0 else c_t
+  h_t  = o_t * tanh(c'_t)
+
+where
+
+  a_{l,t} = w_{l,m,m} * h_{t-1} + x'_{l,t}
+
+where
+
+  x'_{l,t} = w_{l,m,i} * x_{t}.
+
+`input` corresponds to the concatenation of `X'_i`, `X'_j`, `X'_f`, and `X'_o`
+where `X'_l = (x'_{l,1}, x'_{l,2}, ..., x'_{l,T})`, `initial_state` corresponds
+to `h_{0}`, `initial_memory` corresponds to `c_{0}` and `weight` corresponds to
+`w_{l,m,m}`. `X'_l` (the transformed input) is computed outside of the op in
+advance, so w_{l,m,i} is not passed in to the op.
+
+`activation` corresponds to `H = (h_1, h_2, ..., h_T)`, `gate_raw_activation`
+corresponds to the concatanation of `A_i`, `A_j`, `A_f` and `A_o`, and `memory`
+corresponds `C = (c_0, c_1, ..., c_T)`.
+
+All entries in the batch are propagated to the end, and are assumed to be the
+same length.
+
+input: 4-D with shape `[batch_size, seq_len, 4, num_nodes]`
+initial_state: 2-D with shape `[batch_size, num_nodes]`
+initial_memory: 2-D with shape `[batch_size, num_nodes]`
+w_m_m: 3-D with shape `[num_nodes, 4, num_nodes]`
+activation: 3-D with shape `[batch_size, seq_len, num_nodes]`
+gate_raw_act: 3-D with shape `[batch_size, seq_len, 4, num_nodes]`
+memory: 3-D with shape `[batch_size, seq_len, num_nodes]`
+)doc");
+
+// ----------------------------- VariableLSTMGradOp ----------------------------
+
+// Kernel to compute the gradient of VariableLSTMOp.
+class VariableLSTMGradOp : public OpKernel {
+ public:
+  explicit VariableLSTMGradOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    // Inputs.
+    const auto initial_state = ctx->input(0).tensor<float, 2>();
+    const auto initial_memory = ctx->input(1).tensor<float, 2>();
+    const auto w_m_m = ctx->input(2).tensor<float, 3>();
+    const auto act = ctx->input(3).tensor<float, 3>();
+    const auto gate_raw_act = ctx->input(4).tensor<float, 4>();
+    const auto memory = ctx->input(5).tensor<float, 3>();
+    const auto act_grad = ctx->input(6).tensor<float, 3>();
+    const auto gate_raw_act_grad = ctx->input(7).tensor<float, 4>();
+    const auto memory_grad = ctx->input(8).tensor<float, 3>();
+    const int batch_size = act.dimension(0);
+    const int seq_len = act.dimension(1);
+    const int output_dim = act.dimension(2);
+
+    // Sanity checks.
+    OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, initial_state.dimension(0),
+                                     "State batch"));
+    OP_REQUIRES_OK(
+        ctx, AreDimsEqual(output_dim, initial_state.dimension(1), "State dim"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, initial_memory.dimension(0),
+                                     "Memory batch"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, initial_memory.dimension(1),
+                                     "Memory dim"));
+    OP_REQUIRES_OK(
+        ctx, AreDimsEqual(output_dim, w_m_m.dimension(0), "Weight dim 0"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(4, w_m_m.dimension(1), "Weight dim 1"));
+    OP_REQUIRES_OK(
+        ctx, AreDimsEqual(output_dim, w_m_m.dimension(2), "Weight dim 2"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, gate_raw_act.dimension(0),
+                                     "Gate raw activation batch"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(seq_len, gate_raw_act.dimension(1),
+                                     "Gate raw activation  len"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(4, gate_raw_act.dimension(2),
+                                     "Gate raw activation num"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, gate_raw_act.dimension(3),
+                                     "Gate raw activation dim"));
+    OP_REQUIRES_OK(
+        ctx, AreDimsEqual(batch_size, memory.dimension(0), "Memory batch"));
+    OP_REQUIRES_OK(ctx,
+                   AreDimsEqual(seq_len, memory.dimension(1), "Memory len"));
+    OP_REQUIRES_OK(ctx,
+                   AreDimsEqual(output_dim, memory.dimension(2), "Memory dim"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, act_grad.dimension(0),
+                                     "Activation gradient batch"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(seq_len, act_grad.dimension(1),
+                                     "Activation gradient len"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, act_grad.dimension(2),
+                                     "Activation gradient dim"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, gate_raw_act_grad.dimension(0),
+                                     "Activation gradient batch"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(seq_len, gate_raw_act_grad.dimension(1),
+                                     "Activation gradient len"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(4, gate_raw_act_grad.dimension(2),
+                                     "Activation gradient num"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, gate_raw_act_grad.dimension(3),
+                                     "Activation gradient dim"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(batch_size, memory_grad.dimension(0),
+                                     "Memory gradient batch"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(seq_len, memory_grad.dimension(1),
+                                     "Memory gradient len"));
+    OP_REQUIRES_OK(ctx, AreDimsEqual(output_dim, memory_grad.dimension(2),
+                                     "Memory gradient dim"));
+
+    // Outputs.
+    std::vector<Tensor*> collections(4, nullptr);
+    OP_REQUIRES_OK(ctx,
+                   ctx->allocate_output(0, {batch_size, seq_len, 4, output_dim},
+                                        &collections[0]));
+    auto input_grad = collections[0]->tensor<float, 4>();
+    input_grad.setZero();
+
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(1, {batch_size, output_dim},
+                                             &collections[1]));
+    auto init_state_grad = collections[1]->tensor<float, 2>();
+    init_state_grad.setZero();
+
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(2, {batch_size, output_dim},
+                                             &collections[2]));
+    auto init_memory_grad = collections[2]->tensor<float, 2>();
+    init_memory_grad.setZero();
+
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(3, {output_dim, 4, output_dim},
+                                             &collections[3]));
+    auto w_m_m_grad = collections[3]->tensor<float, 3>();
+    w_m_m_grad.setZero();
+
+    // Const and scratch tensors.
+    Tensor ones_tensor;
+    OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, {batch_size, output_dim},
+                                           &ones_tensor));
+    auto ones = ones_tensor.tensor<float, 2>();
+    ones.setConstant(1.0);
+
+    Tensor scratch_tensor;
+    OP_REQUIRES_OK(ctx,
+                   ctx->allocate_temp(DT_FLOAT, {batch_size, 4, output_dim},
+                                      &scratch_tensor));
+    auto scratch = scratch_tensor.tensor<float, 3>();
+    scratch.setZero();
+
+    Tensor tmp1_tensor;
+    OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, {batch_size, output_dim},
+                                           &tmp1_tensor));
+    auto tmp1 = tmp1_tensor.tensor<float, 2>();
+    tmp1.setZero();
+
+    Tensor tmp2_tensor;
+    OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, {batch_size, output_dim},
+                                           &tmp2_tensor));
+    auto tmp2 = tmp2_tensor.tensor<float, 2>();
+    tmp2.setZero();
+
+    // Uses the most efficient order for the contraction depending on the batch
+    // size.
+
+    // Shuffles the dimensions of the weight tensor to be efficient when used in
+    // the left-hand side. Allocates memory for the shuffled tensor for
+    // efficiency.
+    Tensor w_m_m_s_tensor;
+    OP_REQUIRES_OK(ctx,
+                   ctx->allocate_temp(DT_FLOAT, {4, output_dim, output_dim},
+                                      &w_m_m_s_tensor));
+    auto w_m_m_s = w_m_m_s_tensor.tensor<float, 3>();
+    if (batch_size == 1) {
+      // Allocates memory only it is used.
+      w_m_m_s = w_m_m.shuffle(array<int, 3>{1, 2, 0});
+    }
+
+    // Dimensions for the contraction with the weight tensor.
+    const array<IndexPair, 1> m_m_dim =
+        batch_size == 1 ? array<IndexPair, 1>{IndexPair(1, 0)}
+                        : array<IndexPair, 1>{IndexPair(1, 1)};
+    // Dimensions for the contraction of the batch dimensions.
+    const array<IndexPair, 1> b_b_dim = {IndexPair(0, 0)};
+    for (int i = seq_len - 1; i >= 0; --i) {
+      if (i == seq_len - 1) {
+        init_state_grad = act_grad.chip(i, 1);
+      } else {
+        w_m_m_grad +=
+            act.chip(i, 1)
+                .contract(scratch.reshape(
+                              array<DenseIndex, 2>{batch_size, 4 * output_dim}),
+                          b_b_dim)
+                .reshape(array<DenseIndex, 3>{output_dim, 4, output_dim});
+        if (batch_size == 1) {
+          init_state_grad.device(ctx->eigen_cpu_device()) =
+              scratch.chip(0, 1).contract(w_m_m_s.chip(0, 0), m_m_dim) +
+              scratch.chip(1, 1).contract(w_m_m_s.chip(1, 0), m_m_dim) +
+              scratch.chip(2, 1).contract(w_m_m_s.chip(2, 0), m_m_dim) +
+              scratch.chip(3, 1).contract(w_m_m_s.chip(3, 0), m_m_dim);
+        } else {
+          init_state_grad.device(ctx->eigen_cpu_device()) =
+              (w_m_m.chip(0, 1).contract(scratch.chip(0, 1), m_m_dim) +
+               w_m_m.chip(1, 1).contract(scratch.chip(1, 1), m_m_dim) +
+               w_m_m.chip(2, 1).contract(scratch.chip(2, 1), m_m_dim) +
+               w_m_m.chip(3, 1).contract(scratch.chip(3, 1), m_m_dim))
+                  .shuffle(array<int, 2>{1, 0});
+        }
+        init_state_grad += act_grad.chip(i, 1);
+      }
+
+      auto gate_raw_act_t = gate_raw_act.chip(i, 1);
+      auto gate_raw_act_grad_t = gate_raw_act_grad.chip(i, 1);
+
+      // Output gate.
+      tmp1 = memory.chip(i, 1);
+      tmp1 = tmp1.tanh();                          // y_t
+      tmp2 = gate_raw_act_t.chip(3, 1).sigmoid();  // o_t
+      scratch.chip(3, 1) = init_state_grad * tmp1 * tmp2 * (ones - tmp2) +
+                           gate_raw_act_grad_t.chip(3, 1);
+
+      init_memory_grad += init_state_grad * tmp2 * (ones - tmp1.square()) +
+                          memory_grad.chip(i, 1);
+
+      // Input gate.
+      tmp1 = gate_raw_act_t.chip(0, 1).sigmoid();  // i_t
+      tmp2 = gate_raw_act_t.chip(1, 1);
+      tmp2 = tmp2.tanh();  // j_t
+      scratch.chip(0, 1) = init_memory_grad * tmp2 * tmp1 * (ones - tmp1) +
+                           gate_raw_act_grad_t.chip(0, 1);
+
+      // Input.
+      scratch.chip(1, 1) = init_memory_grad * tmp1 * (ones - tmp2.square()) +
+                           gate_raw_act_grad_t.chip(1, 1);
+
+      // Forget gate.
+      tmp1 = gate_raw_act_t.chip(2, 1).sigmoid();  // f_t
+      if (i == 0) {
+        scratch.chip(2, 1) =
+            init_memory_grad * initial_memory * tmp1 * (ones - tmp1) +
+            gate_raw_act_grad_t.chip(2, 1);
+      } else {
+        scratch.chip(2, 1) =
+            init_memory_grad * memory.chip(i - 1, 1) * tmp1 * (ones - tmp1) +
+            gate_raw_act_grad_t.chip(2, 1);
+      }
+
+      // Memory.
+      init_memory_grad *= tmp1;
+
+      input_grad.chip(i, 1) = scratch;
+    }
+    w_m_m_grad += initial_state
+                      .contract(scratch.reshape(array<DenseIndex, 2>{
+                                    batch_size, 4 * output_dim}),
+                                b_b_dim)
+                      .reshape(array<DenseIndex, 3>{output_dim, 4, output_dim});
+    if (batch_size == 1) {
+      init_state_grad.device(ctx->eigen_cpu_device()) =
+          (scratch.chip(0, 1).contract(w_m_m_s.chip(0, 0), m_m_dim) +
+           scratch.chip(1, 1).contract(w_m_m_s.chip(1, 0), m_m_dim) +
+           scratch.chip(2, 1).contract(w_m_m_s.chip(2, 0), m_m_dim) +
+           scratch.chip(3, 1).contract(w_m_m_s.chip(3, 0), m_m_dim));
+    } else {
+      init_state_grad.device(ctx->eigen_cpu_device()) =
+          (w_m_m.chip(0, 1).contract(scratch.chip(0, 1), m_m_dim) +
+           w_m_m.chip(1, 1).contract(scratch.chip(1, 1), m_m_dim) +
+           w_m_m.chip(2, 1).contract(scratch.chip(2, 1), m_m_dim) +
+           w_m_m.chip(3, 1).contract(scratch.chip(3, 1), m_m_dim))
+              .shuffle(array<int, 2>{1, 0});
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("VariableLSTMGrad").Device(DEVICE_CPU),
+                        VariableLSTMGradOp);
+
+REGISTER_OP("VariableLSTMGrad")
+    .Input("initial_state: float32")
+    .Input("initial_memory: float32")
+    .Input("w_m_m: float32")
+    .Input("activation: float32")
+    .Input("gate_raw_act: float32")
+    .Input("memory: float32")
+    .Input("act_grad: float32")
+    .Input("gate_raw_act_grad: float32")
+    .Input("memory_grad: float32")
+    .Output("input_grad: float32")
+    .Output("initial_state_grad: float32")
+    .Output("initial_memory_grad: float32")
+    .Output("w_m_m_grad: float32")
+    .Doc(R"doc(
+Computes the gradient for VariableLSTM.
+
+This is to be used conjunction with VariableLSTM. It ignores the clipping used
+in the forward pass.
+
+initial_state: 2-D with shape `[batch_size, num_nodes]`
+initial_memory: 2-D with shape `[batch_size, num_nodes]`
+w_m_m: 3-D with shape `[num_nodes, 4, num_nodes]`
+activation: 3-D with shape `[batch_size, seq_len, num_nodes]`
+gate_raw_act: 3-D with shape `[batch_size, seq_len, 4, num_nodes]`
+memory: 3-D with shape `[batch_size, seq_len, num_nodes]`
+act_grad: 3-D with shape `[batch_size, seq_len, num_nodes]`
+gate_raw_act_grad: 3-D with shape `[batch_size, seq_len, 4, num_nodes]`
+memory_grad: 3-D with shape `[batch_size, seq_len, num_nodes]`
+input_grad: 3-D with shape `[batch_size, seq_len, num_nodes]`
+initial_state_grad: 2-D with shape `[batch_size, num_nodes]`
+initial_memory_grad: 2-D with shape `[batch_size, num_nodes]`
+w_m_m_grad: 3-D with shape `[num_nodes, 4, num_nodes]`
+)doc");
+
+}  // namespace tensorflow

BIN
street/g3doc/avdessapins.png


+ 324 - 0
street/g3doc/vgslspecs.md

@@ -0,0 +1,324 @@
+# VGSL Specs - rapid prototyping of mixed conv/LSTM networks for images.
+
+Variable-size Graph Specification Language (VGSL) enables the specification of a
+Tensor Flow graph, composed of convolutions and LSTMs, that can process
+variable-sized images, from a very short definition string.
+
+## Applications: What is VGSL Specs good for?
+
+VGSL Specs are designed specifically to create TF graphs for:
+
+*   Variable size images as the input. (In one or BOTH dimensions!)
+*   Output an image (heat map), sequence (like text), or a category.
+*   Convolutions and LSTMs are the main computing component.
+*   Fixed-size images are OK too!
+
+But wait, aren't there other systems that simplify generating TF graphs? There
+are indeed, but something they all have in common is that they are designed for
+fixed size images only. If you want to solve a real OCR problem, you either have
+to cut the image into arbitrary sized pieces and try to stitch the results back
+together, or use VGSL.
+
+## Basic Usage
+
+A full model, including input and the output layers, can be built using
+vgsl_model.py. Alternatively you can supply your own tensors and add your own
+loss function layer if you wish, using vgslspecs.py directly.
+
+### Building a full model
+
+Provided your problem matches the one addressed by vgsl_model, you are good to
+go.
+
+Targeted problems:
+
+*   Images for input, either 8 bit greyscale or 24 bit color.
+*   Output is 0-d (A category, like cat, dog, train, car.)
+*   Output is 1-d, with either variable length or a fixed length sequence, eg
+    OCR, transcription problems in general.
+
+Currently only softmax (1 of n) outputs are supported, but it would not be
+difficult to extend to logistic.
+
+Use vgsl_train.py to train your model, and vgsl_eval.py to evaluate it. They
+just call Train and Eval in vgsl_model.py.
+
+### Model string for a full model
+
+The model string for a full model includes the input spec, the output spec and
+the layers spec in between. Example:
+
+```
+'1,0,0,3[Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256]O1c105'
+```
+
+The first 4 numbers specify the standard TF tensor dimensions: [batch, height,
+width, depth], except that height and/or width may be zero, allowing them to be
+variable. Batch is specific only to training, and may be a different value at
+recognition/inference time. Depth needs to be 1 for greyscale and 3 for color.
+
+The model string in square brackets [] is the main model definition, which is
+described [below.](#basic-layers-syntax) The output specification takes the
+form:
+
+```
+O(2|1|0)(l|s|c)n output layer with n classes.
+  2 (heatmap) Output is a 2-d vector map of the input (possibly at
+    different scale). (Not yet supported.)
+  1 (sequence) Output is a 1-d sequence of vector values.
+  0 (category) Output is a 0-d single vector value.
+  l uses a logistic non-linearity on the output, allowing multiple
+    hot elements in any output vector value. (Not yet supported.)
+  s uses a softmax non-linearity, with one-hot output in each value.
+  c uses a softmax with CTC. Can only be used with s (sequence).
+  NOTE Only O0s, O1s and O1c are currently supported.
+```
+
+The number of classes must match the encoding of the TF Example data set.
+
+### Layers only - providing your own input and loss layers
+
+You don't have to use the canned input/output modules, if you provide your
+separate code to read TF Example and loss functions. First prepare your inputs:
+
+*   A TF-conventional batch of: `images = tf.float32[batch, height, width,
+    depth]`
+*   A tensor of the width of each image in the batch: `widths = tf.int64[batch]`
+*   A tensor of the height of each image in the batch: `heights =
+    tf.int64[batch]`
+
+Note that these can be created from individual images using
+`tf.train.batch_join` with `dynamic_pad=True.`
+
+```python
+import vgslspecs
+...
+spec = '[Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256]'
+vgsl = vgslspecs.VGSLSpecs(widths, heights, is_training=True)
+last_layer = vgsl.Build(images, spec)
+...
+AddSomeLossFunction(last_layer)....
+```
+
+With some appropriate training data, this would create a world-class OCR engine!
+
+## Basic Layers Syntax
+
+NOTE that *all* ops input and output the standard TF convention of a 4-d tensor:
+`[batch, height, width, depth]` *regardless of any collapsing of dimensions.*
+This greatly simplifies things, and allows the VGSLSpecs class to track changes
+to the values of widths and heights, so they can be correctly passed in to LSTM
+operations, and used by any downstream CTC operation.
+
+NOTE: in the descriptions below, `<d>` is a numeric value, and literals are
+described using regular expression syntax.
+
+NOTE: Whitespace is allowed between ops.
+
+### Naming
+
+Each op gets a unique name by default, based on its spec string plus its
+character position in the overall specification. All the Ops take an optional
+name argument in braces after the mnemonic code, but before any numeric
+arguments.
+
+### Functional ops
+
+```
+C(s|t|r|l|m)[{name}]<y>,<x>,<d> Convolves using a y,x window, with no shrinkage,
+  SAME infill, d outputs, with s|t|r|l|m non-linear layer.
+F(s|t|r|l|m)[{name}]<d> Fully-connected with s|t|r|l|m non-linearity and d
+  outputs. Reduces height, width to 1. Input height and width must be constant.
+L(f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+  The LSTM must have one of:
+    f runs the LSTM forward only.
+    r runs the LSTM reversed only.
+    b runs the LSTM bidirectionally.
+  It will operate on either the x- or y-dimension, treating the other dimension
+  independently (as if part of the batch).
+  (Full 2-d and grid are not yet supported).
+  s (optional) summarizes the output in the requested dimension,
+     outputting only the final step, collapsing the dimension to a
+     single element.
+Do[{name}] Insert a dropout layer.
+```
+
+In the above, `(s|t|r|l|m)` specifies the type of the non-linearity:
+
+```python
+s = sigmoid
+t = tanh
+r = relu
+l = linear (i.e., None)
+m = softmax
+```
+
+Examples:
+
+`Cr5,5,32` Runs a 5x5 Relu convolution with 32 depth/number of filters.
+
+`Lfx{MyLSTM}128` runs a forward-only LSTM, named 'MyLSTM' in the x-dimension
+with 128 outputs, treating the y dimension independently.
+
+`Lfys64` runs a forward-only LSTM in the y-dimension with 64 outputs, treating
+the x-dimension independently and collapses the y-dimension to 1 element.
+
+### Plumbing ops
+
+The plumbing ops allow the construction of arbitrarily complex graphs. Something
+currently missing is the ability to define macros for generating say an
+inception unit in multiple places.
+
+```
+[...] Execute ... networks in series (layers).
+(...) Execute ... networks in parallel, with their output concatenated in depth.
+S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to another
+  dimension.
+Mp[{name}]<y>,<x> Maxpool the input, reducing the (y,x) rectangle to a single
+  value.
+```
+
+In the `S` op, `<a>, <b>, <d>, <e>, <f>` are numbers.
+
+`S` is a generalized reshape. It splits input dimension `d` into `a` x `b`,
+sending the high/most significant part `a` to the high/most significant side of
+dimension `e`, and the low part `b` to the high side of dimension `f`.
+Exception: if `d=e=f`, then then dimension `d` is internally transposed to
+`bxa`. *At least one* of `e`, `f` must be equal to `d`, so no dimension can be
+totally destroyed. Either `a` or `b` can be zero, meaning whatever is left after
+taking out the other, allowing dimensions to be of variable size.
+
+NOTE: Remember the standard TF convention of a 4-d tensor: `[batch, height,
+width, depth]`, so `batch=0, height=1, width=2, depth=3.`
+
+Eg. `S3(3x50)2,3` will split the 150-element depth into 3x50, with the 3 going
+to the most significant part of the width, and the 50 part staying in depth.
+This will rearrange a 3x50 output parallel operation to spread the 3 output sets
+over width.
+
+### Full Examples
+
+Example 1: A graph capable of high quality OCR.
+
+`1,0,0,1[Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256]O1c105`
+
+As layer descriptions: (Input layer is at the bottom, output at the top.)
+
+```
+O1c105: Output layer produces 1-d (sequence) output, trained with CTC,
+  outputting 105 classes.
+Lfx256: Forward-only LSTM in x with 256 outputs
+Lrx128: Reverse-only LSTM in x with 128 outputs
+Lfx128: Forward-only LSTM in x with 128 outputs
+Lfys64: Dimension-summarizing LSTM, summarizing the y-dimension with 64 outputs
+Mp3,3: 3 x 3 Maxpool
+Ct5,5,16: 5 x 5 Convolution with 16 outputs and tanh non-linearity
+[]: The body of the graph is alway expressed as a series of layers.
+1,0,0,1: Input is a batch of 1 image of variable size in greyscale
+```
+
+Example 2: The STREET network for reading French street name signs end-to-end.
+For a detailed description see the [FSNS dataset
+paper](http://link.springer.com/chapter/10.1007%2F978-3-319-46604-0_30)
+
+```
+1,600,150,3[S2(4x150)0,2 Ct5,5,16 Mp2,2 Ct5,5,64 Mp3,3
+  ([Lrys64 Lbx128][Lbys64 Lbx128][Lfys64 Lbx128]) S3(3x0)2,3
+  Lfx128 Lrx128 S0(1x4)0,3 Lfx256]O1c134
+```
+
+Since networks are usually illustrated with the input at the bottom, the input
+layer is at the bottom, output at the top, with 'headings' *below* the section
+they introduce.
+
+```
+O1c134: Output is a 1-d sequence, trained with CTC and 134 output softmax.
+Lfx256: Forward-only LSTM with 256 outputs
+S0(1x4)0,3: Reshape transferring the batch of 4 tiles to the depth dimension.
+Lrx128: Reverse-only LSTM with 128 outputs
+Lfx128: Forward-only LSTM with 128 outputs
+(Final section above)
+S3(3x0)2,3: Split the outputs of the 3 parallel summarizers and spread over the
+  x-dimension
+  [Lfys64 Lbx128]: Summarizing LSTM downwards on the y-dimension with 64
+    outputs, followed by a bi-directional LSTM in the x-dimension with 128
+    outputs
+  [Lbys64 Lbx128]: Summarizing bi-directional LSTM on the y-dimension with
+    64 outputs, followed by a bi-directional LSTM in the x-dimension with 128
+    outputs
+  [Lrys64 Lbx128]: Summarizing LSTM upwards on the y-dimension with 64 outputs,
+    followed by a bi-directional LSTM in the x-dimension with 128 outputs
+(): In parallel (re-using the inputs and concatenating the outputs):
+(Summarizing section above)
+Mp3,3: 3 x 3 Maxpool
+Ct5,5,64: 5 x 5 Convolution with 64 outputs and tanh non-linearity
+Mp2,2: 2 x 2 Maxpool
+Ct5,5,16: 5 x 5 Convolution with 16 outputs and tanh non-linearity
+S2(4x150)0,2: Split the x-dimension into 4x150, converting each tiled 600x150
+image into a batch of 4 150x150 images
+(Convolutional input section above)
+[]: The body of the graph is alway expressed as a series of layers.
+1,150,600,3: Input is a batch of 1, 600x150 image in 24 bit color
+```
+
+## Variable size Tensors Under the Hood
+
+Here are some notes about handling variable-sized images since they require some
+consideration and a little bit of knowledge about what goes on inside.
+
+A variable-sized image is an input for which the width and/or height are not
+known at graph-building time, so the tensor shape contains unknown/None/-1
+sizes.
+
+Many standard NN layers, such as convolutions, are designed to cope naturally
+with variable-sized images in TF and produce a variable sized image as the
+output. For other layers, such as 'Fully connected' variable size is
+fundamentally difficult, if not impossible to deal with, since by definition,
+*all* its inputs are connected via a weight to an output. The number of inputs
+therefore must be fixed.
+
+It is possible to handle variable sized images by using sparse tensors. Some
+implementations make a single variable dimension a list instead of part of the
+tensor. Both these solutions suffer from completely segregating the world of
+variable size from the world of fixed size, making models and their descriptions
+completely non-interchangeable.
+
+In VGSL, we use a standard 4-d Tensor, `[batch, height, width, depth]` and
+either use a batch size of 1 or put up with padding of the input images to the
+largest size of any element of the batch. The other price paid for this
+standardization is that the user must supply a pair of tensors of shape [batch]
+specifying the width and height of each input in a batch. This allows the LSTMs
+in the graph to know how many iterations to execute and how to correctly
+back-propagate the gradients.
+
+The standard TF implementation of CTC also requires a tensor giving the sequence
+lengths of its inputs. If the output of VGSL is going into CTC, the lengths can
+be obtained using:
+
+```python
+import vgslspecs
+...
+spec = '[Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256]'
+vgsl = vgslspecs.VGSLSpecs(widths, heights, is_training=True)
+last_layer = vgsl.Build(images, spec)
+seq_lengths = vgsl.GetLengths()
+```
+
+The above will provide the widths that were given in the constructor, scaled
+down by the max-pool operator. The heights may be obtained using
+`vgsl.GetLengths(1)`, specifying the index of the y-dimension.
+
+NOTE that currently the only way of collapsing a dimension of unknown size to
+known size (1) is through the use of a summarizing LSTM. A single summarizing
+LSTM will collapse one dimension (x or y), leaving a 1-d sequence. The 1-d
+sequence can then be collapsed in the other dimension to make a 0-d categorical
+(softmax) or embedding (logistic) output.
+
+Using the (parallel) op it is entirely possible to run multiple [series] of ops
+that collapse x first in one and y first in the other, reducing both eventually
+to a single categorical value! For eample, the following description may do
+something useful with ImageNet-like problems:
+
+```python
+[Cr5,5,16 Mp2,2 Cr5,5,64 Mp3,3 ([Lfxs64 Lfys256] [Lfys64 Lfxs256]) Fr512 Fr512]
+```

+ 243 - 0
street/python/decoder.py

@@ -0,0 +1,243 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Basic CTC+recoder decoder.
+
+Decodes a sequence of class-ids into UTF-8 text.
+For basic information on CTC See:
+Alex Graves et al. Connectionist Temporal Classification: Labelling Unsegmented
+Sequence Data with Recurrent Neural Networks.
+http://www.cs.toronto.edu/~graves/icml_2006.pdf
+"""
+import collections
+import re
+
+import errorcounter as ec
+import tensorflow as tf
+
+# Named tuple Part describes a part of a multi (1 or more) part code that
+# represents a utf-8 string. For example, Chinese character 'x' might be
+# represented by 3 codes of which (utf8='x', index=1, num_codes3) would be the
+# middle part. (The actual code is not stored in the tuple).
+Part = collections.namedtuple('Part', 'utf8 index, num_codes')
+
+
+# Class that decodes a sequence of class-ids into UTF-8 text.
+class Decoder(object):
+  """Basic CTC+recoder decoder."""
+
+  def __init__(self, filename):
+    r"""Constructs a Decoder.
+
+    Reads the text file describing the encoding and build the encoder.
+    The text file contains lines of the form:
+    <code>[,<code>]*\t<string>
+    Each line defines a mapping from a sequence of one or more integer codes to
+    a corresponding utf-8 string.
+    Args:
+      filename:   Name of file defining the decoding sequences.
+    """
+    # self.decoder is a list of lists of Part(utf8, index, num_codes).
+    # The index to the top-level list is a code. The list given by the code
+    # index is a list of the parts represented by that code, Eg if the code 42
+    # represents the 2nd (index 1) out of 3 part of Chinese character 'x', then
+    # self.decoder[42] = [..., (utf8='x', index=1, num_codes3), ...] where ...
+    # means all other uses of the code 42.
+    self.decoder = []
+    if filename:
+      self._InitializeDecoder(filename)
+
+  def SoftmaxEval(self, sess, model, num_steps):
+    """Evaluate a model in softmax mode.
+
+    Adds char, word recall and sequence error rate events to the sw summary
+    writer, and returns them as well
+    TODO(rays) Add LogisticEval.
+    Args:
+      sess:  A tensor flow Session.
+      model: The model to run in the session. Requires a VGSLImageModel or any
+        other class that has a using_ctc attribute and a RunAStep(sess) method
+        that reurns a softmax result with corresponding labels.
+      num_steps: Number of steps to evaluate for.
+    Returns:
+      ErrorRates named tuple.
+    Raises:
+      ValueError: If an unsupported number of dimensions is used.
+    """
+    coord = tf.train.Coordinator()
+    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+    # Run the requested number of evaluation steps, gathering the outputs of the
+    # softmax and the true labels of the evaluation examples.
+    total_label_counts = ec.ErrorCounts(0, 0, 0, 0)
+    total_word_counts = ec.ErrorCounts(0, 0, 0, 0)
+    sequence_errors = 0
+    for _ in xrange(num_steps):
+      softmax_result, labels = model.RunAStep(sess)
+      # Collapse softmax to same shape as labels.
+      predictions = softmax_result.argmax(axis=-1)
+      # Exclude batch from num_dims.
+      num_dims = len(predictions.shape) - 1
+      batch_size = predictions.shape[0]
+      null_label = softmax_result.shape[-1] - 1
+      for b in xrange(batch_size):
+        if num_dims == 2:
+          # TODO(rays) Support 2-d data.
+          raise ValueError('2-d label data not supported yet!')
+        else:
+          if num_dims == 1:
+            pred_batch = predictions[b, :]
+            labels_batch = labels[b, :]
+          else:
+            pred_batch = [predictions[b]]
+            labels_batch = [labels[b]]
+          text = self.StringFromCTC(pred_batch, model.using_ctc, null_label)
+          truth = self.StringFromCTC(labels_batch, False, null_label)
+          # Note that recall_errs is false negatives (fn) aka drops/deletions.
+          # Actual recall would be 1-fn/truth_words.
+          # Likewise precision_errs is false positives (fp) aka adds/insertions.
+          # Actual precision would be 1-fp/ocr_words.
+          total_word_counts = ec.AddErrors(total_word_counts,
+                                           ec.CountWordErrors(text, truth))
+          total_label_counts = ec.AddErrors(total_label_counts,
+                                            ec.CountErrors(text, truth))
+          if text != truth:
+            sequence_errors += 1
+
+    coord.request_stop()
+    coord.join(threads)
+    return ec.ComputeErrorRates(total_label_counts, total_word_counts,
+                                sequence_errors, num_steps * batch_size)
+
+  def StringFromCTC(self, ctc_labels, merge_dups, null_label):
+    """Decodes CTC output to a string.
+
+    Extracts only sequences of codes that are allowed by self.decoder.
+    Labels that make illegal code sequences are dropped.
+    Note that, by its nature of taking only top choices, this is much weaker
+    than a full-blown beam search that considers all the softmax outputs.
+    For languages without many multi-code sequences, this doesn't make much
+    difference, but for complex scripts the accuracy will be much lower.
+    Args:
+      ctc_labels: List of class labels including null characters to remove.
+      merge_dups: If True, Duplicate labels will be merged
+      null_label: Label value to ignore.
+
+    Returns:
+      Labels decoded to a string.
+    """
+    # Run regular ctc on the labels, extracting a list of codes.
+    codes = self._CodesFromCTC(ctc_labels, merge_dups, null_label)
+    length = len(codes)
+    if length == 0:
+      return ''
+    # strings and partials are both indexed by the same index as codes.
+    # strings[i] is the best completed string upto position i, and
+    # partials[i] is a list of partial code sequences at position i.
+    # Warning: memory is squared-order in length.
+    strings = []
+    partials = []
+    for pos in xrange(length):
+      code = codes[pos]
+      parts = self.decoder[code]
+      partials.append([])
+      strings.append('')
+      # Iterate over the parts that this code can represent.
+      for utf8, index, num_codes in parts:
+        if index > pos:
+          continue
+        # We can use code if it is an initial code (index==0) or continues a
+        # sequence in the partials list at the previous position.
+        if index == 0 or partials[pos - 1].count(
+            Part(utf8, index - 1, num_codes)) > 0:
+          if index < num_codes - 1:
+            # Save the partial sequence.
+            partials[-1].append(Part(utf8, index, num_codes))
+          elif not strings[-1]:
+            # A code sequence is completed. Append to the best string that we
+            # had where it started.
+            if pos >= num_codes:
+              strings[-1] = strings[pos - num_codes] + utf8
+            else:
+              strings[-1] = utf8
+      if not strings[-1] and pos > 0:
+        # We didn't get anything here so copy the previous best string, skipping
+        # the current code, but it may just be a partial anyway.
+        strings[-1] = strings[-2]
+    return strings[-1]
+
+  def _InitializeDecoder(self, filename):
+    """Reads the decoder file and initializes self.decoder from it.
+
+    Args:
+      filename: Name of text file mapping codes to utf8 strings.
+    Raises:
+      ValueError: if the input file is not parsed correctly.
+    """
+    line_re = re.compile(r'(?P<codes>\d+(,\d+)*)\t(?P<utf8>.+)')
+    with tf.gfile.GFile(filename) as f:
+      for line in f:
+        m = line_re.match(line)
+        if m is None:
+          raise ValueError('Unmatched line:', line)
+        # codes is the sequence that maps to the string.
+        str_codes = m.groupdict()['codes'].split(',')
+        codes = []
+        for code in str_codes:
+          codes.append(int(code))
+        utf8 = m.groupdict()['utf8']
+        num_codes = len(codes)
+        for index, code in enumerate(codes):
+          while code >= len(self.decoder):
+            self.decoder.append([])
+          self.decoder[code].append(Part(utf8, index, num_codes))
+
+  def _CodesFromCTC(self, ctc_labels, merge_dups, null_label):
+    """Collapses CTC output to regular output.
+
+    Args:
+      ctc_labels: List of class labels including null characters to remove.
+      merge_dups: If True, Duplicate labels will be merged.
+      null_label: Label value to ignore.
+
+    All trailing zeros are removed!!
+    TODO(rays) This may become a problem with non-CTC models.
+    If using charset, this should not be a problem as zero is always space.
+    tf.pad can only append zero, so we have to be able to drop them, as a
+    non-ctc will have learned to output trailing zeros instead of trailing
+    nulls. This is awkward, as the stock ctc loss function requires that the
+    null character be num_classes-1.
+    Returns:
+      (List of) Labels with null characters removed.
+    """
+    out_labels = []
+    prev_label = -1
+    zeros_needed = 0
+    for label in ctc_labels:
+      if label == null_label:
+        prev_label = -1
+      elif label != prev_label or not merge_dups:
+        if label == 0:
+          # Count zeros and only emit them when it is clear there is a non-zero
+          # after, so as to truncate away all trailing zeros.
+          zeros_needed += 1
+        else:
+          if merge_dups and zeros_needed > 0:
+            out_labels.append(0)
+          else:
+            out_labels += [0] * zeros_needed
+          zeros_needed = 0
+          out_labels.append(label)
+        prev_label = label
+    return out_labels

+ 57 - 0
street/python/decoder_test.py

@@ -0,0 +1,57 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for decoder."""
+import os
+
+import tensorflow as tf
+import decoder
+
+
+def _testdata(filename):
+  return os.path.join('../testdata/', filename)
+
+
+class DecoderTest(tf.test.TestCase):
+
+  def testCodesFromCTC(self):
+    """Tests that the simple CTC decoder drops nulls and duplicates.
+    """
+    ctc_labels = [9, 9, 9, 1, 9, 2, 2, 3, 9, 9, 0, 0, 1, 9, 1, 9, 9, 9]
+    decode = decoder.Decoder(filename=None)
+    non_null_labels = decode._CodesFromCTC(
+        ctc_labels, merge_dups=False, null_label=9)
+    self.assertEqual(non_null_labels, [1, 2, 2, 3, 0, 0, 1, 1])
+    idempotent_labels = decode._CodesFromCTC(
+        non_null_labels, merge_dups=False, null_label=9)
+    self.assertEqual(idempotent_labels, non_null_labels)
+    collapsed_labels = decode._CodesFromCTC(
+        ctc_labels, merge_dups=True, null_label=9)
+    self.assertEqual(collapsed_labels, [1, 2, 3, 0, 1, 1])
+    non_idempotent_labels = decode._CodesFromCTC(
+        collapsed_labels, merge_dups=True, null_label=9)
+    self.assertEqual(non_idempotent_labels, [1, 2, 3, 0, 1])
+
+  def testStringFromCTC(self):
+    """Tests that the decoder can decode sequences including multi-codes.
+    """
+    #             -  f  -  a  r  -  m(1/2)m     -junk sp b  a  r  -  n  -
+    ctc_labels = [9, 6, 9, 1, 3, 9, 4, 9, 5, 5, 9, 5, 0, 2, 1, 3, 9, 4, 9]
+    decode = decoder.Decoder(filename=_testdata('charset_size_10.txt'))
+    text = decode.StringFromCTC(ctc_labels, merge_dups=True, null_label=9)
+    self.assertEqual(text, 'farm barn')
+
+
+if __name__ == '__main__':
+  tf.test.main()

+ 123 - 0
street/python/errorcounter.py

@@ -0,0 +1,123 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Some simple tools for error counting.
+
+"""
+import collections
+
+# Named tuple Error counts describes the counts needed to accumulate errors
+# over multiple trials:
+#   false negatives (aka drops or deletions),
+#   false positives: (aka adds or insertions),
+#   truth_count: number of elements in ground truth = denominator for fn,
+#   test_count:  number of elements in test string = denominator for fp,
+# Note that recall = 1 - fn/truth_count, precision = 1 - fp/test_count,
+# accuracy = 1 - (fn + fp) / (truth_count + test_count).
+ErrorCounts = collections.namedtuple('ErrorCounts', ['fn', 'fp', 'truth_count',
+                                                     'test_count'])
+
+# Named tuple for error rates, as a percentage. Accuracies are just 100-error.
+ErrorRates = collections.namedtuple('ErrorRates',
+                                    ['label_error', 'word_recall_error',
+                                     'word_precision_error', 'sequence_error'])
+
+
+def CountWordErrors(ocr_text, truth_text):
+  """Counts the word drop and add errors as a bag of words.
+
+  Args:
+    ocr_text:    OCR text string.
+    truth_text:  Truth text string.
+
+  Returns:
+    ErrorCounts named tuple.
+  """
+  # Convert to lists of words.
+  return CountErrors(ocr_text.split(), truth_text.split())
+
+
+def CountErrors(ocr_text, truth_text):
+  """Counts the drops and adds between 2 bags of iterables.
+
+  Simple bag of objects count returns the number of dropped and added
+  elements, regardless of order, from anything that is iterable, eg
+  a pair of strings gives character errors, and a pair of word lists give
+  word errors.
+  Args:
+    ocr_text:    OCR text iterable (eg string for chars, word list for words).
+    truth_text:  Truth text iterable.
+
+  Returns:
+    ErrorCounts named tuple.
+  """
+  counts = collections.Counter(truth_text)
+  counts.subtract(ocr_text)
+  drops = sum(c for c in counts.values() if c > 0)
+  adds = sum(-c for c in counts.values() if c < 0)
+  return ErrorCounts(drops, adds, len(truth_text), len(ocr_text))
+
+
+def AddErrors(counts1, counts2):
+  """Adds the counts and returns a new sum tuple.
+
+  Args:
+    counts1: ErrorCounts named tuples to sum.
+    counts2: ErrorCounts named tuples to sum.
+  Returns:
+    Sum of counts1, counts2.
+  """
+  return ErrorCounts(counts1.fn + counts2.fn, counts1.fp + counts2.fp,
+                     counts1.truth_count + counts2.truth_count,
+                     counts1.test_count + counts2.test_count)
+
+
+def ComputeErrorRates(label_counts, word_counts, seq_errors, num_seqs):
+  """Returns an ErrorRates corresponding to the given counts.
+
+  Args:
+    label_counts: ErrorCounts for the character labels
+    word_counts:  ErrorCounts for the words
+    seq_errors:   Number of sequence errors
+    num_seqs:     Total sequences
+  Returns:
+    ErrorRates corresponding to the given counts.
+  """
+  label_errors = label_counts.fn + label_counts.fp
+  num_labels = label_counts.truth_count + label_counts.test_count
+  return ErrorRates(
+      ComputeErrorRate(label_errors, num_labels),
+      ComputeErrorRate(word_counts.fn, word_counts.truth_count),
+      ComputeErrorRate(word_counts.fp, word_counts.test_count),
+      ComputeErrorRate(seq_errors, num_seqs))
+
+
+def ComputeErrorRate(error_count, truth_count):
+  """Returns a sanitized percent error rate from the raw counts.
+
+  Prevents div by 0 and clips return to 100%.
+  Args:
+    error_count: Number of errors.
+    truth_count: Number to divide by.
+
+  Returns:
+    100.0 * error_count / truth_count clipped to 100.
+  """
+  if truth_count == 0:
+    truth_count = 1
+    error_count = 1
+  elif error_count > truth_count:
+    error_count = truth_count
+  return error_count * 100.0 / truth_count

+ 124 - 0
street/python/errorcounter_test.py

@@ -0,0 +1,124 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for errorcounter."""
+import tensorflow as tf
+import errorcounter as ec
+
+
+class ErrorcounterTest(tf.test.TestCase):
+
+  def testComputeErrorRate(self):
+    """Tests that the percent calculation works as expected.
+    """
+    rate = ec.ComputeErrorRate(error_count=0, truth_count=0)
+    self.assertEqual(rate, 100.0)
+    rate = ec.ComputeErrorRate(error_count=1, truth_count=0)
+    self.assertEqual(rate, 100.0)
+    rate = ec.ComputeErrorRate(error_count=10, truth_count=1)
+    self.assertEqual(rate, 100.0)
+    rate = ec.ComputeErrorRate(error_count=0, truth_count=1)
+    self.assertEqual(rate, 0.0)
+    rate = ec.ComputeErrorRate(error_count=3, truth_count=12)
+    self.assertEqual(rate, 25.0)
+
+  def testCountErrors(self):
+    """Tests that the error counter works as expected.
+    """
+    truth_str = 'farm barn'
+    counts = ec.CountErrors(ocr_text=truth_str, truth_text=truth_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=0, fp=0, truth_count=9, test_count=9))
+    # With a period on the end, we get a char error.
+    dot_str = 'farm barn.'
+    counts = ec.CountErrors(ocr_text=dot_str, truth_text=truth_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=0, fp=1, truth_count=9, test_count=10))
+    counts = ec.CountErrors(ocr_text=truth_str, truth_text=dot_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=1, fp=0, truth_count=10, test_count=9))
+    # Space is just another char.
+    no_space = 'farmbarn'
+    counts = ec.CountErrors(ocr_text=no_space, truth_text=truth_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=1, fp=0, truth_count=9, test_count=8))
+    counts = ec.CountErrors(ocr_text=truth_str, truth_text=no_space)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=0, fp=1, truth_count=8, test_count=9))
+    # Lose them all.
+    counts = ec.CountErrors(ocr_text='', truth_text=truth_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=9, fp=0, truth_count=9, test_count=0))
+    counts = ec.CountErrors(ocr_text=truth_str, truth_text='')
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=0, fp=9, truth_count=0, test_count=9))
+
+  def testCountWordErrors(self):
+    """Tests that the error counter works as expected.
+    """
+    truth_str = 'farm barn'
+    counts = ec.CountWordErrors(ocr_text=truth_str, truth_text=truth_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=0, fp=0, truth_count=2, test_count=2))
+    # With a period on the end, we get a word error.
+    dot_str = 'farm barn.'
+    counts = ec.CountWordErrors(ocr_text=dot_str, truth_text=truth_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=1, fp=1, truth_count=2, test_count=2))
+    counts = ec.CountWordErrors(ocr_text=truth_str, truth_text=dot_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=1, fp=1, truth_count=2, test_count=2))
+    # Space is special.
+    no_space = 'farmbarn'
+    counts = ec.CountWordErrors(ocr_text=no_space, truth_text=truth_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=2, fp=1, truth_count=2, test_count=1))
+    counts = ec.CountWordErrors(ocr_text=truth_str, truth_text=no_space)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=1, fp=2, truth_count=1, test_count=2))
+    # Lose them all.
+    counts = ec.CountWordErrors(ocr_text='', truth_text=truth_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=2, fp=0, truth_count=2, test_count=0))
+    counts = ec.CountWordErrors(ocr_text=truth_str, truth_text='')
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=0, fp=2, truth_count=0, test_count=2))
+    # With a space in ba rn, there is an extra add.
+    sp_str = 'farm ba rn'
+    counts = ec.CountWordErrors(ocr_text=sp_str, truth_text=truth_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=1, fp=2, truth_count=2, test_count=3))
+    counts = ec.CountWordErrors(ocr_text=truth_str, truth_text=sp_str)
+    self.assertEqual(
+        counts, ec.ErrorCounts(
+            fn=2, fp=1, truth_count=3, test_count=2))
+
+
+if __name__ == '__main__':
+  tf.test.main()

+ 253 - 0
street/python/nn_ops.py

@@ -0,0 +1,253 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Ops and utilities for neural networks.
+
+For now, just an LSTM layer.
+"""
+import shapes
+import tensorflow as tf
+rnn = tf.load_op_library("../cc/rnn_ops.so")
+
+
+def rnn_helper(inp,
+               length,
+               cell_type=None,
+               direction="forward",
+               name=None,
+               *args,
+               **kwargs):
+  """Adds ops for a recurrent neural network layer.
+
+  This function calls an actual implementation of a recurrent neural network
+  based on `cell_type`.
+
+  There are three modes depending on the value of `direction`:
+
+    forward: Adds a forward RNN.
+    backward: Adds a backward RNN.
+    bidirectional: Adds both forward and backward RNNs and creates a
+                   bidirectional RNN.
+
+  Args:
+    inp: A 3-D tensor of shape [`batch_size`, `max_length`, `feature_dim`].
+    length: A 1-D tensor of shape [`batch_size`] and type int64. Each element
+            represents the length of the corresponding sequence in `inp`.
+    cell_type: Cell type of RNN. Currently can only be "lstm".
+    direction: One of "forward", "backward", "bidirectional".
+    name: Name of the op.
+    *args: Other arguments to the layer.
+    **kwargs: Keyword arugments to the layer.
+
+  Returns:
+    A 3-D tensor of shape [`batch_size`, `max_length`, `num_nodes`].
+  """
+
+  assert cell_type is not None
+  rnn_func = None
+  if cell_type == "lstm":
+    rnn_func = lstm_layer
+  assert rnn_func is not None
+  assert direction in ["forward", "backward", "bidirectional"]
+
+  with tf.variable_scope(name):
+    if direction in ["forward", "bidirectional"]:
+      forward = rnn_func(
+          inp=inp,
+          length=length,
+          backward=False,
+          name="forward",
+          *args,
+          **kwargs)
+      if isinstance(forward, tuple):
+        # lstm_layer returns a tuple (output, memory). We only need the first
+        # element.
+        forward = forward[0]
+    if direction in ["backward", "bidirectional"]:
+      backward = rnn_func(
+          inp=inp,
+          length=length,
+          backward=True,
+          name="backward",
+          *args,
+          **kwargs)
+      if isinstance(backward, tuple):
+        # lstm_layer returns a tuple (output, memory). We only need the first
+        # element.
+        backward = backward[0]
+    if direction == "forward":
+      out = forward
+    elif direction == "backward":
+      out = backward
+    else:
+      out = tf.concat(2, [forward, backward])
+  return out
+
+
+@tf.RegisterShape("VariableLSTM")
+def _variable_lstm_shape(op):
+  """Shape function for the VariableLSTM op."""
+  input_shape = op.inputs[0].get_shape().with_rank(4)
+  state_shape = op.inputs[1].get_shape().with_rank(2)
+  memory_shape = op.inputs[2].get_shape().with_rank(2)
+  w_m_m_shape = op.inputs[3].get_shape().with_rank(3)
+  batch_size = input_shape[0].merge_with(state_shape[0])
+  batch_size = input_shape[0].merge_with(memory_shape[0])
+  seq_len = input_shape[1]
+  gate_num = input_shape[2].merge_with(w_m_m_shape[1])
+  output_dim = input_shape[3].merge_with(state_shape[1])
+  output_dim = output_dim.merge_with(memory_shape[1])
+  output_dim = output_dim.merge_with(w_m_m_shape[0])
+  output_dim = output_dim.merge_with(w_m_m_shape[2])
+  return [[batch_size, seq_len, output_dim],
+          [batch_size, seq_len, gate_num, output_dim],
+          [batch_size, seq_len, output_dim]]
+
+
+@tf.RegisterGradient("VariableLSTM")
+def _variable_lstm_grad(op, act_grad, gate_grad, mem_grad):
+  """Gradient function for the VariableLSTM op."""
+  initial_state = op.inputs[1]
+  initial_memory = op.inputs[2]
+  w_m_m = op.inputs[3]
+  act = op.outputs[0]
+  gate_raw_act = op.outputs[1]
+  memory = op.outputs[2]
+  return rnn.variable_lstm_grad(initial_state, initial_memory, w_m_m, act,
+                                gate_raw_act, memory, act_grad, gate_grad,
+                                mem_grad)
+
+
+def lstm_layer(inp,
+               length=None,
+               state=None,
+               memory=None,
+               num_nodes=None,
+               backward=False,
+               clip=50.0,
+               reg_func=tf.nn.l2_loss,
+               weight_reg=False,
+               weight_collection="LSTMWeights",
+               bias_reg=False,
+               stddev=None,
+               seed=None,
+               decode=False,
+               use_native_weights=False,
+               name=None):
+  """Adds ops for an LSTM layer.
+
+  This adds ops for the following operations:
+
+    input => (forward-LSTM|backward-LSTM) => output
+
+  The direction of the LSTM is determined by `backward`. If it is false, the
+  forward LSTM is used, the backward one otherwise.
+
+  Args:
+    inp: A 3-D tensor of shape [`batch_size`, `max_length`, `feature_dim`].
+    length: A 1-D tensor of shape [`batch_size`] and type int64. Each element
+            represents the length of the corresponding sequence in `inp`.
+    state: If specified, uses it as the initial state.
+    memory: If specified, uses it as the initial memory.
+    num_nodes: The number of LSTM cells.
+    backward: If true, reverses the `inp` before adding the ops. The output is
+              also reversed so that the direction is the same as `inp`.
+    clip: Value used to clip the cell values.
+    reg_func: Function used for the weight regularization such as
+              `tf.nn.l2_loss`.
+    weight_reg: If true, regularize the filter weights with `reg_func`.
+    weight_collection: Collection to add the weights to for regularization.
+    bias_reg: If true, regularize the bias vector with `reg_func`.
+    stddev: Standard deviation used to initialize the variables.
+    seed: Seed used to initialize the variables.
+    decode: If true, does not add ops which are not used for inference.
+    use_native_weights: If true, uses weights in the same format as the native
+                        implementations.
+    name: Name of the op.
+
+  Returns:
+    A 3-D tensor of shape [`batch_size`, `max_length`, `num_nodes`].
+  """
+  with tf.variable_scope(name):
+    if backward:
+      if length is None:
+        inp = tf.reverse(inp, [False, True, False])
+      else:
+        inp = tf.reverse_sequence(inp, length, 1, 0)
+
+    num_prev = inp.get_shape()[2]
+    if stddev:
+      initializer = tf.truncated_normal_initializer(stddev=stddev, seed=seed)
+    else:
+      initializer = tf.uniform_unit_scaling_initializer(seed=seed)
+
+    if use_native_weights:
+      with tf.variable_scope("LSTMCell"):
+        w = tf.get_variable(
+            "W_0",
+            shape=[num_prev + num_nodes, 4 * num_nodes],
+            initializer=initializer,
+            dtype=tf.float32)
+        w_i_m = tf.slice(w, [0, 0], [num_prev, 4 * num_nodes], name="w_i_m")
+        w_m_m = tf.reshape(
+            tf.slice(w, [num_prev, 0], [num_nodes, 4 * num_nodes]),
+            [num_nodes, 4, num_nodes],
+            name="w_m_m")
+    else:
+      w_i_m = tf.get_variable("w_i_m", [num_prev, 4 * num_nodes],
+                              initializer=initializer)
+      w_m_m = tf.get_variable("w_m_m", [num_nodes, 4, num_nodes],
+                              initializer=initializer)
+
+    if not decode and weight_reg:
+      tf.add_to_collection(weight_collection, reg_func(w_i_m, name="w_i_m_reg"))
+      tf.add_to_collection(weight_collection, reg_func(w_m_m, name="w_m_m_reg"))
+
+    batch_size = shapes.tensor_dim(inp, dim=0)
+    num_frames = shapes.tensor_dim(inp, dim=1)
+    prev = tf.reshape(inp, tf.pack([batch_size * num_frames, num_prev]))
+
+    if use_native_weights:
+      with tf.variable_scope("LSTMCell"):
+        b = tf.get_variable(
+            "B",
+            shape=[4 * num_nodes],
+            initializer=tf.zeros_initializer,
+            dtype=tf.float32)
+      biases = tf.identity(b, name="biases")
+    else:
+      biases = tf.get_variable(
+          "biases", [4 * num_nodes], initializer=tf.constant_initializer(0.0))
+    if not decode and bias_reg:
+      tf.add_to_collection(
+          weight_collection, reg_func(
+              biases, name="biases_reg"))
+    prev = tf.nn.xw_plus_b(prev, w_i_m, biases)
+
+    prev = tf.reshape(prev, tf.pack([batch_size, num_frames, 4, num_nodes]))
+    if state is None:
+      state = tf.fill(tf.pack([batch_size, num_nodes]), 0.0)
+    if memory is None:
+      memory = tf.fill(tf.pack([batch_size, num_nodes]), 0.0)
+
+    out, _, mem = rnn.variable_lstm(prev, state, memory, w_m_m, clip=clip)
+
+    if backward:
+      if length is None:
+        out = tf.reverse(out, [False, True, False])
+      else:
+        out = tf.reverse_sequence(out, length, 1, 0)
+
+  return out, mem

+ 216 - 0
street/python/shapes.py

@@ -0,0 +1,216 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Shape manipulation functions.
+
+rotate_dimensions: prepares for a rotating transpose by returning a rotated
+  list of dimension indices.
+transposing_reshape: allows a dimension to be factorized, with one of the pieces
+  transferred to another dimension, or to transpose factors within a single
+  dimension.
+tensor_dim: gets a shape dimension as a constant integer if known otherwise a
+  runtime usable tensor value.
+tensor_shape: returns the full shape of a tensor as the tensor_dim.
+"""
+import tensorflow as tf
+
+
+def rotate_dimensions(num_dims, src_dim, dest_dim):
+  """Returns a list of dimension indices that will rotate src_dim to dest_dim.
+
+  src_dim is moved to dest_dim, with all intervening dimensions shifted towards
+  the hole left by src_dim. Eg:
+  num_dims = 4, src_dim=3, dest_dim=1
+  Returned list=[0, 3, 1, 2]
+  For a tensor with dims=[5, 4, 3, 2] a transpose would yield [5, 2, 4, 3].
+  Args:
+    num_dims: The number of dimensions to handle.
+    src_dim:  The dimension to move.
+    dest_dim: The dimension to move src_dim to.
+
+  Returns:
+    A list of rotated dimension indices.
+  """
+  # List of dimensions for transpose.
+  dim_list = range(num_dims)
+  # Shuffle src_dim to dest_dim by swapping to shuffle up the other dims.
+  step = 1 if dest_dim > src_dim else -1
+  for x in xrange(src_dim, dest_dim, step):
+    dim_list[x], dim_list[x + step] = dim_list[x + step], dim_list[x]
+  return dim_list
+
+
+def transposing_reshape(tensor,
+                        src_dim,
+                        part_a,
+                        part_b,
+                        dest_dim_a,
+                        dest_dim_b,
+                        name=None):
+  """Splits src_dim and sends one of the pieces to another dim.
+
+  Terminology:
+  A matrix is often described as 'row-major' or 'column-major', which doesn't
+  help if you can't remember which is the row index and which is the column,
+  even if you know what 'major' means, so here is a simpler explanation of it:
+  When TF stores a tensor of size [d0, d1, d2, d3] indexed by [i0, i1, i2, i3],
+  the memory address of an element is calculated using:
+  ((i0 * d1 + i1) * d2 + i2) * d3 + i3, so, d0 is the MOST SIGNIFICANT dimension
+  and d3 the LEAST SIGNIFICANT, just like in the decimal number 1234, 1 is the
+  most significant digit and 4 the least significant. In both cases the most
+  significant is multiplied by the largest number to determine its 'value'.
+  Furthermore, if we reshape the tensor to [d0'=d0, d1'=d1 x d2, d2'=d3], then
+  the MOST SIGNIFICANT part of d1' is d1 and the LEAST SIGNIFICANT part of d1'
+  is d2.
+
+  Action:
+  transposing_reshape splits src_dim into factors [part_a, part_b], and sends
+  the most significant part (of size  part_a) to be the most significant part of
+  dest_dim_a*(Exception: see NOTE 2), and the least significant part (of size
+  part_b) to be the most significant part of dest_dim_b.
+  This is basically a combination of reshape, rotating transpose, reshape.
+  NOTE1: At least one of dest_dim_a and dest_dim_b must equal src_dim, ie one of
+  the parts always stays put, so src_dim is never totally destroyed and the
+  output number of dimensions is always the same as the input.
+  NOTE2: If dest_dim_a == dest_dim_b == src_dim, then parts a and b are simply
+  transposed within src_dim to become part_b x part_a, so the most significant
+  part becomes the least significant part and vice versa. Thus if you really
+  wanted to make one of the parts the least significant side of the destiantion,
+  the destination dimension can be internally transposed with a second call to
+  transposing_reshape.
+  NOTE3: One of part_a and part_b may be -1 to allow src_dim to be of unknown
+  size with one known-size factor. Otherwise part_a * part_b must equal the size
+  of src_dim.
+  NOTE4: The reshape preserves as many known-at-graph-build-time dimension sizes
+  as are available.
+
+  Example:
+  Input dims=[5, 2, 6, 2]
+  tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
+           [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
+          [[[24, 25]...
+  src_dim=2, part_a=2, part_b=3, dest_dim_a=3, dest_dim_b=2
+  output dims =[5, 2, 3, 4]
+  output tensor=[[[[0, 1, 6, 7][2, 3, 8, 9][4, 5, 10, 11]]
+                  [[12, 13, 18, 19][14, 15, 20, 21][16, 17, 22, 23]]]
+                 [[[24, 26, 28]...
+  Example2:
+  Input dims=[phrases, words, letters]=[2, 6, x]
+  tensor=[[[the][cat][sat][on][the][mat]]
+         [[a][stitch][in][time][saves][nine]]]
+  We can factorize the 6 words into 3x2 = [[the][cat]][[sat][on]][[the][mat]]
+  or 2x3=[[the][cat][sat]][[on][the][mat]] and
+  src_dim=1, part_a=3, part_b=2, dest_dim_a=1, dest_dim_b=1
+  would yield:
+  [[[the][sat][the][cat][on][mat]]
+   [[a][in][saves][stitch][time][nine]]], but
+  src_dim=1, part_a=2, part_b=3, dest_dim_a=1, dest_dim_b=1
+  would yield:
+  [[[the][on][cat][the][sat][mat]]
+   [[a][time][stitch][saves][in][nine]]], and
+  src_dim=1, part_a=2, part_b=3, dest_dim_a=0, dest_dim_b=1
+  would yield:
+  [[[the][cat][sat]]
+   [[a][stitch][in]]
+   [[on][the][mat]]
+   [[time][saves][nine]]]
+  Now remember that the words above represent any least-significant subset of
+  the input dimensions.
+
+  Args:
+    tensor:     A tensor to reshape.
+    src_dim:    The dimension to split.
+    part_a:     The first factor of the split.
+    part_b:     The second factor of the split.
+    dest_dim_a: The dimension to move part_a of src_dim to.
+    dest_dim_b: The dimension to move part_b of src_dim to.
+    name:       Optional base name for all the ops.
+
+  Returns:
+    Reshaped tensor.
+
+  Raises:
+    ValueError: If the args are invalid.
+  """
+  if dest_dim_a != src_dim and dest_dim_b != src_dim:
+    raise ValueError(
+        'At least one of dest_dim_a, dest_dim_b must equal src_dim!')
+  if part_a == 0 or part_b == 0:
+    raise ValueError('Zero not allowed for part_a or part_b!')
+  if part_a < 0 and part_b < 0:
+    raise ValueError('At least one of part_a and part_b must be positive!')
+  if not name:
+    name = 'transposing_reshape'
+  prev_shape = tensor_shape(tensor)
+  expanded = tf.reshape(
+      tensor,
+      prev_shape[:src_dim] + [part_a, part_b] + prev_shape[src_dim + 1:],
+      name=name + '_reshape_in')
+  dest = dest_dim_b
+  if dest_dim_a != src_dim:
+    # We are just moving part_a to dest_dim_a.
+    dest = dest_dim_a
+  else:
+    # We are moving part_b to dest_dim_b.
+    src_dim += 1
+  dim_list = rotate_dimensions(len(expanded.get_shape()), src_dim, dest)
+  expanded = tf.transpose(expanded, dim_list, name=name + '_rot_transpose')
+  # Reshape identity except dest,dest+1, which get merged.
+  ex_shape = tensor_shape(expanded)
+  combined = ex_shape[dest] * ex_shape[dest + 1]
+  return tf.reshape(
+      expanded,
+      ex_shape[:dest] + [combined] + ex_shape[dest + 2:],
+      name=name + '_reshape_out')
+
+
+def tensor_dim(tensor, dim):
+  """Returns int dimension if known at a graph build time else a tensor.
+
+  If the size of the dim of tensor is known at graph building time, then that
+  known value is returned, otherwise (instead of None), a Tensor that will give
+  the size of the dimension when the graph is run. The return value will be
+  accepted by tf.reshape in multiple (or even all) dimensions, even when the
+  sizes are not known at graph building time, unlike -1, which can only be used
+  in one dimension. It is a bad idea to use tf.shape all the time, as some ops
+  demand a known (at graph build time) size. This function therefore returns
+  the best available, most useful dimension size.
+  Args:
+    tensor: Input tensor.
+    dim:    Dimension to find the size of.
+
+  Returns:
+    An integer if shape is known at build time, otherwise a tensor of int32.
+  """
+  result = tensor.get_shape().as_list()[dim]
+  if result is None:
+    result = tf.shape(tensor)[dim]
+  return result
+
+
+def tensor_shape(tensor):
+  """Returns a heterogeneous list of tensor_dim for the tensor.
+
+  See tensor_dim for a more detailed explanation.
+  Args:
+    tensor: Input tensor.
+
+  Returns:
+    A heterogeneous list of integers and int32 tensors.
+  """
+  result = []
+  for d in xrange(len(tensor.get_shape())):
+    result.append(tensor_dim(tensor, d))
+  return result

+ 171 - 0
street/python/shapes_test.py

@@ -0,0 +1,171 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for shapes."""
+
+import numpy as np
+import tensorflow as tf
+import shapes
+
+
+def _rand(*size):
+  return np.random.uniform(size=size).astype('f')
+
+
+class ShapesTest(tf.test.TestCase):
+  """Tests just the shapes from a call to transposing_reshape."""
+
+  def __init__(self, other):
+    super(ShapesTest, self).__init__(other)
+    self.batch_size = 4
+    self.im_height = 24
+    self.im_width = 36
+    self.depth = 20
+
+  def testReshapeTile(self):
+    """Tests that a tiled input can be reshaped to the batch dimension."""
+    fake = tf.placeholder(
+        tf.float32, shape=(None, None, None, self.depth), name='inputs')
+    real = _rand(self.batch_size, self.im_height, self.im_width, self.depth)
+    with self.test_session() as sess:
+      outputs = shapes.transposing_reshape(
+          fake, src_dim=2, part_a=3, part_b=-1, dest_dim_a=0, dest_dim_b=2)
+      res_image = sess.run([outputs], feed_dict={fake: real})
+      self.assertEqual(
+          tuple(res_image[0].shape),
+          (self.batch_size * 3, self.im_height, self.im_width / 3, self.depth))
+
+  def testReshapeDepth(self):
+    """Tests that depth can be reshaped to the x dimension."""
+    fake = tf.placeholder(
+        tf.float32, shape=(None, None, None, self.depth), name='inputs')
+    real = _rand(self.batch_size, self.im_height, self.im_width, self.depth)
+    with self.test_session() as sess:
+      outputs = shapes.transposing_reshape(
+          fake, src_dim=3, part_a=4, part_b=-1, dest_dim_a=2, dest_dim_b=3)
+      res_image = sess.run([outputs], feed_dict={fake: real})
+      self.assertEqual(
+          tuple(res_image[0].shape),
+          (self.batch_size, self.im_height, self.im_width * 4, self.depth / 4))
+
+
+class DataTest(tf.test.TestCase):
+  """Tests that the data is moved correctly in a call to transposing_reshape.
+
+  """
+
+  def testTransposingReshape_2_2_3_2_1(self):
+    """Case: dest_a == src, dest_b < src: Split with Least sig part going left.
+    """
+    with self.test_session() as sess:
+      fake = tf.placeholder(
+          tf.float32, shape=(None, None, None, 2), name='inputs')
+      outputs = shapes.transposing_reshape(
+          fake, src_dim=2, part_a=2, part_b=3, dest_dim_a=2, dest_dim_b=1)
+      # Make real inputs. The tensor looks like this:
+      # tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
+      #          [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
+      #         [[[24, 25]...
+      real = np.arange(120).reshape((5, 2, 6, 2))
+      np_array = sess.run([outputs], feed_dict={fake: real})[0]
+      self.assertEqual(tuple(np_array.shape), (5, 6, 2, 2))
+      self.assertAllEqual(np_array[0, :, :, :],
+                          [[[0, 1], [6, 7]], [[12, 13], [18, 19]],
+                           [[2, 3], [8, 9]], [[14, 15], [20, 21]],
+                           [[4, 5], [10, 11]], [[16, 17], [22, 23]]])
+
+  def testTransposingReshape_2_2_3_2_3(self):
+    """Case: dest_a == src, dest_b > src: Split with Least sig part going right.
+    """
+    with self.test_session() as sess:
+      fake = tf.placeholder(
+          tf.float32, shape=(None, None, None, 2), name='inputs')
+      outputs = shapes.transposing_reshape(
+          fake, src_dim=2, part_a=2, part_b=3, dest_dim_a=2, dest_dim_b=3)
+      # Make real inputs. The tensor looks like this:
+      # tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
+      #          [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
+      #         [[[24, 25]...
+      real = np.arange(120).reshape((5, 2, 6, 2))
+      np_array = sess.run([outputs], feed_dict={fake: real})[0]
+      self.assertEqual(tuple(np_array.shape), (5, 2, 2, 6))
+      self.assertAllEqual(
+          np_array[0, :, :, :],
+          [[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11]],
+           [[12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]]])
+
+  def testTransposingReshape_2_2_3_2_2(self):
+    """Case: dest_a == src, dest_b == src. Transpose within dimension 2.
+    """
+    with self.test_session() as sess:
+      fake = tf.placeholder(
+          tf.float32, shape=(None, None, None, 2), name='inputs')
+      outputs = shapes.transposing_reshape(
+          fake, src_dim=2, part_a=2, part_b=3, dest_dim_a=2, dest_dim_b=2)
+      # Make real inputs. The tensor looks like this:
+      # tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
+      #          [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
+      #         [[[24, 25]...
+      real = np.arange(120).reshape((5, 2, 6, 2))
+      np_array = sess.run([outputs], feed_dict={fake: real})[0]
+      self.assertEqual(tuple(np_array.shape), (5, 2, 6, 2))
+      self.assertAllEqual(
+          np_array[0, :, :, :],
+          [[[0, 1], [6, 7], [2, 3], [8, 9], [4, 5], [10, 11]],
+           [[12, 13], [18, 19], [14, 15], [20, 21], [16, 17], [22, 23]]])
+
+  def testTransposingReshape_2_2_3_1_2(self):
+    """Case: dest_a < src, dest_b == src. Split with Most sig part going left.
+    """
+    with self.test_session() as sess:
+      fake = tf.placeholder(
+          tf.float32, shape=(None, None, None, 2), name='inputs')
+      outputs = shapes.transposing_reshape(
+          fake, src_dim=2, part_a=2, part_b=3, dest_dim_a=1, dest_dim_b=2)
+      # Make real inputs. The tensor looks like this:
+      # tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
+      #          [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
+      #         [[[24, 25]...
+      real = np.arange(120).reshape((5, 2, 6, 2))
+      np_array = sess.run([outputs], feed_dict={fake: real})[0]
+      self.assertEqual(tuple(np_array.shape), (5, 4, 3, 2))
+      self.assertAllEqual(np_array[0, :, :, :],
+                          [[[0, 1], [2, 3], [4, 5]],
+                           [[12, 13], [14, 15], [16, 17]],
+                           [[6, 7], [8, 9], [10, 11]],
+                           [[18, 19], [20, 21], [22, 23]]])
+
+  def testTransposingReshape_2_2_3_3_2(self):
+    """Case: dest_a < src, dest_b == src. Split with Most sig part going right.
+    """
+    with self.test_session() as sess:
+      fake = tf.placeholder(
+          tf.float32, shape=(None, None, None, 2), name='inputs')
+      outputs = shapes.transposing_reshape(
+          fake, src_dim=2, part_a=2, part_b=3, dest_dim_a=3, dest_dim_b=2)
+      # Make real inputs. The tensor looks like this:
+      # tensor=[[[[0, 1][2, 3][4, 5][6, 7][8, 9][10, 11]]
+      #          [[12, 13][14, 15][16, 17][18, 19][20, 21][22, 23]]
+      #         [[[24, 25]...
+      real = np.arange(120).reshape((5, 2, 6, 2))
+      np_array = sess.run([outputs], feed_dict={fake: real})[0]
+      self.assertEqual(tuple(np_array.shape), (5, 2, 3, 4))
+      self.assertAllEqual(
+          np_array[0, :, :, :],
+          [[[0, 1, 6, 7], [2, 3, 8, 9], [4, 5, 10, 11]],
+           [[12, 13, 18, 19], [14, 15, 20, 21], [16, 17, 22, 23]]])
+
+
+if __name__ == '__main__':
+  tf.test.main()

+ 49 - 0
street/python/vgsl_eval.py

@@ -0,0 +1,49 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Model eval separate from training."""
+from tensorflow import app
+from tensorflow.python.platform import flags
+
+import vgsl_model
+
+flags.DEFINE_string('eval_dir', '/tmp/mdir/eval',
+                    'Directory where to write event logs.')
+flags.DEFINE_string('graph_def_file', None,
+                    'Output eval graph definition file.')
+flags.DEFINE_string('train_dir', '/tmp/mdir',
+                    'Directory where to find training checkpoints.')
+flags.DEFINE_string('model_str',
+                    '1,150,600,3[S2(4x150)0,2 Ct5,5,16 Mp2,2 Ct5,5,64 Mp3,3'
+                    '([Lrys64 Lbx128][Lbys64 Lbx128][Lfys64 Lbx128])S3(3x0)2,3'
+                    'Lfx128 Lrx128 S0(1x4)0,3 Do Lfx256]O1c134',
+                    'Network description.')
+flags.DEFINE_integer('num_steps', 1000, 'Number of steps to run evaluation.')
+flags.DEFINE_integer('eval_interval_secs', 60,
+                     'Time interval between eval runs.')
+flags.DEFINE_string('eval_data', None, 'Evaluation data filepattern')
+flags.DEFINE_string('decoder', None, 'Charset decoder')
+
+FLAGS = flags.FLAGS
+
+
+def main(argv):
+  del argv
+  vgsl_model.Eval(FLAGS.train_dir, FLAGS.eval_dir, FLAGS.model_str,
+                  FLAGS.eval_data, FLAGS.decoder, FLAGS.num_steps,
+                  FLAGS.graph_def_file, FLAGS.eval_interval_secs)
+
+
+if __name__ == '__main__':
+  app.run()

+ 150 - 0
street/python/vgsl_input.py

@@ -0,0 +1,150 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""String network description language to define network layouts."""
+import collections
+import tensorflow as tf
+from tensorflow.python.ops import parsing_ops
+
+# Named tuple for the standard tf image tensor Shape.
+# batch_size:     Number of images to batch-up for training.
+# height:         Fixed height of image or None for variable.
+# width:          Fixed width of image or None for variable.
+# depth:          Desired depth in bytes per pixel of input images.
+ImageShape = collections.namedtuple('ImageTensorDims',
+                                    ['batch_size', 'height', 'width', 'depth'])
+
+
+def ImageInput(input_pattern, num_threads, shape, using_ctc, reader=None):
+  """Creates an input image tensor from the input_pattern filenames.
+
+  TODO(rays) Expand for 2-d labels, 0-d labels, and logistic targets.
+  Args:
+    input_pattern:  Filenames of the dataset(s) to read.
+    num_threads:    Number of preprocessing threads.
+    shape:          ImageShape with the desired shape of the input.
+    using_ctc:      Take the unpadded_class labels instead of padded.
+    reader:         Function that returns an actual reader to read Examples from
+      input files. If None, uses tf.TFRecordReader().
+  Returns:
+    images:   Float Tensor containing the input image scaled to [-1.28, 1.27].
+    heights:  Tensor int64 containing the heights of the images.
+    widths:   Tensor int64 containing the widths of the images.
+    labels:   Serialized SparseTensor containing the int64 labels.
+    sparse_labels:   Serialized SparseTensor containing the int64 labels.
+    truths:   Tensor string of the utf8 truth texts.
+  Raises:
+    ValueError: if the optimizer type is unrecognized.
+  """
+  data_files = tf.gfile.Glob(input_pattern)
+  assert data_files, 'no files found for dataset ' + input_pattern
+  queue_capacity = shape.batch_size * num_threads * 2
+  filename_queue = tf.train.string_input_producer(
+      data_files, capacity=queue_capacity)
+
+  # Create a subgraph with its own reader (but sharing the
+  # filename_queue) for each preprocessing thread.
+  images_and_label_lists = []
+  for _ in range(num_threads):
+    image, height, width, labels, text = _ReadExamples(filename_queue, shape,
+                                                       using_ctc, reader)
+    images_and_label_lists.append([image, height, width, labels, text])
+  # Create a queue that produces the examples in batches.
+  images, heights, widths, labels, truths = tf.train.batch_join(
+      images_and_label_lists,
+      batch_size=shape.batch_size,
+      capacity=16 * shape.batch_size,
+      dynamic_pad=True)
+  # Deserialize back to sparse, because the batcher doesn't do sparse.
+  labels = tf.deserialize_many_sparse(labels, tf.int64)
+  sparse_labels = tf.cast(labels, tf.int32)
+  labels = tf.sparse_tensor_to_dense(labels)
+  labels = tf.reshape(labels, [shape.batch_size, -1], name='Labels')
+  # Crush the other shapes to just the batch dimension.
+  heights = tf.reshape(heights, [-1], name='Heights')
+  widths = tf.reshape(widths, [-1], name='Widths')
+  truths = tf.reshape(truths, [-1], name='Truths')
+  # Give the images a nice name as well.
+  images = tf.identity(images, name='Images')
+
+  tf.image_summary('Images', images)
+  return images, heights, widths, labels, sparse_labels, truths
+
+
+def _ReadExamples(filename_queue, shape, using_ctc, reader=None):
+  """Builds network input tensor ops for TF Example.
+
+  Args:
+    filename_queue: Queue of filenames, from tf.train.string_input_producer
+    shape:          ImageShape with the desired shape of the input.
+    using_ctc:      Take the unpadded_class labels instead of padded.
+    reader:         Function that returns an actual reader to read Examples from
+      input files. If None, uses tf.TFRecordReader().
+  Returns:
+    image:   Float Tensor containing the input image scaled to [-1.28, 1.27].
+    height:  Tensor int64 containing the height of the image.
+    width:   Tensor int64 containing the width of the image.
+    labels:  Serialized SparseTensor containing the int64 labels.
+    text:    Tensor string of the utf8 truth text.
+  """
+  if reader:
+    reader = reader()
+  else:
+    reader = tf.TFRecordReader()
+  _, example_serialized = reader.read(filename_queue)
+  example_serialized = tf.reshape(example_serialized, shape=[])
+  features = tf.parse_single_example(
+      example_serialized,
+      {'image/encoded': parsing_ops.FixedLenFeature(
+          [1], dtype=tf.string, default_value=''),
+       'image/text': parsing_ops.FixedLenFeature(
+           [1], dtype=tf.string, default_value=''),
+       'image/class': parsing_ops.VarLenFeature(dtype=tf.int64),
+       'image/unpadded_class': parsing_ops.VarLenFeature(dtype=tf.int64),
+       'image/height': parsing_ops.FixedLenFeature(
+           [1], dtype=tf.int64, default_value=1),
+       'image/width': parsing_ops.FixedLenFeature(
+           [1], dtype=tf.int64, default_value=1)})
+  if using_ctc:
+    labels = features['image/unpadded_class']
+  else:
+    labels = features['image/class']
+  labels = tf.serialize_sparse(labels)
+  image = tf.reshape(features['image/encoded'], shape=[], name='encoded')
+  image = _ImageProcessing(image, shape)
+  height = tf.reshape(features['image/height'], [-1])
+  width = tf.reshape(features['image/width'], [-1])
+  text = tf.reshape(features['image/text'], shape=[])
+
+  return image, height, width, labels, text
+
+
+def _ImageProcessing(image_buffer, shape):
+  """Convert a PNG string into an input tensor.
+
+  We allow for fixed and variable sizes.
+  Does fixed conversion to floats in the range [-1.28, 1.27].
+  Args:
+    image_buffer: Tensor containing a PNG encoded image.
+    shape:          ImageShape with the desired shape of the input.
+  Returns:
+    image:        Decoded, normalized image in the range [-1.28, 1.27].
+  """
+  image = tf.image.decode_png(image_buffer, channels=shape.depth)
+  image.set_shape([shape.height, shape.width, shape.depth])
+  image = tf.cast(image, tf.float32)
+  image = tf.sub(image, 128.0)
+  image = tf.mul(image, 1 / 100.0)
+  return image

+ 599 - 0
street/python/vgsl_model.py

@@ -0,0 +1,599 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""String network description language to define network layouts."""
+import re
+import time
+
+import decoder
+import errorcounter as ec
+import shapes
+import tensorflow as tf
+import vgsl_input
+import vgslspecs
+import tensorflow.contrib.slim as slim
+from tensorflow.core.framework import summary_pb2
+from tensorflow.python.platform import tf_logging as logging
+
+
+# Parameters for rate decay.
+# We divide the learning_rate_halflife by DECAY_STEPS_FACTOR and use DECAY_RATE
+# as the decay factor for the learning rate, ie we use the DECAY_STEPS_FACTORth
+# root of 2 as the decay rate every halflife/DECAY_STEPS_FACTOR to achieve the
+# desired halflife.
+DECAY_STEPS_FACTOR = 16
+DECAY_RATE = pow(0.5, 1.0 / DECAY_STEPS_FACTOR)
+
+
+def Train(train_dir,
+          model_str,
+          train_data,
+          max_steps,
+          master='',
+          task=0,
+          ps_tasks=0,
+          initial_learning_rate=0.001,
+          final_learning_rate=0.001,
+          learning_rate_halflife=160000,
+          optimizer_type='Adam',
+          num_preprocess_threads=1,
+          reader=None):
+  """Testable trainer with no dependence on FLAGS.
+
+  Args:
+    train_dir: Directory to write checkpoints.
+    model_str: Network specification string.
+    train_data: Training data file pattern.
+    max_steps: Number of training steps to run.
+    master: Name of the TensorFlow master to use.
+    task: Task id of this replica running the training. (0 will be master).
+    ps_tasks: Number of tasks in ps job, or 0 if no ps job.
+    initial_learning_rate: Learing rate at start of training.
+    final_learning_rate: Asymptotic minimum learning rate.
+    learning_rate_halflife: Number of steps over which to halve the difference
+      between initial and final learning rate.
+    optimizer_type: One of 'GradientDescent', 'AdaGrad', 'Momentum', 'Adam'.
+    num_preprocess_threads: Number of input threads.
+    reader: Function that returns an actual reader to read Examples from input
+      files. If None, uses tf.TFRecordReader().
+  """
+  if master.startswith('local'):
+    device = tf.ReplicaDeviceSetter(ps_tasks)
+  else:
+    device = '/cpu:0'
+  with tf.Graph().as_default():
+    with tf.device(device):
+      model = InitNetwork(train_data, model_str, 'train', initial_learning_rate,
+                          final_learning_rate, learning_rate_halflife,
+                          optimizer_type, num_preprocess_threads, reader)
+
+      # Create a Supervisor.  It will take care of initialization, summaries,
+      # checkpoints, and recovery.
+      #
+      # When multiple replicas of this program are running, the first one,
+      # identified by --task=0 is the 'chief' supervisor.  It is the only one
+      # that takes case of initialization, etc.
+      sv = tf.train.Supervisor(
+          logdir=train_dir,
+          is_chief=(task == 0),
+          saver=model.saver,
+          save_summaries_secs=10,
+          save_model_secs=30,
+          recovery_wait_secs=5)
+
+      step = 0
+      while step < max_steps:
+        try:
+          # Get an initialized, and possibly recovered session.  Launch the
+          # services: Checkpointing, Summaries, step counting.
+          with sv.managed_session(master) as sess:
+            while step < max_steps:
+              _, step = model.TrainAStep(sess)
+              if sv.coord.should_stop():
+                break
+        except tf.errors.AbortedError as e:
+          logging.error('Received error:%s', e)
+          continue
+
+
+def Eval(train_dir,
+         eval_dir,
+         model_str,
+         eval_data,
+         decoder_file,
+         num_steps,
+         graph_def_file=None,
+         eval_interval_secs=0,
+         reader=None):
+  """Restores a model from a checkpoint and evaluates it.
+
+  Args:
+    train_dir: Directory to find checkpoints.
+    eval_dir: Directory to write summary events.
+    model_str: Network specification string.
+    eval_data: Evaluation data file pattern.
+    decoder_file: File to read to decode the labels.
+    num_steps: Number of eval steps to run.
+    graph_def_file: File to write graph definition to for freezing.
+    eval_interval_secs: How often to run evaluations, or once if 0.
+    reader: Function that returns an actual reader to read Examples from input
+      files. If None, uses tf.TFRecordReader().
+  Returns:
+    (char error rate, word recall error rate, sequence error rate) as percent.
+  Raises:
+    ValueError: If unimplemented feature is used.
+  """
+  decode = None
+  if decoder_file:
+    decode = decoder.Decoder(decoder_file)
+
+  # Run eval.
+  rates = ec.ErrorRates(
+      label_error=None,
+      word_recall_error=None,
+      word_precision_error=None,
+      sequence_error=None)
+  with tf.Graph().as_default():
+    model = InitNetwork(eval_data, model_str, 'eval', reader=reader)
+    sw = tf.train.SummaryWriter(eval_dir)
+
+    while True:
+      sess = tf.Session('')
+      if graph_def_file is not None:
+        # Write the eval version of the graph to a file for freezing.
+        if not tf.gfile.Exists(graph_def_file):
+          with tf.gfile.FastGFile(graph_def_file, 'w') as f:
+            f.write(
+                sess.graph.as_graph_def(add_shapes=True).SerializeToString())
+      ckpt = tf.train.get_checkpoint_state(train_dir)
+      if ckpt and ckpt.model_checkpoint_path:
+        step = model.Restore(ckpt.model_checkpoint_path, sess)
+        if decode:
+          rates = decode.SoftmaxEval(sess, model, num_steps)
+          _AddRateToSummary('Label error rate', rates.label_error, step, sw)
+          _AddRateToSummary('Word recall error rate', rates.word_recall_error,
+                            step, sw)
+          _AddRateToSummary('Word precision error rate',
+                            rates.word_precision_error, step, sw)
+          _AddRateToSummary('Sequence error rate', rates.sequence_error, step,
+                            sw)
+          sw.flush()
+          print 'Error rates=', rates
+        else:
+          raise ValueError('Non-softmax decoder evaluation not implemented!')
+      if eval_interval_secs:
+        time.sleep(eval_interval_secs)
+      else:
+        break
+  return rates
+
+
+def InitNetwork(input_pattern,
+                model_spec,
+                mode='eval',
+                initial_learning_rate=0.00005,
+                final_learning_rate=0.00005,
+                halflife=1600000,
+                optimizer_type='Adam',
+                num_preprocess_threads=1,
+                reader=None):
+  """Constructs a python tensor flow model defined by model_spec.
+
+  Args:
+    input_pattern: File pattern of the data in tfrecords of Example.
+    model_spec: Concatenation of input spec, model spec and output spec.
+      See Build below for input/output spec. For model spec, see vgslspecs.py
+    mode: One of 'train', 'eval'
+    initial_learning_rate: Initial learning rate for the network.
+    final_learning_rate: Final learning rate for the network.
+    halflife: Number of steps over which to halve the difference between
+              initial and final learning rate for the network.
+    optimizer_type: One of 'GradientDescent', 'AdaGrad', 'Momentum', 'Adam'.
+    num_preprocess_threads: Number of threads to use for image processing.
+    reader: Function that returns an actual reader to read Examples from input
+      files. If None, uses tf.TFRecordReader().
+    Eval tasks need only specify input_pattern and model_spec.
+
+  Returns:
+    A VGSLImageModel class.
+
+  Raises:
+    ValueError: if the model spec syntax is incorrect.
+  """
+  model = VGSLImageModel(mode, model_spec, initial_learning_rate,
+                         final_learning_rate, halflife)
+  left_bracket = model_spec.find('[')
+  right_bracket = model_spec.rfind(']')
+  if left_bracket < 0 or right_bracket < 0:
+    raise ValueError('Failed to find [] in model spec! ', model_spec)
+  input_spec = model_spec[:left_bracket]
+  layer_spec = model_spec[left_bracket:right_bracket + 1]
+  output_spec = model_spec[right_bracket + 1:]
+  model.Build(input_pattern, input_spec, layer_spec, output_spec,
+              optimizer_type, num_preprocess_threads, reader)
+  return model
+
+
+class VGSLImageModel(object):
+  """Class that builds a tensor flow model for training or evaluation.
+  """
+
+  def __init__(self, mode, model_spec, initial_learning_rate,
+               final_learning_rate, halflife):
+    """Constructs a VGSLImageModel.
+
+    Args:
+      mode:        One of "train", "eval"
+      model_spec:  Full model specification string, for reference only.
+      initial_learning_rate: Initial learning rate for the network.
+      final_learning_rate: Final learning rate for the network.
+      halflife: Number of steps over which to halve the difference between
+                initial and final learning rate for the network.
+    """
+    # The string that was used to build this model.
+    self.model_spec = model_spec
+    # The layers between input and output.
+    self.layers = None
+    # The train/eval mode.
+    self.mode = mode
+    # The initial learning rate.
+    self.initial_learning_rate = initial_learning_rate
+    self.final_learning_rate = final_learning_rate
+    self.decay_steps = halflife / DECAY_STEPS_FACTOR
+    self.decay_rate = DECAY_RATE
+    # Tensor for the labels.
+    self.labels = None
+    self.sparse_labels = None
+    # Debug data containing the truth text.
+    self.truths = None
+    # Tensor for loss
+    self.loss = None
+    # Train operation
+    self.train_op = None
+    # Tensor for the global step counter
+    self.global_step = None
+    # Tensor for the output predictions (usually softmax)
+    self.output = None
+    # True if we are using CTC training mode.
+    self.using_ctc = False
+    # Saver object to load or restore the variables.
+    self.saver = None
+
+  def Build(self, input_pattern, input_spec, model_spec, output_spec,
+            optimizer_type, num_preprocess_threads, reader):
+    """Builds the model from the separate input/layers/output spec strings.
+
+    Args:
+      input_pattern: File pattern of the data in tfrecords of TF Example format.
+      input_spec: Specification of the input layer:
+        batchsize,height,width,depth (4 comma-separated integers)
+          Training will run with batches of batchsize images, but runtime can
+          use any batch size.
+          height and/or width can be 0 or -1, indicating variable size,
+          otherwise all images must be the given size.
+          depth must be 1 or 3 to indicate greyscale or color.
+          NOTE 1-d image input, treating the y image dimension as depth, can
+          be achieved using S1(1x0)1,3 as the first op in the model_spec, but
+          the y-size of the input must then be fixed.
+      model_spec: Model definition. See vgslspecs.py
+      output_spec: Output layer definition:
+        O(2|1|0)(l|s|c)n output layer with n classes.
+          2 (heatmap) Output is a 2-d vector map of the input (possibly at
+            different scale).
+          1 (sequence) Output is a 1-d sequence of vector values.
+          0 (value) Output is a 0-d single vector value.
+          l uses a logistic non-linearity on the output, allowing multiple
+            hot elements in any output vector value.
+          s uses a softmax non-linearity, with one-hot output in each value.
+          c uses a softmax with CTC. Can only be used with s (sequence).
+          NOTE Only O1s and O1c are currently supported.
+      optimizer_type: One of 'GradientDescent', 'AdaGrad', 'Momentum', 'Adam'.
+      num_preprocess_threads: Number of threads to use for image processing.
+      reader: Function that returns an actual reader to read Examples from input
+        files. If None, uses tf.TFRecordReader().
+    """
+    self.global_step = tf.Variable(0, name='global_step', trainable=False)
+    shape = _ParseInputSpec(input_spec)
+    out_dims, out_func, num_classes = _ParseOutputSpec(output_spec)
+    self.using_ctc = out_func == 'c'
+    images, heights, widths, labels, sparse, _ = vgsl_input.ImageInput(
+        input_pattern, num_preprocess_threads, shape, self.using_ctc, reader)
+    self.labels = labels
+    self.sparse_labels = sparse
+    self.layers = vgslspecs.VGSLSpecs(widths, heights, self.mode == 'train')
+    last_layer = self.layers.Build(images, model_spec)
+    self._AddOutputs(last_layer, out_dims, out_func, num_classes)
+    if self.mode == 'train':
+      self._AddOptimizer(optimizer_type)
+
+    # For saving the model across training and evaluation
+    self.saver = tf.train.Saver()
+
+  def TrainAStep(self, sess):
+    """Runs a training step in the session.
+
+    Args:
+      sess: Session in which to train the model.
+    Returns:
+      loss, global_step.
+    """
+    _, loss, step = sess.run([self.train_op, self.loss, self.global_step])
+    return loss, step
+
+  def Restore(self, checkpoint_path, sess):
+    """Restores the model from the given checkpoint path into the session.
+
+    Args:
+      checkpoint_path: File pathname of the checkpoint.
+      sess:            Session in which to restore the model.
+    Returns:
+      global_step of the model.
+    """
+    self.saver.restore(sess, checkpoint_path)
+    return tf.train.global_step(sess, self.global_step)
+
+  def RunAStep(self, sess):
+    """Runs a step for eval in the session.
+
+    Args:
+      sess:            Session in which to run the model.
+    Returns:
+      output tensor result, labels tensor result.
+    """
+    return sess.run([self.output, self.labels])
+
+  def _AddOutputs(self, prev_layer, out_dims, out_func, num_classes):
+    """Adds the output layer and loss function.
+
+    Args:
+      prev_layer:  Output of last layer of main network.
+      out_dims:    Number of output dimensions, 0, 1 or 2.
+      out_func:    Output non-linearity. 's' or 'c'=softmax, 'l'=logistic.
+      num_classes: Number of outputs/size of last output dimension.
+    """
+    height_in = shapes.tensor_dim(prev_layer, dim=1)
+    logits, outputs = self._AddOutputLayer(prev_layer, out_dims, out_func,
+                                           num_classes)
+    if self.mode == 'train':
+      # Setup loss for training.
+      self.loss = self._AddLossFunction(logits, height_in, out_dims, out_func)
+      tf.scalar_summary('loss', self.loss, name='loss')
+    elif out_dims == 0:
+      # Be sure the labels match the output, even in eval mode.
+      self.labels = tf.slice(self.labels, [0, 0], [-1, 1])
+      self.labels = tf.reshape(self.labels, [-1])
+
+    logging.info('Final output=%s', outputs)
+    logging.info('Labels tensor=%s', self.labels)
+    self.output = outputs
+
+  def _AddOutputLayer(self, prev_layer, out_dims, out_func, num_classes):
+    """Add the fully-connected logits and SoftMax/Logistic output Layer.
+
+    Args:
+      prev_layer:  Output of last layer of main network.
+      out_dims:    Number of output dimensions, 0, 1 or 2.
+      out_func:    Output non-linearity. 's' or 'c'=softmax, 'l'=logistic.
+      num_classes: Number of outputs/size of last output dimension.
+
+    Returns:
+      logits:  Pre-softmax/logistic fully-connected output shaped to out_dims.
+      outputs: Post-softmax/logistic shaped to out_dims.
+
+    Raises:
+      ValueError: if syntax is incorrect.
+    """
+    # Reduce dimensionality appropriate to the output dimensions.
+    batch_in = shapes.tensor_dim(prev_layer, dim=0)
+    height_in = shapes.tensor_dim(prev_layer, dim=1)
+    width_in = shapes.tensor_dim(prev_layer, dim=2)
+    depth_in = shapes.tensor_dim(prev_layer, dim=3)
+    if out_dims:
+      # Combine any remaining height and width with batch and unpack after.
+      shaped = tf.reshape(prev_layer, [-1, depth_in])
+    else:
+      # Everything except batch goes to depth, and therefore has to be known.
+      shaped = tf.reshape(prev_layer, [-1, height_in * width_in * depth_in])
+    logits = slim.fully_connected(shaped, num_classes, activation_fn=None)
+    if out_func == 'l':
+      raise ValueError('Logistic not yet supported!')
+    else:
+      output = tf.nn.softmax(logits)
+    # Reshape to the dessired output.
+    if out_dims == 2:
+      output_shape = [batch_in, height_in, width_in, num_classes]
+    elif out_dims == 1:
+      output_shape = [batch_in, height_in * width_in, num_classes]
+    else:
+      output_shape = [batch_in, num_classes]
+    output = tf.reshape(output, output_shape, name='Output')
+    logits = tf.reshape(logits, output_shape)
+    return logits, output
+
+  def _AddLossFunction(self, logits, height_in, out_dims, out_func):
+    """Add the appropriate loss function.
+
+    Args:
+      logits:  Pre-softmax/logistic fully-connected output shaped to out_dims.
+      height_in:  Height of logits before going into the softmax layer.
+      out_dims:   Number of output dimensions, 0, 1 or 2.
+      out_func:   Output non-linearity. 's' or 'c'=softmax, 'l'=logistic.
+
+    Returns:
+      loss: That which is to be minimized.
+
+    Raises:
+      ValueError: if logistic is used.
+    """
+    if out_func == 'c':
+      # Transpose batch to the middle.
+      ctc_input = tf.transpose(logits, [1, 0, 2])
+      # Compute the widths of each batch element from the input widths.
+      widths = self.layers.GetLengths(dim=2, factor=height_in)
+      cross_entropy = tf.nn.ctc_loss(ctc_input, self.sparse_labels, widths)
+    elif out_func == 's':
+      if out_dims == 2:
+        self.labels = _PadLabels3d(logits, self.labels)
+      elif out_dims == 1:
+        self.labels = _PadLabels2d(
+            shapes.tensor_dim(
+                logits, dim=1), self.labels)
+      else:
+        self.labels = tf.slice(self.labels, [0, 0], [-1, 1])
+        self.labels = tf.reshape(self.labels, [-1])
+      cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
+          logits, self.labels, name='xent')
+    else:
+      # TODO(rays) Labels need an extra dimension for logistic, so different
+      # padding functions are needed, as well as a different loss function.
+      raise ValueError('Logistic not yet supported!')
+    return tf.reduce_sum(cross_entropy)
+
+  def _AddOptimizer(self, optimizer_type):
+    """Adds an optimizer with learning rate decay to minimize self.loss.
+
+    Args:
+      optimizer_type: One of 'GradientDescent', 'AdaGrad', 'Momentum', 'Adam'.
+    Raises:
+      ValueError: if the optimizer type is unrecognized.
+    """
+    learn_rate_delta = self.initial_learning_rate - self.final_learning_rate
+    learn_rate_dec = tf.add(
+        tf.train.exponential_decay(learn_rate_delta, self.global_step,
+                                   self.decay_steps, self.decay_rate),
+        self.final_learning_rate)
+    if optimizer_type == 'GradientDescent':
+      opt = tf.train.GradientDescentOptimizer(learn_rate_dec)
+    elif optimizer_type == 'AdaGrad':
+      opt = tf.train.AdagradOptimizer(learn_rate_dec)
+    elif optimizer_type == 'Momentum':
+      opt = tf.train.MomentumOptimizer(learn_rate_dec, momentum=0.9)
+    elif optimizer_type == 'Adam':
+      opt = tf.train.AdamOptimizer(learning_rate=learn_rate_dec)
+    else:
+      raise ValueError('Invalid optimizer type: ' + optimizer_type)
+    tf.scalar_summary('learn_rate', learn_rate_dec, name='lr_summ')
+
+    self.train_op = opt.minimize(
+        self.loss, global_step=self.global_step, name='train')
+
+
+def _PadLabels3d(logits, labels):
+  """Pads or slices 3-d labels to match logits.
+
+  Covers the case of 2-d softmax output, when labels is [batch, height, width]
+  and logits is [batch, height, width, onehot]
+  Args:
+    logits: 4-d Pre-softmax fully-connected output.
+    labels: 3-d, but not necessarily matching in size.
+
+  Returns:
+    labels: Resized by padding or clipping to match logits.
+  """
+  logits_shape = shapes.tensor_shape(logits)
+  labels_shape = shapes.tensor_shape(labels)
+  labels = tf.reshape(labels, [-1, labels_shape[2]])
+  labels = _PadLabels2d(logits_shape[2], labels)
+  labels = tf.reshape(labels, [labels_shape[0], -1])
+  labels = _PadLabels2d(logits_shape[1] * logits_shape[2], labels)
+  return tf.reshape(labels, [labels_shape[0], logits_shape[1], logits_shape[2]])
+
+
+def _PadLabels2d(logits_size, labels):
+  """Pads or slices the 2nd dimension of 2-d labels to match logits_size.
+
+  Covers the case of 1-d softmax output, when labels is [batch, seq] and
+  logits is [batch, seq, onehot]
+  Args:
+    logits_size: Tensor returned from tf.shape giving the target size.
+    labels:      2-d, but not necessarily matching in size.
+
+  Returns:
+    labels: Resized by padding or clipping the last dimension to logits_size.
+  """
+  pad = logits_size - tf.shape(labels)[1]
+
+  def _PadFn():
+    return tf.pad(labels, [[0, 0], [0, pad]])
+
+  def _SliceFn():
+    return tf.slice(labels, [0, 0], [-1, logits_size])
+
+  return tf.cond(tf.greater(pad, 0), _PadFn, _SliceFn)
+
+
+def _ParseInputSpec(input_spec):
+  """Parses input_spec and returns the numbers obtained therefrom.
+
+  Args:
+    input_spec:  Specification of the input layer. See Build.
+
+  Returns:
+    shape:      ImageShape with the desired shape of the input.
+
+  Raises:
+    ValueError: if syntax is incorrect.
+  """
+  pattern = re.compile(R'(\d+),(\d+),(\d+),(\d+)')
+  m = pattern.match(input_spec)
+  if m is None:
+    raise ValueError('Failed to parse input spec:' + input_spec)
+  batch_size = int(m.group(1))
+  y_size = int(m.group(2)) if int(m.group(2)) > 0 else None
+  x_size = int(m.group(3)) if int(m.group(3)) > 0 else None
+  depth = int(m.group(4))
+  if depth not in [1, 3]:
+    raise ValueError('Depth must be 1 or 3, had:', depth)
+  return vgsl_input.ImageShape(batch_size, y_size, x_size, depth)
+
+
+def _ParseOutputSpec(output_spec):
+  """Parses the output spec.
+
+  Args:
+    output_spec: Output layer definition. See Build.
+
+  Returns:
+    out_dims:     2|1|0 for 2-d, 1-d, 0-d.
+    out_func:     l|s|c for logistic, softmax, softmax+CTC
+    num_classes:  Number of classes in output.
+
+  Raises:
+    ValueError: if syntax is incorrect.
+  """
+  pattern = re.compile(R'(O)(0|1|2)(l|s|c)(\d+)')
+  m = pattern.match(output_spec)
+  if m is None:
+    raise ValueError('Failed to parse output spec:' + output_spec)
+  out_dims = int(m.group(2))
+  out_func = m.group(3)
+  if out_func == 'c' and out_dims != 1:
+    raise ValueError('CTC can only be used with a 1-D sequence!')
+  num_classes = int(m.group(4))
+  return out_dims, out_func, num_classes
+
+
+def _AddRateToSummary(tag, rate, step, sw):
+  """Adds the given rate to the summary with the given tag.
+
+  Args:
+    tag:   Name for this value.
+    rate:  Value to add to the summary. Perhaps an error rate.
+    step:  Global step of the graph for the x-coordinate of the summary.
+    sw:    Summary writer to which to write the rate value.
+  """
+  sw.add_summary(
+      summary_pb2.Summary(value=[summary_pb2.Summary.Value(
+          tag=tag, simple_value=rate)]), step)

+ 248 - 0
street/python/vgsl_model_test.py

@@ -0,0 +1,248 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for vgsl_model."""
+import os
+
+import numpy as np
+import tensorflow as tf
+import vgsl_input
+import vgsl_model
+
+
+def _testdata(filename):
+  return os.path.join('../testdata/', filename)
+
+
+def _rand(*size):
+  return np.random.uniform(size=size).astype('f')
+
+
+class VgslModelTest(tf.test.TestCase):
+
+  def testParseInputSpec(self):
+    """The parser must return the numbers in the correct order.
+    """
+    shape = vgsl_model._ParseInputSpec(input_spec='32,42,256,3')
+    self.assertEqual(
+        shape,
+        vgsl_input.ImageShape(
+            batch_size=32, height=42, width=256, depth=3))
+    # Nones must be inserted for zero sizes.
+    shape = vgsl_model._ParseInputSpec(input_spec='1,0,0,3')
+    self.assertEqual(
+        shape,
+        vgsl_input.ImageShape(
+            batch_size=1, height=None, width=None, depth=3))
+
+  def testParseOutputSpec(self):
+    """The parser must return the correct args in the correct order.
+    """
+    out_dims, out_func, num_classes = vgsl_model._ParseOutputSpec(
+        output_spec='O1c142')
+    self.assertEqual(out_dims, 1)
+    self.assertEqual(out_func, 'c')
+    self.assertEqual(num_classes, 142)
+    out_dims, out_func, num_classes = vgsl_model._ParseOutputSpec(
+        output_spec='O2s99')
+    self.assertEqual(out_dims, 2)
+    self.assertEqual(out_func, 's')
+    self.assertEqual(num_classes, 99)
+    out_dims, out_func, num_classes = vgsl_model._ParseOutputSpec(
+        output_spec='O0l12')
+    self.assertEqual(out_dims, 0)
+    self.assertEqual(out_func, 'l')
+    self.assertEqual(num_classes, 12)
+
+  def testPadLabels2d(self):
+    """Must pad timesteps in labels to match logits.
+    """
+    with self.test_session() as sess:
+      # Make placeholders for logits and labels.
+      ph_logits = tf.placeholder(tf.float32, shape=(None, None, 42))
+      ph_labels = tf.placeholder(tf.int64, shape=(None, None))
+      padded_labels = vgsl_model._PadLabels2d(tf.shape(ph_logits)[1], ph_labels)
+      # Make actual inputs.
+      real_logits = _rand(4, 97, 42)
+      real_labels = _rand(4, 85)
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (4, 97))
+      real_labels = _rand(4, 97)
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (4, 97))
+      real_labels = _rand(4, 100)
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (4, 97))
+
+  def testPadLabels3d(self):
+    """Must pad height and width in labels to match logits.
+
+    The tricky thing with 3-d is that the rows and columns need to remain
+    intact, so we'll test it with small known data.
+    """
+    with self.test_session() as sess:
+      # Make placeholders for logits and labels.
+      ph_logits = tf.placeholder(tf.float32, shape=(None, None, None, 42))
+      ph_labels = tf.placeholder(tf.int64, shape=(None, None, None))
+      padded_labels = vgsl_model._PadLabels3d(ph_logits, ph_labels)
+      # Make actual inputs.
+      real_logits = _rand(1, 3, 4, 42)
+      # Test all 9 combinations of height x width in [small, ok, big]
+      real_labels = np.arange(6).reshape((1, 2, 3))  # Height small, width small
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (1, 3, 4))
+      self.assertAllEqual(np_array[0, :, :],
+                          [[0, 1, 2, 0], [3, 4, 5, 0], [0, 0, 0, 0]])
+      real_labels = np.arange(8).reshape((1, 2, 4))  # Height small, width ok
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (1, 3, 4))
+      self.assertAllEqual(np_array[0, :, :],
+                          [[0, 1, 2, 3], [4, 5, 6, 7], [0, 0, 0, 0]])
+      real_labels = np.arange(10).reshape((1, 2, 5))  # Height small, width big
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (1, 3, 4))
+      self.assertAllEqual(np_array[0, :, :],
+                          [[0, 1, 2, 3], [5, 6, 7, 8], [0, 0, 0, 0]])
+      real_labels = np.arange(9).reshape((1, 3, 3))  # Height ok, width small
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (1, 3, 4))
+      self.assertAllEqual(np_array[0, :, :],
+                          [[0, 1, 2, 0], [3, 4, 5, 0], [6, 7, 8, 0]])
+      real_labels = np.arange(12).reshape((1, 3, 4))  # Height ok, width ok
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (1, 3, 4))
+      self.assertAllEqual(np_array[0, :, :],
+                          [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]])
+      real_labels = np.arange(15).reshape((1, 3, 5))  # Height ok, width big
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (1, 3, 4))
+      self.assertAllEqual(np_array[0, :, :],
+                          [[0, 1, 2, 3], [5, 6, 7, 8], [10, 11, 12, 13]])
+      real_labels = np.arange(12).reshape((1, 4, 3))  # Height big, width small
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (1, 3, 4))
+      self.assertAllEqual(np_array[0, :, :],
+                          [[0, 1, 2, 0], [3, 4, 5, 0], [6, 7, 8, 0]])
+      real_labels = np.arange(16).reshape((1, 4, 4))  # Height big, width ok
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (1, 3, 4))
+      self.assertAllEqual(np_array[0, :, :],
+                          [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]])
+      real_labels = np.arange(20).reshape((1, 4, 5))  # Height big, width big
+      np_array = sess.run([padded_labels],
+                          feed_dict={ph_logits: real_logits,
+                                     ph_labels: real_labels})[0]
+      self.assertEqual(tuple(np_array.shape), (1, 3, 4))
+      self.assertAllEqual(np_array[0, :, :],
+                          [[0, 1, 2, 3], [5, 6, 7, 8], [10, 11, 12, 13]])
+
+  def testEndToEndSizes0d(self):
+    """Tests that the output sizes match when training/running real 0d data.
+
+    Uses mnist with dual summarizing LSTMs to reduce to a single value.
+    """
+    filename = _testdata('mnist-tiny')
+    with self.test_session() as sess:
+      model = vgsl_model.InitNetwork(
+          filename,
+          model_spec='4,0,0,1[Cr5,5,16 Mp3,3 Lfys16 Lfxs16]O0s12',
+          mode='train')
+      tf.initialize_all_variables().run(session=sess)
+      coord = tf.train.Coordinator()
+      tf.train.start_queue_runners(sess=sess, coord=coord)
+      _, step = model.TrainAStep(sess)
+      self.assertEqual(step, 1)
+      output, labels = model.RunAStep(sess)
+      self.assertEqual(len(output.shape), 2)
+      self.assertEqual(len(labels.shape), 1)
+      self.assertEqual(output.shape[0], labels.shape[0])
+      self.assertEqual(output.shape[1], 12)
+
+  # TODO(rays) Support logistic and test with Imagenet (as 0d, multi-object.)
+
+  def testEndToEndSizes1dCTC(self):
+    """Tests that the output sizes match when training with CTC.
+
+    Basic bidi LSTM on top of convolution and summarizing LSTM with CTC.
+    """
+    filename = _testdata('arial-32-tiny')
+    with self.test_session() as sess:
+      model = vgsl_model.InitNetwork(
+          filename,
+          model_spec='2,0,0,1[Cr5,5,16 Mp3,3 Lfys16 Lbx100]O1c105',
+          mode='train')
+      tf.initialize_all_variables().run(session=sess)
+      coord = tf.train.Coordinator()
+      tf.train.start_queue_runners(sess=sess, coord=coord)
+      _, step = model.TrainAStep(sess)
+      self.assertEqual(step, 1)
+      output, labels = model.RunAStep(sess)
+      self.assertEqual(len(output.shape), 3)
+      self.assertEqual(len(labels.shape), 2)
+      self.assertEqual(output.shape[0], labels.shape[0])
+      # This is ctc - the only cast-iron guarantee is labels <= output.
+      self.assertLessEqual(labels.shape[1], output.shape[1])
+      self.assertEqual(output.shape[2], 105)
+
+  def testEndToEndSizes1dFixed(self):
+    """Tests that the output sizes match when training/running 1 data.
+
+    Convolution, summarizing LSTM with fwd rev fwd to allow no CTC.
+    """
+    filename = _testdata('numbers-16-tiny')
+    with self.test_session() as sess:
+      model = vgsl_model.InitNetwork(
+          filename,
+          model_spec='8,0,0,1[Cr5,5,16 Mp3,3 Lfys16 Lfx64 Lrx64 Lfx64]O1s12',
+          mode='train')
+      tf.initialize_all_variables().run(session=sess)
+      coord = tf.train.Coordinator()
+      tf.train.start_queue_runners(sess=sess, coord=coord)
+      _, step = model.TrainAStep(sess)
+      self.assertEqual(step, 1)
+      output, labels = model.RunAStep(sess)
+      self.assertEqual(len(output.shape), 3)
+      self.assertEqual(len(labels.shape), 2)
+      self.assertEqual(output.shape[0], labels.shape[0])
+      # Not CTC, output lengths match.
+      self.assertEqual(output.shape[1], labels.shape[1])
+      self.assertEqual(output.shape[2], 12)
+
+  # TODO(rays) Get a 2-d dataset and support 2d (heat map) outputs.
+
+
+if __name__ == '__main__':
+  tf.test.main()

+ 55 - 0
street/python/vgsl_train.py

@@ -0,0 +1,55 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Model trainer for single or multi-replica training."""
+from tensorflow import app
+from tensorflow.python.platform import flags
+
+import vgsl_model
+
+flags.DEFINE_string('master', '', 'Name of the TensorFlow master to use.')
+flags.DEFINE_string('train_dir', '/tmp/mdir',
+                    'Directory where to write event logs.')
+flags.DEFINE_string('model_str',
+                    '1,150,600,3[S2(4x150)0,2 Ct5,5,16 Mp2,2 Ct5,5,64 Mp3,3'
+                    '([Lrys64 Lbx128][Lbys64 Lbx128][Lfys64 Lbx128])S3(3x0)2,3'
+                    'Lfx128 Lrx128 S0(1x4)0,3 Do Lfx256]O1c134',
+                    'Network description.')
+flags.DEFINE_integer('max_steps', 10000, 'Number of steps to train for.')
+flags.DEFINE_integer('task', 0, 'Task id of the replica running the training.')
+flags.DEFINE_integer('ps_tasks', 0, 'Number of tasks in the ps job.'
+                     'If 0 no ps job is used.')
+flags.DEFINE_string('train_data', None, 'Training data filepattern')
+flags.DEFINE_float('initial_learning_rate', 0.00002, 'Initial learning rate')
+flags.DEFINE_float('final_learning_rate', 0.00002, 'Final learning rate')
+flags.DEFINE_integer('learning_rate_halflife', 1600000,
+                     'Halflife of learning rate')
+flags.DEFINE_string('optimizer_type', 'Adam',
+                    'Optimizer from:GradientDescent, AdaGrad, Momentum, Adam')
+flags.DEFINE_integer('num_preprocess_threads', 4, 'Number of input threads')
+
+FLAGS = flags.FLAGS
+
+
+def main(argv):
+  del argv
+  vgsl_model.Train(FLAGS.train_dir, FLAGS.model_str, FLAGS.train_data,
+                   FLAGS.max_steps, FLAGS.master, FLAGS.task, FLAGS.ps_tasks,
+                   FLAGS.initial_learning_rate, FLAGS.final_learning_rate,
+                   FLAGS.learning_rate_halflife, FLAGS.optimizer_type,
+                   FLAGS.num_preprocess_threads)
+
+
+if __name__ == '__main__':
+  app.run()

+ 533 - 0
street/python/vgslspecs.py

@@ -0,0 +1,533 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""String network description language mapping to TF-Slim calls where possible.
+
+See vglspecs.md for detailed description.
+"""
+
+import re
+from string import maketrans
+
+import nn_ops
+import shapes
+import tensorflow as tf
+import tensorflow.contrib.slim as slim
+
+
+# Class that builds a set of ops to manipulate variable-sized images.
+class VGSLSpecs(object):
+  """Layers that can be built from a string definition."""
+
+  def __init__(self, widths, heights, is_training):
+    """Constructs a VGSLSpecs.
+
+    Args:
+      widths:  Tensor of size batch_size of the widths of the inputs.
+      heights: Tensor of size batch_size of the heights of the inputs.
+      is_training: True if the graph should be build for training.
+    """
+    # The string that was used to build this model.
+    self.model_str = None
+    # True if we are training
+    self.is_training = is_training
+    # Tensor for the size of the images, of size batch_size.
+    self.widths = widths
+    self.heights = heights
+    # Overall reduction factors of this model so far for each dimension.
+    # TODO(rays) consider building a graph from widths and heights instead of
+    # computing a scale factor.
+    self.reduction_factors = [1.0, 1.0, 1.0, 1.0]
+    # List of Op parsers.
+    # TODO(rays) add more Op types as needed.
+    self.valid_ops = [self.AddSeries, self.AddParallel, self.AddConvLayer,
+                      self.AddMaxPool, self.AddDropout, self.AddReShape,
+                      self.AddFCLayer, self.AddLSTMLayer]
+    # Translation table to convert unacceptable characters that may occur
+    # in op strings that cannot be used as names.
+    self.transtab = maketrans('(,)', '___')
+
+  def Build(self, prev_layer, model_str):
+    """Builds a network with input prev_layer from a VGSLSpecs description.
+
+    Args:
+      prev_layer: The input tensor.
+      model_str:  Model definition similar to Tesseract as follows:
+        ============ FUNCTIONAL OPS ============
+        C(s|t|r|l|m)[{name}]<y>,<x>,<d> Convolves using a y,x window, with no
+          shrinkage, SAME infill, d outputs, with s|t|r|l|m non-linear layer.
+          (s|t|r|l|m) specifies the type of non-linearity:
+          s = sigmoid
+          t = tanh
+          r = relu
+          l = linear (i.e., None)
+          m = softmax
+        F(s|t|r|l|m)[{name}]<d> Fully-connected with s|t|r|l|m non-linearity and
+          d outputs. Reduces height, width to 1. Input height and width must be
+          constant.
+        L(f|r|b)(x|y)[s][{name}]<n> LSTM cell with n outputs.
+          f runs the LSTM forward only.
+          r runs the LSTM reversed only.
+          b runs the LSTM bidirectionally.
+          x runs the LSTM in the x-dimension (on data with or without the
+             y-dimension).
+          y runs the LSTM in the y-dimension (data must have a y dimension).
+          s (optional) summarizes the output in the requested dimension,
+             outputting only the final step, collapsing the dimension to a
+             single element.
+          Examples:
+          Lfx128 runs a forward-only LSTM in the x-dimension with 128
+                 outputs, treating any y dimension independently.
+          Lfys64 runs a forward-only LSTM in the y-dimension with 64 outputs
+                 and collapses the y-dimension to 1 element.
+          NOTE that Lbxsn is implemented as (LfxsnLrxsn) since the summaries
+          need to be taken from opposite ends of the output
+        Do[{name}] Insert a dropout layer.
+        ============ PLUMBING OPS ============
+        [...] Execute ... networks in series (layers).
+        (...) Execute ... networks in parallel, with their output concatenated
+          in depth.
+        S[{name}]<d>(<a>x<b>)<e>,<f> Splits one dimension, moves one part to
+          another dimension.
+          Splits input dimension d into a x b, sending the high part (a) to the
+          high side of dimension e, and the low part (b) to the high side of
+          dimension f. Exception: if d=e=f, then then dimension d is internally
+          transposed to bxa.
+          Either a or b can be zero, meaning whatever is left after taking out
+          the other, allowing dimensions to be of variable size.
+          Eg. S3(3x50)2,3 will split the 150-element depth into 3x50, with the 3
+          going to the most significant part of the width, and the 50 part
+          staying in depth.
+          This will rearrange a 3x50 output parallel operation to spread the 3
+          output sets over width.
+        Mp[{name}]<y>,<x> Maxpool the input, reducing the (y,x) rectangle to a
+          single vector value.
+
+    Returns:
+      Output tensor
+    """
+    self.model_str = model_str
+    final_layer, _ = self.BuildFromString(prev_layer, 0)
+    return final_layer
+
+  def GetLengths(self, dim=2, factor=1):
+    """Returns the lengths of the batch of elements in the given dimension.
+
+    WARNING: The returned sizes may not exactly match TF's calculation.
+    Args:
+      dim: dimension to get the sizes of, in [1,2]. batch, depth not allowed.
+      factor: A scalar value to multiply by.
+
+    Returns:
+      The original heights/widths scaled by the current scaling of the model and
+      the given factor.
+
+    Raises:
+      ValueError: If the args are invalid.
+    """
+    if dim == 1:
+      lengths = self.heights
+    elif dim == 2:
+      lengths = self.widths
+    else:
+      raise ValueError('Invalid dimension given to GetLengths')
+    lengths = tf.cast(lengths, tf.float32)
+    if self.reduction_factors[dim] is not None:
+      lengths = tf.div(lengths, self.reduction_factors[dim])
+    else:
+      lengths = tf.ones_like(lengths)
+    if factor != 1:
+      lengths = tf.mul(lengths, tf.cast(factor, tf.float32))
+    return tf.cast(lengths, tf.int32)
+
+  def BuildFromString(self, prev_layer, index):
+    """Adds the layers defined by model_str[index:] to the model.
+
+    Args:
+      prev_layer: Input tensor.
+      index:      Position in model_str to start parsing
+
+    Returns:
+      Output tensor, next model_str index.
+
+    Raises:
+      ValueError: If the model string is unrecognized.
+    """
+    index = self._SkipWhitespace(index)
+    for op in self.valid_ops:
+      output_layer, next_index = op(prev_layer, index)
+      if output_layer is not None:
+        return output_layer, next_index
+    if output_layer is not None:
+      return output_layer, next_index
+    raise ValueError('Unrecognized model string:' + self.model_str[index:])
+
+  def AddSeries(self, prev_layer, index):
+    """Builds a sequence of layers for a VGSLSpecs model.
+
+    Args:
+      prev_layer: Input tensor.
+      index:      Position in model_str to start parsing
+
+    Returns:
+      Output tensor of the series, end index in model_str.
+
+    Raises:
+      ValueError: If [] are unbalanced.
+    """
+    if self.model_str[index] != '[':
+      return None, None
+    index += 1
+    while index < len(self.model_str) and self.model_str[index] != ']':
+      prev_layer, index = self.BuildFromString(prev_layer, index)
+    if index == len(self.model_str):
+      raise ValueError('Missing ] at end of series!' + self.model_str)
+    return prev_layer, index + 1
+
+  def AddParallel(self, prev_layer, index):
+    """tf.concats outputs of layers that run on the same inputs.
+
+    Args:
+      prev_layer: Input tensor.
+      index:      Position in model_str to start parsing
+
+    Returns:
+      Output tensor of the parallel,  end index in model_str.
+
+    Raises:
+      ValueError: If () are unbalanced or the elements don't match.
+    """
+    if self.model_str[index] != '(':
+      return None, None
+    index += 1
+    layers = []
+    num_dims = 0
+    # Each parallel must output the same, including any reduction factor, in
+    # all dimensions except depth.
+    # We have to save the starting factors, so they don't get reduced by all
+    # the elements of the parallel, only once.
+    original_factors = self.reduction_factors
+    final_factors = None
+    while index < len(self.model_str) and self.model_str[index] != ')':
+      self.reduction_factors = original_factors
+      layer, index = self.BuildFromString(prev_layer, index)
+      if num_dims == 0:
+        num_dims = len(layer.get_shape())
+      elif num_dims != len(layer.get_shape()):
+        raise ValueError('All elements of parallel must return same num dims')
+      layers.append(layer)
+      if final_factors:
+        if final_factors != self.reduction_factors:
+          raise ValueError('All elements of parallel must scale the same')
+      else:
+        final_factors = self.reduction_factors
+    if index == len(self.model_str):
+      raise ValueError('Missing ) at end of parallel!' + self.model_str)
+    return tf.concat(num_dims - 1, layers), index + 1
+
+  def AddConvLayer(self, prev_layer, index):
+    """Add a single standard convolutional layer.
+
+    Args:
+      prev_layer: Input tensor.
+      index:      Position in model_str to start parsing
+
+    Returns:
+      Output tensor, end index in model_str.
+    """
+    pattern = re.compile(R'(C)(s|t|r|l|m)({\w+})?(\d+),(\d+),(\d+)')
+    m = pattern.match(self.model_str, index)
+    if m is None:
+      return None, None
+    name = self._GetLayerName(m.group(0), index, m.group(3))
+    width = int(m.group(4))
+    height = int(m.group(5))
+    depth = int(m.group(6))
+    fn = self._NonLinearity(m.group(2))
+    return slim.conv2d(
+        prev_layer, depth, [height, width], activation_fn=fn,
+        scope=name), m.end()
+
+  def AddMaxPool(self, prev_layer, index):
+    """Add a maxpool layer.
+
+    Args:
+      prev_layer: Input tensor.
+      index:      Position in model_str to start parsing
+
+    Returns:
+      Output tensor, end index in model_str.
+    """
+    pattern = re.compile(R'(Mp)({\w+})?(\d+),(\d+)(?:,(\d+),(\d+))?')
+    m = pattern.match(self.model_str, index)
+    if m is None:
+      return None, None
+    name = self._GetLayerName(m.group(0), index, m.group(2))
+    height = int(m.group(3))
+    width = int(m.group(4))
+    y_stride = height if m.group(5) is None else m.group(5)
+    x_stride = width if m.group(6) is None else m.group(6)
+    self.reduction_factors[1] *= y_stride
+    self.reduction_factors[2] *= x_stride
+    return slim.max_pool2d(
+        prev_layer, [height, width], [y_stride, x_stride],
+        padding='SAME',
+        scope=name), m.end()
+
+  def AddDropout(self, prev_layer, index):
+    """Adds a dropout layer.
+
+    Args:
+      prev_layer: Input tensor.
+      index:      Position in model_str to start parsing
+
+    Returns:
+      Output tensor, end index in model_str.
+    """
+    pattern = re.compile(R'(Do)({\w+})?')
+    m = pattern.match(self.model_str, index)
+    if m is None:
+      return None, None
+    name = self._GetLayerName(m.group(0), index, m.group(2))
+    layer = slim.dropout(
+        prev_layer, 0.5, is_training=self.is_training, scope=name)
+    return layer, m.end()
+
+  def AddReShape(self, prev_layer, index):
+    """Reshapes the input tensor by moving each (x_scale,y_scale) rectangle to.
+
+       the depth dimension. NOTE that the TF convention is that inputs are
+       [batch, y, x, depth].
+
+    Args:
+      prev_layer: Input tensor.
+      index:      Position in model_str to start parsing
+
+    Returns:
+      Output tensor, end index in model_str.
+    """
+    pattern = re.compile(R'(S)(?:{(\w)})?(\d+)\((\d+)x(\d+)\)(\d+),(\d+)')
+    m = pattern.match(self.model_str, index)
+    if m is None:
+      return None, None
+    name = self._GetLayerName(m.group(0), index, m.group(2))
+    src_dim = int(m.group(3))
+    part_a = int(m.group(4))
+    part_b = int(m.group(5))
+    dest_dim_a = int(m.group(6))
+    dest_dim_b = int(m.group(7))
+    if part_a == 0:
+      part_a = -1
+    if part_b == 0:
+      part_b = -1
+    prev_shape = tf.shape(prev_layer)
+    layer = shapes.transposing_reshape(
+        prev_layer, src_dim, part_a, part_b, dest_dim_a, dest_dim_b, name=name)
+    # Compute scale factors.
+    result_shape = tf.shape(layer)
+    for i in xrange(len(self.reduction_factors)):
+      if self.reduction_factors[i] is not None:
+        factor1 = tf.cast(self.reduction_factors[i], tf.float32)
+        factor2 = tf.cast(prev_shape[i], tf.float32)
+        divisor = tf.cast(result_shape[i], tf.float32)
+        self.reduction_factors[i] = tf.div(tf.mul(factor1, factor2), divisor)
+    return layer, m.end()
+
+  def AddFCLayer(self, prev_layer, index):
+    """Parse expression and add Fully Connected Layer.
+
+    Args:
+      prev_layer: Input tensor.
+      index:      Position in model_str to start parsing
+
+    Returns:
+      Output tensor, end index in model_str.
+    """
+    pattern = re.compile(R'(F)(s|t|r|l|m)({\w+})?(\d+)')
+    m = pattern.match(self.model_str, index)
+    if m is None:
+      return None, None
+    fn = self._NonLinearity(m.group(2))
+    name = self._GetLayerName(m.group(0), index, m.group(3))
+    depth = int(m.group(4))
+    input_depth = shapes.tensor_dim(prev_layer, 1) * shapes.tensor_dim(
+        prev_layer, 2) * shapes.tensor_dim(prev_layer, 3)
+    # The slim fully connected is actually a 1x1 conv, so we have to crush the
+    # dimensions on input.
+    # Everything except batch goes to depth, and therefore has to be known.
+    shaped = tf.reshape(
+        prev_layer, [-1, input_depth], name=name + '_reshape_in')
+    output = slim.fully_connected(shaped, depth, activation_fn=fn, scope=name)
+    # Width and height are collapsed to 1.
+    self.reduction_factors[1] = None
+    self.reduction_factors[2] = None
+    return tf.reshape(
+        output, [shapes.tensor_dim(prev_layer, 0), 1, 1, depth],
+        name=name + '_reshape_out'), m.end()
+
+  def AddLSTMLayer(self, prev_layer, index):
+    """Parse expression and add LSTM Layer.
+
+    Args:
+      prev_layer: Input tensor.
+      index:      Position in model_str to start parsing
+
+    Returns:
+      Output tensor, end index in model_str.
+    """
+    pattern = re.compile(R'(L)(f|r|b)(x|y)(s)?({\w+})?(\d+)')
+    m = pattern.match(self.model_str, index)
+    if m is None:
+      return None, None
+    direction = m.group(2)
+    dim = m.group(3)
+    summarize = m.group(4) == 's'
+    name = self._GetLayerName(m.group(0), index, m.group(5))
+    depth = int(m.group(6))
+    if direction == 'b' and summarize:
+      fwd = self._LSTMLayer(prev_layer, 'forward', dim, True, depth,
+                            name + '_forward')
+      back = self._LSTMLayer(prev_layer, 'backward', dim, True, depth,
+                             name + '_reverse')
+      return tf.concat(3, [fwd, back], name=name + '_concat'), m.end()
+    if direction == 'f':
+      direction = 'forward'
+    elif direction == 'r':
+      direction = 'backward'
+    else:
+      direction = 'bidirectional'
+    outputs = self._LSTMLayer(prev_layer, direction, dim, summarize, depth,
+                              name)
+    if summarize:
+      # The x or y dimension is getting collapsed.
+      if dim == 'x':
+        self.reduction_factors[2] = None
+      else:
+        self.reduction_factors[1] = None
+    return outputs, m.end()
+
+  def _LSTMLayer(self, prev_layer, direction, dim, summarize, depth, name):
+    """Adds an LSTM layer with the given pre-parsed attributes.
+
+    Always maps 4-D to 4-D regardless of summarize.
+    Args:
+      prev_layer: Input tensor.
+      direction:  'forward' 'backward' or 'bidirectional'
+      dim:        'x' or 'y', dimension to consider as time.
+      summarize:  True if we are to return only the last timestep.
+      depth:      Output depth.
+      name:       Some string naming the op.
+
+    Returns:
+      Output tensor.
+    """
+    # If the target dimension is y, we need to transpose.
+    if dim == 'x':
+      lengths = self.GetLengths(2, 1)
+      inputs = prev_layer
+    else:
+      lengths = self.GetLengths(1, 1)
+      inputs = tf.transpose(prev_layer, [0, 2, 1, 3], name=name + '_ytrans_in')
+    input_batch = shapes.tensor_dim(inputs, 0)
+    num_slices = shapes.tensor_dim(inputs, 1)
+    num_steps = shapes.tensor_dim(inputs, 2)
+    input_depth = shapes.tensor_dim(inputs, 3)
+    # Reshape away the other dimension.
+    inputs = tf.reshape(
+        inputs, [-1, num_steps, input_depth], name=name + '_reshape_in')
+    # We need to replicate the lengths by the size of the other dimension, and
+    # any changes that have been made to the batch dimension.
+    tile_factor = tf.to_float(input_batch *
+                              num_slices) / tf.to_float(tf.shape(lengths)[0])
+    lengths = tf.tile(lengths, [tf.cast(tile_factor, tf.int32)])
+    lengths = tf.cast(lengths, tf.int64)
+    outputs = nn_ops.rnn_helper(
+        inputs,
+        lengths,
+        cell_type='lstm',
+        num_nodes=depth,
+        direction=direction,
+        name=name,
+        stddev=0.1)
+    # Output depth is doubled if bi-directional.
+    if direction == 'bidirectional':
+      output_depth = depth * 2
+    else:
+      output_depth = depth
+    # Restore the other dimension.
+    if summarize:
+      outputs = tf.slice(
+          outputs, [0, num_steps - 1, 0], [-1, 1, -1], name=name + '_sum_slice')
+      outputs = tf.reshape(
+          outputs, [input_batch, num_slices, 1, output_depth],
+          name=name + '_reshape_out')
+    else:
+      outputs = tf.reshape(
+          outputs, [input_batch, num_slices, num_steps, output_depth],
+          name=name + '_reshape_out')
+    if dim == 'y':
+      outputs = tf.transpose(outputs, [0, 2, 1, 3], name=name + '_ytrans_out')
+    return outputs
+
+  def _NonLinearity(self, code):
+    """Returns the non-linearity function pointer for the given string code.
+
+    For forwards compatibility, allows the full names for stand-alone
+    non-linearities, as well as the single-letter names used in ops like C,F.
+    Args:
+      code: String code representing a non-linearity function.
+    Returns:
+      non-linearity function represented by the code.
+    """
+    if code in ['s', 'Sig']:
+      return tf.sigmoid
+    elif code in ['t', 'Tanh']:
+      return tf.tanh
+    elif code in ['r', 'Relu']:
+      return tf.nn.relu
+    elif code in ['m', 'Smax']:
+      return tf.nn.softmax
+    return None
+
+  def _GetLayerName(self, op_str, index, name_str):
+    """Generates a name for the op, using a user-supplied name if possible.
+
+    Args:
+      op_str:     String representing the parsed op.
+      index:      Position in model_str of the start of the op.
+      name_str:   User-supplied {name} with {} that need removing or None.
+
+    Returns:
+      Selected name.
+    """
+    if name_str:
+      return name_str[1:-1]
+    else:
+      return op_str.translate(self.transtab) + '_' + str(index)
+
+  def _SkipWhitespace(self, index):
+    """Skips any leading whitespace in the model description.
+
+    Args:
+      index:      Position in model_str to start parsing
+
+    Returns:
+      end index in model_str of whitespace.
+    """
+    pattern = re.compile(R'([ \t\n]+)')
+    m = pattern.match(self.model_str, index)
+    if m is None:
+      return index
+    return m.end()

+ 122 - 0
street/python/vgslspecs_test.py

@@ -0,0 +1,122 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for vgslspecs."""
+
+import numpy as np
+import tensorflow as tf
+import vgslspecs
+
+
+def _rand(*size):
+  return np.random.uniform(size=size).astype('f')
+
+
+class VgslspecsTest(tf.test.TestCase):
+
+  def __init__(self, other):
+    super(VgslspecsTest, self).__init__(other)
+    self.max_width = 36
+    self.max_height = 24
+    self.batch_size = 4
+
+  def SetupInputs(self):
+    # Make placeholders for standard inputs.
+    # Everything is variable in the input, except the depth.
+    self.ph_image = tf.placeholder(
+        tf.float32, shape=(None, None, None, 3), name='inputs')
+    self.ph_widths = tf.placeholder(tf.int64, shape=(None,), name='w')
+    self.ph_heights = tf.placeholder(tf.int64, shape=(None,), name='h')
+    # Make actual inputs.
+    self.in_image = _rand(self.batch_size, self.max_height, self.max_width, 3)
+    self.in_widths = [24, 12, self.max_width, 30]
+    self.in_heights = [self.max_height, 18, 12, 6]
+
+  def ExpectScaledSize(self, spec, target_shape, factor=1):
+    """Tests that the output of the graph of the given spec has target_shape."""
+    with tf.Graph().as_default():
+      with self.test_session() as sess:
+        self.SetupInputs()
+        # Only the placeholders are given at construction time.
+        vgsl = vgslspecs.VGSLSpecs(self.ph_widths, self.ph_heights, True)
+        outputs = vgsl.Build(self.ph_image, spec)
+        # Compute the expected output widths from the given scale factor.
+        target_widths = tf.div(self.in_widths, factor).eval()
+        target_heights = tf.div(self.in_heights, factor).eval()
+        # Run with the 'real' data.
+        tf.initialize_all_variables().run()
+        res_image, res_widths, res_heights = sess.run(
+            [outputs, vgsl.GetLengths(2), vgsl.GetLengths(1)],
+            feed_dict={self.ph_image: self.in_image,
+                       self.ph_widths: self.in_widths,
+                       self.ph_heights: self.in_heights})
+        self.assertEqual(tuple(res_image.shape), target_shape)
+        if target_shape[1] > 1:
+          self.assertEqual(tuple(res_heights), tuple(target_heights))
+        if target_shape[2] > 1:
+          self.assertEqual(tuple(res_widths), tuple(target_widths))
+
+  def testSameSizeConv(self):
+    """Test all types of Conv. There is no scaling."""
+    self.ExpectScaledSize(
+        '[Cs{MyConv}5,5,16 Ct3,3,12 Cr4,4,24 Cl5,5,64]',
+        (self.batch_size, self.max_height, self.max_width, 64))
+
+  def testSameSizeLSTM(self):
+    """Test all non-reducing LSTMs. Output depth is doubled with BiDi."""
+    self.ExpectScaledSize('[Lfx16 Lrx8 Do Lbx24 Lfy12 Do{MyDo} Lry7 Lby32]',
+                          (self.batch_size, self.max_height, self.max_width,
+                           64))
+
+  def testSameSizeParallel(self):
+    """Parallel affects depth, but not scale."""
+    self.ExpectScaledSize('[Cs5,5,16 (Lfx{MyLSTM}32 Lrx32 Lbx16)]',
+                          (self.batch_size, self.max_height, self.max_width,
+                           96))
+
+  def testScalingOps(self):
+    """Test a heterogeneous series with scaling."""
+    self.ExpectScaledSize('[Cs5,5,16 Mp{MyPool}2,2 Ct3,3,32 Mp3,3 Lfx32 Lry64]',
+                          (self.batch_size, self.max_height / 6,
+                           self.max_width / 6, 64), 6)
+
+  def testXReduction(self):
+    """Test a heterogeneous series with reduction of x-dimension."""
+    self.ExpectScaledSize('[Cr5,5,16 Mp2,2 Ct3,3,32 Mp3,3 Lfxs32 Lry64]',
+                          (self.batch_size, self.max_height / 6, 1, 64), 6)
+
+  def testYReduction(self):
+    """Test a heterogeneous series with reduction of y-dimension."""
+    self.ExpectScaledSize('[Cl5,5,16 Mp2,2 Ct3,3,32 Mp3,3 Lfys32 Lfx64]',
+                          (self.batch_size, 1, self.max_width / 6, 64), 6)
+
+  def testXYReduction(self):
+    """Test a heterogeneous series with reduction to 0-d."""
+    self.ExpectScaledSize(
+        '[Cr5,5,16 Lfys32 Lfxs64 Fr{MyFC}16 Ft20 Fl12 Fs32 Fm40]',
+        (self.batch_size, 1, 1, 40))
+
+  def testReshapeTile(self):
+    """Tests that a tiled input can be reshaped to the batch dimension."""
+    self.ExpectScaledSize('[S2(3x0)0,2 Cr5,5,16 Lfys16]',
+                          (self.batch_size * 3, 1, self.max_width / 3, 16), 3)
+
+  def testReshapeDepth(self):
+    """Tests that depth can be reshaped to the x dimension."""
+    self.ExpectScaledSize('[Cl5,5,16 Mp3,3 (Lrys32 Lbys16 Lfys32) S3(3x0)2,3]',
+                          (self.batch_size, 1, self.max_width, 32))
+
+
+if __name__ == '__main__':
+  tf.test.main()

BIN
street/testdata/arial-32-tiny


+ 112 - 0
street/testdata/arial.charset_size=105.txt

@@ -0,0 +1,112 @@
+0	 
+104	<nul>
+1	G
+2	r
+3	a
+4	s
+5	l
+6	n
+7	d
+8	.
+9	B
+10	C
+11	O
+12	W
+13	Y
+14	,
+15	(
+16	u
+17	z
+18	i
+19	e
+20	)
+21	1
+22	9
+23	2
+24	-
+25	6
+26	o
+27	L
+28	P
+29	'
+30	t
+31	m
+32	K
+33	c
+34	k
+35	V
+36	S
+37	D
+38	J
+39	h
+40	M
+41	x
+42	E
+43	q
+44	;
+45	A
+46	y
+47	f
+48	5
+49	7
+50	b
+51	4
+52	0
+53	3
+54	N
+55	I
+56	T
+57	/
+58	p
+59	w
+60	g
+61	H
+62	“
+63	F
+62	”
+62	"
+29	’
+64	R
+24	—
+65	8
+66	v
+67	?
+68	é
+69	%
+70	:
+71	j
+72	\
+73	{
+74	}
+75	|
+76	U
+77	$
+78	°
+79	*
+80	!
+81	]
+82	Q
+29	‘
+83	Z
+84	X
+85	[
+86	=
+87	+
+88	§
+89	_
+90	£
+91	&
+92	#
+93	>
+94	<
+95	~
+96	€
+97	@
+98	¢
+99	»
+100	«
+47,5	fl
+47,18	fi
+101	®
+102	©
+103	¥

+ 139 - 0
street/testdata/charset_size=134.txt

@@ -0,0 +1,139 @@
+0	 
+133	<nul>
+1	l
+2	’
+3	é
+4	t
+5	e
+6	i
+7	n
+8	s
+9	x
+10	g
+11	u
+12	o
+13	1
+14	8
+15	7
+16	0
+17	-
+18	.
+19	p
+20	a
+21	r
+22	è
+23	d
+24	c
+25	V
+26	v
+27	b
+28	m
+29	)
+30	C
+31	z
+32	S
+33	y
+34	,
+35	k
+36	É
+37	A
+38	h
+39	E
+40	»
+41	D
+42	/
+43	H
+44	M
+45	(
+46	G
+47	P
+48	ç
+2	'
+49	R
+50	f
+51	"
+52	2
+53	j
+54	|
+55	N
+56	6
+57	°
+58	5
+59	T
+60	O
+61	U
+62	3
+63	%
+64	9
+65	q
+66	Z
+67	B
+68	K
+69	w
+70	W
+71	:
+72	4
+73	L
+74	F
+75	]
+76	ï
+2	‘
+77	I
+78	J
+79	ä
+80	î
+81	;
+82	à
+83	ê
+84	X
+85	ü
+86	Y
+87	ô
+88	=
+89	+
+90	\
+91	{
+92	}
+93	_
+94	Q
+95	œ
+96	ñ
+97	*
+98	!
+99	Ü
+51	“
+100	â
+101	Ç
+102	Œ
+103	û
+104	?
+105	$
+106	ë
+107	«
+108	€
+109	&
+110	<
+51	”
+111	æ
+112	#
+113	®
+114	Â
+115	È
+116	>
+117	[
+17	—
+118	Æ
+119	ù
+120	Î
+121	Ô
+122	ÿ
+123	À
+124	Ê
+125	@
+126	Ï
+127	©
+128	Ë
+129	Ù
+130	£
+131	Ÿ
+132	Û

+ 10 - 0
street/testdata/charset_size_10.txt

@@ -0,0 +1,10 @@
+0	 
+9	<nul>
+1	a
+2	b
+3	r
+4	n
+4,5	m
+6	f
+7	.
+8	,

BIN
street/testdata/mnist-tiny


BIN
street/testdata/numbers-16-tiny


+ 12 - 0
street/testdata/numbers.charset_size=12.txt

@@ -0,0 +1,12 @@
+0	 
+11	<nul>
+1	9
+2	8
+3	7
+4	6
+5	1
+6	4
+7	0
+8	3
+9	5
+10	2