Ver código fonte

Rewrote introduction after feedback from Prof. Waibel

Martin Thoma 10 anos atrás
pai
commit
4804ad91d5

BIN
documents/write-math-ba-paper/write-math-ba-paper.pdf


+ 68 - 34
documents/write-math-ba-paper/write-math-ba-paper.tex

@@ -8,6 +8,7 @@
 \usepackage{booktabs}
 \usepackage{multirow}
 \usepackage{pgfplots}
+\usepackage{ wasysym }
 \usepackage[noadjust]{cite}
 \usepackage[nameinlink,noabbrev]{cleveref} % has to be after hyperref, ntheorem, amsthm
 \usepackage[binary-units]{siunitx}
@@ -33,59 +34,77 @@
 \begin{document}
 \maketitle
 \begin{abstract}
-Writing mathematical formulas with \LaTeX{} is easy as soon as one is used to
-commands like \verb+\alpha+ and \verb+\propto+. However, for people who have
-never used \LaTeX{} or who don't know the English name of the command, it can
-be difficult to find the right command. Hence the automatic recognition of
-handwritten mathematical symbols is desirable. This paper presents a system
+
+The automatic recognition of single handwritten symbols has three main
+applications. The first application is to support users who know how a symbol
+looks like, but not what its name is such as $\saturn$. The second application
+is providing the necessary commands for professional publishing in books or on
+websites, e.g. in form of \LaTeX{} commands, as MathML, or as code points. The
+third application of single symbol classifiers is in form of a building block
+for formula recognition.
+
+This paper presents a system
 which uses the pen trajectory to classify handwritten symbols. Five
 preprocessing steps, one data multiplication algorithm, five features and five
 variants for multilayer Perceptron training were evaluated using $\num{166898}$
 recordings which were collected with two crowdsourcing projects. The evaluation
 results of these 21~experiments were used to create an optimized recognizer
 which has a TOP-1 error of less than $\SI{17.5}{\percent}$ and a TOP-3 error of
-$\SI{4.0}{\percent}$. This is a relative improvement of $\SI{18.5}{\percent}$ for the
-TOP-1 error and $\SI{29.7}{\percent}$ for the TOP-3 error compared to the
-baseline system.
+$\SI{4.0}{\percent}$. This is a relative improvement of $\SI{18.5}{\percent}$
+for the TOP-1 error and $\SI{29.7}{\percent}$ for the TOP-3 error compared to
+the baseline system.
 \end{abstract}
 
 \section{Introduction}
 On-line recognition makes use of the pen trajectory. This means the data is
 given as groups of sequences of tuples $(x, y, t) \in \mathbb{R}^3$, where each
 group represents a stroke, $(x, y)$ is the position of the pen on a canvas and
-$t$ is the time. One handwritten symbol in the described format is called a
-\textit{recording}. One approach to classify recordings into symbol classes
-assigns a probability to each class given the data. The classifier can be
-evaluated by using recordings which were classified by humans and were not used
-to train the classifier. The set of those recordings is called \textit{test
-set}. The TOP-$n$ error is defined as the fraction of the symbols where
-the correct class was not within the top $n$ classes of the highest
-probability.
+$t$ is the time.
+
+On-line data was used to classify handwritten natural language text in many
+different variants. For example, the NPen++ system classified cursive
+handwriting into English words by using hidden Markov models and neural
+networks\cite{Manke1995}.
+
+% One handwritten symbol in the described format is called a
+% \textit{recording}. One approach to classify recordings into symbol classes
+% assigns a probability to each class given the data. The classifier can be
+% evaluated by using recordings which were classified by humans and were not used
+% to train the classifier. The set of those recordings is called \textit{test
+% set}. The TOP-$n$ error is defined as the fraction of the symbols where
+% the correct class was not within the top $n$ classes of the highest
+% probability.
 
 Several systems for mathematical symbol recognition with on-line data have been
-described so far~\cite{Kosmala98,Mouchere2013}, but most of them have neither
-published their source code nor their data which makes it impossible to re-run
-experiments to compare different systems. This is unfortunate as the choice of
-symbols is crucial for the TOP-$n$ error and all systems used different symbol
-sets. For example, the symbols $o$, $O$, $\circ$ and $0$ are very similar and
-systems which know all those classes will certainly have a higher TOP-$n$ error
-than systems which only accept one of them.
+described so far~\cite{Kosmala98,Mouchere2013}, but no standard test set
+existed to compare the results of different classifiers. The used symbols
+differed in all papers. This is unfortunate as the choice of symbols is crucial
+for the TOP-$n$ error. For example, the symbols $o$, $O$, $\circ$ and $0$ are
+very similar and systems which know all those classes will certainly have a
+higher TOP-$n$ error than systems which only accept one of them. But not only
+the classes differed, also the used data to train and test had to be collected
+by each author again.
 
 Daniel Kirsch describes in~\cite{Kirsch} a system called Detexify which uses
-time warping to classify on-line handwritten symbols and reports a
-TOP-3 error of less than $\SI{10}{\percent}$ for a set of $\num{100}$~symbols.
-He also published his data on \url{https://github.com/kirel/detexify-data},
+time warping to classify on-line handwritten symbols and reports a TOP-3 error
+of less than $\SI{10}{\percent}$ for a set of $\num{100}$~symbols. He did also
+recently publish his data on \url{https://github.com/kirel/detexify-data},
 which was collected by a crowdsourcing approach via
 \url{http://detexify.kirelabs.org}. Those recordings as well as some recordings
 which were collected by a similar approach via \url{http://write-math.com} were
 used to train and evaluated different classifiers. A complete description of
 all involved software, data and experiments is given in~\cite{Thoma:2014}.
 
+In this paper we present a baseline system for the classification of on-line
+handwriting into $369$ classes of which some are very similar. An optimized
+classifier which has a $\SI{29.7}{\percent}$ relative improvement of the TOP-3
+error. This was achieved by using better features and layer-wise supervised
+pretraining. The absolute improvements compared to the baseline of those
+changes will also be shown.
 
-\section{Steps in Handwriting Recognition}
-
-The following steps are used for symbol classification:
 
+\section{Steps in Handwriting Recognition}
+The following steps are used for symbol classification:\nobreak
 \begin{enumerate}
     \item \textbf{Preprocessing}: Recorded data is never perfect. Devices have
           errors and people make mistakes while using the devices. To tackle
@@ -108,8 +127,9 @@ The following steps are used for symbol classification:
           recognition, this step will not be further discussed.
     \item \textbf{Feature computation}: A feature is high-level information
           derived from the raw data after preprocessing. Some systems like
-          Detexify take the result of the preprocessing step, but many
-          compute new features. This might have the advantage that less
+          Detexify take the result of the preprocessing step, but many compute
+          new features. Those features could be designed by a human engineer or
+          learned. Non-raw data features can have the advantage that less
           training data is needed since the developer can use knowledge about
           handwriting to compute highly discriminative features. Various
           features are explained in \cref{sec:features}.
@@ -121,8 +141,7 @@ The following steps are used for symbol classification:
 After these steps, we are faced with a classification learning task which
 consists of two parts:
 \begin{enumerate}
-    \item \textbf{Learning} parameters for a given classifier. This process is
-          also called \textit{training}.
+    \item \textbf{Learning} parameters for a given classifier.
     \item \textbf{Classifying} new recordings, sometimes called
           \textit{evaluation}. This should not be confused with the evaluation
           of the classification performance which is done for multiple
@@ -135,6 +154,21 @@ of input features is the same for every recording. There are many ways how to
 adjust \glspl{MLP} and how to adjust their training. Some of them are
 described in~\cref{sec:mlp-training}.
 
+
+\section{Data and Implementation}
+The combined data of Detexify and \href{http://write-math.com}{write-math.com}
+can be downloaded via \href{http://write-math.com/data}{write-math.com/data} as
+a compressed tar archive. It contains a list of $369$ symbols which are used in
+mathematical context. Each symbol has at least $50$ labeled examples, but most
+symbols have more than $200$ labeled examples and some have more than $2000$.
+In total, more than $\num{160000}$ labeled recordings were collected.
+
+Preprocessing and feature computation algorithms were implemented and are
+publicly available as open-source software in the Python package \texttt{hwrt}
+and \gls{MLP} algorithms are available in the Python package
+\texttt{nntoolkit}.
+
+
 \section{Algorithms}
 \subsection{Preprocessing}\label{sec:preprocessing}
 Preprocessing in symbol recognition is done to improve the quality and
@@ -485,7 +519,7 @@ this improved the classifiers again.
 \end{table}
 
 
-\section{Conclusion}
+\section{Discussion}
 Four baseline recognition systems were adjusted in many experiments and their
 recognition capabilities were compared in order to build a recognition system
 that can recognize 396 mathematical symbols with low error rates as well as to