|
@@ -8,6 +8,7 @@
|
|
|
\usepackage{booktabs}
|
|
|
\usepackage{multirow}
|
|
|
\usepackage{pgfplots}
|
|
|
+\usepackage{ wasysym }
|
|
|
\usepackage[noadjust]{cite}
|
|
|
\usepackage[nameinlink,noabbrev]{cleveref} % has to be after hyperref, ntheorem, amsthm
|
|
|
\usepackage[binary-units]{siunitx}
|
|
@@ -33,59 +34,77 @@
|
|
|
\begin{document}
|
|
|
\maketitle
|
|
|
\begin{abstract}
|
|
|
-Writing mathematical formulas with \LaTeX{} is easy as soon as one is used to
|
|
|
-commands like \verb+\alpha+ and \verb+\propto+. However, for people who have
|
|
|
-never used \LaTeX{} or who don't know the English name of the command, it can
|
|
|
-be difficult to find the right command. Hence the automatic recognition of
|
|
|
-handwritten mathematical symbols is desirable. This paper presents a system
|
|
|
+
|
|
|
+The automatic recognition of single handwritten symbols has three main
|
|
|
+applications. The first application is to support users who know how a symbol
|
|
|
+looks like, but not what its name is such as $\saturn$. The second application
|
|
|
+is providing the necessary commands for professional publishing in books or on
|
|
|
+websites, e.g. in form of \LaTeX{} commands, as MathML, or as code points. The
|
|
|
+third application of single symbol classifiers is in form of a building block
|
|
|
+for formula recognition.
|
|
|
+
|
|
|
+This paper presents a system
|
|
|
which uses the pen trajectory to classify handwritten symbols. Five
|
|
|
preprocessing steps, one data multiplication algorithm, five features and five
|
|
|
variants for multilayer Perceptron training were evaluated using $\num{166898}$
|
|
|
recordings which were collected with two crowdsourcing projects. The evaluation
|
|
|
results of these 21~experiments were used to create an optimized recognizer
|
|
|
which has a TOP-1 error of less than $\SI{17.5}{\percent}$ and a TOP-3 error of
|
|
|
-$\SI{4.0}{\percent}$. This is a relative improvement of $\SI{18.5}{\percent}$ for the
|
|
|
-TOP-1 error and $\SI{29.7}{\percent}$ for the TOP-3 error compared to the
|
|
|
-baseline system.
|
|
|
+$\SI{4.0}{\percent}$. This is a relative improvement of $\SI{18.5}{\percent}$
|
|
|
+for the TOP-1 error and $\SI{29.7}{\percent}$ for the TOP-3 error compared to
|
|
|
+the baseline system.
|
|
|
\end{abstract}
|
|
|
|
|
|
\section{Introduction}
|
|
|
On-line recognition makes use of the pen trajectory. This means the data is
|
|
|
given as groups of sequences of tuples $(x, y, t) \in \mathbb{R}^3$, where each
|
|
|
group represents a stroke, $(x, y)$ is the position of the pen on a canvas and
|
|
|
-$t$ is the time. One handwritten symbol in the described format is called a
|
|
|
-\textit{recording}. One approach to classify recordings into symbol classes
|
|
|
-assigns a probability to each class given the data. The classifier can be
|
|
|
-evaluated by using recordings which were classified by humans and were not used
|
|
|
-to train the classifier. The set of those recordings is called \textit{test
|
|
|
-set}. The TOP-$n$ error is defined as the fraction of the symbols where
|
|
|
-the correct class was not within the top $n$ classes of the highest
|
|
|
-probability.
|
|
|
+$t$ is the time.
|
|
|
+
|
|
|
+On-line data was used to classify handwritten natural language text in many
|
|
|
+different variants. For example, the NPen++ system classified cursive
|
|
|
+handwriting into English words by using hidden Markov models and neural
|
|
|
+networks\cite{Manke1995}.
|
|
|
+
|
|
|
+% One handwritten symbol in the described format is called a
|
|
|
+% \textit{recording}. One approach to classify recordings into symbol classes
|
|
|
+% assigns a probability to each class given the data. The classifier can be
|
|
|
+% evaluated by using recordings which were classified by humans and were not used
|
|
|
+% to train the classifier. The set of those recordings is called \textit{test
|
|
|
+% set}. The TOP-$n$ error is defined as the fraction of the symbols where
|
|
|
+% the correct class was not within the top $n$ classes of the highest
|
|
|
+% probability.
|
|
|
|
|
|
Several systems for mathematical symbol recognition with on-line data have been
|
|
|
-described so far~\cite{Kosmala98,Mouchere2013}, but most of them have neither
|
|
|
-published their source code nor their data which makes it impossible to re-run
|
|
|
-experiments to compare different systems. This is unfortunate as the choice of
|
|
|
-symbols is crucial for the TOP-$n$ error and all systems used different symbol
|
|
|
-sets. For example, the symbols $o$, $O$, $\circ$ and $0$ are very similar and
|
|
|
-systems which know all those classes will certainly have a higher TOP-$n$ error
|
|
|
-than systems which only accept one of them.
|
|
|
+described so far~\cite{Kosmala98,Mouchere2013}, but no standard test set
|
|
|
+existed to compare the results of different classifiers. The used symbols
|
|
|
+differed in all papers. This is unfortunate as the choice of symbols is crucial
|
|
|
+for the TOP-$n$ error. For example, the symbols $o$, $O$, $\circ$ and $0$ are
|
|
|
+very similar and systems which know all those classes will certainly have a
|
|
|
+higher TOP-$n$ error than systems which only accept one of them. But not only
|
|
|
+the classes differed, also the used data to train and test had to be collected
|
|
|
+by each author again.
|
|
|
|
|
|
Daniel Kirsch describes in~\cite{Kirsch} a system called Detexify which uses
|
|
|
-time warping to classify on-line handwritten symbols and reports a
|
|
|
-TOP-3 error of less than $\SI{10}{\percent}$ for a set of $\num{100}$~symbols.
|
|
|
-He also published his data on \url{https://github.com/kirel/detexify-data},
|
|
|
+time warping to classify on-line handwritten symbols and reports a TOP-3 error
|
|
|
+of less than $\SI{10}{\percent}$ for a set of $\num{100}$~symbols. He did also
|
|
|
+recently publish his data on \url{https://github.com/kirel/detexify-data},
|
|
|
which was collected by a crowdsourcing approach via
|
|
|
\url{http://detexify.kirelabs.org}. Those recordings as well as some recordings
|
|
|
which were collected by a similar approach via \url{http://write-math.com} were
|
|
|
used to train and evaluated different classifiers. A complete description of
|
|
|
all involved software, data and experiments is given in~\cite{Thoma:2014}.
|
|
|
|
|
|
+In this paper we present a baseline system for the classification of on-line
|
|
|
+handwriting into $369$ classes of which some are very similar. An optimized
|
|
|
+classifier which has a $\SI{29.7}{\percent}$ relative improvement of the TOP-3
|
|
|
+error. This was achieved by using better features and layer-wise supervised
|
|
|
+pretraining. The absolute improvements compared to the baseline of those
|
|
|
+changes will also be shown.
|
|
|
|
|
|
-\section{Steps in Handwriting Recognition}
|
|
|
-
|
|
|
-The following steps are used for symbol classification:
|
|
|
|
|
|
+\section{Steps in Handwriting Recognition}
|
|
|
+The following steps are used for symbol classification:\nobreak
|
|
|
\begin{enumerate}
|
|
|
\item \textbf{Preprocessing}: Recorded data is never perfect. Devices have
|
|
|
errors and people make mistakes while using the devices. To tackle
|
|
@@ -108,8 +127,9 @@ The following steps are used for symbol classification:
|
|
|
recognition, this step will not be further discussed.
|
|
|
\item \textbf{Feature computation}: A feature is high-level information
|
|
|
derived from the raw data after preprocessing. Some systems like
|
|
|
- Detexify take the result of the preprocessing step, but many
|
|
|
- compute new features. This might have the advantage that less
|
|
|
+ Detexify take the result of the preprocessing step, but many compute
|
|
|
+ new features. Those features could be designed by a human engineer or
|
|
|
+ learned. Non-raw data features can have the advantage that less
|
|
|
training data is needed since the developer can use knowledge about
|
|
|
handwriting to compute highly discriminative features. Various
|
|
|
features are explained in \cref{sec:features}.
|
|
@@ -121,8 +141,7 @@ The following steps are used for symbol classification:
|
|
|
After these steps, we are faced with a classification learning task which
|
|
|
consists of two parts:
|
|
|
\begin{enumerate}
|
|
|
- \item \textbf{Learning} parameters for a given classifier. This process is
|
|
|
- also called \textit{training}.
|
|
|
+ \item \textbf{Learning} parameters for a given classifier.
|
|
|
\item \textbf{Classifying} new recordings, sometimes called
|
|
|
\textit{evaluation}. This should not be confused with the evaluation
|
|
|
of the classification performance which is done for multiple
|
|
@@ -135,6 +154,21 @@ of input features is the same for every recording. There are many ways how to
|
|
|
adjust \glspl{MLP} and how to adjust their training. Some of them are
|
|
|
described in~\cref{sec:mlp-training}.
|
|
|
|
|
|
+
|
|
|
+\section{Data and Implementation}
|
|
|
+The combined data of Detexify and \href{http://write-math.com}{write-math.com}
|
|
|
+can be downloaded via \href{http://write-math.com/data}{write-math.com/data} as
|
|
|
+a compressed tar archive. It contains a list of $369$ symbols which are used in
|
|
|
+mathematical context. Each symbol has at least $50$ labeled examples, but most
|
|
|
+symbols have more than $200$ labeled examples and some have more than $2000$.
|
|
|
+In total, more than $\num{160000}$ labeled recordings were collected.
|
|
|
+
|
|
|
+Preprocessing and feature computation algorithms were implemented and are
|
|
|
+publicly available as open-source software in the Python package \texttt{hwrt}
|
|
|
+and \gls{MLP} algorithms are available in the Python package
|
|
|
+\texttt{nntoolkit}.
|
|
|
+
|
|
|
+
|
|
|
\section{Algorithms}
|
|
|
\subsection{Preprocessing}\label{sec:preprocessing}
|
|
|
Preprocessing in symbol recognition is done to improve the quality and
|
|
@@ -485,7 +519,7 @@ this improved the classifiers again.
|
|
|
\end{table}
|
|
|
|
|
|
|
|
|
-\section{Conclusion}
|
|
|
+\section{Discussion}
|
|
|
Four baseline recognition systems were adjusted in many experiments and their
|
|
|
recognition capabilities were compared in order to build a recognition system
|
|
|
that can recognize 396 mathematical symbols with low error rates as well as to
|