radu
/
LaTeX-examples


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124
							%!TEX root = write-math-ba-paper.tex

\section{Summary}
Four baseline recognition systems were adjusted in many experiments and their
recognition capabilities were compared in order to build a recognition system
that can recognize 396 mathematical symbols with low error rates as well as to
evaluate which preprocessing steps and features help to improve the recognition
rate.

All recognition systems were trained and evaluated with
$\num{\totalCollectedRecordings{}}$ recordings for \totalClassesAnalyzed{}
symbols. These recordings were collected by two crowdsourcing projects
(\href{http://detexify.kirelabs.org/classify.html}{Detexify} and
\href{write-math.com}{write-math.com}) and created with various devices. While
some recordings were created with standard touch devices such as tablets and
smartphones, others were created with the mouse.

\Glspl{MLP} were used for the classification task. Four baseline systems with
different numbers of hidden layers were used, as the number of hidden layer
influences the capabilities and problems of \glspl{MLP}.

All baseline systems used the same preprocessing queue. The recordings were
scaled and shifted as described in \ref{sec:preprocessing}, resampled with
linear interpolation so that every stroke had exactly 20~points which are
spread equidistant in time. The 80~($x,y$) coordinates of the first 4~strokes
were used to get exactly $160$ input features for every recording. The baseline
system $B_{hl=2}$ has a top-3 error of $\SI{5.7}{\percent}$.

Adding two slightly rotated variants for each recording and hence tripling the
training set made the systems $B_{hl=3}$ and $B_{hl=4}$ perform much worse, but
improved the performance of the smaller systems.

The global features re-curvature, ink, stoke count and aspect ratio improved
the systems $B_{hl=1}$--$B_{hl=3}$, whereas the stroke center point feature
made $B_{hl=2}$ perform worse.

Denoising auto-encoders were evaluated as one way to use pretraining, but by
this the error rate increased notably. However, \acrlong{SLP} improved the
performance decidedly.

The stroke connection algorithm was added to the preprocessing steps of the
baseline system as well as the re-curvature feature, the ink feature, the
number of strokes and the aspect ratio. The training setup of the baseline
system was changed to \acrlong{SLP} and the resulting model was trained with a
lower learning rate again. This optimized recognizer $B_{hl=2,c}'$ had a top-3
error of $\SI{4.0}{\percent}$. This means that the top-3 error dropped by over
$\num{1.7}$ percentage points in comparison to the baseline system $B_{hl=2}$.

A top-3 error of $\SI{4.0}{\percent}$ makes the system usable for symbol
lookup. It could also be used as a starting point for the development of a
multiple-symbol classifier.

The aim of this work was to develop a symbol recognition system which is easy
to use, fast and has high recognition rates as well as evaluating ideas for
single symbol classifiers. Some of those goals were reached. The recognition
system $B_{hl=2,c}'$ evaluates new recordings in a fraction of a second and has
acceptable recognition rates.

% Many algorithms were evaluated. However, there are still many other
% algorithms which could be evaluated and, at the time of this work, the best
% classifier $B_{hl=2,c}'$ is only available through the Python package
% \texttt{hwrt}. It is planned to add an web version of that classifier online.

\section{Optimized Recognizer}
All preprocessing steps and features that were useful were combined to create a
recognizer that performs best.

All models were much better than everything that was tried before. The results
of this experiment show that single-symbol recognition with
\totalClassesAnalyzed{} classes and usual touch devices and the mouse can be
done with a top-1 error rate of $\SI{18.6}{\percent}$ and a top-3 error of
$\SI{4.1}{\percent}$. This was
achieved by a \gls{MLP} with a $167{:}500{:}500{:}\totalClassesAnalyzed{}$ topology.

It used the stroke connection algorithm to connect of which the ends were less
than $\SI{10}{\pixel}$ away, scaled each recording to a unit square and shifted
as described in \ref{sec:preprocessing}. After that, a linear resampling step
was applied to the first 4 strokes to resample them to 20 points each. All
other strokes were discarded.

\goodbreak
The 167 features were\mynobreakpar%
\begin{itemize}
     \item the first 4 strokes with 20 points per stroke resulting in 160
           features,
     \item the re-curvature for the first 4 strokes,
     \item the ink,
     \item the number of strokes and
     \item the aspect ratio of the bounding box
\end{itemize}

\Gls{SLP} was applied with $\num{1000}$ epochs per layer, a
learning rate of $\eta=0.1$ and a momentum of $\alpha=0.1$. After that, the
complete model was trained again for $1000$ epochs with standard mini-batch
gradient descent resulting in systems $B_{hl=1,c}'$ -- $B_{hl=4,c}'$.

After the models $B_{hl=1,c}$ -- $B_{hl=4,c}$ were trained the first $1000$ epochs,
they were trained again for $\num{1000}$ epochs with a learning rate of $\eta =
0.05$. \Cref{table:complex-recognizer-systems-evaluation} shows that
this improved the classifiers again.

\begin{table}[htb]
    \centering
    \begin{tabular}{lrrrr}
    \toprule
    \multirow{2}{*}{System}  & \multicolumn{4}{c}{Classification error}\\
    \cmidrule(l){2-5}
              & Top-1                 & Change                & Top-3                & Change\\\midrule
    $B_{hl=1,c}$ & $\SI{21.0}{\percent}$ & $\SI{-2.2}{\percent}$ & $\SI{5.2}{\percent}$ & $\SI{-1.5}{\percent}$\\
    $B_{hl=2,c}$ & $\SI{18.3}{\percent}$ & $\SI{-3.3}{\percent}$ & $\SI{4.1}{\percent}$ & $\SI{-1.6}{\percent}$\\
    $B_{hl=3,c}$ & \underline{$\SI{18.2}{\percent}$} & $\SI{-3.7}{\percent}$ & \underline{$\SI{4.1}{\percent}$} & $\SI{-1.6}{\percent}$\\
    $B_{hl=4,c}$ & $\SI{18.6}{\percent}$ & $\SI{-5.3}{\percent}$ & $\SI{4.3}{\percent}$ & $\SI{-1.9}{\percent}$\\\midrule
    $B_{hl=1,c}'$ & $\SI{19.3}{\percent}$ & $\SI{-3.9}{\percent}$ & $\SI{4.8}{\percent}$ & $\SI{-1.9}{\percent}$ \\
    $B_{hl=2,c}'$ & \underline{$\SI{17.5}{\percent}$} & $\SI{-4.1}{\percent}$ & \underline{$\SI{4.0}{\percent}$} & $\SI{-1.7}{\percent}$\\
    $B_{hl=3,c}'$ & $\SI{17.7}{\percent}$ & $\SI{-4.2}{\percent}$ & $\SI{4.1}{\percent}$ & $\SI{-1.6}{\percent}$\\
    $B_{hl=4,c}'$ & $\SI{17.8}{\percent}$ & $\SI{-6.1}{\percent}$ & $\SI{4.3}{\percent}$ & $\SI{-1.9}{\percent}$\\
    \bottomrule
    \end{tabular}
    \caption{Error rates of the optimized recognizer systems. The systems
             $B_{hl=i,c}'$ were trained another $\num{1000}$ epochs with a learning rate
             of $\eta=0.05$.}
\label{table:complex-recognizer-systems-evaluation}
\end{table}