123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124 |
- %!TEX root = write-math-ba-paper.tex
- \section{Summary}
- Four baseline recognition systems were adjusted in many experiments and their
- recognition capabilities were compared in order to build a recognition system
- that can recognize 396 mathematical symbols with low error rates as well as to
- evaluate which preprocessing steps and features help to improve the recognition
- rate.
- All recognition systems were trained and evaluated with
- $\num{\totalCollectedRecordings{}}$ recordings for \totalClassesAnalyzed{}
- symbols. These recordings were collected by two crowdsourcing projects
- (\href{http://detexify.kirelabs.org/classify.html}{Detexify} and
- \href{write-math.com}{write-math.com}) and created with various devices. While
- some recordings were created with standard touch devices such as tablets and
- smartphones, others were created with the mouse.
- \Glspl{MLP} were used for the classification task. Four baseline systems with
- different numbers of hidden layers were used, as the number of hidden layer
- influences the capabilities and problems of \glspl{MLP}.
- All baseline systems used the same preprocessing queue. The recordings were
- scaled and shifted as described in \ref{sec:preprocessing}, resampled with
- linear interpolation so that every stroke had exactly 20~points which are
- spread equidistant in time. The 80~($x,y$) coordinates of the first 4~strokes
- were used to get exactly $160$ input features for every recording. The baseline
- system $B_{hl=2}$ has a top-3 error of $\SI{5.7}{\percent}$.
- Adding two slightly rotated variants for each recording and hence tripling the
- training set made the systems $B_{hl=3}$ and $B_{hl=4}$ perform much worse, but
- improved the performance of the smaller systems.
- The global features re-curvature, ink, stoke count and aspect ratio improved
- the systems $B_{hl=1}$--$B_{hl=3}$, whereas the stroke center point feature
- made $B_{hl=2}$ perform worse.
- Denoising auto-encoders were evaluated as one way to use pretraining, but by
- this the error rate increased notably. However, \acrlong{SLP} improved the
- performance decidedly.
- The stroke connection algorithm was added to the preprocessing steps of the
- baseline system as well as the re-curvature feature, the ink feature, the
- number of strokes and the aspect ratio. The training setup of the baseline
- system was changed to \acrlong{SLP} and the resulting model was trained with a
- lower learning rate again. This optimized recognizer $B_{hl=2,c}'$ had a top-3
- error of $\SI{4.0}{\percent}$. This means that the top-3 error dropped by over
- $\num{1.7}$ percentage points in comparison to the baseline system $B_{hl=2}$.
- A top-3 error of $\SI{4.0}{\percent}$ makes the system usable for symbol
- lookup. It could also be used as a starting point for the development of a
- multiple-symbol classifier.
- The aim of this work was to develop a symbol recognition system which is easy
- to use, fast and has high recognition rates as well as evaluating ideas for
- single symbol classifiers. Some of those goals were reached. The recognition
- system $B_{hl=2,c}'$ evaluates new recordings in a fraction of a second and has
- acceptable recognition rates.
- % Many algorithms were evaluated. However, there are still many other
- % algorithms which could be evaluated and, at the time of this work, the best
- % classifier $B_{hl=2,c}'$ is only available through the Python package
- % \texttt{hwrt}. It is planned to add an web version of that classifier online.
- \section{Optimized Recognizer}
- All preprocessing steps and features that were useful were combined to create a
- recognizer that performs best.
- All models were much better than everything that was tried before. The results
- of this experiment show that single-symbol recognition with
- \totalClassesAnalyzed{} classes and usual touch devices and the mouse can be
- done with a top-1 error rate of $\SI{18.6}{\percent}$ and a top-3 error of
- $\SI{4.1}{\percent}$. This was
- achieved by a \gls{MLP} with a $167{:}500{:}500{:}\totalClassesAnalyzed{}$ topology.
- It used the stroke connection algorithm to connect of which the ends were less
- than $\SI{10}{\pixel}$ away, scaled each recording to a unit square and shifted
- as described in \ref{sec:preprocessing}. After that, a linear resampling step
- was applied to the first 4 strokes to resample them to 20 points each. All
- other strokes were discarded.
- \goodbreak
- The 167 features were\mynobreakpar%
- \begin{itemize}
- \item the first 4 strokes with 20 points per stroke resulting in 160
- features,
- \item the re-curvature for the first 4 strokes,
- \item the ink,
- \item the number of strokes and
- \item the aspect ratio of the bounding box
- \end{itemize}
- \Gls{SLP} was applied with $\num{1000}$ epochs per layer, a
- learning rate of $\eta=0.1$ and a momentum of $\alpha=0.1$. After that, the
- complete model was trained again for $1000$ epochs with standard mini-batch
- gradient descent resulting in systems $B_{hl=1,c}'$ -- $B_{hl=4,c}'$.
- After the models $B_{hl=1,c}$ -- $B_{hl=4,c}$ were trained the first $1000$ epochs,
- they were trained again for $\num{1000}$ epochs with a learning rate of $\eta =
- 0.05$. \Cref{table:complex-recognizer-systems-evaluation} shows that
- this improved the classifiers again.
- \begin{table}[htb]
- \centering
- \begin{tabular}{lrrrr}
- \toprule
- \multirow{2}{*}{System} & \multicolumn{4}{c}{Classification error}\\
- \cmidrule(l){2-5}
- & Top-1 & Change & Top-3 & Change\\\midrule
- $B_{hl=1,c}$ & $\SI{21.0}{\percent}$ & $\SI{-2.2}{\percent}$ & $\SI{5.2}{\percent}$ & $\SI{-1.5}{\percent}$\\
- $B_{hl=2,c}$ & $\SI{18.3}{\percent}$ & $\SI{-3.3}{\percent}$ & $\SI{4.1}{\percent}$ & $\SI{-1.6}{\percent}$\\
- $B_{hl=3,c}$ & \underline{$\SI{18.2}{\percent}$} & $\SI{-3.7}{\percent}$ & \underline{$\SI{4.1}{\percent}$} & $\SI{-1.6}{\percent}$\\
- $B_{hl=4,c}$ & $\SI{18.6}{\percent}$ & $\SI{-5.3}{\percent}$ & $\SI{4.3}{\percent}$ & $\SI{-1.9}{\percent}$\\\midrule
- $B_{hl=1,c}'$ & $\SI{19.3}{\percent}$ & $\SI{-3.9}{\percent}$ & $\SI{4.8}{\percent}$ & $\SI{-1.9}{\percent}$ \\
- $B_{hl=2,c}'$ & \underline{$\SI{17.5}{\percent}$} & $\SI{-4.1}{\percent}$ & \underline{$\SI{4.0}{\percent}$} & $\SI{-1.7}{\percent}$\\
- $B_{hl=3,c}'$ & $\SI{17.7}{\percent}$ & $\SI{-4.2}{\percent}$ & $\SI{4.1}{\percent}$ & $\SI{-1.6}{\percent}$\\
- $B_{hl=4,c}'$ & $\SI{17.8}{\percent}$ & $\SI{-6.1}{\percent}$ & $\SI{4.3}{\percent}$ & $\SI{-1.9}{\percent}$\\
- \bottomrule
- \end{tabular}
- \caption{Error rates of the optimized recognizer systems. The systems
- $B_{hl=i,c}'$ were trained another $\num{1000}$ epochs with a learning rate
- of $\eta=0.05$.}
- \label{table:complex-recognizer-systems-evaluation}
- \end{table}
|