ch6-summary.tex 6.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124
  1. %!TEX root = write-math-ba-paper.tex
  2. \section{Summary}
  3. Four baseline recognition systems were adjusted in many experiments and their
  4. recognition capabilities were compared in order to build a recognition system
  5. that can recognize 396 mathematical symbols with low error rates as well as to
  6. evaluate which preprocessing steps and features help to improve the recognition
  7. rate.
  8. All recognition systems were trained and evaluated with
  9. $\num{\totalCollectedRecordings{}}$ recordings for \totalClassesAnalyzed{}
  10. symbols. These recordings were collected by two crowdsourcing projects
  11. (\href{http://detexify.kirelabs.org/classify.html}{Detexify} and
  12. \href{write-math.com}{write-math.com}) and created with various devices. While
  13. some recordings were created with standard touch devices such as tablets and
  14. smartphones, others were created with the mouse.
  15. \Glspl{MLP} were used for the classification task. Four baseline systems with
  16. different numbers of hidden layers were used, as the number of hidden layer
  17. influences the capabilities and problems of \glspl{MLP}.
  18. All baseline systems used the same preprocessing queue. The recordings were
  19. scaled and shifted as described in \ref{sec:preprocessing}, resampled with
  20. linear interpolation so that every stroke had exactly 20~points which are
  21. spread equidistant in time. The 80~($x,y$) coordinates of the first 4~strokes
  22. were used to get exactly $160$ input features for every recording. The baseline
  23. system $B_{hl=2}$ has a top-3 error of $\SI{5.7}{\percent}$.
  24. Adding two slightly rotated variants for each recording and hence tripling the
  25. training set made the systems $B_{hl=3}$ and $B_{hl=4}$ perform much worse, but
  26. improved the performance of the smaller systems.
  27. The global features re-curvature, ink, stoke count and aspect ratio improved
  28. the systems $B_{hl=1}$--$B_{hl=3}$, whereas the stroke center point feature
  29. made $B_{hl=2}$ perform worse.
  30. Denoising auto-encoders were evaluated as one way to use pretraining, but by
  31. this the error rate increased notably. However, \acrlong{SLP} improved the
  32. performance decidedly.
  33. The stroke connection algorithm was added to the preprocessing steps of the
  34. baseline system as well as the re-curvature feature, the ink feature, the
  35. number of strokes and the aspect ratio. The training setup of the baseline
  36. system was changed to \acrlong{SLP} and the resulting model was trained with a
  37. lower learning rate again. This optimized recognizer $B_{hl=2,c}'$ had a top-3
  38. error of $\SI{4.0}{\percent}$. This means that the top-3 error dropped by over
  39. $\num{1.7}$ percentage points in comparison to the baseline system $B_{hl=2}$.
  40. A top-3 error of $\SI{4.0}{\percent}$ makes the system usable for symbol
  41. lookup. It could also be used as a starting point for the development of a
  42. multiple-symbol classifier.
  43. The aim of this work was to develop a symbol recognition system which is easy
  44. to use, fast and has high recognition rates as well as evaluating ideas for
  45. single symbol classifiers. Some of those goals were reached. The recognition
  46. system $B_{hl=2,c}'$ evaluates new recordings in a fraction of a second and has
  47. acceptable recognition rates.
  48. % Many algorithms were evaluated. However, there are still many other
  49. % algorithms which could be evaluated and, at the time of this work, the best
  50. % classifier $B_{hl=2,c}'$ is only available through the Python package
  51. % \texttt{hwrt}. It is planned to add an web version of that classifier online.
  52. \section{Optimized Recognizer}
  53. All preprocessing steps and features that were useful were combined to create a
  54. recognizer that performs best.
  55. All models were much better than everything that was tried before. The results
  56. of this experiment show that single-symbol recognition with
  57. \totalClassesAnalyzed{} classes and usual touch devices and the mouse can be
  58. done with a top-1 error rate of $\SI{18.6}{\percent}$ and a top-3 error of
  59. $\SI{4.1}{\percent}$. This was
  60. achieved by a \gls{MLP} with a $167{:}500{:}500{:}\totalClassesAnalyzed{}$ topology.
  61. It used the stroke connection algorithm to connect of which the ends were less
  62. than $\SI{10}{\pixel}$ away, scaled each recording to a unit square and shifted
  63. as described in \ref{sec:preprocessing}. After that, a linear resampling step
  64. was applied to the first 4 strokes to resample them to 20 points each. All
  65. other strokes were discarded.
  66. \goodbreak
  67. The 167 features were\mynobreakpar%
  68. \begin{itemize}
  69. \item the first 4 strokes with 20 points per stroke resulting in 160
  70. features,
  71. \item the re-curvature for the first 4 strokes,
  72. \item the ink,
  73. \item the number of strokes and
  74. \item the aspect ratio of the bounding box
  75. \end{itemize}
  76. \Gls{SLP} was applied with $\num{1000}$ epochs per layer, a
  77. learning rate of $\eta=0.1$ and a momentum of $\alpha=0.1$. After that, the
  78. complete model was trained again for $1000$ epochs with standard mini-batch
  79. gradient descent resulting in systems $B_{hl=1,c}'$ -- $B_{hl=4,c}'$.
  80. After the models $B_{hl=1,c}$ -- $B_{hl=4,c}$ were trained the first $1000$ epochs,
  81. they were trained again for $\num{1000}$ epochs with a learning rate of $\eta =
  82. 0.05$. \Cref{table:complex-recognizer-systems-evaluation} shows that
  83. this improved the classifiers again.
  84. \begin{table}[htb]
  85. \centering
  86. \begin{tabular}{lrrrr}
  87. \toprule
  88. \multirow{2}{*}{System} & \multicolumn{4}{c}{Classification error}\\
  89. \cmidrule(l){2-5}
  90. & Top-1 & Change & Top-3 & Change\\\midrule
  91. $B_{hl=1,c}$ & $\SI{21.0}{\percent}$ & $\SI{-2.2}{\percent}$ & $\SI{5.2}{\percent}$ & $\SI{-1.5}{\percent}$\\
  92. $B_{hl=2,c}$ & $\SI{18.3}{\percent}$ & $\SI{-3.3}{\percent}$ & $\SI{4.1}{\percent}$ & $\SI{-1.6}{\percent}$\\
  93. $B_{hl=3,c}$ & \underline{$\SI{18.2}{\percent}$} & $\SI{-3.7}{\percent}$ & \underline{$\SI{4.1}{\percent}$} & $\SI{-1.6}{\percent}$\\
  94. $B_{hl=4,c}$ & $\SI{18.6}{\percent}$ & $\SI{-5.3}{\percent}$ & $\SI{4.3}{\percent}$ & $\SI{-1.9}{\percent}$\\\midrule
  95. $B_{hl=1,c}'$ & $\SI{19.3}{\percent}$ & $\SI{-3.9}{\percent}$ & $\SI{4.8}{\percent}$ & $\SI{-1.9}{\percent}$ \\
  96. $B_{hl=2,c}'$ & \underline{$\SI{17.5}{\percent}$} & $\SI{-4.1}{\percent}$ & \underline{$\SI{4.0}{\percent}$} & $\SI{-1.7}{\percent}$\\
  97. $B_{hl=3,c}'$ & $\SI{17.7}{\percent}$ & $\SI{-4.2}{\percent}$ & $\SI{4.1}{\percent}$ & $\SI{-1.6}{\percent}$\\
  98. $B_{hl=4,c}'$ & $\SI{17.8}{\percent}$ & $\SI{-6.1}{\percent}$ & $\SI{4.3}{\percent}$ & $\SI{-1.9}{\percent}$\\
  99. \bottomrule
  100. \end{tabular}
  101. \caption{Error rates of the optimized recognizer systems. The systems
  102. $B_{hl=i,c}'$ were trained another $\num{1000}$ epochs with a learning rate
  103. of $\eta=0.05$.}
  104. \label{table:complex-recognizer-systems-evaluation}
  105. \end{table}