ch1-introduction.tex 2.7 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546
  1. %!TEX root = write-math-ba-paper.tex
  2. \section{Introduction}
  3. On-line recognition makes use of the pen trajectory. One possible
  4. representation of the data is given as groups of sequences of tuples $(x, y, t)
  5. \in \mathbb{R}^3$, where each group represents a stroke, $(x, y)$ is the
  6. position of the pen on a canvas and $t$ is the time.
  7. % On-line data was used to classify handwritten natural language text in many
  8. % different variants. For example, the $\text{NPen}^{++}$ system classified
  9. % cursive handwriting into English words by using hidden Markov models and neural
  10. % networks~\cite{Manke1995}.
  11. % Several systems for mathematical symbol recognition with on-line data have been
  12. % described so far~\cite{Kosmala98,Mouchere2013}, but no standard test set
  13. % existed to compare the results of different classifiers for single-symbol
  14. % classification of mathematical symbols. The used symbols differed in most
  15. % papers. This is unfortunate as the choice of symbols is crucial for the top-$n$
  16. % error. For example, the symbols $o$, $O$, $\circ$ and $0$ are very similar and
  17. % systems which know all those classes will certainly have a higher top-$n$ error
  18. % than systems which only accept one of them. But not only the classes differed,
  19. % also the used data to train and test had to be collected by each author again.
  20. \cite{Kirsch}~describes a system called Detexify which uses
  21. time warping to classify on-line handwritten symbols and reports a top-3 error
  22. of less than $\SI{10}{\percent}$ for a set of $\num{100}$~symbols. He did also
  23. recently publish his data on \url{https://github.com/kirel/detexify-data},
  24. which was collected by a crowdsourcing approach via
  25. \url{http://detexify.kirelabs.org}. Those recordings as well as some recordings
  26. which were collected by a similar approach via \url{http://write-math.com} were
  27. merged in a single data set, the labels were semi-automatically checked for
  28. correctness and used to train and evaluated different classifiers. A more
  29. detailed description of all used software, data and experiments is given
  30. in~\cite{Thoma:2014}.
  31. In this paper we present a baseline system for the classification of on-line
  32. handwriting into $369$ classes of which some are very similar. An optimized
  33. classifier was developed which has a $\SI{29.7}{\percent}$ relative improvement
  34. of the top-3 error. This was achieved by using better features and \gls{SLP}.
  35. The absolute improvements compared to the baseline of those changes will also
  36. be shown.
  37. In the following, we will give a general overview of the system design, give
  38. information about the used data and implementation, describe the algorithms
  39. we used to classify the data, report results of our experiments and present
  40. the optimized recognizer we created.