ch4-algorithms.tex 6.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
  1. %!TEX root = write-math-ba-paper.tex
  2. \section{Algorithms}
  3. \subsection{Preprocessing}\label{sec:preprocessing}
  4. Preprocessing in symbol recognition is done to improve the quality and
  5. expressive power of the data. It makes follow-up tasks like feature extraction
  6. and classification easier, more effective or faster. It does so by resolving
  7. errors in the input data, reducing duplicate information and removing
  8. irrelevant information.
  9. Preprocessing algorithms fall into two groups: Normalization and noise
  10. reduction algorithms.
  11. A very important normalization algorithm in single-symbol recognition is
  12. \textit{scale-and-shift}~\cite{Thoma:2014}. It scales the recording so that
  13. its bounding box fits into a unit square. As the aspect ratio of a recording is
  14. almost never 1:1, only one dimension will fit exactly in the unit square. For
  15. this paper, it was chosen to shift the recording in the direction of its bigger
  16. dimension into the $[0,1] \times [0,1]$ unit square. After that, the recording
  17. is shifted in direction of its smaller dimension such that its bounding box is
  18. centered around zero.
  19. Another normalization preprocessing algorithm is
  20. resampling~\cite{Guyon91,Manke01}. As the data points on the pen trajectory are
  21. generated asynchronously and with different time-resolutions depending on the
  22. used hardware and software, it is desirable to resample the recordings to have
  23. points spread equally in time for every recording. This was done by linear
  24. interpolation of the $(x,t)$ and $(y,t)$ sequences and getting a fixed number
  25. of equally spaced points per stroke.
  26. \textit{Stroke connection} is a noise reduction algorithm which is mentioned
  27. in~\cite{Tappert90}. It happens sometimes that the hardware detects that the
  28. user lifted the pen where the user certainly didn't do so. This can be detected
  29. by measuring the Euclidean distance between the end of one stroke and the
  30. beginning of the next stroke. If this distance is below a threshold, then the
  31. strokes are connected.
  32. Due to a limited resolution of the recording device and due to erratic
  33. handwriting, the pen trajectory might not be smooth. One way to smooth is
  34. calculating a weighted average and replacing points by the weighted average of
  35. their coordinate and their neighbors coordinates. Another way to do smoothing
  36. is to reduce the number of points with the Douglas-Peucker
  37. algorithm to the points that are more relevant for the
  38. overall shape of a stroke and then interpolate the stroke between those points.
  39. The Douglas-Peucker stroke simplification algorithm is usually used in
  40. cartography to simplify the shape of roads. It works recursively to find a
  41. subset of points of a stroke that is simpler and still similar to the original
  42. shape. The algorithm adds the first and the last point $p_1$ and $p_n$ of a
  43. stroke to the simplified set of points $S$. Then it searches the point $p_i$ in
  44. between that has maximum distance from the line $p_1 p_n$. If this distance is
  45. above a threshold $\varepsilon$, the point $p_i$ is added to $S$. Then the
  46. algorithm gets applied to $p_1 p_i$ and $p_i p_n$ recursively. It is described
  47. as \enquote{Algorithm 1} in~\cite{Visvalingam1990}.
  48. \subsection{Features}\label{sec:features}
  49. Features can be \textit{global}, that means calculated for the complete
  50. recording or complete strokes. Other features are calculated for single points
  51. on the pen trajectory and are called \textit{local}.
  52. Global features are the \textit{number of strokes} in a recording, the
  53. \textit{aspect ratio} of a recordings bounding box or the
  54. \textit{ink} being used for a recording. The ink feature gets calculated by
  55. measuring the length of all strokes combined. The re-curvature, which was
  56. introduced in~\cite{Huang06}, is defined as
  57. \[\text{re-curvature}(stroke) := \frac{\text{height}(stroke)}{\text{length}(stroke)}\]
  58. and a stroke-global feature.
  59. The simplest local feature is the coordinate of the point itself. Speed,
  60. curvature and a local small-resolution bitmap around the point, which was
  61. introduced by Manke, Finke and Waibel in~\cite{Manke1995}, are other local
  62. features.
  63. \subsection{Multilayer Perceptrons}\label{sec:mlp-training}
  64. \Glspl{MLP} are explained in detail in~\cite{Mitchell97}. They can have
  65. different numbers of hidden layers, the number of neurons per layer and the
  66. activation functions can be varied. The learning algorithm is parameterized by
  67. the learning rate $\eta \in (0, \infty)$, the momentum $\alpha \in [0, \infty)$
  68. and the number of epochs.
  69. The topology of \glspl{MLP} will be denoted in the following by separating the
  70. number of neurons per layer with colons. For example, the notation
  71. $160{:}500{:}500{:}500{:}369$ means that the input layer gets 160~features,
  72. there are three hidden layers with 500~neurons per layer and one output layer
  73. with 369~neurons.
  74. \glspl{MLP} training can be executed in various different ways, for example
  75. with \acrfull{SLP}. In case of a \gls{MLP} with the topology
  76. $160{:}500{:}500{:}500{:}369$, \gls{SLP} works as follows: At first a \gls{MLP}
  77. with one hidden layer ($160{:}500{:}369$) is trained. Then the output layer is
  78. discarded, a new hidden layer and a new output layer is added and it is trained
  79. again, resulting in a $160{:}500{:}500{:}369$ \gls{MLP}. The output layer is
  80. discarded again, a new hidden layer is added and a new output layer is added
  81. and the training is executed again.
  82. Denoising auto-encoders are another way of pretraining. An
  83. \textit{auto-encoder} is a neural network that is trained to restore its input.
  84. This means the number of input neurons is equal to the number of output
  85. neurons. The weights define an \textit{encoding} of the input that allows
  86. restoring the input. As the neural network finds the encoding by itself, it is
  87. called auto-encoder. If the hidden layer is smaller than the input layer, it
  88. can be used for dimensionality reduction~\cite{Hinton1989}. If only one hidden
  89. layer with linear activation functions is used, then the hidden layer contains
  90. the principal components after training~\cite{Duda2001}.
  91. Denoising auto-encoders are a variant introduced in~\cite{Vincent2008} that
  92. is more robust to partial corruption of the input features. It is trained to
  93. get robust by adding noise to the input features.
  94. There are multiple ways how noise can be added. Gaussian noise and randomly
  95. masking elements with zero are two possibilities.
  96. \cite{Deeplearning-Denoising-AE} describes how such a denoising auto-encoder
  97. with masking noise can be implemented. The corruption $\varkappa \in [0, 1)$ is
  98. the probability of a feature being masked.