content.tex 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438
  1. %!TEX root = main.tex
  2. \section{Introduction}
  3. Publicly available datasets have helped the computer vision community to
  4. compare new algorithms and develop applications. Especially
  5. MNIST~\cite{LeNet-5} was used thousands of times to train and evaluate models
  6. for classification. However, even rather simple models consistently get about
  7. $\SI{99.2}{\percent}$ accuracy on MNIST~\cite{TF-MNIST-2016}. The best models
  8. classify everything except for about 20~instances correct. This makes
  9. meaningful statements about improvements in classifiers hard. Possible reason
  10. why current models are so good on MNIST are
  11. \begin{enumerate*}
  12. \item MNIST has only 10~classes
  13. \item there are very few (probably none) labeling errors in MNIST
  14. \item every class has \num{6000}~training samples
  15. \item the feature dimensionality is comparatively low.
  16. \end{enumerate*}
  17. Also, applications which need to recognize only Arabic numerals are rare.
  18. Similar to MNIST, \dbName{} is of very low resolution. In contrast to MNIST,
  19. the \dbNameVersion~dataset contains \dbTotalClasses~classes, including Arabic
  20. numerals and Latin characters. Furthermore, \dbNameVersion{} has much less
  21. recordings per class than MNIST and is only in black and white whereas
  22. MNIST is in grayscale.
  23. \dbName{} could be used to train models for semantic segmentation of
  24. non-cursive handwritten documents like mathematical notes or forms.
  25. \section{Terminology}
  26. A \textit{symbol} is an atomic semantic entity which has exactly one visual
  27. appearance when it is handwritten. Examples of symbols are:
  28. $\alpha, \propto, \cdot, x, \int, \sigma, \dots$
  29. %\footnote{The first symbol is an \verb+\alpha+, the second one is a \verb+\propto+.}
  30. While a symbol is a single semantic entity with a given visual appearance, a
  31. glyph is a single typesetting entity. Symbols, glyphs and \LaTeX{} commands do
  32. not relate:
  33. \begin{itemize}
  34. \item Two different symbols can have the same glyph. For example, the symbols
  35. \verb+\sum+ and \verb+\Sigma+ both render to $\Sigma$, but they have different
  36. semantics and hence they are different symbols.
  37. \item Two different glyphs can correspond to the same semantic entity. An example is
  38. \verb+\varphi+ ($\varphi$) and \verb+\phi+ ($\phi$): Both represent the small
  39. Greek letter \enquote{phi}, but they exist in two different variants. Hence
  40. \verb+\varphi+ and \verb+\phi+ are two different symbols.
  41. \item Examples for different \LaTeX{} commands that represent the same symbol are
  42. \verb+\alpha+ ($\alpha$) and \verb+\upalpha+ ($\upalpha$): Both have the same
  43. semantics and are hand-drawn the same way. This is the case for all \verb+\up+
  44. variants of Greek letters.
  45. \end{itemize}
  46. All elements of the data set are called \textit{recordings} in the following.
  47. \section{How HASY was created}
  48. \dbName{} is derived from the HWRT dataset which was first used and described
  49. in~\cite{Thoma:2014}. HWRT is an on-line recognition dataset, meaning it does
  50. not contain the handwritten symbols as images, but as point-sequences. Hence
  51. HWRT contains strictly more information than \dbName. The larger dimension
  52. of each recordings bounding box was scaled to be \SI{32}{\pixel}. The image
  53. was then centered within the $\SI{32}{\pixel} \times \SI{32}{\pixel}$ bounding
  54. box.
  55. \begin{figure}[h]
  56. \centering
  57. \includegraphics*[width=\linewidth, keepaspectratio]{figures/sample-images.png}
  58. \caption{100 recordings of the \dbNameVersion{} data set.}
  59. \label{fig:100-data-items}
  60. \end{figure}
  61. HWRT contains exactly the same recordings and classes as \dbName, but \dbName{}
  62. is rendered in order to make it easy to use.
  63. HWRT and hence \dbName{} is a merged dataset. $\SI{91.93}{\percent}$ of HWRT
  64. were collected by Detexify~\cite{Kirsch,Kirsch2014}. The remaining recordings
  65. were collected by \href{http://write-math.com}{http://write-math.com}. Both
  66. projects aim at helping users to find \LaTeX{} commands in cases where the
  67. users know how to write the symbol, but not the symbols name. The user writes
  68. the symbol on a blank canvas in the browser (either via touch devices or with a
  69. mouse). Then the websites give the Top-$k$ results which the user could have
  70. thought of. The user then clicks on the correct symbol to accept it as the
  71. correct symbol. On \href{http://write-math.com}{write-math.com}, other users
  72. can also suggest which symbol could be the correct one.
  73. After collecting the data, Martin Thoma manually inspected each recording. This
  74. manual inspection is a necessary step as anonymous web users could submit any
  75. drawing they wanted to any symbol. This includes many creative recordings as
  76. shown in~\cite{Kirsch,Thoma:2014} as well as loose associations. In some cases,
  77. the correct label was unambiguous and could be changed. In other cases, the
  78. recordings were removed from the data set.
  79. It is not possible to determine the exact number of people who contributed
  80. handwritten symbols to the Detexify part of the dataset. The part which was
  81. created with \href{http://write-math.com}{write-math.com} was created by
  82. 477~user~IDs. Although user IDs are given in the dataset, they are not
  83. reliable. On the one hand, the Detexify data has the user ID 16925,
  84. although many users contributed to it. Also, some users lend their phone to
  85. others while being logged in to show how write-math.com works. This leads to
  86. multiple users per user ID. On the other hand, some users don't register and
  87. use write-math.com multiple times. This can lead to multiple user IDs for one
  88. person.
  89. \section{Classes}
  90. The \dbNameVersion~dataset contains \dbTotalClasses~classes. Those classes include the
  91. Latin uppercase and lowercase characters (\verb+A-Z+, \verb+a-z+), the Arabic
  92. numerals (\verb+0-9+), 32~different types of arrows, fractal and calligraphic
  93. Latin characters, brackets and more. See \cref{table:symbols-of-db-0,table:symbols-of-db-1,table:symbols-of-db-2,table:symbols-of-db-3,table:symbols-of-db-4,table:symbols-of-db-5,table:symbols-of-db-6,table:symbols-of-db-7,table:symbols-of-db-8} for more information.
  94. \section{Data}
  95. The \dbNameVersion~dataset contains \dbTotalInstances{} black and white images
  96. of the size $\SI{32}{\pixel} \times \SI{32}{\pixel}$. Each image is labeled
  97. with one of \dbTotalClasses~labels. An example of 100~elements of the
  98. \dbNameVersion{} data set is shown in~\cref{fig:100-data-items}.
  99. The average amount of black pixels is \SI{16}{\percent}, but this is highly
  100. class-dependent ranging from \SI{3.7}{\percent} of \enquote{$\dotsc$} to \SI{59.2}{\percent} of \enquote{$\blacksquare$} average
  101. black pixel by class.
  102. The ten classes with most recordings are:
  103. \[\int, \sum, \infty, \alpha, \xi, \equiv, \partial, \mathds{R}, \in, \square\]
  104. Those symbols have \num{26780} recordings and thus account for
  105. \SI{16}{\percent} of the data set. 47~classes have more than \num{1000}
  106. recordings. The number of recordings of the remaining classes are distributed
  107. as visualized in~\cref{fig:class-data-distribution}.
  108. \begin{figure}[h]
  109. \centering
  110. \includegraphics*[width=\linewidth, keepaspectratio]{figures/data-dist}
  111. \caption{Distribution of the data among classes. 47~classes with
  112. more than \num{1000} recordings are not shown.}
  113. \label{fig:class-data-distribution}
  114. \end{figure}
  115. A weakness of \dbNameVersion{} is the amount of available data per class. For
  116. some classes, there are only 51~elements in the test set.
  117. The data has $32\cdot 32 = 1024$ features in $\Set{0, 255}$.
  118. As~\cref{table:pca-explained-variance} shows, \SI{32}{\percent} of the features
  119. can explain~\SI{90}{\percent} of the variance, \SI{54}{\percent} of the
  120. features explain \SI{99}{\percent} of the variance and \SI{86}{\percent} of the
  121. features explain \SI{99}{\percent} of the variance.
  122. \begin{table}[h]
  123. \centering
  124. \begin{tabular}{lccc}
  125. \toprule
  126. Principal Components & 331 & 551 & 882 \\
  127. Explained Variance & \SI{90}{\percent} & \SI{95}{\percent} & \SI{99}{\percent} \\
  128. \bottomrule
  129. \end{tabular}
  130. \caption{The number of principal components necessary to explain,
  131. \SI{90}{\percent}, \SI{95}{\percent}, \SI{99}{\percent}
  132. of the data.}
  133. \label{table:pca-explained-variance}
  134. \end{table}
  135. The Pearson correlation coefficient was calculated for all features. The
  136. features are more correlated the closer the pixels are together as one can see
  137. in~\cref{fig:feature-correlation}. The block-like structure of every 32th
  138. feature comes from the fact the features were flattened for this visualization.
  139. The second diagonal to the right shows features which are one pixel down in the
  140. image. Those correlations are expected as symbols are written by continuous
  141. lines. Less easy to explain are the correlations between high-index
  142. features with low-index features in the upper right corner of the image.
  143. Those correlations correspond to features in the upper left corner with
  144. features in the lower right corner. One explanation is that symbols which have
  145. a line in the upper left corner are likely $\blacksquare$.
  146. \begin{figure}[h]
  147. \centering
  148. \includegraphics*[width=\linewidth, keepaspectratio]{figures/feature-correlation.pdf}
  149. \caption{Correlation of all $32 \cdot 32 = 1024$ features. The diagonal
  150. shows the correlation of a feature with itself.}
  151. \label{fig:feature-correlation}
  152. \end{figure}
  153. \section{Classification Challenge}
  154. \subsection{Evaluation}
  155. \dbName{} defines 10 folds which should be used for calculating the accuracy
  156. of any classifier being evaluated on \dbName{} as follows:
  157. \begin{algorithm}[H]
  158. \begin{algorithmic}
  159. \Function{CrossValidation}{Folds $F$}
  160. \State $D \gets \cup_{i=1}^{10} F_i$\Comment{Complete Dataset}
  161. \For{($i=0$; $\;i < 10$; $\;i$++)}
  162. \State $A \gets D \setminus F_i$\Comment{Train set}
  163. \State $B \gets F_i$\Comment{Test set}
  164. \State Train Classifier $C_i$ on $A$
  165. \State Calculate accuracy $a_i$ of $C_i$ on $B$
  166. \EndFor
  167. \State \Return ($\frac{1}{10}\sum_{i=1}^{10} a_i$, $\min(a_i)$, $\max(a_i)$)
  168. \EndFunction
  169. \end{algorithmic}
  170. \caption{Calculate the mean accuracy, the minimum accuracy, and the maximum
  171. accuracy with 10-fold cross-validation}
  172. \label{alg:seq1}
  173. \end{algorithm}
  174. \subsection{Model Baselines}
  175. Eight standard algorithms were evaluated by their accuracy on the raw image
  176. data. The neural networks were implemented with
  177. Tensorflow~0.12.1~\cite{tensorflow2015-whitepaper}. All other algorithms are
  178. implemented in sklearn~0.18.1~\cite{scikit-learn}. \Cref{table:classifier-results}
  179. shows the results of the models being trained and tested on MNIST and also for
  180. \dbNameVersion{}:
  181. \begin{table}[h]
  182. \centering
  183. \begin{tabular}{lrrr}
  184. \toprule
  185. \multirow{2}{*}{Classifier} & \multicolumn{3}{c}{Test Accuracy} \\%& \multirow{2}{*}{\parbox{1.2cm}{\centering HASY\\Test time}}\\
  186. & MNIST & HASY & min -- max\hphantom{00 } \\\midrule% &
  187. TF-CNN & \SI{99.20}{\percent} & \SI{81.0}{\percent} & \SI{80.6}{\percent} -- \SI{81.5}{\percent}\\% & \SI{3.1}{\second}\\
  188. Random Forest & \SI{96.41}{\percent} & \SI{62.4}{\percent} & \SI{62.1}{\percent} -- \SI{62.8}{\percent}\\% & \SI{19.0}{\second}\\
  189. MLP (1 Layer) & \SI{89.09}{\percent} & \SI{62.2}{\percent} & \SI{61.7}{\percent} -- \SI{62.9}{\percent}\\% & \SI{7.8}{\second}\\
  190. LDA & \SI{86.42}{\percent} & \SI{46.8}{\percent} & \SI{46.3}{\percent} -- \SI{47.7}{\percent}\\% & \SI{0.2}{\second}\\
  191. $k$-NN ($k=3$)& \SI{92.84}{\percent} & \SI{28.4}{\percent} & \SI{27.4}{\percent} -- \SI{29.1}{\percent}\\% & \SI{196.2}{\second}\\
  192. $k$-NN ($k=5$)& \SI{92.88}{\percent} & \SI{27.4}{\percent} & \SI{26.9}{\percent} -- \SI{28.3}{\percent}\\% & \SI{196.2}{\second}\\
  193. QDA & \SI{55.61}{\percent} & \SI{25.4}{\percent} & \SI{24.9}{\percent} -- \SI{26.2}{\percent}\\% & \SI{94.7}{\second}\\
  194. Decision Tree & \SI{65.40}{\percent} & \SI{11.0}{\percent} & \SI{10.4}{\percent} -- \SI{11.6}{\percent}\\% & \SI{0.0}{\second}\\
  195. Naive Bayes & \SI{56.15}{\percent} & \SI{8.3}{\percent} & \SI{7.9}{\percent} -- \hphantom{0}\SI{8.7}{\percent}\\% & \SI{24.7}{\second}\\
  196. AdaBoost & \SI{73.67}{\percent} & \SI{3.3}{\percent} & \SI{2.1}{\percent} -- \hphantom{0}\SI{3.9}{\percent}\\% & \SI{9.8}{\second}\\
  197. \bottomrule
  198. \end{tabular}
  199. \caption{Classification results for eight classifiers.
  200. % The test time is the time needed for all test samples in average.
  201. The number of
  202. test samples differs between the folds, but is $\num{16827} \pm
  203. 166$. The decision tree was trained with a maximum depth of~5. The
  204. exact structure of the CNNs is explained
  205. in~\cref{subsec:CNNs-Classification}. For $k$ nearest neighbor,
  206. the amount of samples per class had to be reduced to 50 for HASY
  207. due to the extraordinary amount of testing time this algorithm
  208. needs.}
  209. \label{table:classifier-results}
  210. \end{table}
  211. The following observations are noteworthy:
  212. \begin{itemize}
  213. \item All algorithms achieve much higher accuracy on MNIST than on
  214. \dbNameVersion{}.
  215. \item While a single Decision Tree performs much better on MNIST than
  216. QDA, it is exactly the other way around for~\dbName{}. One possible
  217. explanation is that MNIST has grayscale images while \dbName{} has
  218. black and white images.
  219. \end{itemize}
  220. \subsection{Convolutional Neural Networks}\label{subsec:CNNs-Classification}
  221. Convolutional Neural Networks (CNNs) are state of the art on several computer
  222. vision benchmarks like MNIST~\cite{wan2013regularization}, CIFAR-10, CIFAR-100
  223. and SVHN~\cite{huang2016densely},
  224. ImageNet~2012~\cite{deep-residual-networks-2015} and more. Experiments on
  225. \dbNameVersion{} without preprocessing also showed that even the
  226. simplest CNNs achieve much higher accuracy on \dbNameVersion{} than all other
  227. classifiers (see~\cref{table:classifier-results}).
  228. \Cref{table:cnn-results} shows the 10-fold cross-validation results for four
  229. architectures.
  230. \begin{table}[H]
  231. \centering
  232. \begin{tabular}{lrrrr}
  233. \toprule
  234. \multirow{2}{*}{Network} & \multirow{2}{*}{Parameters} & \multicolumn{2}{c}{Test Accuracy} & \multirow{2}{*}{Time} \\
  235. & & mean & min -- max\hphantom{00 } & \\\midrule
  236. 2-layer & \num{3023537} & \SI{73.8}{\percent} & \SI{72.9}{\percent} -- \SI{74.3}{\percent} & \SI{1.5}{\second}\\
  237. 3-layer & \num{1530609} & \SI{78.4}{\percent} & \SI{77.6}{\percent} -- \SI{79.0}{\percent} & \SI{2.4}{\second}\\
  238. 4-layer & \num{848753} & \SI{80.5}{\percent} & \SI{79.2}{\percent} -- \SI{80.7}{\percent} & \SI{2.8}{\second}\\
  239. TF-CNN & \num{4592369} & \SI{81.0}{\percent} & \SI{80.6}{\percent} -- \SI{81.5}{\percent} & \SI{2.9}{\second}\\
  240. \bottomrule
  241. \end{tabular}
  242. \caption{Classification results for CNN architectures. The test time is,
  243. as before, the mean test time for all examples on the ten folds.}
  244. \label{table:cnn-results}
  245. \end{table}
  246. The following architectures were evaluated:
  247. \begin{itemize}
  248. \item 2-layer: A convolutional layer with 32~filters of size $3 \times 3 \times 1$
  249. is followed by a $2 \times 2$ max pooling layer with stride~2. The output
  250. layer is --- as in all explored CNN architectures --- a fully
  251. connected softmax layer with 369~neurons.
  252. \item 3-layer: Like the 2-layer CNN, but before the output layer is another
  253. convolutional layer with 64~filters of size $3 \times 3 \times 32$
  254. followed by a $2 \times 2$ max pooling layer with stride~2.
  255. \item 4-layer: Like the 3-layer CNN, but before the output layer is another
  256. convolutional layer with 128~filters of size $3 \times 3 \times 64$
  257. followed by a $2 \times 2$ max pooling layer with stride~2.
  258. \item TF-CNN: A convolutional layer with 32~filters of size $3 \times 3 \times 1$
  259. is followed by a $2 \times 2$ max pooling layer with stride~2.
  260. Another convolutional layer with 64~filters of size $3 \times 3 \times 32$
  261. and a $2 \times 2$ max pooling layer with stride~2 follow. A fully
  262. connected layer with 1024~units and tanh activation function, a
  263. dropout layer with dropout probability 0.5 and the output softmax
  264. layer are last. This network is described in~\cite{tf-mnist}.
  265. \end{itemize}
  266. For all architectures, ADAM~\cite{kingma2014adam} was used for training. The
  267. combined training and testing time was always less than 6~hours for the 10~fold
  268. cross-validation on a Nvidia GeForce GTX Titan Black with CUDA~8 and CuDNN~5.1.
  269. \clearpage
  270. \subsection{Class Difficulties}
  271. The class-wise accuracy
  272. \[\text{class-accuracy}(c) = \frac{\text{correctly predicted samples of class } c}{\text{total number of training samples of class } c}\]
  273. is used to estimate how difficult a class is.
  274. 32~classes were not a single time classified correctly by TF-CNN and hence have
  275. a class-accuracy of~0. They are shown in~\cref{table:hard-classes}. Some, like
  276. \verb+\mathsection+ and \verb+\S+ are not distinguishable at all. Others, like
  277. \verb+\Longrightarrow+ and
  278. \verb+\Rightarrow+ are only distinguishable in some peoples handwriting.
  279. Those classes account for \SI{2.8}{\percent} of the data.
  280. \begin{table}[h]
  281. \centering
  282. \begin{tabular}{lcrlc}
  283. \toprule
  284. \LaTeX & Rendered & Total & Confused with & \\\midrule
  285. \verb+\mid+ & $\mid$ & 34 & \verb+|+ & $|$ \\
  286. \verb+\triangle+ & $\triangle$ & 32 & \verb+\Delta+ & $\Delta$ \\
  287. \verb+\mathds{1}+ & $\mathds{1}$ & 32 & \verb+\mathbb{1}+ & \includegraphics{symbols/mathbb1.pdf} \\
  288. \verb+\checked+ & {\mbox {\wasyfamily \char 8}} & 28 & \verb+\checkmark+ & $\checkmark$ \\
  289. \verb+\shortrightarrow+ & $\shortrightarrow$ & 28 & \verb+\rightarrow+ & $\rightarrow$ \\
  290. \verb+\Longrightarrow+ & $\Longrightarrow$ & 27 & \verb+\Rightarrow+ & $\Rightarrow$ \\
  291. \verb+\backslash+ & $\backslash$ & 26 & \verb+\setminus+ & $\setminus$ \\
  292. \verb+\O+ & \O & 24 & \verb+\emptyset+ & $\emptyset$ \\
  293. \verb+\with+ & $\with$ & 21 & \verb+\&+ & $\&$ \\
  294. \verb+\diameter+ & {\mbox {\wasyfamily \char 31}} & 20 & \verb+\emptyset+ & $\emptyset$ \\
  295. \verb+\triangledown+ & $\triangledown$ & 20 & \verb+\nabla+ & $\nabla$ \\
  296. \verb+\longmapsto+ & $\longmapsto$ & 19 & \verb+\mapsto+ & $\mapsto$ \\
  297. \verb+\dotsc+ & $\dotsc$ & 15 & \verb+\dots+ & $\dots$ \\
  298. \verb+\fullmoon+ & {\mbox {\wasyfamily \char 35}} & 15 & \verb+\circ+ & $\circ$ \\
  299. \verb+\varpropto+ & $\varpropto$ & 14 & \verb+\propto+ & $\propto$ \\
  300. \verb+\mathsection+ & $\mathsection$ & 13 & \verb+\S+ & $\S$ \\
  301. \verb+\vartriangle+ & $\vartriangle$ & 12 & \verb+\Delta+ & $\Delta$ \\
  302. \verb+O+ & $O$ & 9 & \verb+\circ+ & $\circ$ \\
  303. \verb+o+ & $o$ & 7 & \verb+\circ+ & $\circ$ \\
  304. \verb+c+ & $c$ & 7 & \verb+\subset+ & $\subset$ \\
  305. \verb+v+ & $v$ & 7 & \verb+\vee+ & $\vee$ \\
  306. \verb+x+ & $x$ & 7 & \verb+\times+ & $\times$ \\
  307. \verb+\mathbb{Z}+ & $\mathbb{Z}$ & 7 & \verb+\mathds{Z}+ & $\mathds{Z}$ \\
  308. \verb+T+ & $T$ & 6 & \verb+\top+ & $\top$ \\
  309. \verb+V+ & $V$ & 6 & \verb+\vee+ & $\vee$ \\
  310. \verb+g+ & $g$ & 6 & \verb+9+ & $9$ \\
  311. \verb+l+ & $l$ & 6 & \verb+|+ & $|$ \\
  312. \verb+s+ & $s$ & 6 & \verb+\mathcal{S}+ & $\mathcal{S}$ \\
  313. \verb+z+ & $z$ & 6 & \verb+\mathcal{Z}+ & $\mathcal{Z}$ \\
  314. \verb+\mathbb{R}+ & $\mathbb{R}$ & 6 & \verb+\mathds{R}+ & $\mathds{R}$ \\
  315. \verb+\mathbb{Q}+ & $\mathbb{Q}$ & 6 & \verb+\mathds{Q}+ & $\mathds{Q}$ \\
  316. \verb+\mathbb{N}+ & $\mathbb{N}$ & 6 & \verb+\mathds{N}+ & $\mathds{N}$ \\
  317. \bottomrule
  318. \end{tabular}
  319. \caption{32~classes which were not a single time classified correctly by
  320. the best CNN.}
  321. \label{table:hard-classes}
  322. \end{table}
  323. In contrast, 21~classes have an accuracy of more than \SI{99}{\percent} with
  324. TF-CNN (see~\cref{table:easy-classes}).
  325. \begin{table}[h]
  326. \centering
  327. \begin{tabular}{lcr}
  328. \toprule
  329. \LaTeX & Rendered & Total\\\midrule
  330. \verb+\forall + & $\forall $ & 214 \\
  331. \verb+\sim + & $\sim $ & 201 \\
  332. \verb+\nabla + & $\nabla $ & 122 \\
  333. \verb+\cup + & $\cup $ & 93 \\
  334. \verb+\neg + & $\neg $ & 85 \\
  335. \verb+\setminus + & $\setminus $ & 52 \\
  336. \verb+\supset + & $\supset $ & 42 \\
  337. \verb+\vdots + & $\vdots $ & 41 \\
  338. \verb+\boxtimes + & $\boxtimes $ & 22 \\
  339. \verb+\nearrow + & $\nearrow $ & 21 \\
  340. \verb+\uplus + & $\uplus $ & 19 \\
  341. \verb+\nvDash + & $\nvDash $ & 15 \\
  342. \verb+\AE + & \AE & 15 \\
  343. \verb+\Vdash + & $\Vdash $ & 14 \\
  344. \verb+\Leftarrow + & $\Leftarrow $ & 14 \\
  345. \verb+\upharpoonright+ & $\upharpoonright$ & 14 \\
  346. \verb+- + & $- $ & 12 \\
  347. \verb+\guillemotleft + & $\guillemotleft $ & 11 \\
  348. \verb+R + & $R $ & 9 \\
  349. \verb+7 + & $7 $ & 8 \\
  350. \verb+\blacktriangleright+ & $\blacktriangleright$ & 6 \\
  351. \bottomrule
  352. \end{tabular}
  353. \caption{21~classes with a class-wise accuracy of more than \SI{99}{\percent}
  354. with TF-CNN.}
  355. \label{table:easy-classes}
  356. \end{table}
  357. \section{Verification Challenge}
  358. In the setting of an online symbol recognizer like
  359. \href{http://write-math.com}{write-math.com} it is important to recognize when
  360. the user enters a symbol which is not known to the classifier. One way to achieve
  361. this is by training a binary classifier to recognize when two recordings belong to
  362. the same symbol. This kind of task is similar to face verification.
  363. Face verification is the task where two images with faces are given and it has
  364. to be decided if they belong to the same person.
  365. For the verification challenge, a training-test split is given. The training
  366. data contains images with their class labels. The test set
  367. contains 32~symbols which were not seen by the classifier before. The elements
  368. of the test set are pairs of recorded handwritten symbols $(r_1, r_2)$. There
  369. are three groups of tests:
  370. \begin{enumerate}[label=V\arabic*]
  371. \item $r_1$ and $r_2$ both belong to symbols which are in the training set,
  372. \item $r_1$ belongs to a symbol in the training set, but $r_2$
  373. might not
  374. \item $r_1$ and $r_2$ don't belong symbols in the training set
  375. \end{enumerate}
  376. When evaluating models, the models may not take advantage of the fact that it
  377. is known if a recording $r_1$ / $r_2$ is an instance of the training symbols.
  378. For all test sets, the following numbers should be reported: True Positive (TP),
  379. True Negative (TN), False Positive (FP), False Negative (FN),
  380. Accuracy $= \frac{TP+ TN}{TP+TN+FP+FN}$.
  381. % \section{Open Questions}
  382. % There are still a couple of open questions about \dbNameVersion:
  383. % \begin{enumerate}
  384. % \item What is the accuracy of human expert labelers?
  385. % \item What is the variance between human experts labeling the samples?
  386. % \end{enumerate}
  387. \section{Acknowledgment}
  388. I want to thank \enquote{Begabtenstiftung Informatik Karls\-ruhe}, the Foundation
  389. for Gifted Informatics Students in Karlsruhe. Their support helped me to write
  390. this work.