ch3-data-and-implementation.tex 1001 B

1234567891011121314151617181920
  1. %!TEX root = write-math-ba-paper.tex
  2. \section{Data and Implementation}
  3. We used $\num{369}$ symbol classes with a total of $\num{166898}$ labeled
  4. recordings. Each class has at least $\num{50}$ labeled recordings, but over
  5. $200$ symbols have more than $\num{200}$ labeled recordings and over $100$
  6. symbols have more than $500$ labeled recordings.
  7. The data was collected by two crowd-sourcing projects (Detexify and
  8. \href{http://write-math.com}{write-math.com}) where users wrote
  9. symbols, were then given a list ordered by an early classification system and
  10. clicked on the symbol they wrote.
  11. The data of Detexify and \href{http://write-math.com}{write-math.com} was
  12. combined, filtered semi-automatically and can be downloaded via
  13. \href{http://write-math.com/data}{write-math.com/data} as a compressed tar
  14. archive of CSV files.
  15. All of the following preprocessing and feature computation algorithms were
  16. implemented and are publicly available as open-source software in the Python
  17. package \texttt{hwrt}.