Zusammenfassung::
Motivation::
Ergebnisse::
Keywords: machine-learning, nlp
Ch2: Regular Expressions, Text Normalisation and Edit Distance
- 2 – text data needs to be preprocessed to build a basis for analysis
- Tokenization (we-dont-know-how-to-define-what-a-word-is)
- lemmatization
- Stemming
- Segmentation
Todo
Metadaten
- Highlights:: highlights-zu-jurafsky2009