Zusammenfassung:: This paper deals with authorship obfuscation through paraphrasing character trigrams. Bevendorff et al. develop both a greedy and a heuristic obfuscation method prototype and evaluate it against common attribution techniques such as unmasking.
Motivation:: Achieve flexibility and maintain readability.
Ergebnisse:: Unmasking performance decreased by ~3-9 %. Compression performance decreased by 15 %. Path cost decrease of up to 75%. However, text is still barely readable after obfuscation, creates some nonsensical phrases.
- 1098 | Nutzen Jensen-Shannon divergence um Abstand zwischen Original und paraphrasierter “Fälschung” zu berechnen
- 1099 | Literature review
- 1100 | Stylistic Distance measured between character trigram frequencies as JS distance: with so that
- 1101 | Strategy 1: Greedy authorship-obfuscation-by-reducing-idiosyncratic-text-features
- 1102 | Strategy 2: Heuristic obfuscation by searching for cost function to optimise process → Generating solutions
- 1103 | cost-intensity-of-author-obfuscation → obfuscation-operators
- 1105 | Longer texts decrease performance, most of text left un-obfuscated
- Highlights:: highlights-zu-bevendorff2019
- Gelesen am:: 2022-10-31
Bevendorff, Janek, Martin Potthast, Matthias Hagen & Benno Stein. 2019. Heuristic Authorship Obfuscation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1098–1108. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1104.