Takeaway 🏃

Zusammenfassung:: This paper deals with authorship obfuscation through paraphrasing character trigrams. Bevendorff et al. develop both a greedy and a heuristic obfuscation method prototype and evaluate it against common attribution techniques such as unmasking.

Motivation:: Achieve flexibility and maintain readability.

Ergebnisse:: Unmasking performance decreased by ~3-9 %. Compression performance decreased by 15 %. Path cost decrease of up to 75%. However, text is still barely readable after obfuscation, creates some nonsensical phrases.

Code: https://github.com/webis-de/acl19-heuristic-authorship-obfuscation

Keywords: authorship

📇 Index

  • 1098 | Nutzen Jensen-Shannon divergence um Abstand zwischen Original und paraphrasierter “FĂ€lschung” zu berechnen
  • 1099 | Literature review
    • authorship-attribution: Abbasi and Chen (2008) Writeprints, Teahan et al. Compression, Koppel and Schler (2004) Unmasking
    • authorship-obfuscation: monoand multilingual machine translation (lack of data), reverse unmasking
  • 1100 | Stylistic Distance measured between character trigram frequencies as JS distance: with so that
  • 1101 | Strategy 1: Greedy authorship-obfuscation-by-reducing-idiosyncratic-text-features
  • 1102 | Strategy 2: Heuristic obfuscation by searching for cost function to optimise process → Generating solutions
  • 1103 | cost-intensity-of-author-obfuscation → obfuscation-operators
  • 1105 | Longer texts decrease performance, most of text left un-obfuscated

Metadaten


Bevendorff, Janek, Martin Potthast, Matthias Hagen & Benno Stein. 2019. Heuristic Authorship Obfuscation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1098–1108. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1104.