Takeaway 🏃

Zusammenfassung:: Evert et al. seek to define and evaluate Burrows’ Delta measurement to improve authorship attribution results.

Motivation:: Progress in attribution performance has been stagnating up until this point and there still was no real understanding on why Burrow’s Delta is relatively reliable… and why it fails when it does.

Ergebnisse:: Using the most frequent words as features and standardizing them yields better results. Both Burrow’s and Cosine Delta improve significantly by using vector normalization.

Keywords: authorship

  • 79 | Definition Authorship Attribution
    • Idiosyncracies/habitual tendencies in a person’s language use
    • Clustering and classification tasks
    • Burrow’s Delta (@burrows2005) is a distance measure, very robust but outperformed by Cosine Delta (Smith and Aldridge 2011)
  • 80 | where is the mean, is a distribution of the relative frequencies of words in a document , the stdev of the word
  • 80 | Burrows’ Delta: (Burrows 2002)
  • 80 | Cosine Delta: (Smith and Aldridge 2011)


  • [b] What is vector normalization?


Evert, Stefan, Thomas Proisl, Thorsten Vitt, Christof Schöch, Fotis Jannidis & Steffen Pielström. 2015. Towards a better understanding of Burrows’s Delta in literary authorship attribution. In Proceedings of the Fourth Workshop on Computational Linguistics for Literature, 79–88. Denver, Colorado, USA: Association for Computational Linguistics. https://doi.org/10.3115/v1/W15-0709.