Takeaway 🏃

Zusammenfassung:: In this study, Burrows outlines several methods of authorship attribution, based on word occurences and statistical analysis.

Motivation:: Authorship attribution is a highly important field in literary studies. Figuring out which person wrote a piece can help tracing back its origins and time of writing. It can also influence possible interpretations. Burrows chose Shamela, a parody of someone else’s work (Samuel Richardson), because it was sort of already known who the original author is (Henry Fielding).

Ergebnisse:: Burrows develops different strategies for identifying the original author. He focusses on word occurences and word types. Since Fielding’s writing had certain idiosyncracies like his use of the archaic words “hath” and “doth”, this approach is successful. This study gave rise to Burrow’s Delta.

Keywords: authorship


  • 438 | Some genres are quite homogenous, there is less room for individual narrative which makes these a harder target.
  • 440 | Perhaps related to pronoun use and attributions in the text, non-fiction thus harder to analyse as well.
  • 441 | Common-word procedures
  • 442 | Some words are so rare, especially in a specific time frame, that they can directly be attributed to a certain author.
    • Issue: even if it’s apparent that only this author uses these words, they don’t pile them up in a text, they use it sparingly → not enough evidence
    • new coinages still spread quickly
  • 442 | A different approach would be to look for words an author uses extremely frequently, it’s like their habit to include them. This would make it easier to distinguish between, say, two authors you’re comparing.
  • 448 | Comparison to “counter set” (basically a control group of authors) to see if first intuition was right, does this author stand apart from the rest in the words he uses frequently?

Todo

  • [b] What is principal component analysis?
  • [b] What is a z-score?

Metadaten


Burrows, John. 2005. Who wrote Shamela? Verifying the Authorship of a Parodic Text. Digital Scholarship in the Humanities 20(4). 437–450. fqi049.