Highlights 💡: A Girl Has A Name

Source: @mahmood2020

Hinzugefügt am 2022-11-07

Such powerful authorship attribution methods pose a threat to privacyconscious users such as journalists and activists who may wish to publish anonymously (Times, 2018; Anonymous, 2018). (p. 1)

in addition to evading attribution and preserving semantics, it is important that authorship obfuscation methods are “stealthy” – i.e., they need to hide the fact that text was obfuscated from the adversary (p. 1)

existing authorship obfuscation methods themselves leave behind stylistic signatures that can be detected using neural language models (p. 1)

make modifications such that text semantics is still preserved (p. 3)

the intuition behind our obfuscation detectors is to exploit the differences in text smoothness between human written and obfuscated texts (p. 3)

A text with a relatively greater proportion of high likelihood words is likely to be more smooth. (p. 3)

instead of word prediction, we extract the likelihood from the language model (either as a probability or as a rank) for each word in the text (p. 3)

GPT-2 (p. 3)

we used the small and medium versions containing 117M and 345M parameters, respectively (p. 3)

BERT (p. 4) pre-trained BERT: BERT BASE with 110M parameters and BERT LARGE with 340M (p. 4)

We implement likelihood extraction…using code made available by the Giant Language Model Test Room (GLTR) (p. 4)

Binning based features (p. 4)

We aggregate this information using fixed size bins representing different likelihood ranges. For probabilities we create bin sizes of 0.001, 0.005 and 0.010. For ranks we create bin sizes of 10, 50 and 100. (p. 4)

Image based features (p. 4)

we explore obfuscation detection via image classification. Specifically, we explore a transfer learning approach wherein we use the VGG-19 classifier4 trained for image classification on ImageNet dataset5 (p. 4)

we sort the extracted likelihood values for the text in descending order and then plot these values (p. 4)

4 language models giving probabilities or ranks as output, 4 features (3 binning based features and 1 image based feature) and 5 different classifiers we experiment with a total of 160 distinct architectures (p. 5)

Document Simplification (Castro-Castro et al., 2017). (p. 5)

replace all contractions with expansions (p. 5)

removing parenthetical texts that do not contain any named entity, discourse markers or appositions (p. 5)

Style Neutralization (Karadzhov et al., 2017) (p. 5)

move the document’s stylometric feature values towards the corpus averages (p. 5)

MUTANT-X (Mahmood et al., 2019) (p. 5)

genetic algorithm (GAs) (p. 5)

identify words that when changed would have the highest positive effect (p. 5)

Extended Brennan Greenstadt corpus. This text corpus from (Brennan et al., 2012) contains 699 documents written by 45 unique authors. (p. 5)

Blog authorship corpus. This text corpus which is from (Schler et al., 2006) contains more than 600,000 blogger.com blog posts written by 19,320 unique authors. (p. 5)

A document is considered obfuscated if it has been processed by an authorship obfuscation tool. (p. 5)

An obfuscated document is viewed as an evaded document if it successfully evades authorship attribution. (p. 5)

conduct a total of 640 distinct obfuscation detection experiments (p. 6)

GLTR (Gehrmann et al., 2019) (p. 6)

pretrained language models to extract word likelihoods and presents their plot to humans making the decision (p. 6)

Character trigrams + KNN (Juola, 2012) (p. 6)

etect manually obfuscated (p. 6)

Writeprints + SVM (Afroz et al., 2012) (p. 6)

After averaging we find that for obfuscation detection, 25% of all 160 architectures achieve F1 score greater than 0.76, 50% achieve F1 score greater than 0.72 and a high 75% of them were able to achieve F1 score greater than 0.52. (p. 7)

evaded documents achieve a higher maximum F1 score than obfuscated documents (p. 7)

using the best of methods the adversary can detect evaded and obfuscated documents with F1 score of 0.77 or higher (p. 7)

Ahmed Abbasi and Hsinchun Chen. 2008. Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Transactions on Information Systems (TOIS), 26(2):7. (p. 9)

Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc’Aurelio Ranzato, and Arthur Szlam. 2019. Real or Fake? Learning to Discriminate Machine from Human Generated Text. arXiv preprint arXiv:1906.03351. (p. 9)

Michael Brennan, Sadia Afroz, and Rachel Greenstadt. 2012. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Transactions on Information and System Security (TISSEC), 15(3):12. (p. 10)