Context:
This is based on an activity from Coulthard/et al. (2017/2018), Chapter 8.
Background:
Below are five very short emails from the that are all written by the same person (a-d).
(a) How sweet. Its doing ok. A better question is hows the lovelife? Thats doing ok too although I wish I could find the perfect woman. Mine is wanting to take some trips Maine, New York, EUROPE!!!!!
(b) I was out yesterday afternoon but the price for that product, all in is $5.06. Let me know if you need it broken down and if you are still interested.
Thanks,
[First name + Surname]
(c) Thanks for coming. It was great to have you with us last weekend. I shall ask Vin about Blackbird. I think he should get a job ASAP. Otherwisw [sic], he will go insane staying home with two aging parents.
(d) Attached is our sample agreement. Our legal guys are standing by to answer any questions. Let me know if there is anything else that you need.
(e) [Name],
We have a contract extension for Sempra, 30,000/d from La Plata to I/B Link extending it from 12/31/01 to 12/31/02. Max rate. Standard language. I would like to approve this contract in your absence. Legal and Regulatory have already approved. Thanks, [First name]
Instructions:
There are four candidates who may be the author of the five emails above. Your job as a forensic linguist is to identify which of the four candidates shown in Table A is the most likely author.
The first stage of analysis has been done for you: Table 8.2 (below) lists style markers that have been found to be consistent and distinctive in the known writings of each of the four candidates who may have written the emails given above. Using these style markers, decide which of the four candidates (in the table below) has a style closest to that of the disputed texts (a-e).
Questions to respond to:
- Discuss which author you selected (A, B, C, or D below) as the author of the disputed writings/emails and a clear justification of why you selected that author.
- Cite specific similarities between the disputed emails and the style of the author you selected (that you think singles them out from the other three authors). In other words, justify your response clearly with evidence from the data you are given above.
- Explain any challenges you faced when making your decision.
- Also, give an estimate of how confident you are in your analysis and why you feel that way.
|
Table 8.2 Consistent and distinctive style markers in four email authors |
|
|
Author A |
Author B |
|
Multiple exclamation marks |
Month [the] number x (e.g., May the 25th) |
|
mm/dd/yy |
Day, Month + Number( x) (e.g., Saturday, May 25th) |
|
Hey buddy |
Shall |
|
Hi Team |
I am attaching |
|
Name, (same line as rest of email) |
|
|
|
|
|
Author C |
Author D |
|
Single/double exclamation marks |
Day+Month.Number (e.g., Saturday May.25) |
|
month.number |
Day-Month+Number format (e.g., Saturday- May 25) |
|
Tomarrow |
regards |
|
Probobly |
Enron online semantic field |
|
thanks, + first name |
Please find attached |
|
thanks, + full name |
thanks + first name |
|
thanks! first name |
thanks + full name |
|
Hey + name |
Name (same line as rest of email) |
|
Let me know if you have any questions |
|
|
Let me know if you need |
|
|
Let me know if there is anything |
|
|
Let me know when you [guys] are available |
|
|
Attached is |
|
More context about the Enron corpus (not necessary for completing activity):
Coulthard et al. says this about the corpus (collection of texts) from which the data for this activity was drawn:
“Recent research in forensic authorship analysis has begun to experimentally test the ways in which corpora can be used in [authorship attribution]. Wright (2013) and Johnson and Wright (2014) used the Enron email corpus as relevant population data. Wright (2013) found that two email greetings used by employees of Enron, Hello: and Hey: (with colons), were respectively 555 and 269 times more likely to appear in an email written by one particular employee than in one written by any of the other 175 employees in the dataset. Later, Johnson and Wright (2014) demonstrated that, even though please occurred over 11,000 times in the emails of 165 out of 176 authors in the corpus, it was still possible to identify individual and idiolectal variation within please-initial word strings (i.e., strings of word that begin with please). For example, please print the message, please format and print and please proceed with your only occur in emails sent by one employee. Such strings, it was argued, hold population- level distinctiveness, and represent textbites that can characterise an authors writing style (Chapter 8, p. 162).”
Leave a Reply
You must be logged in to post a comment.