3 Organizational Style & Structure of Response for a Robo-Grader

Alise Lamoreaux

Automated essay graders (AEG) are programmed to “read” for certain types of words that signal content and structure of the essay.  AEG is limited in the type of essay it can effectively score.  AEG does not independently assess the merits of an essay.  AEG is designed to mirror or predict the score an expert human reader would assign to the essay or extended response.

The software programming behind the AEG system has been trained to look for similarities between the test set data and the response being currently evaluated.  The basis for the trait analysis is that good writing should look like good writing.  Automated essay graders are good at evaluating writing with specific parameters and defined vocabulary selection.  Essays with a narrative format are difficult for AEG to evaluate due to the wide-open possibilities of language use.  Using automated essay graders puts an emphasis on rhetorical essays of the argumentative/persuasive or informational styles.  Unfortunately for teachers and students, the argumentative response can be one of the most difficult essay styles to teach.  William Jolliff (1998) in his faculty publication, for George Fox University, states that, “Most of [his] students have apparently seldom witnessed how real argument works….”  It could be argued that in today’s (2020) environment of conflict, it’s hard to find a model for understanding the basis of evidence and persuasion via argument.

Allison Rose Greenwald, in her 2007 Thesis research at Iowa State University, cites four major difficulties in teaching the argumentative style of writing to students:

  1. Students’ limitation in comprehending logic
  2. Argument is a difficult form of discourse to teach
  3. Lack of guidance provided by standard textbooks
  4. Poor teacher training as most composition teachers lack the background and skills in rhetoric and logic to teach argumentation effectively

Three major types/models of argumentation formats are academically hailed.  Each one has a different methodology and audience expectation.

  1. Classical: argues an issue using evidence and refutation expecting an “opponent” with an open mind to change.
  2. Rogerian: argues an issue emphasizing similarities with the “opponent’s” beliefs attempting to establish a “win-win” outcome with no losers
  3. Toulmin: argues an issue emphasizing the strength of evidence to a close-minded audience. Practical arguments are comprised of probability.  Best used when the audience is logical and rational.

Different types of arguments lend themselves to each of the 3 models of argumentation styles.  For example, the topic of Universal Health Care is a topic with “gray” areas of debate.  The Rogerian model, looking for a “win-win” outcome, might be better suited for this topic.  When facing a Robo-Grader, the Toulmin model may be the best choice.   Toulmin arguments are most effective where there is a clear split between ideas on both sides of the issue.  The Toulmin model focuses on removing the credibility of the opposition and showing the strength of the position supported.  The topic of environmental damage being done by humans would fit the Toulmin model of argumentation.

The Toulmin method is good to use when the goal is to put facts at the forefront of the argument.  It is also a good format for addressing the scientific community.  Toulmin’s ideas about logical arguments are relatively easy to explain to students and lend themselves to the tokenization that AEG will apply to the sentences.

One of the benefits to the Toulmin method is that it offers a non-complicated system for presenting an argument.  It offers a structural model for building and analyzing rhetorical arguments.  A complaint about the Toulmin method may be that it seems a bit like a formula for organizing an essay; however, that could also be its strength. The Toulmin method is a way to help writers think about connections and how to link the evidence to the claim.  It’s important to remember that evidence allows for judgement.  It is not the same as proof, which means something is absolute, and not able to be contested.

In June of 2015, a teacher training session provided by GEDTS® designed for preparing students to take the GED® Reasoning Through Language Arts test, which includes an Extended Response that is scored by AEG, the presenters specifically suggested using the Toulmin method for structuring the response (https://www.youtube.com/watch?v=DAwXSOan3KQ). A more recent training by the same organization (Aug. 2019), still suggests using the Toulmin method, but they no longer refer to the format by name.

Components of the Toulmin Method for creating a structured argument:

  1. Make a claim.
    The claim answers the question of “So, what’s the point?”
  2. What are the grounds/data?
    The grounds/data to answer the question of “How come?” or “Why?”
  3. State the warrant/bridge that connects the claim to the grounds.
    “Why do these things go together?”
  4. Provide backing to the warrant.
    Provide additional logic or supporting evidence for the warrant.
  5. Include qualifiers to show the strength of the argument.
    Examples: so, some, many, in general, usually, typically, 75%
  6. Create a counterclaim to the claim.
    Anticipate the opposing perspective and state it. Responding to counterclaims make you seem unbiased.
  7. State a rebuttal which provided evidence to disagree with the counterclaim.

Example #1 of the foundation of the Toulmin Model (Simplified)

Claim: There is a forest fire nearby.
Grounds:  Smoke is in the air.
Warrant: Fires produce smoke…
Qualifier: …so chances are, where there is smoke there is fire.
Backing:  It is summer and that’s fire season.
Counterclaim: It rained all last week and the ground is wet.
Rebuttal: A helicopter with a water bucket just flew overhead heading in the direction of the smoke.

Example #2 of the foundation of the Toulmin Model:

Grounds:  My thoroughbred horse was born in the state of New York
Claim: ,so my horse is eligible for extra winnings in races specifically for New York bred horses
Warrant:  ,since an equine born in New York will be considered a New York bred horse.

How the sentence would read to AEG:

My thoroughbred horse was born in the state of New York, so my horse is eligible for extra winnings in races specifically for New York bred horses, since an equine born in New York will be considered a New York bred horse.

Analyzing the above sentence regarding the horse, as an AEG might “read” it, the sentence has the following features:

  • 42 words and 181 characters
  • Reading level of 16.1
  • The word “so” signals additional information and acts as a “qualifier”
  • The word “since” signals a relationship
  • The word “equine” is less commonly used/unique word and a synonym for horse
  • Longer sentences are correlated to higher skills in language usage
  • There are no misspelled words

As a writing instructor, I may not like the sentence construction, and feel it uses too many words, but looking at it from the perspective of its “features”, I might think differently.  The sentence will likely score high in the “eyes” of a Robo-grader.

Organization, Style, & Word Value

We know from the research that automated essay graders can’t make judgements about evidence or content within the writing being scored.  We know from the original research around Project Essay Grader (PEG®) that the intrinsic value associated with good writing can’t be measured, so instead features that approximate the intrinsic qualities are defined and quantified.  Key words signal the complexity of the writing.  One such word is “because” and it is also linked to style (Shermis, Burstein, Higgins, & Zechner 2013).

One method of measurement used by AEG is to tie two or more features together to assess the complexity of the writing.  It appears that AEG likes clauses, and especially dependent clauses, because they show relationships and can be qualifiers.  Dependent clauses also make sentences longer.  Dependent clauses signal that more information is coming.  They also signal reasoning.  Subordinating conjunctions are almost always associated with dependent clauses and can be interpreted as “cue” words.  A word like “before” can cue the AEG into potential sequencing and organization.  “Rather than” can cue a turn in reasoning or topic.  The word “because” implies that a reason for the action or behavior will follow.  To a Robo-grader, “power words” like “because” not only show a relationship, but also increase sentence length, which increases reading level, which increases sentence complexity, and subsequently, equals a higher score.

Robo-graders also like discourse markers, the words that help the text flow by showing time, cause and effect, contrast, comparison, qualification, and so on.  Examples of discourse markers can be words like however, likewise, until, consequently, and therefore.  Discourse markers are words that help connect sentences and ideas.  They are basically transition words or conjunctions.  These words match features that the AEG is looking for.

The AEG can look for words with similar meanings.  The coding behind AEG will have clustered these similar meaning words together.  The words that are longer, thus containing more letters, will have a higher value.  The synonyms that are used less often and therefore considered more unique will also have a higher value.

AEG cannot detect polysemy, the coexistence of many meanings to the same word.  Strictly counting the appearance of the word by AEG could be misleading.  For example, the word, mine.  It could be a personal pronoun, a hole in the ground, an explosive device, or part of the name of the 2009 Kentucky Derby winner, Mine That Bird.

Another aspect of automated essay grading is looking for word matches to the test/sample essays that set the basis for the scoring.  AEG has been trained to look for words that look like the highest scoring essays and then award value to the essay being scored based on its similarity to the sample set.

Length of the essay another vital component to pay attention to.  If a suggested word length is given, for example, 300-500 words, it is an important piece of information.  Failing to meet the suggested minimum number of words may trigger the AEG to not be able to find a paper in the sample set it is trying to match.  Essays that are short will lack many of the features the AEG is looking for.  Assumptions behind the programming from the research available suggest that shorter essays will equate to lower quality writing.  At the opposite end of the spectrum, going beyond the suggested length may not gain the writer additional value as the essay has already been assigned the value of length and the assumption would be that more words aren’t necessary. Demonstrating what 300 words looks like in print can be a helpful tool for students to increase their awareness and understanding of the expectation for the AEG.

Organizational style and structure can take on a different meaning when an automated essay grader is the final evaluator of an essay.  An argumentative essay may not be the style of essay the student is familiar with, and therefore may need additional guidance in “thinking like a Robo-grader”.  The style of essay students will be asked to write will involve Evidence Based Writing (EBW), which is not the typical essay students learn to write.  Traditionally, students are taught to engage with writing on a personal level. Robo-graders cannot handle the nuances of expression and may penalize the writer for vocabulary choices.  Longer writing will rate higher, while fragments will decrease the score, even if they are stylistically appropriate.  Word choice can take on a different meaning and significance when preparing to write for an automated essay grader.