Can Computers Score Essays?

Can Computers Score Essays? photo

We are living in an age where machines can be programmed to do just about anything imaginable. Automation has become a common occurrence in our everyday lives. But most of us are still wary of the reliability of automated systems in performing functions requiring intelligent human thought, especially those relating to the learning process. This inevitably leads us to the concept of artificial intelligence, which begs the question, “Can computers learn?” According to Ellis Batten Page, Ph.D. (Professor of Educational Psychology at the University of Connecticut) — whose pioneering work in the field of computational linguistics has distinguished him as the father of computer-based essay scoring — the answer is a resounding, "YES."

Dr. Page first proposed the idea of using computing technology to evaluate and score written prose in the early 1960s. As a former high-school teacher, Page knew that the best way to improve writing skills was to practice writing—and often. He also knew that, among other demands facing teachers in the classroom, the time and effort required to read and evaluate student writing were the primary constraints. He hypothesized that if the time-consuming drudgery of evaluating written prose could be reduced, through automated evaluation and scoring, teachers would be inclined to assign more writing exercises. Correspondingly, when students are given more opportunities to write with the expectation of receiving feedback about their writing, students could substantially improve their writing skills.

To address the need for an automated method of evaluating and scoring prose, Page and his associates first defined a set of characteristics intrinsic to the art of writing well (including diction, grammar, vocabulary, cohesion, and fluency). Next, he identified more than 300 mathematical approximations or correlations of these characteristics that could be measured by computer programming. They included sentence length, word count, the number of verbs per sentence, the number of misspelled words, and the presence or absence of transitional words and phrases. Finally, using proven statistical techniques, Page built mathematical models designed to analyze the characteristics of written text, associate those characteristics with qualitative scores rendered by human judges, and produce an independent computer-generated score. This research yielded one of the leading automated essay-scoring software solutions, known today as Project Essay Grade or PEG. Initial results proved that PEG could produce scores highly comparable to those awarded by human judges, validating Page’s belief that human scoring could be reliably modeled by a computer and simulated (thus satisfying the test of artificial intelligence).

Although significant, PEG’s contribution to computational linguistics remained largely academic until advances in computing technology enabled a more practical application. Driven by a variety of factors during the 1990s, including the rising costs of human scoring and the demand for instantaneous results, automated essay scoring was fast becoming a viable scoring alternative for diagnostic writing assessment and test preparation. Page and his colleagues re-engineered the software to take advantage of object-oriented programming techniques, Web-based technologies, and even more advanced statistical methodologies. In 2003, Measurement Incorporated acquired the PEG technology and continues to develop and extend Page’s pioneering efforts in service to its customers in education and business.

Even with more than 40 years of research, study, and enhancement, the accuracy of PEG scores must ultimately rely on how well it trains itself (or learns) to grade an essay. To that end, PEG software analyzes hundreds of “real world” essays and compiles a wealth of statistics about their characteristics. In addition, the “training set” is graded by experienced professional judges at Measurement Incorporated using a standard set of scoring criteria. By analyzing the judges’ scores and the computed characteristics of the training set, PEG learns to identify the prominent factors that judges use in making scoring decisions. Combining these factors, PEG produces a mathematical model that predicts how the judges might score an essay. In short, the computer does not really grade an essay at all. It uses “predictive modeling” to simulate how expert professional judges would grade the essay based on previous decisions made. The model used to score essays reflects the quality of the decisions that were made by those who graded the training set.

With more than 25 years of experience in scoring high-stakes writing assessments for state departments of education, Measurement Incorporated has earned a reputation for scoring quality unrivaled in the testing industry. Within the testing industry, there are several measures of score reliability, of which, percent agreement is the most common. It measures how often human readers agree with one another, within one point, when grading the same essay. Even though human scoring environments can differ significantly based on a number of factors, including the level of training or the clarity of the scoring criteria, well-trained judges typically agree 85-95 percent of the time (based on a six-point scale). As a matter of policy, Measurement Incorporated will not release (for production use) a PEG model that yields less than 90% agreement with the judges scoring the training set. This policy ensures the reliability of PEG scores. Most of the MI-trained PEG models in production today yield between 93% and 97% agreement.

Measurement Incorporated has conducted writing research projects for the Connecticut, Georgia, and Tennessee state departments of education that indicate that PEG’s scoring of essays demonstrates accuracy that is very comparable to that of trained human judges. In 2006, PEG software scored more than 110,000 student writing samples for clients who provide test preparation services for assessments such as the SAT and GED, as well as a variety of writing improvements programs targeted for grades K-12 and adult learners.

Some of Measurement Incorporated's corporate clients using PEG technology today include LearningExpress, Educational Records Bureau, and ProWrite. Since 2000, PEG has been used for a variety of practice writing assessments in the LearningExpress Library, which is an interactive online learning platform of practice tests and tutorial course series. It is available in more than 3,000 public, college, and school libraries in the United States. For LearningExpress Advantage, Measurement Incorporated provides PEG automated essay scoring for hundreds of online practice tests for SAT, ACT, GED, and middle and high-school writing exams.

Since 2005, Measurement Incorporated has developed, hosted, and provided support for ERB’s Writing Practice Program. It is a Web-based program that provides practice and immediate evaluation (using PEG) in six areas of writing by incorporating a state-of-the-art scoring rubric as well as unlimited access to tutorials. Approximately 20,000 students in ERB schools around the world use the program each year. The ProWrite program includes online writing practice tests, scored by PEG, and online tutorials linked to the specific writing skill areas diagnosed by the assessment. It allows businesses and organizations the ability to assess the writing skills of job applicants or improve the writing of current employees. Some clients include law enforcement organizations, educational agencies, corporations, and financial institutions.

In the final analysis, the quality of the scoring model depends on the quality of the human decisions being modeled. Users can have confidence in a PEG automated essay-scoring solution because it not only reflects PEG’s rich history, it also embodies MI’s 25+ years as the nation’s preeminent provider of professional hand-scoring services.

DesignHammer - A Raleigh Web Design Company   DesignHammer, a Durham Web Design Company ~ Building Smarter Websites