Teachers Professionals Prompt Engineering

AI as a Foreign Language Assignment Grader

by Michael Cai on Jul 1st, 2024

AI as a Foreign Language Assignment Grader

Introduction

When it comes to recent major influences on education, there are few things as prominent as generative AI chatbots. Comprising tools like ChatGPT and Google’s Bard, generative AI has swept through schools across the country at an unprecedented rate. Their ability to find solutions to problems and create written works that can seemingly bypass plagiarism checkers (N.B. there are now AI-powered tools that can detect AI-generated work) has forced curriculums to rapidly evolve with this new “AI revolution.” It’s understandable, then, that the terms “ChatGPT” and “AI helper” have had a suboptimal reputation among teachers and instructors. But ironically enough, many educators have themselves begun to use AI tools, and for good reason as well! Many tasks required by instructors are quite tedious. From grading multiple choice answers to writing trivial emails, these duties are often interpreted as time-consuming and detract from the time that could be spent communicating with students and designing quality lesson plans. The former can be especially helpful, as much of the time, grading can be little more than a brain-numbing tedium. And given the customizable nature of AI tools, they serve as the perfect solution, especially for groups like world language instructors.

Why AI with Language Educators?

Before jumping into applications, it is first important to understand the basics of why AI tools in particular are so powerful. Unlike traditional modes of technological tools, modern AI models implement what is known as a neural network, a complex, multilayer set of connections between nodes (or “neurons”) whose nonlinearity can theoretically help identify almost any type of pattern or connection. These networks offer a sense of malleability to AI models, one which can enable the model to fit any mold required by the user. This allows for customization. For instance, if teachers properly query ChatGPT for particular grading help (using something called a “prompt”), the AI chatbot could, in theory, perform whatever grading task was asked.

So why is this particularly useful for language teachers? Well, unlike for other subjects, automatically grading language assignments often requires multiple tools for multiple levels of processing: first would have to translate the text into something the particular program would understand (e.g., English for programs designed around the English language), and only then could it perform the grading task at hand. Generative AI condenses this into one step. Chatbots like ChatGPT are trained on information scraped from all over the internet. As a result, users can expect these models to understand any language well documented online. So, the chatbot can effectively interpret the language in its native language, and depending on the prompt, output something in the same language or an entirely different one.

Additionally, fast grading is particularly important for learning languages. When first learning a language, exercises can seem much more like a math problem than something you would find in an English class. As a result, uses of writing style, tone, and sentence flow are typically not as important as the black-and-white topics like basic grammar. AI thrives in these settings, since it can easily identify topics that have an objectivity to them. Also, when it comes to beginning language exercises, quantity may sometimes be more important than quality. In order to fluently learn a language, individuals require constant exposure. This means repetition on a massive scale in order for people to be able to communicate the new grammar and vocabulary as if it were second nature. As a result, perhaps more than any other subject, quick grading is incredibly important.

Simple Question Graders

So how can educators exactly use AI to help with grading? The most obvious way AI can help is with short answer questions and multiple choice questions. Interestingly, generative AI performs better with short answer questions than the latter. The reason is that generative AI chatbots form sentences word by word, where each word is the word that’s the most statistically likely to appear given the existing words in the sentence. As a result, they have no built-in function of comparing multiple choice answers to solutions, so when it comes to something as binary as that, the mess-ups may seem more apparent. Short answer questions, on the other hand, offer more leeway and thus allow these chatbots to shine.

For explicitly, here is an example prompt tool that could help with checking the solution to a short answer question. The tool is provided on Lowtech AI, but for diversity, here is the response and prompt as seen on ChatGPT:

INPUT: Here is a prompt "introduce yourself in Latin ." Evaluate how well the following response answers that prompt and provide a grade. Here is the response "Salvete omnes! Mihi nomen est Michael. Como te habes?"

OUTPUT: The provided response is a good attempt at introducing oneself in Latin. The response translates to:

"Hello, everyone! My name is Michael. How are you doing?"

The introduction contains the basic elements, such as a greeting and stating the name. However, it lacks some essential information that is often included in a self-introduction, such as age, occupation, or background. Additionally, the last part, "Como te habes?" is not a correct translation for "How are you doing?" in Latin. The correct Latin phrase for "How are you?" would be "Quomodo vales?" or "Quid agis?".

Considering these points, I would grade the response as follows:

Grade: B-

The response demonstrates a basic understanding of introducing oneself in Latin but lacks some key information and has a minor error in the phrase used to ask how someone is doing. With a few improvements, it could receive a higher grade.

One thing to note (as mentioned in the prompt tool), it is important to include the foreign language in the "prompt" field if applicable. Otherwise, the grader might provide a very low grade, thinking that the student responded in some random language unrelated to the task. Although this tool works best for computer text, written text can be easily transferred. Another AI-tool known as OCR (optical character recognition) can be used to convert a picture of some text to computer text that AI tools can understand. As of now, ChatGPT and other AI chatbots do not have free, built-in OCR capabilities, but there are many public OCR tools on the internet.

In contrast, here is what a hypothetical multiple choice grader would return:

INPUT: Here are a student's answers for a multiple choice set A, B, C, True, D, E, False. Here are the solutions A, C, C, False, D, E, False. Return the grade the student received as a fraction of the points earned divided by the total amount of points possible.

OUTPUT: Let's calculate the grade the student received as a fraction.

Total points possible = Number of questions * Points per question Total points possible = 7 * 1 = 7

Number of points earned = Number of correct answers * Points per question Number of points earned = (3 A's + 2 C's + 1 False + 1 E) * 1 = 3 + 2 + 1 + 1 = 7

Grade = Points earned / Total points possible Grade = 7 / 7 = 1

So, the student received a grade of 1, which means they answered all the questions correctly and earned full marks.

As previously mentioned, the AI performs poorly with multiple choice. Currently, these forms of AI are not best for applications with fine margins of error. More specialized AI tools are used when it comes to high-risk areas like cancer detection or stock prediction, since they are specifically tuned to limit error when conducting the task. Thankfully, many online multiple choice graders have existed even before the AI boom we see today, though an OCR may be needed to transfer actual text to computer text as per the previous example.

Writing analysis

Another good application of AI is text analysis. Simple text analysis such as identifying grammar errors or inappropriate wording or connotation are discussed in another article found here. There you will find examples and prompt tools concerning these topics.

One added benefit of using AI for error detection in this sense is that in addition to grading and finding mistakes, the AI can provide a list of correct uses of the specific subject that was used improperly or corresponding further practice in a very short amount of time. This can be achieved with follow-up prompts, or prompts asked in response to answers given by the chatbot. The chatbot has a form of memory that allows it to remember previous responses, so it will remember details previously mentioned.

A new tool more specific to grading is the custom text analysis prompt, found here. This prompt creates an evaluation of a sample response based on a specified criteria. An example is provided below:

INPUT: Evaluate the following student's work based on the use of the imperative. Respond in English. Here is the students work: "Discipule," magister dixit, "venis huc."

OUTPUT: The student's work shows an attempt to use the imperative form in Latin. However, there are a few issues that need to be addressed:

In Latin, the imperative form is used to give commands or instructions. The verb "venis" should be in the imperative form to match the command, but it is in the indicative form, which indicates a statement or a fact.
The word "Discipule" should be written as "Discipule" (if addressing one male student) or "Discipula" (if addressing one female student). Alternatively, if addressing multiple students, it should be "Discipuli" (for male students) or "Discipulae" (for female students).
The word "huc" is correct, meaning "here," indicating the place where the student should come.

A corrected version of the student's work using the imperative form would be:

"Discipule/Discipula/Discipuli/Discipulae, veni huc."

This translates to: "Student, come here."

Remember, the imperative form in Latin is essential for giving commands or instructions, so it's important to use it correctly to convey the intended meaning.

As you can see, the AI chatbot provides additional information it thinks could be useful. And again, even more information (such as sample practice problems or correct uses) can be extracted by generating further follow-up prompts.

Conclusion

Indeed, AI has the full potential to expedite the process of grading, a task especially important for language teachers. But as hinted at before, there are caveats: first, as with the multiple choice tool, AI chatbots are not perfect. They can make mistakes when grading, and as a result, each evaluation should be checked. Additionally, evaluations that the AI model returns will be lacking in context and emotion. It is almost impossible to input in a single prompt every experience with or fact about a specific student, so AI tools will always know less about the person providing the solution than the teacher. These are drawbacks that every person intending on using AI should keep in mind. So, even though AI can be a fantastic auxiliary tool, it should not replace the human connection and guidance necessary for quality educator-student interactions.