AI Text Analysis for Language-Learning

by Michael Cai on Jun 17th, 2023

Introduction

It’s inevitable to make mistakes when learning a language. After all, every language is rife with intricate linguistic subtleties and nuances only understandable after years of exposure. As valuable as mistakes can be, they’re practically useless if left ignored or undetected. While instructors and native speakers can definitely be sources for identifying mistakes, oftentimes students might not have these resources at their disposal. That’s where artificial intelligence comes in.

AI text analysis isn’t anything new; tools like Grammarly and spell-check have existed since the late 20th century. However, most of these tools are either hidden behind a paywall or limited in what they’re able to do. Each tool dedicated to a certain task, whether it be checking for grammar or spelling, is restricted to a particular set of capabilities. Additionally, many of these tools only work on the English language, a fact which immediately invalidates a large bulk of existing spelling and grammar checkers when it comes to learning a new language. Recently, however, a more versatile form of AI has emerged in the form of ChatGPT, Google’s Bart, and other generative AI chatbots. These tools are unique in that they aren’t programmed to complete one specific task, but rather, they are given a large dataset consisting of information from all over the internet to learn from, so as to be able to “generate” responses to any question asked (hence their name). Not surprisingly, text analysis is a big feature of generative AI tools. And while specific tools targeted for grammar help might be confined to a small repertoire of capabilities, these models can be customized to help with whatever question, no matter how random or obscure.

Let’s take a look at what generative AI can actually do.

ERROR DETECTION

One of the most obvious uses of generative AI — or any type of AI for that matter — is error detection. More specifically, AI tools can help detect errors in spelling and grammar in a given block of text. Here is an example of a prompt that could be used with such a tool: “Given the text [paste the desired text here], return as many grammar and spelling errors as possible that appear within the text. Format the response as a list.” As one can probably tell, the prompt is quite lengthy. That’s because the more information given to the AI, the more the response will align with what the user has in mind. As with all generative AI input prompts, the one provided above is customizable. If the user does not want their output to be in a list form, they could replace the “Format the response as a list” section with “Format the response in a paragraph” or “Format the response in a single sentence.” No matter the change, each sentence should always resemble a command. This means starting the prompt with an action word (verbs such as “return” or “provide” or “list”).

The prompt provided is available here through LowTech AI. It’s important to note that it is not necessary to specify the language in which the text is written. This is because generative AI models are typically trained on billions of parameters and words. Due their large datasets — also known as “corpora” — these models most likely have encountered every language documented on the internet, and thus are able to identify which language is which by themselves. That said, generative AI models’ knowledge is limited to the corpora from which they are trained. So if the user inputs an extremely obscure language, the model may misinterpret the input text. In the case where languages can be confused due to similar syntax, orthography, or vocabulary, specifying the language explicitly in the prompt can be very helpful (for instance, include “The text is in language X”).

All this comes with a caveat; if the model is struggling to identify the language, chances are the language isn't well-documented online. When it comes to generative AI, models learn a language by gathering a ton of high quality sentences and phrases. The model then generates sentences based on patterns and statistical relationships it observed in its training data. As a result, when the training data is limited in size or low in quality, the output might end up being incoherent or a bunch of gibberish. At that point it might be more worthwhile to consult a native speaker than AI.

Due to this limitation, we will focus on widely-known languages. For example, when the input text is written in Latin, the LowTech AI interface produces the following output:

Indeed, there are still some errors in the output, such as point number five, which states “‘in’ should be lowercase as it is a preposition, not a proper noun” even though “in” is already lowercase. Different AI models will have different degrees of accuracy. One of the most reliable is ChatGPT, which gave the following output in response to the same prompt:

INPUT: Given the text “sallve, pueri in villa est,” return as many grammar and spelling errors as possible that appear within the text. Format the response in a list.

OUTPUT: Here is a list of grammar and spelling errors within the given text:

"sallve" - The correct spelling should be "salve" (assuming you meant to say "salve," which means "hello" or "greetings" in Latin).
"pueri in villa est" - The verb "est" should agree with the subject "pueri" (which means "boys" in Latin). The correct form should be "pueri in villa sunt" to indicate that "boys are in the villa."

Please note that I have assumed the context to be Latin based on the presence of Latin words in the text. If you intended it to be in a different language or if there are specific grammar rules you were following, please provide more details for a more accurate analysis.

As of now, the ChatGPT model provides more clear, concise, and accurate information than other chatbots. It also tends to provide supplementary information beneficial to the task at hand (here, it refers to the tendency for the model to “guess” what language the input is written in). Consequently, examples from here on out will be given in terms of ChatGPT outputs. The prompts are easily transferable from LowTech AI to ChatGPT: users can retrieve the prompt by pressing “Show Prompt” on LowTech AI and paste the prompt into ChatGPT. Note that there may be a bolded and underlined “text” that serves as a placeholder for the input text. Just replace the placeholder with the actual input text when submitting to ChatGPT.

Writing suggestions

Writing suggestions can be broken down into two categories: help with vocabulary and help with grammatical constructions.

Help with vocabulary is probably the more straightforward of the two: the AI will return synonyms for particular words that it considers more fitting for the situation. Here is an example prompt on LowTech AI. And below is an example of the prompt in ChatGPT:

INPUT: Given the following text, return what words should be substituted for synonyms that better describe the situation or synonyms that avoid repetition (format the output in a bullet-pointed list): el hombre que camina por la calle verá un gran globo. decidirá comprárselo a su hijo, que cumple años mañana.

OUTPUT: Here is a bullet-pointed list suggesting word substitutions for synonyms or to avoid repetition:

el hombre que camina por la calle verá un gran globo. decidirá comprárselo a su hijo, que cumple años mañana.
- el hombre que camina por la calle -> un transeúnte
- verá -> observará
- un gran globo -> un enorme globo
- decidirá -> optará por
- comprárselo a su hijo -> adquirírselo a su hijo
- que cumple años mañana -> cuyo cumpleaños es mañana

Revised sentence: Un transeúnte observará un enorme globo y optará por adquirírselo a su hijo, cuyo cumpleaños es mañana.

The second category (help with grammatical constructions) means providing different ways to express the same information but in more concise, non-repetitive, or appropriate ways. Oftentimes this requires changing the grammatical or syntactical structure of the sentence, hence why this category relates to “grammatical constructions.” Here is an example prompt on LowTech AI. And below is an example of the prompt in ChatGPT. To further clarify the task at hand, the input text is given in English as opposed to a foreign language.

INPUT: Given the following text, return what phrases should be substituted for grammatical constructions that better describe the situation or that avoid repetition (format the output in a bullet-pointed list).

The man walking down the street will see a big balloon. He will decide to buy it for his son, who has his birthday tomorrow.

OUTPUT: Here is a bullet-pointed list suggesting phrase substitutions for grammatical constructions to better describe the situation or avoid repetition:

the man walking down the street will see a big balloon. he will decide to buy it for his son, who has his birthday tomorrow.
- the man walking down the street -> a pedestrian strolling along the road
- will see -> will come across
- a big balloon -> a large balloon
- he will decide to buy it for his son -> he will make the decision to purchase it for his son
- who has his birthday tomorrow -> whose birthday is tomorrow

Revised sentence: A pedestrian strolling along the road will come across a large balloon. He will make the decision to purchase it for his son, whose birthday is tomorrow.

Sentiment analysis

Another major roadblock when learning new languages is understanding connotations of words. One word can take on many different meanings depending on the scenario, and traditionally, students learn to grasp these subtleties after exhaustive exposure with the language. However, through sentiment analysis, AI can help expedite the process. Given a text input, the AI can return whether the connotation of the text is positive, negative, or something else. Here is the prompt on LowTech AI. The ChatGPT example below is again in English for clarity, but it can be applied to any language. Also, two examples are juxtaposed to fully display the effect of sentiment analysis.

EXAMPLE 1

INPUT: Given the following text, provide a sentiment analysis of the text (whether the sentiment is positive, negative, or has some other quality), and provide a reason why the sentiment was chosen. The bug is too large

OUTPUT: Sentiment Analysis: Negative

Reasoning: The sentiment of the text is considered negative because the phrase "too large" implies that the bug's size exceeds a desirable or acceptable level. The word "too" indicates an excessive or undesirable extent, suggesting that the bug's size is seen as a negative attribute or inconvenience. The use of "large" alone might have been neutral or positive, but when combined with "too," it conveys a negative sentiment.

EXAMPLE 2

OUTPUT: Sentiment Analysis: Neutral/Positive

Reasoning: The sentiment of the text is considered neutral or slightly positive. The phrase "very large" indicates that the bug's size is significant or substantial. While the word "large" itself is neutral, the addition of "very" intensifies the magnitude, suggesting a more positive perception. The text does not convey any explicit negative connotations or indications of dissatisfaction, making it overall a neutral or slightly positive statement about the bug's size.

Conclusion

Generative AI is an extremely powerful tool when it comes to language learning. Whereas traditional methods are often both time and resource intensive, tools like ChatGPT can be convenient and effective, preparing students for full-on conversation and communication in their desired languages. That said, AI is not a substitute for native speakers. The fundamental essence of language learning entails interacting with different societies and the world around us. And while AI can help, as of now there isn’t a good replacement for exploring the language through interaction.