- Large language models like ChatGPT exhibit impressive linguistic abilities but lack genuine cognitive capabilities, leading researchers to investigate their rationality and decision-making abilities.
- Experiments revealed that these models struggle with rational decision-making and understanding expected gain, although they can be trained to make relatively rational decisions in specific contexts.
- For high-stakes decision-making applications, human oversight, review, and editing are essential as researchers continue to explore ways to endow these models with a general sense of rationality.
Over the past few years, significant advancements in large language model artificial intelligence systems have been witnessed, with models like ChatGPT leading the charge. These advanced models have the ability to write poetry, engage in human-like conversations, and even pass medical school exams. However, their potential social and economic impact could range from job displacements and increased misinformation to massive productivity gains.
While these large language models display impressive linguistic abilities, it is essential to remember that they do not possess actual cognitive capabilities. They are prone to making elementary errors and even fabricating information. Despite this, their fluency in language often prompts people to engage with them as if they can genuinely think.
This has led researchers to investigate the models’ apparent cognitive abilities and biases, a field of study that has gained significant importance as large language models become more widely available. The origins of this research can be traced back to early large language models like Google’s BERT, which is integrated into its search engine. This integration has given rise to the term “BERTology” and has revealed much about what these models can achieve and where they fall short.
For example, ingeniously designed experiments have demonstrated that many language models struggle with negation (e.g., “what is not”) and performing simple calculations. These models can also exhibit overconfidence in their answers, even when incorrect. As with other modern machine learning algorithms, they often find it challenging to explain their reasoning when asked about their responses.
Language and Cognition: Are They Rational?
Inspired by the extensive research in BERTology and related fields like cognitive science, my student Zhisheng Tang and I sought to answer a seemingly straightforward question: Are large language models rational? The term “rational” may commonly be used as a synonym for sane or reasonable, but it bears a specific meaning within the realm of decision-making.
A decision-making system, be it an individual human or a complex entity like an organization, is considered rational if it can maximize expected gain when faced with a set of choices. The qualifier “expected” is crucial, as it implies that decisions are made amid significant uncertainty.
It might seem peculiar to assume that a model designed to make accurate predictions about words and sentences without truly understanding their meanings could grasp the concept of expected gain. However, a vast body of research indicates that language and cognition are inextricably linked.
A prime example is the groundbreaking research conducted by Edward Sapir and Benjamin Lee Whorf in the early 20th century. They posited that one’s native language and vocabulary could influence the way an individual thinks. The degree to which this holds true remains a contentious subject, but supporting anthropological evidence exists, such as the study of Native American cultures.
For example, speakers of the Zuรฑi language, spoken by the Zuรฑi people in the American Southwest, do not have separate words for orange and yellow, and therefore cannot distinguish between these colors as effectively as speakers of languages that do.
Investigating the Rationality of Language Models
We conducted an extensive set of experiments to determine whether large language models like BERT can understand expected gain and make rational decisions. Our results showed that, in their original form, these models behaved randomly when presented with bet-like choices. This randomness persisted even when faced with trick questions.
Interestingly, we discovered that the model could be trained to make relatively rational decisions using a small set of example questions and answers. While this may initially suggest that the models can do more than merely “play” with language, further experiments revealed that the situation is far more complex.
For example, when we used cards or dice instead of coins to frame our bet questions, the models’ performance dropped significantly by over 25%, although it remained above random selection. This finding suggests that the ability of these models to learn general principles of rational decision-making is, at best, uncertain.
Our more recent case studies using ChatGPT confirmed that decision-making remains a nontrivial and unsolved problem even for larger and more advanced language models.
The Importance of Getting the Decision Right
This area of research is crucial because rational decision-making under uncertain conditions is vital for building systems that can comprehend costs and benefits. If an intelligent system could balance expected costs and benefits effectively, it might outperform humans in tasks like planning around supply chain disruptions during the COVID-19 pandemic, managing inventory, or serving as a financial advisor.
However, our findings emphasize that when employing large language models for such purposes, human oversight, review, and editing are necessary. Until researchers successfully endow these models with a general sense of rationality, they should be approached with caution, particularly in high-stakes decision-making applications.