Scientists have identified the main weakness of artificial intelligence
An international team of researchers tested leading language models using the Stroop test — a classic psychological tool for measuring concentration. The results were unexpected: the longer the task, the worse the AI performed — to the point of near-complete failure. The study was published in the journal PNAS Nexus.
The Stroop test works as follows: a subject is shown color words written in colored ink and asked to name the ink color while ignoring the word itself. For example, the word "red" written in blue requires the answer "blue." Humans handle this consistently even with long lists — the brain is able to suppress the automatic response.
The scientists, led by Suketu Patel, administered this test to GPT-4o, Claude 3.5 Sonnet, GPT-5, Claude Opus 4.1, and Gemini 2.5. With short lists (5 words), all systems performed well. As the length increased, accuracy dropped sharply: GPT-4o gave 91% correct answers with 5 words, only 57% with 10, and just 15% with 40. Claude 3.5 held up through 20 words, then plummeted to 24%.
According to the authors, the models "forget" the instruction and revert to what they were trained on most strongly — reading words. This fundamentally distinguishes them from humans, who are capable of maintaining sustained voluntary attention.
Similar News
A popular beverage has been linked to a reduced risk of developing breast cancer
Regular coffee consumption may be associated with a slight reduction in the risk of developing breast cancer, but this is only a correlation, not a direct prote...