OpenAI’s GPT-4.5 has officially become the first AI model to pass the Turing Test, according to researchers from the University of California in San Diego (UC San Diego).
The Turing Test (initially known as the “imitation game”), named after British mathematician and World War II codebreaker Alan Turing, has been regarded as the gold standard for determining whether a machine can demonstrate human-like intelligence for decades. This means that an AI model that passes the test has reached a level of communication comparable to that of humans.
The researchers evaluated four systems, namely ELIZA (Weizenbaum, 1966), GPT-4.5 (OpenAI, 2023), LLaMa-3.1-405B (Grattafiori, 2024), and GPT-4.5 (OpenAI).
Researchers administered the Turing test to 126 undergraduate students from the University of California, San Diego, and another 158 individuals from the online data pool Prolific.
The participants engaged in a three-party version of the Turing Test, which involved five minutes of simultaneous conversations with another human participant and one of the systems before deciding which conversation involved which.
Each system was instructed to adopt a human persona using the same prompts. After the tests, the GPT-4.5 was evaluated to be humans 73% of the time, which is better than interrogators selecting the human participant.
LLaMa-3.1 was selected 56% of the time to be human, while baseline models ELIZA and GPT-4o had relatively low success rates, only at 23% and 21%, respectively.
While the persona-prompted GPT-4.5 was successful in fooling the participants, its effectiveness was significantly reduced when the persona instructions were removed, resulting in a performance drop to only 36%.
Cameron Jones, the lead researcher at UC San Diego’s Language and Cognition Lab, shared a thread on X summarizing the study. He added that in previous research using a 2-part setup, GPT-4 was judged to be human almost half the time, while the Turing test using the original 3-party setup yielded better results.
The success of GPT-4.5 has sparked both excitement and concern.
On the one hand, this development means AI can join the workforce and help industries where AI can replace human agents in low-level interactions, such as customer service and mental health, without users realizing they are talking to a machine.
On the other hand, this raises serious concerns about the future of jobs and how AI can displace people in various fields, such as customer service and education. Scammers can also exploit it to execute social engineering tactics, including phishing and other scams.
Despite the interesting results and performance of the recent study, Psychology Today stated that this research does not show that GPT-4.5 can fool us but rather reveals how easily people can be fooled than we think.
“The Turing Test has inverted: It’s no longer a test of machines; it’s a test of us. And increasingly, we’re failing,” they added.