ChatGPT Education Study Retracted
A study that was widely cited in academia and social media, which was considered to provide significant evidence that ChatGPT improves student learning outcomes, has recently been officially retracted by its publisher. The retraction was due to multiple discrepancies and methodological issues during the meta-analysis process, which severely undermined the reliability of its conclusions.
The paper, published in May 2025 in the journal Humanities & Social Sciences Communications by Springer Nature, attempted to synthesize results from 51 studies to assess differences in student performance when using and not using ChatGPT. The paper claimed that using ChatGPT had a “significant positive impact” on “improving academic performance,” a moderate positive impact on “enhancing learning perception,” and could “promote higher-order thinking skills.”

After its publication, the study quickly gained traction in both academic and public discourse. Within the Springer Nature framework, it was cited 262 times, with overall citations exceeding 500 and nearly 500,000 reads. Due to its continued dissemination on social media, the paper ranked in the top percentile of journal articles and was cited and referenced by many as “the first hard evidence” supporting the benefits of ChatGPT for learning. However, as the paper’s influence spread rapidly, criticisms began to accumulate, ultimately leading to its retraction.
Ben Williamson, a senior lecturer at the University of Edinburgh’s Centre for Digital Education and Edinburgh Futures Institute, pointed out that the conclusions drawn by the authors were highly sensational, claiming that ChatGPT could significantly improve learning outcomes, which led to its widespread dissemination on social media as a “gold standard” of evidence. He criticized the meta-analysis for its concerning methods in integrating original studies: on one hand, it seemed to include “very low-quality” research, and on the other, it forcibly combined results from studies that were vastly different in methods, subjects, and samples, making direct comparisons impossible. In an interview with Ars Technica, Williamson stated that it appeared to be a paper that “should not have been published.”
The contradictions in the timeline also raised alarms in the academic community. ChatGPT was only made publicly available at the end of 2022, leaving researchers with a very narrow window to complete multiple high-quality, peer-reviewed empirical studies and ultimately compile them into a meta-analysis. Williamson argued that within such a short timeframe, it was nearly impossible to produce dozens of high-quality studies sufficient to support a rigorous meta-analysis, thus fundamentally questioning the paper’s “sample basis.”
In addition to Williamson, other researchers had also issued early warnings regarding this study. Ilkka Tuomi, chief scientist at Meaning Processing Ltd., criticized on LinkedIn that such meta-analyses often mix results from studies that are not actually comparable, leading to conclusions based on vague or inconsistent metrics. He cautioned that complex statistical tools can easily create an illusion of being “highly scientific,” even when the underlying data quality is unreliable, ultimately producing seemingly credible numbers and charts.
As the paper continued to be shared on social media, many of the “qualifying conditions” and research details originally present in the text were diluted, leaving only the headline conclusion that “ChatGPT significantly enhances learning outcomes” to circulate repeatedly. Williamson noted that this kind of dissemination, which leaves behind “only slogans without context,” exacerbated public misjudgment regarding the role of AI in education and weakened the academic community’s internal discussion space regarding evidence quality. He expressed concern that even though the paper has been officially retracted, researchers and practitioners who previously cited or shared it may not be aware of this update. This means that the core message, “ChatGPT can significantly improve learning performance,” may continue to be treated as a given fact in many contexts.
The timing of this retraction coincides closely with the ongoing debates in the education system surrounding generative AI. Some schools and universities are still trying to find ways to limit the misuse of AI in assignments and exams, particularly to prevent cheating via chatbots; at the same time, tech companies are continuously launching various “learning assistant” and “homework tutoring” features, marketing chatbots as the next generation of learning tools. Alongside this, there is a reflection on the “fully digital classroom,” with some countries emphasizing the importance of physical textbooks and handwritten assignments, attempting to correct the over-reliance on screens and online platforms.
For researchers like Williamson, the frustration stemming from this incident goes beyond just a single paper; it reflects the overall atmosphere. In recent years, discussions surrounding generative AI have often been dominated by “hype” and optimistic narratives, while genuinely rigorous research supported by sufficient evidence has been noticeably lacking. He believes this retraction serves as a reminder that rather than rushing to announce that “AI has completely transformed education,” it is more important to first solidly answer a more fundamental question: how and under what conditions do these tools actually influence the behaviors and outcomes of students and teachers in specific educational practices?
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.