In a development that has reignited debate over the risks of generative artificial intelligence, recent testing has revealed that ChatGPT fails to correctly identify nearly 92 percent of fake videos created using OpenAI’s video-generation tool, Sora. The findings underscore a growing gap between the rapid advancement of AI systems that generate realistic content and the limited ability of existing tools to reliably detect what is real and what is fabricated.
Sora, OpenAI’s text-to-video model, is capable of producing short, highly detailed video clips from simple written prompts. These videos often feature realistic motion, lighting, environments, and human-like behavior, making them difficult for both humans and machines to distinguish from authentic footage. When a set of such videos was presented to ChatGPT and the system was asked to determine whether the clips were real or AI-generated, the model misclassified the vast majority, frequently labeling synthetic content as genuine.
What makes the result particularly striking is that both Sora and ChatGPT are products of the same company. This has raised questions about whether advanced language and multimodal models can meaningfully serve as safeguards against the misuse of generative media, even when they are closely related to the tools creating that content. Observers say the findings challenge common assumptions that artificial intelligence can police itself.
In many cases, ChatGPT not only failed to identify the videos as fake but also went further, describing the fabricated scenes as plausible real-world events. Some responses included confident explanations or contextual interpretations of scenes that never occurred, reinforcing concerns about AI “hallucinations” — instances in which systems produce authoritative-sounding but incorrect information. Such behavior can be especially problematic when users rely on AI tools to verify the authenticity of visual material.
Experts note that the limitations are partly rooted in how these systems are designed. ChatGPT is primarily optimized for understanding and generating language, not for forensic analysis of visual media. While it can interpret images and videos at a basic level, it lacks specialized capabilities to analyze subtle inconsistencies in physics, motion, lighting, or digital artifacts that might reveal synthetic origins. As AI-generated videos become increasingly polished, these cues are becoming harder to detect even for trained analysts.

The findings also highlight a broader industry-wide challenge. Detection technologies have consistently lagged behind generative tools, creating an imbalance that favors the creation of convincing fake content over the ability to identify it. As text-to-video systems like Sora improve, they are narrowing the perceptual gap between real and artificial media, making detection increasingly complex and resource-intensive.
The implications extend far beyond technical performance. Undetectable or easily misidentified fake videos pose serious risks to public trust, particularly in areas such as politics, journalism, finance, and social media. Deepfake videos could be used to spread disinformation, manipulate public opinion, fabricate evidence, or damage reputations, all while appearing credible to viewers and even to AI-based verification tools.
OpenAI has emphasized that ChatGPT is not intended to function as a definitive detector of AI-generated media. Instead, the company has relied on other safeguards, such as watermarking and metadata tagging, to signal when content is produced by tools like Sora. However, critics argue that these measures are insufficient on their own, as watermarks can be removed and metadata can be stripped during editing or re-uploading across platforms.
The failure rate revealed by the tests has fueled calls for a more comprehensive approach to content authenticity. Researchers and policy experts argue that detection should not rely on a single AI model or method. Instead, they advocate for a layered system that combines technical standards, platform-level enforcement, independent verification tools, and public awareness. Some also stress the importance of educating users about the limitations of AI, warning against treating conversational systems as authoritative fact-checkers.
There are also growing demands for clearer labeling of AI-generated content and stronger regulatory frameworks. Governments and technology companies are under increasing pressure to establish rules that ensure transparency without stifling innovation. Proposals include mandatory disclosure of synthetic media, standardized provenance systems that track content from creation to distribution, and penalties for malicious misuse.
At the same time, developers face a difficult balancing act. Improving detection capabilities without undermining creative tools or compromising user privacy remains a complex technical and ethical challenge. As generative models grow more capable, the line between real and artificial content is becoming less distinct, raising fundamental questions about authenticity in the digital age.
The revelation that ChatGPT fails to spot most fake videos produced by Sora serves as a reminder that artificial intelligence is not a neutral or infallible arbiter of truth. While AI tools can assist users in navigating information, they are not substitutes for critical thinking, human judgment, or robust verification systems.
As AI-generated video becomes more accessible and widespread, the stakes will only rise. The current gap between creation and detection suggests that society may be entering a period where seeing is no longer believing — and where even the most advanced AI systems struggle to tell the difference.









