A newly surfaced internal memo has revealed that OpenAI has accused Chinese artificial intelligence startup DeepSeek of training its advanced AI systems by distilling outputs from leading U.S.-developed models, raising fresh concerns over intellectual property protection, competitive fairness, and AI safety in the rapidly intensifying global technology race.
According to the memo, which was circulated to U.S. policymakers and congressional staff, OpenAI alleges that DeepSeek used a technique known as model distillation to replicate and accelerate the performance of its own AI systems by systematically querying high-end American models and using their responses as training data. The company argues that this approach, when conducted without authorization, effectively transfers the reasoning patterns and capabilities of proprietary systems into competing models.
![]()
Model distillation is a recognized technical method in artificial intelligence development, typically used by organizations internally to create smaller, faster, and cheaper versions of their own large models. However, OpenAI’s memo distinguishes between authorized internal distillation and what it describes as external extraction — where a third party uses another company’s deployed model outputs at scale to train a rival system.
OpenAI claims that usage patterns and technical signals indicated automated, high-volume querying consistent with dataset generation rather than normal human use. The memo states that such activity appeared designed to capture structured outputs across a wide range of domains, including technical reasoning, coding, and analytical tasks. According to the document, safeguards were triggered after internal monitoring systems flagged behavior associated with scripted access and routing through intermediary infrastructure intended to obscure origin points.
The allegations come at a time when competition between U.S. and Chinese AI firms has accelerated sharply. Over the past year, DeepSeek has drawn international attention for releasing high-performance large language models that observers say approach the capabilities of leading Western systems while operating with lower reported training costs. That rapid progress has prompted debate within the industry over whether newer entrants are achieving breakthroughs through efficiency innovations, alternative architectures, or heavy reliance on existing frontier model outputs.
OpenAI’s memo frames the issue as not only commercial but strategic. The company warns that unauthorized distillation could undermine the economic incentives that support large-scale AI research and development. Training frontier models requires vast computational infrastructure, specialized talent, and extensive safety testing. If competitors can shortcut that investment by extracting behavior from deployed systems, the memo argues, it could distort market dynamics and weaken the motivation for long-term foundational research.
The document also raises safety concerns. OpenAI notes that its models include layered safeguards and behavioral tuning designed to limit harmful outputs and reduce misuse risk. Distilled replicas, it argues, may reproduce advanced capabilities without reliably inheriting the same safety controls, monitoring hooks, or update mechanisms. This could lead to powerful systems circulating without consistent guardrails or accountability channels.
Industry experts note that detecting distillation from outside access alone is technically complex. AI systems are trained on enormous mixtures of data, and similarities in output style or reasoning do not automatically prove improper training practices. However, unusually patterned query behavior, especially when conducted at scale, can raise red flags for platform providers. Companies increasingly deploy anomaly detection and rate-limit systems to prevent automated harvesting of model outputs.
The dispute highlights a broader gray zone in AI governance. While copyright and trade secret laws cover code and training data in many contexts, legal frameworks around model behavior and generated outputs remain unsettled in many jurisdictions. Whether large-scale output harvesting constitutes infringement, contract violation, or unfair competition is still being tested through policy debates and early court cases worldwide.
Technology policy analysts say the memo could influence upcoming regulatory discussions in Washington around AI export controls, access restrictions, and cross-border model use. Some lawmakers have already expressed concern that advanced AI capabilities could spread through indirect channels even when hardware and chip exports are tightly controlled. If model distillation across borders becomes easier, enforcement strategies may need to shift from hardware controls toward service-level protections and usage verification.
DeepSeek has not publicly detailed its full training pipeline and has not formally responded to the specific allegations outlined in the memo. The company has previously stated that it relies on a combination of publicly available data, licensed material, and internally generated synthetic data to build its models. Like many AI developers, it also uses teacher-student training approaches — a broad category that can include distillation — though the provenance of teacher models is central to the present controversy.
The episode underscores how AI competition is moving beyond raw computing power into questions of method, access, and provenance. As leading models become more capable, their outputs themselves gain value as structured knowledge sources. That creates both opportunity and risk: opportunity for efficiency gains through compression techniques, and risk of capability transfer without compensation or oversight.
For now, OpenAI says it is strengthening detection systems, tightening usage controls, and expanding investigation capacity to identify large-scale automated extraction attempts. The company’s memo calls for clearer rules and cooperative enforcement mechanisms to define acceptable AI training practices across borders.
As global AI development accelerates, disputes over how models are trained — and from whose knowledge — are likely to become more frequent, more technical, and more consequential for the future structure of the industry.









