OpenAI is pushing back forcefully against a federal court order that would require the company to turn over 20 million ChatGPT conversation logs as part of an ongoing copyright lawsuit brought by a coalition of major news organizations. The dispute, which has escalated rapidly in recent weeks, has become one of the most consequential legal battles yet in the fast-evolving world of artificial intelligence—pitting concerns about user privacy against demands for transparency in how AI systems are trained.
The lawsuit centers on allegations that OpenAI used copyrighted news articles without permission to train its models, including the version of ChatGPT widely used today. The plaintiffs, led by The New York Times, argue that in order to determine the extent of the alleged misuse, they need access to a massive archive of chat logs from real users. Their lawyers claim the conversations could contain outputs revealing traces of their copyrighted material, which they say would be critical evidence in the case.

Earlier this month, a federal magistrate judge agreed with the plaintiffs, issuing an order requiring OpenAI to produce the full set of 20 million anonymized conversations once it completes its de-identification process. The judge argued that protective measures—including redactions and confidentiality restrictions—would sufficiently protect user privacy while allowing the plaintiffs to perform their analysis.
OpenAI strongly disagrees. In sharply worded court filings and a public statement to its users, the company warned that complying with the order would set a dangerous precedent. According to OpenAI, only a minuscule fraction of the requested data—far less than one percent—could plausibly be relevant to the plaintiffs’ claims. For the remaining millions of chats, the company contends, users’ private conversations would be swept up in what it describes as an overbroad, speculative search for evidence.
The company says the court order would require the turnover of entire conversations, not just individual responses that might be linked to copyrighted text. Many of these conversations, OpenAI notes, include sensitive personal information: health concerns, financial questions, emotional struggles, workplace issues, and other intimate details users expected would remain private. Even after anonymization, OpenAI believes the risk of unintended disclosure would be too high.
OpenAI’s Chief Information Security Officer has been at the forefront of the response, arguing that no U.S. court has ever compelled a technology company to disclose such a large volume of personal user content in the discovery phase of a lawsuit. The company maintains that the scope of the order is “unprecedented” and could undermine public trust not just in AI platforms but in digital services more broadly.
To address the court’s concerns while safeguarding user privacy, OpenAI has offered a set of alternatives. One proposal would limit discovery to conversations that specifically reference the plaintiffs’ publications or contain clear indications that copyrighted material may have been reproduced. Another would allow plaintiffs to search within a smaller, carefully selected dataset rather than the full 20 million logs. OpenAI argues that these narrower approaches would still allow the plaintiffs to gather relevant evidence without exposing millions of unrelated users to privacy risks.
Thus far, the court has not accepted these proposals. The company has filed a motion asking the judge to reconsider the order, and it is prepared to take the dispute to a higher court if the ruling stands. The company’s legal team emphasizes that the issue is not simply about protecting its business interests but about defending its users’ expectation of confidentiality.
The legal battle has stirred intense debate within the broader tech and policy communities. Privacy advocates have expressed alarm at the scale of the disclosure being sought, warning that even anonymized chat data can be re-identified under certain circumstances—especially when conversations include unusual details or references to unique events. Some experts fear the case could weaken legal norms that protect user data from broad discovery demands.
Meanwhile, organizations supporting the plaintiffs argue that powerful AI companies should not be shielded from scrutiny, especially when the technology in question has been trained on massive amounts of human-generated text. They insist that meaningful accountability is impossible without visibility into how systems like ChatGPT behave at scale, including how they might reproduce copyrighted content.
Caught in the middle are millions of ChatGPT users, many of whom are just now learning that their past conversations could potentially be shared—albeit in anonymized form—as part of the lawsuit. OpenAI has sought to reassure users, clarifying that it is fighting “as hard as possible” to minimize or prevent disclosure. The company also announced plans to accelerate the rollout of enhanced privacy features, including options that would prevent certain types of chats from being stored or even accessible for legal requests.
The case is shaping up to be a major test of how courts will balance privacy, intellectual property, and the demands of transparency in the AI era. Its outcome could influence how future lawsuits against AI companies unfold and determine what kinds of user data can be compelled in discovery.
For now, OpenAI remains firm: while it is willing to cooperate within reasonable limits, it will not hand over millions of user conversations without exhausting every possible avenue of appeal. The stakes, the company argues, extend far beyond a single lawsuit—touching on the foundational question of how private our digital conversations truly are.








