When “Open” Becomes Extraction

The OER–AI Conundrum and What It Reveals About Educational Infrastructure

The Open Access movement promised to democratize knowledge.
Two decades later, the bargain has shifted: free to read is being treated as free to train.

Academic knowledge—produced through public investment, stewarded by universities and libraries, and shared in the public interest—is now being absorbed at scale into commercial AI systems. Attribution dissolves. Intellectual lineages blur. Enormous private value is generated, while the public infrastructure that created that value weakens.

This is not a copyright technicality or a licensing oversight.
It is what happens when we fail to think infrastructurally about education technology.

When we mistake access for equity, openness for sustainability, and consumption for stewardship, extraction becomes normalized.

The Problem We Didn’t See Coming

When researchers chose Creative Commons licenses, they imagined human readers: students without institutional access, practitioners in under-resourced contexts, public audiences engaging with publicly funded research.

They did not imagine large language models ingesting their work at scale—fragmenting it into tokens and recombining it into AI outputs that obscure original sources entirely.

The mechanism of harm is specific. When AI systems synthesize information from many unattributed sources, they create what Stephanie Decker calls citation laundering. Intellectual lineages become impossible to trace. Researchers who built foundational knowledge lose credit. Academic incentive structures begin to erode.

This matters beyond individual careers. Citation practices are how claims remain contestable—how knowledge can be verified, challenged, and refined. When lineage disappears, so does the ability to locate where an idea came from or to evaluate how it traveled.

What weakens here is not just attribution, but the infrastructure of knowledge validation itself.

Why Licenses Can’t Solve Infrastructure Problems

The instinctive response has been legal: more restrictive licenses, clearer terms of use, stronger enforcement.

But Creative Commons’ own guidance makes the limits explicit. AI training is often permitted under existing copyright law. License conditions frequently do not apply to machine reuse. Fair use doctrine in the United States and text-and-data-mining exceptions in the European Union were never designed for extraction at this scale.

Creative Commons’ emerging preference signals framework is an important development—not because it resolves the problem, but because it acknowledges something fundamental: legal tools alone cannot govern infrastructure.

When educational resources become training data for commercial systems without reciprocity, the failure is not legal. It is architectural.

The Pattern of Private Capture

This is not confined to scholarly publishing. We have seen the same pattern across education technology:

Public institutions generate knowledge and data.
Openness enables scale.
Commercial platforms extract value.
Public systems receive little back—no attribution, no governance role, no reinvestment.

As Anna Tumadóttir, CEO of Creative Commons, has argued, innovation built on the commons carries an obligation to the commons. That obligation is largely absent in current AI development.

The central question is not how to license content better.
It is who stewards educational knowledge, and in whose interest.

Through the Lens of Sustainable Learning

The Sustainable Learning Framework treats technology as infrastructure, not tools. From that lens, the OER–AI conundrum exposes four structural fault lines.

Dependency over capacity
Academic institutions do not control how their knowledge is used, represented, or attributed in AI systems trained on their work. They are positioned as consumers of tools built from their own intellectual labor.

Extraction over reciprocity
Commercial AI systems benefit from decades of public investment in research and open access infrastructure, while contributing little back to libraries, repositories, or research communities.

Platforms over infrastructure
AI interfaces increasingly mediate access to knowledge, bypassing peer review, citation systems, and disciplinary context. What is lost is not only credit, but the conditions that make knowledge trustworthy.

Consumption over stewardship
Open Access positioned universities and libraries as stewards—entities responsible for preservation, curation, and public accountability. AI training treats them as content sources, valuable only for what can be extracted.

These dynamics are not accidental. They are the predictable outcome of treating openness as sufficient without asking: open for whom, and to what end?

What UNESCO Sees—and What It Misses

UNESCO’s recent guidance has rightly raised concerns about ethical, human-centered AI and warned against “open-washing.” Contributions from open education scholars emphasize that genuine openness must be rooted in public, transparent, collaborative systems.

But the center of gravity remains responsible adoption, not public infrastructure. The guidance focuses on how education systems should use AI, more than on whether they should be building and governing their own AI systems, models, and data commons.

That distinction matters. It is the difference between becoming a sophisticated consumer of AI and remaining a steward of educational infrastructure.

The absence of meaningful attention to AI training in recent Open Access policy consultations underscores how unprepared existing governance frameworks are. AI is treated as an application layer, not as infrastructure. Institutions are imagined as content producers, not infrastructure actors.

The Hinge Point

The OER–AI tensions are making infrastructure questions unavoidable. Researchers are watching attribution systems weaken. Libraries are seeing their stewardship role bypassed. Universities are recognizing that decades of investment in open access infrastructure are being captured by private AI ecosystems they do not control.

What happens next is not predetermined.

Accommodation deepens dependency.
Infrastructure reclamation opens the possibility of public-interest AI.

This is not a technical choice. It is a political one—about who governs educational systems and whose interests they ultimately serve.

Resources

The following resources inform this week’s UNBOUND analysis of how open educational resources, scholarly publishing, and AI training infrastructures intersect—and why governance, attribution, and reciprocity are now central educational infrastructure questions.

Open Access, AI Training, and Scholarly Attribution

Creative Commons and the Limits of Licensing

Data, Training, and Transparency

Global Governance and Policy Context

 
 
 
Previous
Previous

Infrastructure, Not Emergency: When Rapid Response Becomes Core Capacity

Next
Next

Beyond Digital as Usual: Digital Stewardship as Public Infrastructure