No Knowledge Without Source : Collecting, Preserving and Sharing Software in a Risky World
26 novembre 2025
Session 2 : Diversité des interdépendances entre science ouverte et intelligence artificielle
- No Knowledge Without Source : Collecting, Preserving and Sharing Software in a Risky World - Roberto Di Cosmo (Software Heritage)
Abstract : Software is a public good—and today it is also a public risk if we fail to preserve it. Software Heritage was created to collect, preserve, and share all publicly available source code at planetary scale (now 26+ billion unique files from ~400 million projects) and to make each artifact citable and verifiable via the intrinsic SWHID identifier (ISO 18670).
This talk shares what we have learned while operating a universal archive at scale:
- handling GDPR-driven requests (e.g., author name changes) and takedowns with due process;
- the huge challenge of the massive sharing of unlicensed code, or how copyright law is in practice nullifying a big part of existing software altruism;
- maintaining provenance, integrity and accountability across diverse forges;
- and, more recently, confronting the AI wave and the fragility of modern digital infrastructure.
I will summarize our “LLMs for code” stance—three simple principles: giving back the foundation models, transparency about training data, respect for authorship and licensing—then discuss how the CodeCommons initiative operationalizes them to enable responsible, open AI on software.
Finally, I’ll address a clear and present danger: Europe’s dependence on non-EU platforms for critical code and packages. A single disruption can stall research pipelines, and much more, overnight.
We propose a coalition effort—grounded in data altruism—to fund a rapid, massive expansion of Software Heritage (mirrors, package-manager fallbacks, 24/7 resilience) so academia can fulfill its duty to provide trustworthy digital resources to society, inform policy, and foster fair innovation.
Mots clés : artificielle cdga cellule cnrs code cosmo days di donnees generative gricad heritage intelligence logiciels open ouverte roberto science software uga
Informations
- Gricad Vidéos
- 5 janvier 2026 15:07
- Conférences
- Français
Commentaire(s)