Licence Creative Commons No Knowledge Without Source : Collecting, Preserving and Sharing Software in a Risky World

26 novembre 2025
Durée : 00:42:47
Nombre de vues 1
Nombre d’ajouts dans une liste de lecture 0
Nombre de favoris 0

Session 2 : Diversité des interdépendances entre science ouverte et intelligence artificielle 

  • No Knowledge Without Source : Collecting, Preserving and Sharing Software in a Risky WorldRoberto Di Cosmo (Software Heritage)

Abstract : Software is a public good—and today it is also a public risk if we fail to preserve it. Software Heritage was created to collect, preserve, and share all publicly available source code at planetary scale (now 26+ billion unique files from ~400 million projects) and to make each artifact citable and verifiable via the intrinsic SWHID identifier (ISO 18670).

This talk shares what we have learned while operating a universal archive at scale:

  • handling GDPR-driven requests (e.g., author name changes) and takedowns with due process;
  • the huge challenge of the massive sharing of unlicensed code, or how copyright law is in practice nullifying a big part of existing software altruism;
  • maintaining provenance, integrity and accountability across diverse forges;
  • and, more recently, confronting the AI wave and the fragility of modern digital infrastructure.

I will summarize our “LLMs for code” stance—three simple principles: giving back the foundation models, transparency about training data, respect for authorship and licensing—then discuss how the CodeCommons initiative operationalizes them to enable responsible, open AI on software.

Finally, I’ll address a clear and present danger: Europe’s dependence on non-EU platforms for critical code and packages. A single disruption can stall research pipelines, and much more, overnight.
We propose a coalition effort—grounded in data altruism—to fund a rapid, massive expansion of Software Heritage (mirrors, package-manager fallbacks, 24/7 resilience) so academia can fulfill its duty to provide trustworthy digital resources to society, inform policy, and foster fair innovation.

Mots clés : artificielle cdga cellule cnrs code cosmo days di donnees generative gricad heritage intelligence logiciels open ouverte roberto science software uga

 Informations

Commentaire(s)

Chargement en cours…