Continuing Discussion on “Open Source AI” Initiates Attempts to Define It Clearly

Continuing Discussion on "Open Source AI" Initiates Attempts to Define It Clearly

Continuing Discussion on “Open Source AI” Initiates Attempts to Define It Clearly


### The Movement for Clarity in AI: OSI’s Initiative to Establish “Open Source AI”

In the swiftly changing landscape of artificial intelligence, the phrase “open source” has sparked considerable discussion. As AI technologies increasingly permeate our everyday experiences, the urgency for transparency and understanding in the development, distribution, and utilization of these systems has intensified. The Open Source Initiative (OSI), a prominent supporter of open software principles, has recently made noteworthy strides to tackle this challenge by putting forth a draft definition of “open source AI.”

#### The Uncertainty of “Open Source” in AI

Traditionally, “open source” refers to software that anyone can access, modify, and share freely. However, applying this notion to AI has revealed intricacies. Companies like Meta have introduced AI models such as Llama 3, which are available at no cost but are subject to usage limitations. These limitations, which might restrict the model’s application based on factors like company size or content type, have ignited discussions among advocates of free software regarding the true essence of “open source” when it comes to AI.

For instance, although Meta’s Llama 3 model can be accessed, it doesn’t fulfill the conventional open source criteria established by the OSI for software. The AI image synthesizer Flux represents another instance of a model labeled as “open,” yet it doesn’t fully adhere to open source values. This lack of clarity has led to the emergence of alternative terms like “open-weights” or “source-available” to describe AI models that come with restrictions on code or weights without corresponding training data.

#### OSI’s Initiative to Define Open Source AI

In an effort to clarify this matter, the OSI has gathered a varied group of around 70 contributors, including scholars, legal experts, policymakers, and advocates, to formulate a definition for “open source AI.” Representatives from leading technology firms like Meta, Google, and Amazon have also participated in this initiative. The current draft definition (version 0.0.9) highlights “four essential freedoms” reminiscent of those that characterize free software: the freedom to utilize the AI system for any reason, analyze its workings, modify it, and distribute it with or without changes.

By delineating clear standards for open source AI, the OSI seeks to create a framework against which AI systems can be assessed. This will likely empower developers, scholars, and users to make more enlightened choices concerning the AI tools they create, study, or utilize.

#### The Significance of Transparency

A crucial advantage of genuinely open source AI lies in the potential for enhanced transparency. When researchers can delve into the workings of AI models, they are better positioned to pinpoint and rectify possible software vulnerabilities. This stands in stark contrast to opaque systems like OpenAI’s ChatGPT, which operates as a proprietary entity with a carefully protected architecture.

The OSI’s project timeline suggests a stable version of the “open source AI” definition is anticipated to be unveiled in October during the All Things Open 2024 event in Raleigh, North Carolina.

#### “Unrestricted Innovation”

In a May press release, the OSI stressed the necessity of accurately defining what open source AI entails. Stefano Maffulli, the OSI’s executive director, remarked, “AI differs from standard software and compels all stakeholders to reassess how Open Source principles pertain to this domain. OSI believes everyone should retain agency and oversight of technology. We also acknowledge that markets thrive when clear definitions facilitate transparency, collaboration, and unrestricted innovation.”

The organization’s latest draft definition expands beyond merely the AI model or its weights, covering the full system and its components. For an AI system to be deemed open source, it must grant access to what the OSI refers to as the “preferred form for modifications.” This entails comprehensive information about the training data, the complete source code used for training and operating the system, as well as the model weights and parameters. All these elements are required to be provided under OSI-sanctioned licenses or terms.

Notably, the draft does not require the sharing of raw training data. Instead, it demands “data information”—detailed metadata about the training data and methodologies. This strategy seeks to deliver transparency and replicability without necessarily releasing the actual dataset, effectively addressing privacy or copyright concerns while upholding open source principles.

#### A Diverse Coalition with a Clear Objective

The OSI’s strategy for formulating the “open source AI” definition dates back to 2022 when it initially reached out to organizations for their insights on the term. The initiative has included a series of global workshops that have convened varied groups from diverse backgrounds. According to the OSI, 53 percent of the participants in the working groups on Open Source AI identified as people of color, with 28 percent being women.

“After nearly two years of gathering perspectives from around the globe to pinpoint the principles of Open Source fit for AI systems, we’re launching a worldwide roadshow to refine…