position-statement

Trust is Good, Assurance is Better

Why the EU Code of Practice for General-Purpose AI Must Mandate Third-Party Assessments

Authors

Lisa Soder

Amin Oueslati

Programmes

Artificial Intelligence

Published by

Interface

March 20, 2025

As the EU finalises its Code of Practice for general-purpose AI, the question of mandatory external assessments before deployment remains fiercely contested. For the Code to live up to its promise, stringent third-party assessments are vital, as they ensure the integrity of safety processes, protect the rights of citizens and provide the necessary assurance to foster European AI innovation.

The EU AI Act presents the first comprehensive framework for governing AI. For the most powerful models, so-called general-purpose AI (GPAI), a Code of Practice is currently being developed, designed to help providers comply with the rules of the AI Act. Recently, the third draft of the Code has been released, marking the beginning of the final drafting round.

One measure has arguably proven the most contentious throughout the drafting process: mandatory third-party assessments prior to deploying a GPAI model. In practice, this involves an independent review of the provider’s risk management and a series of tests (“evaluations”), typically with the aim of eliciting harmful or otherwise unwanted output from the model.

Background on the Code of Practice: Starting in August 2025, the EU AI Act’s rules for providers of GPAI models will come into force. To clarify how companies can meet these new requirements (in particular Article 53 & 55), the EU AI Office facilitates the drafting of a corresponding Code of Practice through an iterative, co-regulatory process. Four expert-chaired working groups, involving nearly 1,000 stakeholders—including industry representatives, civil society, independent experts, and EU member state officials—are currently drafting the Code. Its latest iteration includes 18 commitments, many of which closely reflect existing best practices and the frontier AI safety commitments large model providers have previously signed at an international AI summit in Seoul. Once finalised and approved by the Commission, the Code of Practice can serve as a central reference for GPAI model providers aiming to comply with the AI Act’s requirements. It also holds broader international significance, potentially influencing AI governance standards beyond Europe through the “Brussels Effect”.

Confronted with mounting opposition from industry representatives, we remain convinced of external assessments’ merit, their essential role in protecting European citizens and their capacity to bolster European AI innovation.

From syringes and boilers: How external assessments assure safety

Why not rely on internal-testing only? Historic industry scandals, from Enron to Wirecard and Boeing, provide the answer: profit incentives and internal conflicts of interest undermine reliable safety assurance. This is particularly true for the testing of GPAI models, which gives assessors great discretion over the exact testing methodology, introducing potential avenues to bias outcomes. Relying on internal assessments only is like model providers writing their own tests and subsequently grading themselves.

The EU requires external assessments in all safety-critical industries. The Digital Service Act (DSA) provides a particularly powerful analogy for GPAI: despite the sensitivity of the technology, the nascency of the assessment ecosystem and the structure of the industry, Very Large Online Platforms (VLOPs) must undergo external scrutiny every year. But beyond the digital realm, whenever risks are high, complex or uncertain, EU law prescribes the involvement of independent third parties, from medical devices to machines. If syringes and boilers are subject to third-party assessment, how can GPAI models be exempted?

Stifling or Supercharging Innovation? Europe's Opportunity in GPAI

But won’t this stifle innovation, one might worry? Quite the opposite, we think. Europe's competitive advantage in AI is unlikely to arise from pouring hundreds of billions into building the largest foundational models. Instead, it will come from industrial adoption, effectively integrating GPAI into useful downstream applications–an approach that plays to Europe’s true strengths: rich data pools, world-class applied engineering capabilities and dynamic SMEs, which make up 99% of all businesses.

For startups and SMEs, however, building downstream applications on top of GPAI models without independent validation can be extremely risky, both legally and reputationally. In other sectors such as automotive, car manufacturers typically have enough leverage to demand extensive certifications from their suppliers. By contrast, given Big Tech’s monopolistic dominance, they usually dictate the conditions for using their models to European startups and SMEs, often amounting to "take it or leave it". And while the cost of third-party assessments is minimal for the handful of big players spending millions—or even billions—on training, pushing the assurance onus onto smaller downstream providers is a true innovation killer. Instead, making third-party assessments for GPAI models mandatory, empowers smaller innovators, by providing independent, credible information on the robustness of the underlying model, and ultimately promoting European AI diffusion.

At the same time, small GPAI model providers would benefit from a more mature third-party ecosystem. Evaluating GPAI requires both broad and deep domain expertise – from bringing in cybersecurity experts to social scientists and disinformation experts. While major companies can hire that expertise in-house, smaller providers depend to a much greater extent on external expertise. Mandating third-party assessments incentivises a more mature ecosystem, driving down cost and expanding capacity, which disproportionately benefits smaller GPAI model providers.

A bold vision for third-party assessment of GPAI models in Europe

With the final drafting round of the Code, Europe stands at a defining moment: one in which responsible governance can be both a safeguard and a powerful driver of AI innovation. To translate Europe's ambition for safe and trustworthy AI into practice, we offer three recommendations:

The Code of Practice should require third-party assessments by default, granting exemptions only if no qualified assessor can be found within a four-week search. This approach accounts for the current nascency of the external AI evaluation ecosystem, while creating the necessary certainty for new evaluation organisations to enter the market.
Second, the AI Office must nurture a robust, trusted ecosystem of independent assessors that model providers feel confident partnering and sharing information with. Through active oversight and a clear accreditation scheme for the network of evaluators, the AI Office can ensure high standards of competence, independence, and organizational integrity. By creating a safe environment where commercially sensitive details remain protected, the code can pave the way for truly robust external evaluations—and fuels a broader AI assurance market, already estimated to reach USD 276 bn billion in annual global value by 2030.
Third, both the AI Office and EU member states should support the creation of a third-party assessment market through competitions, grants, and funding, ensuring sufficient capacity will meet growing demand. At the same time, they should explore establishing or expanding governmental evaluation capabilities, drawing inspiration from the recently established France’s National Institute for the Security of AI and analogous efforts like the UK AI Security Institute or the US Safety Institute.

Done right, third-party assessments for GPAI would not only protect the rights of citizens, but also instill much-needed assurance to European downstream deployers, fueling AI adoption across sectors—and enabling Europe to thrive at the global forefront of trustworthy AI innovation.