An eye on AI

The largest AI models should be monitored by a third-party auditor - who would basically check an AI system to ascertain its capabilities and the risk it poses. Both OpenAI and Anthropic - two of the AI labs with the most advanced systems - commissioned a third-party auditor called ARC Evals to act as a ‘third-party evaluator to assess potentially dangerous capabilities of today’s state-of-the-art ML models.’

A safety evaluation of an AI system, known among AI labs as an ‘eval’, checks an AI system’s capabilities to ensure that pre-deployment they are developed and deployed responsibly and with human interests in mind. When ARC Evals stress-tested OpenAI’s pre-aligned GPT-4 it did so in a controlled environment and in essence tried to make the model misbehave.

They managed to make GPT-4 lie to a human and get that same human to perform a task for them on TaskRabbit, make long term strategic plans, and write and run code: ‘As AI systems improve [...] It is important to have systematic, controlled testing of these capabilities in place before models pose an imminent risk, so that labs can have advance warning when they’re getting close and know to stop scaling up models further until they have robust safety and security guarantees.’

ARC Evals is particularly worried that future and more advanced systems might exploit financial arbitrage, create new pathogens, and impersonate online humans. 

With this in mind, British-based Deepmind got together a very stellar cast of AI researchers including Turing Award winners, to hash out *exactly* how one monitors the risks from the increasingly advanced and potentially more dangerous AI models. They find that: ‘Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities [and that] Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills.’

And because of this, they go on to explain ‘why model evaluation is critical for addressing extreme risks [...]. These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.’

Model ‘evals’ to uncover the risk of extreme risks of catastrophe and existential risk ‘should be a priority area for AI safety and governance.’ Major labs such as Google Deepmind, OpenAI and Anthropic have perhaps the biggest responsibility in the whole AI ecosystem, as they are the ones developing the model - which can be used for great good or for great (even unintentional) ill. Perhaps an International Atomic Agency for AI would be a suitable house for such a system of monitoring.

The Adam Smith Institute’s paper will be released next month and I cannot wait to share with you the fantastic innovation-led policies for the safe deployment of AI that we have cooked up.