MLCommons: Towards Safe and Responsible AI

The development and adoption of generative artificial intelligence (AI) is creating a lot of opportunities to improve the lives of people around the world, from business to healthcare and education. But, with great opportunity, comes great responsibility. This rapid development also raises many questions related to security, safety, privacy, compliance, ethics, fairness etc. which must be addressed to ensure an equitable outcome for everyone. We're already seeing the creation and proliferation of deepfakes and other objectionable or hateful content, and ensuring that generative AI technology does not produce harmful content is critical.

To this end, safety and responsibility in AI is garnering interest among researchers and practitioners alike, and we're seeing collaborative industry efforts like MLCommons to improve AI systems. AI safety is an evolving domain though, and decisions to implement any suggested best practices should take into account your business needs, jurisdiction of operations, regulatory requirements, and your internal legal and risk management process.

What is MLCommons?

MLCommons is an AI engineering consortium, collaborating openly to improve AI systems. Through joint industry-wide efforts, it helps measure and improve the accuracy, safety, speed, efficiency and safety of AI technologies. MLCommons has co-authored papers on AI safety, defined a taxonomy on various safety hazards, and created a platform, tools, and tests for the development of a standard AI Safety benchmark. It also publishes standardised test datasets to help the community evaluate and deliver safer AI systems.

Image source: mlcommons.org

What is MLCommons AI Safety Taxonomy?

The MLCommons Taxonomy of Hazards provides a way to group individual hazards together into well-defined, mutually exclusive categories. The AI Safety taxonomy contains 13 hazard categories, 7 of which are in the just released benchmark (see next section). The complete list of categories is as follows (first 7 included in the benchmark):

  1. Violent crimes
  2. Non-violent crimes
  3. Sex-related crimes
  4. Child sexual exploitation
  5. Indiscriminate weapons (CBRNE)
  6. Suicide and self-harm
  7. Hate
  8. Specialised advice
  9. Privacy
  10. Intellectual property
  11. Elections
  12. Defamation
  13. Sexual content

What is MLCommons AI Safety Benchmark?

In short, the MLCommons AI Safety Benchmark looks to assess the safety of AI systems. The benchmarks assess safety by enumerating the above set of hazards, and then testing the system's handling of prompts that could trigger those hazards. Each system is then assigned hazard-specific and overall safety ratings based on the percentage of prompts handled appropriately. As of writing, v0.5 of a benchmark for a general-purpose, English-language chat model has been released as a proof of concept. Evaluation for this benchmark is limited to hazards focused on physical harms, criminal activity, hate speech, and sexual abuse.

MLCommons AI safety ratings; source: mlcommons.org

AI safety testing is still a nascent field; this benchmark evaluates text-only language models for general-purpose chat. That means, other input modalities like image, audio or video are not currently in scope. For the above POC, the prompt responses are evaluated using Meta's Llama Guard, an automated evaluation tool that classifies responses according to the MLCommons taxonomy.