Home/resources/Evaluation Report on “Advancing Methodologies for Agentic Evaluations Across Domains”
The 3rd Joint Testing Exercise builds on the insights from the previous two exercises, to work towards advancing the science of AI agent evaluations and building common best practices for testing AI agents. Bringing their collective technical and linguistic expertise, participating AISIs worked together to conduct testing for sensitive information leak, fraud, and cybersecurity threats.
Following the blogpost released earlier this year, this evaluation report takes a closer look at the various methodological components and findings behind this joint testing exercise, which aims to seed a common approach for multilingual safety testing of frontier models at scale.
The Singapore Conference on AI (SCAI): International Scientific Exchange (ISE) on AI Safety saw over 100 of the best global minds from academia, industry and government, collectively identify and demonstrate consensus around technical AI safety research priorities. In shaping reliable, secure and safe AI, the outcomes of the discussion at SCAI:ISE are synthesized into the Singapore Consensus on Global AI Safety Research Priorities, a living document that continues to welcome views from the global research community.