SkillTester: Benchmarking Utility and Security of Agent Skills

📰 ArXiv cs.AI

SkillTester is a tool for evaluating the utility and security of agent skills

advanced Published 1 Apr 2026

Action Steps

Implement paired baseline and with-skill execution conditions to evaluate agent skills
Use a separate security probe suite to assess security vulnerabilities
Normalize raw execution artifacts into utility and security scores
Assign a three-level security status label based on the security score

Who Needs to Know This

AI engineers and researchers on a team can benefit from SkillTester to assess and improve the performance of their agent skills, while security experts can use it to identify potential vulnerabilities

Key Insight

💡 SkillTester provides a comprehensive evaluation framework for agent skills, combining utility and security assessments