Evals
Evaluation (Evals) in AgentScope AI is a critical component that ensures the performance, accuracy, and reliability of deployed agents and workflows. By implementing structured evaluation methodologies, AgentScope AI enables developers to assess their models and tools systematically, making data-driven improvements.
This document outlines how to set up, configure, and run Evals within AgentScope AI for continuous performance monitoring and enhancement.
Setting Up Evals
Before running evaluations, ensure that your AgentScope AI deployment is properly configured with the necessary datasets and benchmarking tools.
Prerequisites
A deployed instance of AgentScope AI
Access to relevant datasets and evaluation metrics
Logging and monitoring enabled for detailed analysis
Configuration
To configure Evals, define the evaluation criteria in your config.json
or a dedicated YAML file:
Running Evals
Evals can be triggered manually or automatically as part of your CI/CD pipeline.
Manual Execution
To run an evaluation manually, use the following command:
Automated Execution
Integrate Evals into your CI/CD pipeline to ensure continuous validation of performance and improvements.
Example GitHub Actions workflow:
Analyzing Evaluation Results
Once evaluations are completed, results are logged and can be analyzed using built-in visualization tools or exported for further analysis.
Sample Output
Debugging Poor Performance
If an agent or workflow performs below expectations, consider:
Adjusting hyperparameters
Refining dataset selection
Debugging tool integration issues
Last updated