VeriBot is a lightweight, configurable framework for automated testing of AI language models. It allows testers to validate AI responses against expected keywords and criteria, providing a structured approach to quality assurance for conversational AI systems.
- Test Case Management: Parse and execute test cases from structured text files
- Keyword Validation: Verify AI responses contain expected keywords and phrases
- Multi-turn Conversation Testing: Support for contextual tests that span multiple exchanges
- Detailed Reporting: Generate CSV reports with test results and failure details
- Progress Tracking: Real-time visibility into test execution status
- Configurable API Integration: Currently supports DeepSeek API with extensible design
# Clone the repository
git clone https://github.com/antonyga/VeriBot.git
cd VeriBot
# Install dependencies
pip install requests
- Create a
config.py
file in the project root with your API credentials:
# DeepSeek API configuration
DEEPSEEK_API_KEY = 'your-api-key-here'
- Customize test cases in
testCases.txt
following the format:
**1. Test Name**
Prompt: "Your test prompt here"
Expected Keywords: ["keyword1", "keyword2"]
Pass criteria: Response contains "keyword1", "keyword2"
Run the test suite:
python test_runner.py
The script will:
- Parse all test cases from your test file
- Send each prompt to the AI service
- Validate responses against expected keywords
- Generate a detailed report of results
VeriBot supports various test case types:
**1. Factual Q&A**
Prompt: "What year did humans first land on the moon?"
Expected Keywords: ["1969"]
Pass criteria: Response contains "1969"
**5. Multi-Turn Context**
Prompt 1: "Who wrote Romeo and Juliet?"
Expected Keywords: ["Shakespeare"]
Pass criteria: Response contains "Shakespeare"
Prompt 2 (follow-up): "What other tragedies did they write?"
Expected Keywords: ["Hamlet", "Macbeth"]
Pass criteria: Response contains "Hamlet", "Macbeth"
The included test cases cover a wide range of AI capabilities:
- Factual knowledge retrieval
- Creative content generation
- Instructional responses
- Role-playing scenarios
- Contextual understanding
- Ambiguity handling
- Mathematical computations
- Linguistic capabilities
- Cultural knowledge
Results are saved to test_results.csv
with the following information:
- Test number and name
- Prompt used
- Expected keywords
- Pass/fail status
- Missing keywords (if any)
- Response snippet
Add new test cases to testCases.txt
following the established format.
To use a different AI provider:
- Update the API endpoint in
test_runner.py
- Modify the request structure in
call_deepseek_api()
- Adjust the response parsing logic if needed
VeriBot/
├── test_runner.py # Main execution script
├── config.py # API credentials
├── testCases.txt # Test case definitions
└── test_results.csv # Generated test results
- QA Testing: Verify AI responses meet quality standards
- Regression Testing: Ensure new model versions maintain expected behavior
- Response Validation: Check factual accuracy and keyword presence
- Multi-turn Validation: Test conversational memory and context handling
- Support for more complex validation beyond keyword matching
- Response time measurement and performance benchmarking
- HTML report generation with interactive visualizations
- Integration with CI/CD pipelines
This project is licensed under the MIT License - see the LICENSE file for details.
Created by @antonyga for Q-Aware Labs - ISTQB Certified AI Software Tester, specializing in AI system validation and prompt engineering.