A comparative analysis of how leading AI models retrieve, process, and present web-sourced information.
This repository contains a study comparing the web search capabilities of four AI assistants:
Gemini 2.0 Flash, ChatGPT-4 Turbo, DeepSeekR1, and Grok 3. The research evaluates response speed, source credibility, citation practices, and ranking algorithm transparency.
Key Questions Explored:
- Who is the fastest giving an answer?
- Which model balances speed and accuracy best?
- What factors influence ranking of web search results?
- How do these systems navigate Internet?
- How do AI assistants prioritize search results?
- Do they cite reliable sources—or lean on social media?
- More...
Model | Default Search Engine | Version |
---|---|---|
DeepSeekR1 | Bing | Free Tier |
Gemini 2.0 Flash | Google Search | Free Tier |
ChatGPT-4 Turbo | Bing | Free Tier |
Grok 3 | Custom Engine | Free Tier |
- Factual Queries (e.g., "Boiling point of water")
- Recent Events (e.g., "2024 UEFA Champions League winner")
- Controversial Topics (e.g., "Ethics of genetic engineering")
- Numerical/Statistical Data (e.g., "Global temperature anomaly 2023")
- Response Timing: Manual timer script built with Python.
- Citation Tracking: Sources recorded for each response.
- Search Engine Disclosure: Directly asked each model.
Model | Avg. Response Time | Total Citations | Social Media Citations |
---|---|---|---|
Gemini 2.0 Flash | 4.2s | 84 | 0 |
ChatGPT-4 Turbo | 6.8s | 88 | 0 |
DeepSeekR1 | 9.5s | 79 | 1 (Facebook) |
Grok 3 | 5.1s | 71 | 8 (Twitter/YouTube) |
- 🏆 Speed vs. Depth: Gemini 2.0 Flash was fastest; DeepSeekR1 provided the most detailed answers.
- 📚 Source Bias: Wikipedia dominated citations (26/350 total), but Gemini prioritized .gov/.edu domains.
⚠️ Social Media Reliance: Grok 3 cited Twitter 7x more than other models.- 🔍 Algorithm Diversity: Models using the same engine (Bing) ranked results differently.
├── data/ # Raw response data (Excel) ├── docs/ # Final report (DOCX) and visuals ├── src/ # Python timer script ├── LICENSE # MIT License └── README.md # This overview
- Replicate the Study:
- Run the timer script to measure AI response times.
- Review the raw data for citations and response details.
- Extend the Research:
- Add new queries to test additional categories (e.g., non-English prompts).
- Improve the timer with automated APIs (e.g., Selenium).
- Manual timing may introduce human error.
- Free-tier models may have rate limits or reduced features.
- Small sample size (12 queries total).
Contributions are welcome! Ideas for expansion:
- Add automated timing tools.
- Test paid-tier models (e.g., Claude).
- Analyze bias in cited sources.
Submit issues or PRs with your improvements.
MIT License. See LICENSE for details.
For questions or collaborations:
- Email: antony.garcia@qawarelabs.com
- LinkedIn: Antony Garcia