🔍 Evaluating Web Search Performance Across Top AI Assistants

A comparative analysis of how leading AI models retrieve, process, and present web-sourced information.

📌 Overview

This repository contains a study comparing the web search capabilities of four AI assistants:
Gemini 2.0 Flash, ChatGPT-4 Turbo, DeepSeekR1, and Grok 3. The research evaluates response speed, source credibility, citation practices, and ranking algorithm transparency.

Key Questions Explored:

Who is the fastest giving an answer?
Which model balances speed and accuracy best?
What factors influence ranking of web search results?
How do these systems navigate Internet?
How do AI assistants prioritize search results?
Do they cite reliable sources—or lean on social media?
More...

📋 Table of Contents

🔍 Evaluating Web Search Performance Across Top AI Assistants

🤖 Models Tested

Model	Default Search Engine	Version
DeepSeekR1	Bing	Free Tier
Gemini 2.0 Flash	Google Search	Free Tier
ChatGPT-4 Turbo	Bing	Free Tier
Grok 3	Custom Engine	Free Tier

🧪 Methodology

Query Categories

Factual Queries (e.g., "Boiling point of water")
Recent Events (e.g., "2024 UEFA Champions League winner")
Controversial Topics (e.g., "Ethics of genetic engineering")
Numerical/Statistical Data (e.g., "Global temperature anomaly 2023")

Testing Process

Response Timing: Manual timer script built with Python.
Citation Tracking: Sources recorded for each response.
Search Engine Disclosure: Directly asked each model.

🚀 Key Findings

Performance Metrics

Model	Avg. Response Time	Total Citations	Social Media Citations
Gemini 2.0 Flash	4.2s	84	0
ChatGPT-4 Turbo	6.8s	88	0
DeepSeekR1	9.5s	79	1 (Facebook)
Grok 3	5.1s	71	8 (Twitter/YouTube)

Insights

🏆 Speed vs. Depth: Gemini 2.0 Flash was fastest; DeepSeekR1 provided the most detailed answers.
📚 Source Bias: Wikipedia dominated citations (26/350 total), but Gemini prioritized .gov/.edu domains.
⚠️ Social Media Reliance: Grok 3 cited Twitter 7x more than other models.
🔍 Algorithm Diversity: Models using the same engine (Bing) ranked results differently.

Full findings here

📂 Repository Structure

├── data/ # Raw response data (Excel) ├── docs/ # Final report (DOCX) and visuals ├── src/ # Python timer script ├── LICENSE # MIT License └── README.md # This overview

🛠 How to Use This Repository

Replicate the Study:
- Run the timer script to measure AI response times.
- Review the raw data for citations and response details.
Extend the Research:
- Add new queries to test additional categories (e.g., non-English prompts).
- Improve the timer with automated APIs (e.g., Selenium).

⚠️ Limitations

Manual timing may introduce human error.
Free-tier models may have rate limits or reduced features.
Small sample size (12 queries total).

🤝 Contributing

Contributions are welcome! Ideas for expansion:

Add automated timing tools.
Test paid-tier models (e.g., Claude).
Analyze bias in cited sources.
Submit issues or PRs with your improvements.

📜 License

MIT License. See LICENSE for details.

📬 Contact

For questions or collaborations:

Email: antony.garcia@qawarelabs.com
LinkedIn: Antony Garcia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 Evaluating Web Search Performance Across Top AI Assistants

📌 Overview

📋 Table of Contents

🤖 Models Tested

🧪 Methodology

Query Categories

Testing Process

🚀 Key Findings

Performance Metrics

Insights

📂 Repository Structure

├── data/ # Raw response data (Excel) ├── docs/ # Final report (DOCX) and visuals ├── src/ # Python timer script ├── LICENSE # MIT License └── README.md # This overview

🛠 How to Use This Repository

⚠️ Limitations

🤝 Contributing

📜 License

📬 Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
docs		docs
media		media
src		src
README.md		README.md

Q-Aware-Labs/Evaluating_AI_Web_Search

Folders and files

Latest commit

History

Repository files navigation

🔍 Evaluating Web Search Performance Across Top AI Assistants

📌 Overview

📋 Table of Contents

🤖 Models Tested

🧪 Methodology

Query Categories

Testing Process

🚀 Key Findings

Performance Metrics

Insights

📂 Repository Structure

├── data/ # Raw response data (Excel) ├── docs/ # Final report (DOCX) and visuals ├── src/ # Python timer script ├── LICENSE # MIT License └── README.md # This overview

🛠 How to Use This Repository

⚠️ Limitations

🤝 Contributing

📜 License

📬 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages