-
Notifications
You must be signed in to change notification settings - Fork 0
Description
🚫 Google Scholar Profile Fetching Not Working After Deployment (Cheerio Blocked by Google)
📋 Description
The Google Scholar profile fetching feature, which uses Cheerio for web scraping, works as expected in the local development environment. However, after deployment (e.g., on platforms like Vercel or Netlify), the feature fails to function. This is due to Google blocking scraping attempts from cloud-hosted environments, leading to failed or blocked requests.
🔁 Steps to Reproduce
- Run the app locally and fetch a Google Scholar profile – it works as expected.
- Deploy the app (e.g., on Vercel or Netlify).
- Attempt to fetch a Google Scholar profile.
- Observe that the request fails or returns a blocked response.
✅ Expected Behavior
Google Scholar profile data should be successfully fetched and displayed, just as it is in the local development environment.
❌ Actual Behavior
After deployment, profile fetching fails due to Google blocking scraping attempts from Vercel IPs.
💡 Possible Solution / Notes
-
Google aggressively blocks scraping requests from well-known serverless and hosting platforms like Vercel.
-
This issue is specific to the deployment environment — the same code works fine locally.
-
A simple and effective workaround is to deploy the app on a private or self-hosted server, such as:
- AWS EC2
- DigitalOcean Droplets
- Self-managed VPS
-
These environments offer dedicated IPs that are less likely to be flagged or rate-limited by Google.
-
We are also considering additional solutions, including:
- Leveraging headless browsers with IP rotation via proxies (used cautiously and in compliance with relevant policies).
- Adding documentation to inform users of this limitation and possible deployment recommendations.
📎 Additional Context
This issue is documented to help users and contributors understand the current limitation with Google Scholar integration after deployment on platforms like Vercel.
If you have suggestions, workarounds, or better solutions, please share them in the comments below.