Skip to content

Feat 23 add fast linkedin option #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

qosha1
Copy link

@qosha1 qosha1 commented Aug 15, 2025

Optional implementation of the fast linkedin scraper library with a configurable switch between selenium and playwright.

If interested @stickerdaniel I can go back and clean it up to make it merge worthy but figured I'd share the raw one regardless.

RE: #23

(btw - thanks for the MCP server, it took a bit to get going but it's pretty slick)

  • Q

qosha1 and others added 2 commits August 14, 2025 18:05
Easier to connect to streamable http mcps if you can link remotely
- Add scraper_factory.py for backend selection between linkedin-scraper and fast-linkedin-scraper
- Add scraper_adapter.py with unified interface for both scrapers
- Add playwright_wrapper.py for Playwright session management (used by fast scraper)
- Update configuration to support scraper_type selection
- Update all tools (person, company, job) to use scraper adapter
- Fix type checking issues in config loaders and playwright wrapper
- Maintain backward compatibility with existing linkedin-scraper backend

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@stickerdaniel
Copy link
Owner

Hey, really appreciate the effort you put into this.
The factory pattern is solid work, but I'd like to propose a simpler direction

This would avoid the threading complexity in the scraper adapter.

Thanks for pushing this forward!

@qosha1
Copy link
Author

qosha1 commented Aug 15, 2025

Cool! Yea that would def make things easier lol I had to hard workaround the sync-async.

Are you essentially deprecating the original linkedin-scraper? Eg. is the primary build path going forward the fast-linkedin-scraper? Cause if so I can start focusing on that too and likely help out on a few things as I build my workflow out. Would be good to be working on the one that's more likely to be kept up.

Happy to push those as well should anything be of interest.

Q

@stickerdaniel
Copy link
Owner

stickerdaniel commented Aug 15, 2025

Yes exactly, the plan is to remove the original selenium scraper entirely as a dependency.
While a few fixes were merged in the original linkedin-scraper, it's become quite outdated and the maintainer isn't very active anymore. Fast-linkedin-scraper should cover all the functionality from the original library in the long term.
Feel free to jump in on the new repo.

I'll finish the async conversion today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants