Skip to content

feat: add configurable user agent support #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 14, 2025

Conversation

stickerdaniel
Copy link
Owner

Summary

  • Add --user-agent CLI argument and USER_AGENT environment variable support
  • Implement platform-specific default user agents (Windows, macOS, Linux)
  • Update configuration schema and loaders to support user agent customization
  • Add user agent documentation to readme

Addresses Issues

Changes

  • Configuration: Added user_agent field to ChromeConfig schema
  • CLI: Added --user-agent argument with help text
  • Environment: Added USER_AGENT environment variable support
  • Platform Detection: Automatic platform-specific user agents to reduce fingerprinting
  • Documentation: Updated README.md

Copy link

claude bot commented Jul 14, 2025

Claude finished @stickerdaniel's task —— View job


Code Review Complete ✅

I've thoroughly reviewed the configurable user agent support implementation. Overall, this is a well-implemented feature that follows the existing codebase patterns and addresses the stated requirements effectively.

Strengths

Clean Architecture

  • Follows established configuration layering pattern (CLI → ENV → defaults)
  • Consistent with existing code style and naming conventions
  • Proper separation of concerns across modules

Platform-Aware Design

  • Smart platform-specific default user agents to reduce fingerprinting
  • Helps address anti-scraping measures LinkedIn may employ
  • Good fallback strategy (custom → platform default)

Configuration Flexibility

  • Multiple input methods: --user-agent CLI arg and USER_AGENT env variable
  • Well-integrated into existing configuration system
  • Proper precedence handling

⚠️ Areas for Improvement

1. Hardcoded Chrome Version drivers/chrome.py:38-42

# Current implementation has fixed version numbers
"Chrome/137.0.0.0 Safari/537.36"

Issue: Chrome version 137.0.0.0 will become outdated quickly, potentially making the user agent less effective against anti-bot detection.

Recommendation: Consider using a more dynamic approach or documenting the need for periodic updates.

2. Input Validation Missing config/loaders.py:274-275

if args.user_agent:
    config.chrome.user_agent = args.user_agent  # No validation

Issue: No validation on user-provided user agent strings.

Recommendation: Add basic validation to ensure the user agent string doesn't contain potentially problematic characters or is excessively long.

3. Minor Performance Optimization drivers/chrome.py:88

user_agent = config.chrome.user_agent or get_default_user_agent()

Issue: get_default_user_agent() is called every time create_chrome_options() runs, even though platform doesn't change.

Recommendation: Consider caching the default user agent since platform detection is static during execution.

🔒 Security Assessment

  • Low Risk: User agent strings are passed as Chrome arguments, which provides good isolation
  • Input Sanitization: Consider basic validation, though Chrome argument handling provides some protection
  • Anti-Detection: Platform-specific defaults are a good security practice

🧪 Test Coverage Recommendations

The PR lacks test coverage for the new functionality. Consider adding tests for:

  1. Platform Detection Logic

    def test_get_default_user_agent_by_platform()
  2. Configuration Precedence

    def test_user_agent_precedence_cli_over_env()
    def test_user_agent_precedence_env_over_default()
  3. Edge Cases

    def test_empty_user_agent_string()
    def test_very_long_user_agent_string()

📝 Minor Suggestions

  1. Documentation: Consider adding an example user agent string in the CLI help text
  2. Constants: The Chrome version numbers could be extracted to constants for easier maintenance
  3. Logging: Consider adding debug logging when custom user agents are applied

🎯 Overall Assessment

This is a solid implementation that:

The identified issues are minor and don't block the functionality. The feature works as intended and provides good value for users dealing with LinkedIn's anti-scraping measures.

Recommendation: ✅ Approve with minor suggestions - The implementation is ready for merge, with the suggested improvements being candidates for future enhancements rather than blocking issues.


@stickerdaniel stickerdaniel merged commit 19dd0ae into main Jul 14, 2025
3 checks passed
@stickerdaniel stickerdaniel deleted the feature/27-configurable-user-agent branch July 14, 2025 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Option to set useragent
1 participant