Skip to content

Conversation

carter293
Copy link

What this PR does

  • Updates firecrawl class to include api_url in instantiation, mirroring underlying FirecrawlApp class behaviour.

Why include this?

  • To enable crewAI firecrawl tools to easily use firecrawl's self and local hosting options.
  • Currently can set the env var FIRECRAWL_API_URL and the FirecrawlApp class will use it, however this is not surfaced in the crewAI docs or the tool's class initialiser.

Why not include this in the existing 'config' variable within crewAI's existing Firecrawl tools.

  • The config variable in is associated with the relevant Firecrawl methods, e.g. scrape_url, crawl_url, etc, not the instantiation of the FirecrawlApp class. However, the api_key and the api_url are.

Why has the error handling changed?

  • The associated firecrawl tools had inconsistent error handling around the missing package firecrawl-py. This change just standardises it.
  • Also removes the try/except around the firecrawl-py installation attempt. This is to not obfuscate the actual error message with "Failed to install firecrawl-py package".

@carter293 carter293 changed the title Introduce Firecrawl api-url variable in initialisation of associated tools Chore: Introduce Firecrawl api-url variable in initialisation of associated tools Jun 4, 2025
@joaomdmoura
Copy link
Collaborator

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment: Firecrawl Tools Standardization

Summary of Changes

This pull request introduces significant improvements to the Firecrawl tools, primarily by standardizing the initialization of the CrawlWebsite, ScrapeWebsite, and Search tools through the inclusion of the new api_url parameter. This modification aims to enhance the flexibility and maintainability of how parameters are handled.

Positive Aspects

  1. Consistent Parameter Management: The addition of the api_url parameter across all tools ensures a uniform approach, reducing the cognitive load on users.
  2. Improved Error Handling: Enhancements made to error handling during package installations significantly improve user experience.
  3. Documentation Updates: The updates to README files are commendable, providing clear instructions on new parameters and ensuring users are well-informed.

Areas for Improvement

1. Code Duplication

The initialization logic is repeated across multiple tool classes. To avoid redundancy, consider creating a base class, BaseFirecrawlTool, that standardizes this logic. Here's a proposed structure:

class BaseFirecrawlTool(BaseTool):
    api_key: Optional[str] = None
    api_url: Optional[str] = None
    _firecrawl: Optional["FirecrawlApp"] = PrivateAttr(None)

    def __init__(self, api_key: Optional[str] = None, api_url: Optional[str] = None, **kwargs):
        super().__init__(**kwargs)
        self.api_key = api_key
        self.api_url = api_url
        self._initialize_firecrawl()

2. Enhanced Error Handling

Incorporate more robust error handling for potential exceptions during API initialization:

def _initialize_firecrawl(self) -> None:
    try:
        from firecrawl import FirecrawlApp
        self._firecrawl = FirecrawlApp(api_key=self.api_key, api_url=self.api_url)
    except ImportError:
        self._handle_missing_package()
    except Exception as e:
        raise RuntimeError(f"Failed to initialize Firecrawl: {str(e)}")

3. Configuration Validation

Add validation to ensure essential parameters are provided:

def _validate_config(self) -> None:
    required_fields = {'api_key'}  # Specify required parameters
    missing_fields = required_fields - set(self.config.keys())
    if missing_fields:
        raise ValueError(f"Missing required configuration fields: {missing_fields}")

4. Type Hints Enhancement

Improve the type hints for better clarity and safety:

from typing import TypedDict

class FirecrawlConfig(TypedDict, total=False):
    max_depth: int
    max_pages: int
    formats: list[str]
    filter_urls: list[str]

5. Environment Variable Handling

Add handling for environment variables to enhance parameter sourcing:

def __init__(self, api_key: Optional[str] = None, api_url: Optional[str] = None, **kwargs):
    super().__init__(**kwargs)
    self.api_key = api_key or os.getenv('FIRECRAWL_API_KEY')
    self.api_url = api_url or os.getenv('FIRECRAWL_API_URL', 'https://api.firecrawl.dev')
    if not self.api_key:
        raise ValueError("API key must be provided either through parameter or FIRECRAWL_API_KEY environment variable")

Historical Context and Related PRs

  • The need for standardization has been noted in previous PR discussions where the initialization across tools was deemed inconsistent. This PR makes significant strides towards addressing those patterns, reflecting lessons learned from earlier implementations regarding configurability and maintainability.

Testing Recommendations

  • Ensure comprehensive unit tests are added for:
    • The new api_url parameter functionality.
    • Fallback mechanisms for environment variables.
    • Robust handling of error scenarios.

Conclusion

The changes introduced in this PR substantially improve the maintainability and usability of the Firecrawl tools. Focusing on the above recommendations will further solidify the improvements made and ensure that the codebase remains robust and user-friendly.

By providing a cleaner structure with a base class and enhancing error handling, the team will position itself for easier future updates and better overall user satisfaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants