-
Notifications
You must be signed in to change notification settings - Fork 439
feat: Adding keboola storage api tools #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Adding keboola storage api tools #329
Conversation
…ocumentation. The tool follows the Storage API flows and download table based on the underlying stack (S3, GCP, Azure).
Disclaimer: This review was made by a crew of AI Agents. Code Review Comment for Keboola Storage API ToolOverall ImpressionThe implementation of the Keboola Storage API Tool is a solid foundation, allowing for the extraction of data from the Keboola Storage API with support across multiple cloud providers. While it demonstrates a structured approach, there are opportunities for improvement in documentation, error handling, testing, and configuration management that can enhance overall usability and reliability. Documentation (README.md)Strengths:
Suggestions for Improvement:
Main Implementation (keboola_table_extract_tool.py)A. Input ValidationCurrent Implementation: class ExtractInput(BaseModel):
table_id: str = Field(..., description="Full table ID like 'in.c-usage.usage_data'")
api_token: str = Field(..., description="Keboola Storage API token")
base_url: str = Field(..., description="Keboola base API URL") Suggested Improvements:
class ExtractInput(BaseModel):
table_id: str = Field(..., regex="^(in|out)\.c-[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+$")
api_token: str = Field(..., min_length=32)
base_url: str = Field(..., regex="^https://connection\..*\.keboola\.com$") B. Error HandlingCurrent Implementation Lacks Specificity:
class KeboolaAPIError(Exception):
"""Custom exception for Keboola API related errors"""
pass C. Resource Management
from contextlib import contextmanager
@contextmanager
def temporary_file():
tmp_path = tempfile.NamedTemporaryFile(delete=False)
try:
yield tmp_path.name
finally:
if os.path.exists(tmp_path.name):
os.remove(tmp_path.name) D. Configuration Management
class KeboolaConfig(BaseSettings):
max_retries: int = 30
# Additional configurations E. Testing Improvements
@pytest.mark.parametrize("cloud_backend", ["s3", "gcp", "azure"])
def test_backend_detection(cloud_backend, tool):
# Implementation of the test logic Recommendations for Future Improvements
Conclusion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please add the tool in this init file:
crewai_tools/__init__.py/
### Usage Example (Manual) | ||
|
||
```python | ||
from keboola_storage_api_tool.keboola_table_extract_tool import KeboolaTableExtractTool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to reference import from crewai_tools
like so:
from crewai_tools import KeboolaTableExtractTool
|
||
```python | ||
from crewai import Agent, Task, Crew | ||
from keboola_storage_api_tool.keboola_table_extract_tool import KeboolaTableExtractTool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with this one. see above comment
…l test coverage Summary of changes: - Implemented KeboolaTableExtractTool for asynchronous table export via Keboola Storage API. - Added support for auto-detection and download from AWS S3, GCP, and Azure based on manifest URLs. - Split cloud-specific logic into modular utility files: - s3_slice_download.py - gcp_slice_download.py - azure_slice_download.py - Introduced utils.py with reusable polling and metadata helpers. - Added config.py using pydantic-settings for polling configuration. - Defined and raised consistent custom exceptions via exceptions.py. - Added full unit test suite: - Tool behavior tests (success, empty table, failure, backend detection, timeout) - Separate tests for each cloud downloader with mocked credentials and I/O
…stency with YAML task action - Updated `KeboolaTableExtractTool.name` to "download_keboola_table_tool"
Hello @tonykipkemboi 👋, Thank you very much for your initial review. I spent quality time yesterday addressing both your comments and the automated feedback I had received earlier. Summary of Changes
Additional NotesI've also thoroughly tested the tool manually by integrating it into a test CrewAI project and validating it against live exports from AWS, GCP, and Azure. Everything seems to be working as expected. I'd really appreciate it if you could take another look when you have a moment. Let me know if you'd like me to share anything specific - I’d be happy to invite you to a Keboola project and provide you additional credits on top of the free tier for hands-on testing if that helps. Thanks again, and have a wonderful day! Radek |
Description
Add Keboola Storage API integration tool enabling CrewAI agents to access and extract structured data directly from Keboola projects.
This tool is the first in a planned series of Keboola-native tools intended to simplify AI-powered workflows and analytics inside and outside of Keboola. It's already in use for internal use-cases within Keboola and contributes toward broader enterprise adoption of CrewAI.
Tool Added
KeboolaTableExtractTool
- Downloads a Keboola table using asynchronous export (multi-cloud supported: AWS, Azure, GCP) and returns its content as a CSV string.Features
Testing
For integration testing: Please email radek.tomasek@keboola.com to request access to a test Keboola project where you can validate the tool end-to-end.
Dependencies
Uses existing
requests
,boto3
,pandas
, andgoogle-auth
libraries.Breaking Changes
None – this is a purely additive contribution.