Skip to content

Export datasets to JSONL for fine-tuning #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 28, 2025
Merged

Conversation

jack-devhub
Copy link
Contributor

Added the ability to export evaluation datasets in JSONL format from the Judgment platform.

Changes:

  • ✨ Implement export_jsonl() method in EvalDatasetClient with streaming support
  • 🔗 Add new API endpoint constant JUDGMENT_DATASETS_EXPORT_JSONL_API_URL
  • ✅ Create comprehensive e2e test for JSONL export functionality
  • 📦 Add JSON serialization/deserialization dependencies in test suite

Testing Performed:

  1. End-to-End Validation

    • Creates test dataset with examples and ground truths
    • Verifies successful API response (200 status)
    • Validates JSONL format integrity
    • Checks field presence based on record type (examples vs ground truths)
    • Ensures correct example/ground truth counts
  2. Error Handling

    • 404 handling for non-existent datasets
    • HTTP error propagation
    • Authentication validation
  3. Integration

    • Verified compatibility with existing dataset operations

Improvements:

  • Requires JUDGMENT_DATASETS_EXPORT_JSONL_API_URL environment variable update
  • Maintains existing security patterns with API key authentication
  • Uses chunked streaming response for memory efficiency with large datasets

@alanzhang25 alanzhang25 force-pushed the export-datasets-to-JSONL branch from aebcdf2 to 63df9f6 Compare February 28, 2025 18:31
@alanzhang25 alanzhang25 force-pushed the export-datasets-to-JSONL branch from 63df9f6 to 27f4fb5 Compare February 28, 2025 18:33
Copy link
Collaborator

@alanzhang25 alanzhang25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run UTs

Copy link
Collaborator

@alanzhang25 alanzhang25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run UT

@alanzhang25 alanzhang25 merged commit 5ba1301 into main Feb 28, 2025
3 checks passed
@alanzhang25 alanzhang25 deleted the export-datasets-to-JSONL branch March 25, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants