Releases: adeism/AMZ-CodeFusion
Releases · adeism/AMZ-CodeFusion
AMZ-CodeFusion v.1
✨ Key Features for Code Documentation, Datasets & RAG
- Human-in-the-Loop Documentation Focus: Prepare your codebase for enhanced documentation efforts by consolidating code into manageable, structured outputs, ready for human review and annotation.
- Source Code Dataset Generation: Create clean, combined source code datasets perfect for training and evaluating RAG models designed for code understanding and generation.
- Source Code Archiving: Efficiently archive entire projects or specific code sections into single files for better organization, searchability, and long-term storage.
- RAG-Optimized Output: Generate output files specifically structured for optimal performance with Retrieval-Augmented Generation systems, enhancing code retrieval and context.
- Intuitive GUI: User-friendly graphical interface powered by
tkinter
for effortless configuration and operation. - Flexible Input: Select any source directory to process your code files.
- Customizable Output: Choose the output file name and location for your code dataset or archive.
- File Extension Filtering: Include only specific code file types (e.g.,
.py
,.java
,.js
,.c
,.cpp
,.html
,.css
). - Folder Exclusion: Exclude development-related folders (like
.git
,node_modules
,venv
) to focus on source code. - Regex Pattern Exclusion: Define regular expression patterns to exclude specific files or paths within your codebase.
- File Size Limit: Manage dataset size by setting a maximum file size to skip processing very large code files.
- Content Enhancements (Optional):
- Include line numbers for referencing specific lines of code in documentation.
- Add timestamps for tracking code versions or archival dates.
- Display file sizes for dataset analysis.
- Opt-in for syntax highlighting in the output for improved readability in documentation and datasets.
- Code-Focused Exclusion Options:
- Exclude images and non-code assets.
- Exclude executable files and build artifacts.
- Exclude temporary and backup files commonly found in development environments.
- Exclude hidden files and folders.
- NEW! Exclude comments (
/* ... */
) to create cleaner code datasets, focusing on the core logic.
- Detailed Logging: Comprehensive logging to track the code dataset generation and archiving process, including skipped files and folders.
- Summary Reports: Includes a summary header and a combination summary in the output file, detailing code files processed, dataset size, and skipped items.
- Zip Archive Creation: Optionally create a
.zip
archive of the output code dataset or archive for easy sharing and distribution. - Multi-threaded Processing: Leverages multi-threading to accelerate the processing of large codebases.
- Open Output File: Automatically opens the generated code dataset or archive file after processing.
- Skipped Items Detail: Option to include detailed lists of skipped folders and files in the output summary for complete transparency in dataset creation.