GFFtk is a comprehensive toolkit for working with genome annotation files in GFF3, GTF, and TBL formats. It provides powerful conversion, filtering, and manipulation capabilities for genomic data.
- Format Conversion: Convert between GFF3, GTF, TBL, and GenBank formats
- Combined GFF3+FASTA: Support for combined files containing both annotations and sequences
- Sequence Extraction: Extract protein and transcript sequences from annotations
- Advanced Filtering: Filter annotations using flexible regex patterns
- Consensus Models: Generate consensus gene models from multiple sources
- Non-Standard Features: Support for intron, noncoding_exon, five_prime_UTR_intron, and pseudogenic_exon features
- File Manipulation: Sort, sanitize, and rename features in annotation files
To install release versions use the pip package manager:
python -m pip install gfftk
To install the most updated code in master you can run:
python -m pip install git+https://github.com/nextgenusfs/gfftk.git
# Convert GFF3 to GTF
gfftk convert -i input.gff3 -f genome.fasta -o output.gtf
# Extract protein sequences
gfftk convert -i input.gff3 -f genome.fasta -o proteins.faa --output-format proteins
# Create a combined file from separate GFF3 and FASTA files
gfftk convert -i input.gff3 -f genome.fasta -o combined.gff --output-format combined
# Read a combined file (no separate FASTA file needed)
gfftk convert -i combined.gff -o output.gff3 --output-format gff3
# Keep only kinase genes
gfftk convert -i input.gff3 -f genome.fasta -o kinases.gff3 --grep product:kinase
# Remove augustus predictions
gfftk convert -i input.gff3 -f genome.fasta -o filtered.gff3 --grepv source:augustus
# Case-insensitive filtering with regex
gfftk convert -i input.gff3 -f genome.fasta -o results.gff3 --grep product:KINASE:i
# Combined filtering
gfftk convert -i input.gff3 -f genome.fasta -o filtered.gff3 \
--grep product:kinase --grepv source:augustus
key:pattern
- Basic string matchingkey:pattern:i
- Case-insensitive matchingkey:regex
- Regular expression patterns- Multiple
--grep
or--grepv
flags for complex filtering
Common filter keys: product
, source
, name
, note
, contig
, strand
, type
, db_xref
, go_terms
For more examples and detailed documentation, see the tutorial.
This project uses pre-commit to ensure code quality and consistency. The pre-commit hooks run Black (code formatter), isort (import sorter), and flake8 (linter).
To set up pre-commit:
- Install pre-commit:
pip install pre-commit
- Install the git hooks:
pre-commit install
- (Optional) Run against all files:
pre-commit run --all-files
After installation, the pre-commit hooks will run automatically on each commit to ensure your code follows the project's style guidelines.