Skip to content

nextgenusfs/gfftk

Repository files navigation

Latest Github release Conda Code style: black Tests codecov

GFFtk: genome annotation tool kit

GFFtk is a comprehensive toolkit for working with genome annotation files in GFF3, GTF, and TBL formats. It provides powerful conversion, filtering, and manipulation capabilities for genomic data.

Features

  • Format Conversion: Convert between GFF3, GTF, TBL, and GenBank formats
  • Combined GFF3+FASTA: Support for combined files containing both annotations and sequences
  • Sequence Extraction: Extract protein and transcript sequences from annotations
  • Advanced Filtering: Filter annotations using flexible regex patterns
  • Consensus Models: Generate consensus gene models from multiple sources
  • Non-Standard Features: Support for intron, noncoding_exon, five_prime_UTR_intron, and pseudogenic_exon features
  • File Manipulation: Sort, sanitize, and rename features in annotation files

Installation

To install release versions use the pip package manager:

python -m pip install gfftk

To install the most updated code in master you can run:

python -m pip install git+https://github.com/nextgenusfs/gfftk.git

Quick Start

Basic Format Conversion

# Convert GFF3 to GTF
gfftk convert -i input.gff3 -f genome.fasta -o output.gtf

# Extract protein sequences
gfftk convert -i input.gff3 -f genome.fasta -o proteins.faa --output-format proteins

Combined GFF3+FASTA Format

# Create a combined file from separate GFF3 and FASTA files
gfftk convert -i input.gff3 -f genome.fasta -o combined.gff --output-format combined

# Read a combined file (no separate FASTA file needed)
gfftk convert -i combined.gff -o output.gff3 --output-format gff3

Advanced Filtering

# Keep only kinase genes
gfftk convert -i input.gff3 -f genome.fasta -o kinases.gff3 --grep product:kinase

# Remove augustus predictions
gfftk convert -i input.gff3 -f genome.fasta -o filtered.gff3 --grepv source:augustus

# Case-insensitive filtering with regex
gfftk convert -i input.gff3 -f genome.fasta -o results.gff3 --grep product:KINASE:i

# Combined filtering
gfftk convert -i input.gff3 -f genome.fasta -o filtered.gff3 \
    --grep product:kinase --grepv source:augustus

Filter Pattern Syntax

  • key:pattern - Basic string matching
  • key:pattern:i - Case-insensitive matching
  • key:regex - Regular expression patterns
  • Multiple --grep or --grepv flags for complex filtering

Common filter keys: product, source, name, note, contig, strand, type, db_xref, go_terms

For more examples and detailed documentation, see the tutorial.

Development

Code Formatting

This project uses pre-commit to ensure code quality and consistency. The pre-commit hooks run Black (code formatter), isort (import sorter), and flake8 (linter).

To set up pre-commit:

  1. Install pre-commit:
pip install pre-commit
  1. Install the git hooks:
pre-commit install
  1. (Optional) Run against all files:
pre-commit run --all-files

After installation, the pre-commit hooks will run automatically on each commit to ensure your code follows the project's style guidelines.

About

GFF toolkit

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages