GitHub - theislab/bartseq-pipeline: Demultiplexing pipeline for BART-Seq

BART-Seq pipeline

This is a pipeline for BART-Seq implemented with Snakemake

The primer design code and an older version lives in theislab/bartSeq.

Entry point

The pipeline can be run via snakemake [-j 4] [-s …/bartseq/Snakefile] [-d …/mydata], where -j specifies the number of threads, and the other parameters default to ./Snakefile and ., respectively.

Within the data directory, the following structure is expected:

in/
- reads/
  - <libname>_R1_001.fastq.gz
  - <libname>_R2_001.fastq.gz
  - optionally additional libraries…
- amplicons.fa or amplicons/<libname>.fa for all libraries
- barcodes.fa or barcodes/<libname>.fa for all libraries

config.yml – A file with the defaults

amplicon-min-length: null  # You can set an integer like 70
allow-mismatch:      True  # You can set this to False

Through the way Snakemake works, you need to create this file. leave it empty to use the defaults.

Output

The pipeline creates a process and an out directory.

The out directory contains plots and summary spreadsheets.

process/3-tagged contains the tagged FASTQ reads without alignment, but you can add a tag for the mapped amplicon by executing e.g. python -m bartseq browse NGS16 Lib1_S1_L001 | gzip >Lib1.fq.gz

Command line interface

python -m bartseq tag [<options>] [in_1] [out_1]

in_1: Read1 file to read from. Supported compression: see --in-compression
out_1: Read1 file to write to. Supported compression: see --out-compression

`--in-2 IN_2`	Read2 file to read from. Supported compression: see --in-compression
`--out-2 OUT_2`	Read2 file to write to. Supported compression: see --out-compression
`--bc-file=BC_FILE, -b BC_FILE`
	Barcode file in the format `<ID> <Sequence>` (with header)
`--stats-file=STATS_FILE, -s STATS_FILE`
	File to write final stats to (in JSON format)
`--bc-table=BC_TABLE, -B BC_TABLE`
	File name for the HTML table of barcode mismatches
`--total=TOTAL, -t TOTAL`
	Number of fastq records in file. “0” means no progressbar
`--len-primer=LEN_PRIMER, -p LEN_PRIMER`
	Primer length for stats
`--len-linker=LEN_LINKER, -l LEN_LINKER`
	Linker length to cut out
`--in-compression=<gz\|xz\|bz2>, -i <gz\|xz\|bz2>`
	Specify compression if reading from stdin or a file with unusual suffix
`--out-compression=<gz\|xz\|bz2>, -o <gz\|xz\|bz2>`
	Specify compression if writing to stdout or a file with unusual suffix
`--dry-run, -n`	Only print what would be done and exit

python -m bartseq count [<options>] data_dir [library]

data_dir: Data directory to read from. Needs to have the directories “./process/{3-tagged,4-mapped}” filled.
library: Library name. E.g. “Lib1_S1_L001” for input files named “Lib1_S1_L001_R{12}_001.fastq.gz”. Omittable if only one library exists.

`--no-mismatch`	Ignore barcodes with mismatches while counting.
`--both`	Print the count results for both to stdout. Default: Write to “./process/5-counts” instead
`--one`	Print the count results for one to stdout. Default: Write to “./process/5-counts” instead

python -m bartseq browse [<options>] data_dir [library] [out]

data_dir: Data directory to read from. Needs to have the directories “./process/{3-tagged,4-mapped}” filled.
library: Library name. E.g. “Lib1_S1_L001” for input files named “Lib1_S1_L001_R{12}_001.fastq.gz”. Omittable if only one library exists.
out: FASTA file to write to. Supported compression: see --out-compression

`--out-compression <gz\|xz\|bz2>, -o <gz\|xz\|bz2>`
	Specify compression if writing to stdout or a file with unusual suffix

Data and statistics

Read structure

trash
3nt protection CCA ()
8nt barcode (known from set)
Linker (one for left bcs, one for right bcs)
Primer + Rest of Amplicon

Interesting Statistics

Make statistics: How many reads have a barcode, ...

from reads tagged with info:

Barcode available?
Trash before bc?
Where bc?
Concatamere? (bc[-bc-bc…]-linker-primer)
Which nucleotides where bc should be?
Amplicon maps to which gene?

Possible Problems

No Amplicons: Only bc and linker
Amplicon quality bad at the end
Trash at the beginning
Barcodes can have mismatches

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
bartseq		bartseq
tests		tests
.gitignore		.gitignore
BartSeq Snakemake.odp		BartSeq Snakemake.odp
LICENSE		LICENSE
README.rst		README.rst
Snakefile		Snakefile
library info.rst		library info.rst
pyproject.toml		pyproject.toml
test.sh		test.sh
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BART-Seq pipeline

Entry point

Output

Command line interface

Data and statistics

Read structure

Interesting Statistics

Possible Problems

About

Uh oh!

Releases 1

Packages

Languages

License

theislab/bartseq-pipeline

Folders and files

Latest commit

History

Repository files navigation

BART-Seq pipeline

Entry point

Output

Command line interface

Data and statistics

Read structure

Interesting Statistics

Possible Problems

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages