Skip to content

All in one script for RNAseq ab initio assembly, mapping, differential expression, and annotation with reference

Notifications You must be signed in to change notification settings

noncodo/EZassembly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EZassembly

Integrated pipeline for all in one ab initio assembly, mapping, annotation, and differential expression of RNAseq data when a reference transcriptome is available (reference agnostic).

Requires SGE HPC environment and several common and easy to compile dependencies.

What it does

Splits up the Trinity pipeline into 3 stages, each using optimal server resources ensuring that the job finishes quickly, and that you are not monopolosing resources. Based on this analysis http://dx.doi.org/10.1145/2335755.2335842

How to use it:

(1) Copy this folder tree to somewhere in your $HOME or /share

(2) Edit the "custom.cfg" file with some specific parameters in the "Custom Variables" section (email, group quota, and additional params). The .cfg file contains information on how to setup your email account to receive SGE notifications when your job starts and finishes.

(3) Try a 'dry run', like so" ./dynomite.bash ./example/R1_list.txt ./example/R2_list.txt 1 ./trinity_out

(4) Generate the R1.txt and R2.txt files, which contain the full paths of all fastq files to be assembled (one per line).

for file in /path/to/dir_1/*_R1.fastq /path/to/dir_2/*_R1.fastq; do echo $file ; done > R1.txt
for file in /path/to/dir_1/*_R2.fastq /path/to/dir_2/*_R2.fastq; do echo $file ; done > R2.txt

(5) Run a full job. Example: ./dynomite.bash ./R1.txt ./R2.txt This will default to 8 CPU (ideal if not trimming) and output written to: /share/ClusterScratch/your_username/trinity

If your jobs did not complete, you can re-run this script without performing all of the analysis again as there are builtin recovery checkpoints. However, should you wonder why your jobs didn't complete, check your output files (TNTpre.oXXXXXX, TNTpre.eXXXXXX, TNT.o/TNT.e/tnt.o/tnt.e).

Some situations often do not produce informative error messages. Common hard-to-diagnose problems include: -Using compressed input files (decompress them first); -Going over quota for RAM (increase the number of CPUs to auto-adjust RAM). This often happens during the normalization stage. -Going over quota for # output files (try deleting the "read_components" folder and start over). This may be problematic with large and complex datasets. If it reoccurs, try using /share/Temp for your output or contact the sysadmin to request a larger file # quota on ClusterScratch.

Email m.smith@garvan.org.au for help.

N.B. Any custom troubleshooting may require compensation in the form of fermented malt with hops.

About

All in one script for RNAseq ab initio assembly, mapping, differential expression, and annotation with reference

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages