Skip to content

Commit 468b8e0

Browse files
authored
Update v0.2 workflow (#14)
* Add updated workflow * Update workflow diagram * Use updated bracken adjustment tool * Add krona chart * Fix unit tests * whitespace * Capture top 5 species, set default confidence to 0.1 * Add proportion_unclassified to table of metadata fields * whitespace
1 parent 2138c00 commit 468b8e0

File tree

10 files changed

+623
-261
lines changed

10 files changed

+623
-261
lines changed

README.md

Lines changed: 28 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,15 @@ This can be used to estimate the relative abundance of sequence reads originatin
3232
In order to use this pipeline, you will also have to install the [kraken2][] and [bracken][] Galaxy tools and their data
3333
managers within your Galaxy instance. These can be found at:
3434

35-
| Name | Version | Revision |
36-
|------------------------------------|-----------------------|--------------------------------------------------------------------------------------------|
37-
| kraken2 | `2.1.1+galaxy1` | [`e674066930b2`](https://toolshed.g2.bx.psu.edu/view/iuc/kraken2/e674066930b2) |
38-
| bracken | `2.6.1+galaxy0` | [`b08ac10aed96`](https://toolshed.g2.bx.psu.edu/view/iuc/bracken/b08ac10aed96) |
39-
| data_manager_build_kraken2_database| `2.1.1` | [`2f27f3b86827`](https://toolshed.g2.bx.psu.edu/view/iuc/data_manager_build_kraken2_database/2f27f3b86827) |
40-
| data_manager_build_bracken_database| `2.5.1+galaxy1` | [`3c7d2c84cb09`](https://toolshed.g2.bx.psu.edu/view/iuc/data_manager_build_bracken_database/3c7d2c84cb09) |
35+
| Name | Version | Owner | Metadata Revision | Galaxy Toolshed Link |
36+
|---------------------------------------|------------------|--------------------------------|-------------------|----------------------------------------|
37+
| fastp | `0.23.2+galaxy0` | `iuc` | 10 (2022-02-03) | [fastp-10:65b93b623c77](https://toolshed.g2.bx.psu.edu/view/iuc/fastp/65b93b623c77) |
38+
| fastp_json_to_tabular | `0.1.0` | `public-health-bioinformatics` | 0 (2022-03-10) | [fastp_json_to_tabular-0:091a2fb2e7ad](https://toolshed.g2.bx.psu.edu/view/public-health-bioinformatics/fastp_json_to_tabular/091a2fb2e7ad) |
39+
| kraken2 | `2.1.1+galaxy1` | `iuc` | 4 (2021-02-17) | [kraken2-4:e674066930b2](https://toolshed.g2.bx.psu.edu/view/iuc/kraken2/e674066930b2) |
40+
| bracken | `2.6.1+galaxy0` | `iuc` | 4 (2021-06-07) | [bracken-4:b08ac10aed96](https://toolshed.g2.bx.psu.edu/view/iuc/bracken/b08ac10aed96) |
41+
| adjust_bracken_for_unclassified_reads | `0.1.0` | `public-health-bioinformatics` | 1 (2021-03-10) | [adjust_bracken_for_unclassified_reads-1:3cde438eb222](https://toolshed.g2.bx.psu.edu/view/iuc/bracken/b08ac10aed96) |
42+
| data_manager_build_kraken2_database | `2.1.2+galaxy0` | `iuc` | 6 (2022-06-24) | [`data_manager_build_kraken2_database-6:9002633b4737`](https://toolshed.g2.bx.psu.edu/view/iuc/data_manager_build_kraken2_database/9002633b4737) |
43+
| data_manager_build_bracken_database | `2.5.1+galaxy1` | `iuc` | 3 (2021-11-08) | [`data_manager_build_bracken_database-3:3c7d2c84cb09`](https://toolshed.g2.bx.psu.edu/view/iuc/data_manager_build_bracken_database/3c7d2c84cb09) |
4144

4245
## Preparing Databases
4346

@@ -132,24 +135,25 @@ classification report, and a `bracken` estimate of the relative abundance of rea
132135
And, you should be able to save and view these results in the IRIDA metadata table. The following fields are written to
133136
the IRIDA 'Line List':
134137

135-
| Field Name | Description |
136-
|--------------------------------------|-------------------------------------------------------------------------------------|
137-
| `species-abundance/taxonomy_level` | The taxonomic level at which reads were aggregated ('S' for species) |
138-
| `species-abundance/taxon_name` | The scientific name of the most abundant species in the sample |
139-
| `species-abundance/taxonomy_id` | The NCBI taxonomy ID for the most abundant species in the sample |
140-
| `species-abundance/proportion` | The proportion of reads in this sample assigned to the most abundant species |
141-
| `species-abundance/taxon_name_2` | The scientific name of the second-most abundant species in the sample |
142-
| `species-abundance/taxonomy_id_2` | The NCBI taxonomy ID for the second-most abundant species in the sample |
143-
| `species-abundance/proportion_2` | The proportion of reads in this sample assigned to the second-most abundant species |
144-
| `species-abundance/taxon_name_3` | The scientific name of the third-most abundant species in the sample |
145-
| `species-abundance/taxonomy_id_3` | The NCBI taxonomy ID for the third-most abundant species in the sample |
146-
| `species-abundance/proportion_3` | The proportion of reads in this sample assigned to the third-most abundant species |
147-
| `species-abundance/taxon_name_4` | The scientific name of the fourth-most abundant species in the sample |
148-
| `species-abundance/taxonomy_id_4` | The NCBI taxonomy ID for the fourth-most abundant species in the sample |
149-
| `species-abundance/proportion_4` | The proportion of reads in this sample assigned to the fourth-most abundant species |
150-
| `species-abundance/taxon_name_5` | The scientific name of the fifth-most abundant species in the sample |
151-
| `species-abundance/taxonomy_id_5` | The NCBI taxonomy ID for the fifth-most abundant species in the sample |
152-
| `species-abundance/proportion_5` | The proportion of reads in this sample assigned to the fifth-most abundant species |
138+
| Field Name | Description |
139+
|---------------------------------------------|-------------------------------------------------------------------------------------|
140+
| `species-abundance/taxonomy_level` | The taxonomic level at which reads were aggregated ('S' for species) |
141+
| `species-abundance/taxon_name` | The scientific name of the most abundant species in the sample |
142+
| `species-abundance/taxonomy_id` | The NCBI taxonomy ID for the most abundant species in the sample |
143+
| `species-abundance/proportion` | The proportion of reads in this sample assigned to the most abundant species |
144+
| `species-abundance/taxon_name_2` | The scientific name of the second-most abundant species in the sample |
145+
| `species-abundance/taxonomy_id_2` | The NCBI taxonomy ID for the second-most abundant species in the sample |
146+
| `species-abundance/proportion_2` | The proportion of reads in this sample assigned to the second-most abundant species |
147+
| `species-abundance/taxon_name_3` | The scientific name of the third-most abundant species in the sample |
148+
| `species-abundance/taxonomy_id_3` | The NCBI taxonomy ID for the third-most abundant species in the sample |
149+
| `species-abundance/proportion_3` | The proportion of reads in this sample assigned to the third-most abundant species |
150+
| `species-abundance/taxon_name_4` | The scientific name of the fourth-most abundant species in the sample |
151+
| `species-abundance/taxonomy_id_4` | The NCBI taxonomy ID for the fourth-most abundant species in the sample |
152+
| `species-abundance/proportion_4` | The proportion of reads in this sample assigned to the fourth-most abundant species |
153+
| `species-abundance/taxon_name_5` | The scientific name of the fifth-most abundant species in the sample |
154+
| `species-abundance/taxonomy_id_5` | The NCBI taxonomy ID for the fifth-most abundant species in the sample |
155+
| `species-abundance/proportion_5` | The proportion of reads in this sample assigned to the fifth-most abundant species |
156+
| `species-abundance/proportion_unclassified` | The proportion of unclassified reads in the sample | |
153157

154158
Note that by default, these fields will not appear in sorted order in the line list. Refer to the [IRIDA Documentation on metadata management](https://phac-nml.github.io/irida-documentation/user/user/sample-metadata/#project-metadata-line-list) to create a customized view of these fields.
155159

15.5 KB
Loading

src/main/java/org/publichealthbioinformatics/irida/plugin/speciesabundance/SpeciesAbundancePluginUpdater.java

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -89,12 +89,16 @@ public void update(Collection<Sample> samples, AnalysisSubmission analysis) thro
8989
String workflowName = iridaWorkflow.getWorkflowDescription().getName();
9090

9191
List<Map<String, String>> speciesAbundances = parseSpeciesAbundanceFile(speciesAbundanceFilePath);
92+
Map<String, String> unclassifiedAbundances = speciesAbundances.remove(0);
93+
String key;
94+
String value;
95+
PipelineProvidedMetadataEntry entry;
96+
value = unclassifiedAbundances.get("bracken_fraction_total_seqs");
97+
entry = new PipelineProvidedMetadataEntry(value, "float", analysis);
98+
key = workflowName + "/" + "proportion_unclassified";
99+
metadataEntries.put(key, entry);
92100
int speciesNum = 1;
93101
for (Map<String, String> species : speciesAbundances) {
94-
String key;
95-
String value;
96-
PipelineProvidedMetadataEntry entry;
97-
98102
value = species.get("taxonomy_lvl");
99103
entry = new PipelineProvidedMetadataEntry(value, "text", analysis);
100104
// taxonomy_level is only recorded once per sample. (should be identical for all lines in a report.)
@@ -121,7 +125,7 @@ public void update(Collection<Sample> samples, AnalysisSubmission analysis) thro
121125
}
122126
metadataEntries.put(key, entry);
123127

124-
value = species.get("fraction_total_reads");
128+
value = species.get("bracken_fraction_total_seqs");
125129
entry = new PipelineProvidedMetadataEntry(value, "float", analysis);
126130
if (speciesNum == 1) {
127131
key = workflowName + "/" + "proportion";
@@ -150,8 +154,8 @@ public void update(Collection<Sample> samples, AnalysisSubmission analysis) thro
150154
* should look like:
151155
*
152156
* <pre>
153-
* name taxonomy_id taxonomy_lvl kraken_assigned_reads added_reads new_est_reads fraction_total_reads
154-
* Salmonella enterica 28901 S 433515 32457 465972 0.99016
157+
* name taxonomy_id taxonomy_lvl kraken_assigned_seqs bracken_assigned_seqs total_seqs kraken_fraction_total_seqs bracken_fraction_total_seqs
158+
* Salmonella enterica 28901 S 228436 841169 848138 0.269338 0.991783
155159
* </pre>
156160
*
157161
* @return A {@link Map<String, String>} containing the read count.
@@ -162,13 +166,14 @@ List<Map<String, String>> parseSpeciesAbundanceFile(Path speciesAbundanceFilePat
162166
BufferedReader speciesAbundanceReader = new BufferedReader(new FileReader(speciesAbundanceFilePath.toFile()));
163167
List<Map<String, String>> abundances = new ArrayList<>();
164168
ArrayList<String> expectedHeaderFields = new ArrayList<String>(Arrays.asList(
165-
"name",
169+
"name",
166170
"taxonomy_id",
167171
"taxonomy_lvl",
168-
"kraken_assigned_reads",
169-
"added_reads",
170-
"new_est_reads",
171-
"fraction_total_reads"
172+
"kraken_assigned_seqs",
173+
"bracken_assigned_seqs",
174+
"total_seqs",
175+
"kraken_fraction_total_seqs",
176+
"bracken_fraction_total_seqs"
172177
));
173178
try {
174179
String headerLine = speciesAbundanceReader.readLine();
@@ -187,7 +192,7 @@ List<Map<String, String>> parseSpeciesAbundanceFile(Path speciesAbundanceFilePat
187192
}
188193
abundances.add(speciesAbundanceMap);
189194
}
190-
for (int i = 0; i < (NUM_SPECIES_TO_REPORT - 1); i++) {
195+
for (int i = 0; i < NUM_SPECIES_TO_REPORT; i++) {
191196
abundancesLine = speciesAbundanceReader.readLine();
192197
ArrayList<String> speciesAbundanceFields = new ArrayList<String>(Arrays.asList(abundancesLine.split("\t")));
193198
Map<String, String> speciesAbundanceMap = new HashMap<>();

src/main/resources/workflows/0.2.0/irida_workflow.xml

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
<requiresSingleSample>true</requiresSingleSample>
1010
</inputs>
1111
<parameters>
12-
<parameter name="kraken2-1-confidence" defaultValue="0.0">
12+
<parameter name="kraken2-1-confidence" defaultValue="0.1">
1313
<toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/kraken2/kraken2/2.1.1+galaxy1" parameterName="confidence" label="Confidence" type="float"/>
1414
</parameter>
1515
<parameter name="kraken2-1-kraken2_database" required="true">
@@ -22,20 +22,35 @@
2222
<toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/kraken2/kraken2/2.1.1+galaxy1" parameterName="min_base_quality" label="Minimum Base Quality" type="integer"/>
2323
</parameter>
2424
<parameter name="bracken-2-threshold" defaultValue="10">
25-
<toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.6.1+galaxy0" parameterName="threshold"/>
25+
<toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.7+galaxy0" parameterName="threshold"/>
2626
</parameter>
2727
<parameter name="bracken-2-kmer_distr" required="true">
2828
<dynamicSource>
2929
<galaxyToolDataTable name="bracken_databases" displayColumn="name" parameterColumn="value" />
3030
</dynamicSource>
31-
<toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.6.1+galaxy0" parameterName="kmer_distr"/>
31+
<toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.7+galaxy0" parameterName="kmer_distr"/>
3232
</parameter>
3333
</parameters>
3434
<outputs>
35+
<output name="fastp_report" fileName="fastp_report.html" />
36+
<output name="fastp_json" fileName="fastp.json" />
3537
<output name="species_abundance" fileName="species_abundance.tsv" />
3638
<output name="kraken2_report" fileName="kraken2_report.tsv" />
39+
<output name="krona_chart" fileName="krona_chart.html" />
3740
</outputs>
3841
<toolRepositories>
42+
<repository>
43+
<name>fastp</name>
44+
<owner>iuc</owner>
45+
<url>https://toolshed.g2.bx.psu.edu</url>
46+
<revision>65b93b623c77</revision>
47+
</repository>
48+
<repository>
49+
<name>fastp_json_to_tabular</name>
50+
<owner>public-health-bioinformatics</owner>
51+
<url>https://toolshed.g2.bx.psu.edu</url>
52+
<revision>091a2fb2e7ad</revision>
53+
</repository>
3954
<repository>
4055
<name>kraken2</name>
4156
<owner>iuc</owner>
@@ -48,11 +63,29 @@
4863
<url>https://toolshed.g2.bx.psu.edu</url>
4964
<revision>b08ac10aed96</revision>
5065
</repository>
66+
<repository>
67+
<name>adjust_bracken_for_unclassified_reads</name>
68+
<owner>public-health-bioinformatics</owner>
69+
<url>https://toolshed.g2.bx.psu.edu</url>
70+
<revision>899a650587ed</revision>
71+
</repository>
72+
<repository>
73+
<name>krakentools</name>
74+
<owner>jvolkening</owner>
75+
<url>https://toolshed.g2.bx.psu.edu</url>
76+
<revision>d491c23394f9</revision>
77+
</repository>
78+
<repository>
79+
<name>taxonomy_krona_chart</name>
80+
<owner>crs4</owner>
81+
<url>https://toolshed.g2.bx.psu.edu</url>
82+
<revision>e9005d1f3cfd</revision>
83+
</repository>
5184
<repository>
5285
<name>data_manager_build_kraken2_database</name>
5386
<owner>iuc</owner>
5487
<url>https://toolshed.g2.bx.psu.edu</url>
55-
<revision>2f27f3b86827</revision>
88+
<revision>9002633b4737</revision>
5689
</repository>
5790
<repository>
5891
<name>data_manager_build_bracken_database</name>

0 commit comments

Comments
 (0)