Public-Health-Bioinformatics
diff --git a/‎README.md
Lines changed: 28 additions & 24 deletions b/‎README.md
Lines changed: 28 additions & 24 deletions
diff --git a/‎doc/images/galaxy-workflow-diagram.png
15.5 KB b/‎doc/images/galaxy-workflow-diagram.png
15.5 KB
diff --git a/‎src/main/java/org/publichealthbioinformatics/irida/plugin/speciesabundance/SpeciesAbundancePluginUpdater.java
Lines changed: 18 additions & 13 deletions b/‎src/main/java/org/publichealthbioinformatics/irida/plugin/speciesabundance/SpeciesAbundancePluginUpdater.java
Lines changed: 18 additions & 13 deletions
diff --git a/‎src/main/resources/workflows/0.2.0/irida_workflow.xml
Lines changed: 37 additions & 4 deletions b/‎src/main/resources/workflows/0.2.0/irida_workflow.xml
Lines changed: 37 additions & 4 deletions
@@ -32,12 +32,15 @@ This can be used to estimate the relative abundance of sequence reads originatin
 In order to use this pipeline, you will also have to install the [kraken2][] and [bracken][] Galaxy tools and their data 
 managers within your Galaxy instance. These can be found at:
 
-| Name                               | Version               | Revision                                                                                   |
-|------------------------------------|-----------------------|--------------------------------------------------------------------------------------------|
-| kraken2                            | `2.1.1+galaxy1`       | [`e674066930b2`](https://toolshed.g2.bx.psu.edu/view/iuc/kraken2/e674066930b2)             |
-| bracken                            | `2.6.1+galaxy0`       | [`b08ac10aed96`](https://toolshed.g2.bx.psu.edu/view/iuc/bracken/b08ac10aed96)             |
-| data_manager_build_kraken2_database| `2.1.1`               | [`2f27f3b86827`](https://toolshed.g2.bx.psu.edu/view/iuc/data_manager_build_kraken2_database/2f27f3b86827) |
-| data_manager_build_bracken_database| `2.5.1+galaxy1`       | [`3c7d2c84cb09`](https://toolshed.g2.bx.psu.edu/view/iuc/data_manager_build_bracken_database/3c7d2c84cb09) |
+| Name                                  | Version          | Owner                          | Metadata Revision | Galaxy Toolshed Link                                            |
+|---------------------------------------|------------------|--------------------------------|-------------------|----------------------------------------|
+| fastp                                 | `0.23.2+galaxy0` | `iuc`                          |   10 (2022-02-03) | [fastp-10:65b93b623c77](https://toolshed.g2.bx.psu.edu/view/iuc/fastp/65b93b623c77) |
+| fastp_json_to_tabular                 | `0.1.0`          | `public-health-bioinformatics` |    0 (2022-03-10) | [fastp_json_to_tabular-0:091a2fb2e7ad](https://toolshed.g2.bx.psu.edu/view/public-health-bioinformatics/fastp_json_to_tabular/091a2fb2e7ad) |
+| kraken2                               | `2.1.1+galaxy1`  | `iuc`                          |    4 (2021-02-17) | [kraken2-4:e674066930b2](https://toolshed.g2.bx.psu.edu/view/iuc/kraken2/e674066930b2) |
+| bracken                               | `2.6.1+galaxy0`  | `iuc`                          |    4 (2021-06-07) | [bracken-4:b08ac10aed96](https://toolshed.g2.bx.psu.edu/view/iuc/bracken/b08ac10aed96) |
+| adjust_bracken_for_unclassified_reads | `0.1.0`          | `public-health-bioinformatics` |    1 (2021-03-10) | [adjust_bracken_for_unclassified_reads-1:3cde438eb222](https://toolshed.g2.bx.psu.edu/view/iuc/bracken/b08ac10aed96) |
+| data_manager_build_kraken2_database   | `2.1.2+galaxy0`  | `iuc`                          |    6 (2022-06-24) | [`data_manager_build_kraken2_database-6:9002633b4737`](https://toolshed.g2.bx.psu.edu/view/iuc/data_manager_build_kraken2_database/9002633b4737) |
+| data_manager_build_bracken_database   | `2.5.1+galaxy1`  | `iuc`                          |    3 (2021-11-08) | [`data_manager_build_bracken_database-3:3c7d2c84cb09`](https://toolshed.g2.bx.psu.edu/view/iuc/data_manager_build_bracken_database/3c7d2c84cb09) |
 
 ## Preparing Databases
 
@@ -132,24 +135,25 @@ classification report, and a `bracken` estimate of the relative abundance of rea
 And, you should be able to save and view these results in the IRIDA metadata table. The following fields are written to
 the IRIDA 'Line List':
 
-| Field Name                           | Description                                                                         |
-|--------------------------------------|-------------------------------------------------------------------------------------|
-| `species-abundance/taxonomy_level`   | The taxonomic level at which reads were aggregated ('S' for species)                |
-| `species-abundance/taxon_name`       | The scientific name of the most abundant species in the sample                      |
-| `species-abundance/taxonomy_id`      | The NCBI taxonomy ID for the most abundant species in the sample                    |
-| `species-abundance/proportion`       | The proportion of reads in this sample assigned to the most abundant species        |
-| `species-abundance/taxon_name_2`     | The scientific name of the second-most abundant species in the sample               |
-| `species-abundance/taxonomy_id_2`    | The NCBI taxonomy ID for the second-most abundant species in the sample             |
-| `species-abundance/proportion_2`     | The proportion of reads in this sample assigned to the second-most abundant species |
-| `species-abundance/taxon_name_3`     | The scientific name of the third-most abundant species in the sample                |
-| `species-abundance/taxonomy_id_3`    | The NCBI taxonomy ID for the third-most abundant species in the sample              |
-| `species-abundance/proportion_3`     | The proportion of reads in this sample assigned to the third-most abundant species  |
-| `species-abundance/taxon_name_4`     | The scientific name of the fourth-most abundant species in the sample               |
-| `species-abundance/taxonomy_id_4`    | The NCBI taxonomy ID for the fourth-most abundant species in the sample             |
-| `species-abundance/proportion_4`     | The proportion of reads in this sample assigned to the fourth-most abundant species |
-| `species-abundance/taxon_name_5`     | The scientific name of the fifth-most abundant species in the sample                |
-| `species-abundance/taxonomy_id_5`    | The NCBI taxonomy ID for the fifth-most abundant species in the sample              |
-| `species-abundance/proportion_5`     | The proportion of reads in this sample assigned to the fifth-most abundant species  |
+| Field Name                                  | Description                                                                         |
+|---------------------------------------------|-------------------------------------------------------------------------------------|
+| `species-abundance/taxonomy_level`          | The taxonomic level at which reads were aggregated ('S' for species)                |
+| `species-abundance/taxon_name`              | The scientific name of the most abundant species in the sample                      |
+| `species-abundance/taxonomy_id`             | The NCBI taxonomy ID for the most abundant species in the sample                    |
+| `species-abundance/proportion`              | The proportion of reads in this sample assigned to the most abundant species        |
+| `species-abundance/taxon_name_2`            | The scientific name of the second-most abundant species in the sample               |
+| `species-abundance/taxonomy_id_2`           | The NCBI taxonomy ID for the second-most abundant species in the sample             |
+| `species-abundance/proportion_2`            | The proportion of reads in this sample assigned to the second-most abundant species |
+| `species-abundance/taxon_name_3`            | The scientific name of the third-most abundant species in the sample                |
+| `species-abundance/taxonomy_id_3`           | The NCBI taxonomy ID for the third-most abundant species in the sample              |
+| `species-abundance/proportion_3`            | The proportion of reads in this sample assigned to the third-most abundant species  |
+| `species-abundance/taxon_name_4`            | The scientific name of the fourth-most abundant species in the sample               |
+| `species-abundance/taxonomy_id_4`           | The NCBI taxonomy ID for the fourth-most abundant species in the sample             |
+| `species-abundance/proportion_4`            | The proportion of reads in this sample assigned to the fourth-most abundant species |
+| `species-abundance/taxon_name_5`            | The scientific name of the fifth-most abundant species in the sample                |
+| `species-abundance/taxonomy_id_5`           | The NCBI taxonomy ID for the fifth-most abundant species in the sample              |
+| `species-abundance/proportion_5`            | The proportion of reads in this sample assigned to the fifth-most abundant species  |
+| `species-abundance/proportion_unclassified` | The proportion of unclassified reads in the sample                                  |                                                                                  |
 
 Note that by default, these fields will not appear in sorted order in the line list. Refer to the [IRIDA Documentation on metadata management](https://phac-nml.github.io/irida-documentation/user/user/sample-metadata/#project-metadata-line-list) to create a customized view of these fields.
 
 
@@ -89,12 +89,16 @@ public void update(Collection<Sample> samples, AnalysisSubmission analysis) thro
 			String workflowName = iridaWorkflow.getWorkflowDescription().getName();
 
 			List<Map<String, String>> speciesAbundances = parseSpeciesAbundanceFile(speciesAbundanceFilePath);
+			Map<String, String> unclassifiedAbundances = speciesAbundances.remove(0);
+			String key;
+			String value;
+			PipelineProvidedMetadataEntry entry;
+			value = unclassifiedAbundances.get("bracken_fraction_total_seqs");
+			entry = new PipelineProvidedMetadataEntry(value, "float", analysis);
+			key = workflowName + "/" + "proportion_unclassified";
+			metadataEntries.put(key, entry);
 			int speciesNum = 1;
 			for (Map<String, String> species : speciesAbundances) {
-				String key;
-				String value;
-				PipelineProvidedMetadataEntry entry;
-
 				value = species.get("taxonomy_lvl");
 				entry = new PipelineProvidedMetadataEntry(value, "text", analysis);
 				// taxonomy_level is only recorded once per sample. (should be identical for all lines in a report.)
@@ -121,7 +125,7 @@ public void update(Collection<Sample> samples, AnalysisSubmission analysis) thro
 				}
 				metadataEntries.put(key, entry);
 
-				value = species.get("fraction_total_reads");
+				value = species.get("bracken_fraction_total_seqs");
 				entry = new PipelineProvidedMetadataEntry(value, "float", analysis);
 				if (speciesNum == 1) {
 					key = workflowName + "/" + "proportion";
@@ -150,8 +154,8 @@ public void update(Collection<Sample> samples, AnalysisSubmission analysis) thro
 	 *                      should look like:
 	 *
 	 *                      <pre>
-	 *                      name	taxonomy_id	taxonomy_lvl	kraken_assigned_reads	added_reads	new_est_reads	fraction_total_reads
-	 *                      Salmonella enterica	28901	S	433515	32457	465972	0.99016
+	 *                      name	taxonomy_id	taxonomy_lvl	kraken_assigned_seqs	bracken_assigned_seqs	total_seqs	kraken_fraction_total_seqs	bracken_fraction_total_seqs
+	 *                      Salmonella enterica	28901	S	228436	841169	848138	0.269338	0.991783
 	 *                      </pre>
 	 *
 	 * @return A {@link Map<String, String>} containing the read count.
@@ -162,13 +166,14 @@ List<Map<String, String>> parseSpeciesAbundanceFile(Path speciesAbundanceFilePat
 		BufferedReader speciesAbundanceReader = new BufferedReader(new FileReader(speciesAbundanceFilePath.toFile()));
 		List<Map<String, String>> abundances = new ArrayList<>();
 		ArrayList<String> expectedHeaderFields = new ArrayList<String>(Arrays.asList(
-				"name",
+		        "name",
 		        "taxonomy_id",
 		        "taxonomy_lvl",
-		        "kraken_assigned_reads",
-		        "added_reads",
-		        "new_est_reads",
-		        "fraction_total_reads"
+		        "kraken_assigned_seqs",
+		        "bracken_assigned_seqs",
+		        "total_seqs",
+		        "kraken_fraction_total_seqs",
+		        "bracken_fraction_total_seqs"
 		));
 		try {
 			String headerLine = speciesAbundanceReader.readLine();
@@ -187,7 +192,7 @@ List<Map<String, String>> parseSpeciesAbundanceFile(Path speciesAbundanceFilePat
 				}
 				abundances.add(speciesAbundanceMap);
 			}
-			for (int i = 0; i < (NUM_SPECIES_TO_REPORT - 1); i++) {
+			for (int i = 0; i < NUM_SPECIES_TO_REPORT; i++) {
 				abundancesLine = speciesAbundanceReader.readLine();
 				ArrayList<String> speciesAbundanceFields = new ArrayList<String>(Arrays.asList(abundancesLine.split("\t")));
 				Map<String, String> speciesAbundanceMap = new HashMap<>();
 
@@ -9,7 +9,7 @@
     <requiresSingleSample>true</requiresSingleSample>
   </inputs>
   <parameters>
-    <parameter name="kraken2-1-confidence" defaultValue="0.0">
+    <parameter name="kraken2-1-confidence" defaultValue="0.1">
       <toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/kraken2/kraken2/2.1.1+galaxy1" parameterName="confidence" label="Confidence" type="float"/>
     </parameter>
     <parameter name="kraken2-1-kraken2_database" required="true">
@@ -22,20 +22,35 @@
       <toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/kraken2/kraken2/2.1.1+galaxy1" parameterName="min_base_quality" label="Minimum Base Quality" type="integer"/>
     </parameter>
     <parameter name="bracken-2-threshold" defaultValue="10">
-      <toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.6.1+galaxy0" parameterName="threshold"/>
+      <toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.7+galaxy0" parameterName="threshold"/>
     </parameter>
     <parameter name="bracken-2-kmer_distr" required="true">
       <dynamicSource>
         <galaxyToolDataTable name="bracken_databases" displayColumn="name" parameterColumn="value" />
       </dynamicSource>
-      <toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.6.1+galaxy0" parameterName="kmer_distr"/>
+      <toolParameter toolId="toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.7+galaxy0" parameterName="kmer_distr"/>
     </parameter>
   </parameters>
   <outputs>
+    <output name="fastp_report" fileName="fastp_report.html" />
+    <output name="fastp_json" fileName="fastp.json" />
     <output name="species_abundance" fileName="species_abundance.tsv" />
     <output name="kraken2_report" fileName="kraken2_report.tsv" />
+    <output name="krona_chart" fileName="krona_chart.html" />
   </outputs>
   <toolRepositories>
+    <repository>
+      <name>fastp</name>
+      <owner>iuc</owner>
+      <url>https://toolshed.g2.bx.psu.edu</url>
+      <revision>65b93b623c77</revision>
+    </repository>
+    <repository>
+      <name>fastp_json_to_tabular</name>
+      <owner>public-health-bioinformatics</owner>
+      <url>https://toolshed.g2.bx.psu.edu</url>
+      <revision>091a2fb2e7ad</revision>
+    </repository>
     <repository>
       <name>kraken2</name>
       <owner>iuc</owner>
@@ -48,11 +63,29 @@
       <url>https://toolshed.g2.bx.psu.edu</url>
       <revision>b08ac10aed96</revision>
     </repository>
+    <repository>
+      <name>adjust_bracken_for_unclassified_reads</name>
+      <owner>public-health-bioinformatics</owner>
+      <url>https://toolshed.g2.bx.psu.edu</url>
+      <revision>899a650587ed</revision>
+    </repository>
+    <repository>
+      <name>krakentools</name>
+      <owner>jvolkening</owner>
+      <url>https://toolshed.g2.bx.psu.edu</url>
+      <revision>d491c23394f9</revision>
+    </repository>
+    <repository>
+      <name>taxonomy_krona_chart</name>
+      <owner>crs4</owner>
+      <url>https://toolshed.g2.bx.psu.edu</url>
+      <revision>e9005d1f3cfd</revision>
+    </repository>
     <repository>
       <name>data_manager_build_kraken2_database</name>
       <owner>iuc</owner>
       <url>https://toolshed.g2.bx.psu.edu</url>
-      <revision>2f27f3b86827</revision>
+      <revision>9002633b4737</revision>
     </repository>
     <repository>
       <name>data_manager_build_bracken_database</name>