Skip to content

R and Python scripts to perform microbial network analysis of the AirBiome project, from data preprocessing to network comparisons and metadata correlations.

Notifications You must be signed in to change notification settings

Multiomics-Analytics-Group/airbiome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AirBiome Project: Microbial Network Analysis Workflow 🦠📊

This repository contains R and Python scripts organized into modules to perform microbial association network (MAN) analysis, from raw data preprocessing to network comparisons and metadata correlations. Here, we preprocess abundance and metadata files, build MANs using the NetComi R package, filter and analyze these networks, visualize them in Cytoscape, and correlate network modules with metadata variables.


Workflow Overview 🚀

The general workflow for analyzing MANs in this repository follows these steps:

  1. Preprocessing: Prepare raw abundance and metadata for analysis.

  2. Network Inference: Construct microbial association networks.

  3. Network Filtering: Split networks based on various thresholds and remove singletons or low-connectivity nodes.

  4. Threshold Analysis: Evaluate network topological metrics across networks with thresholds and identify optimal networks based on modularity and average clsutering coefficient.

  5. Network Visualization: Generate interactive visualizations of networks in Cytoscape.

  6. Network Analysis: Perform network analysis using NetComi and custom Python scripts.

  7. Metadata Correlation: Investigate relationships between network modules and metadata variables.

  8. Network Comparisons: Compare the structure and composition of two distinct networks, overlaps, and clustering similarities.

Scripts & Their Purpose 📁

Each subfolder within the Scripts directory contains specialized tools. For detailed usage instructions and parameters for individual scripts, refer to the README.md file located within each respective subfolder.

Folder Purpose Outputs
Preprocessing Initial data handling: parsing raw data into usable formats and exploratory data analysis (EDA). Cleaned abundance, taxonomic, and metadata files, EDA plots.
Network_inference Building microbial interaction networks with the cclasso method on NetComi. microNet R objects and corresponding edgelist CSVs, often saved by network type and threshold.
Threshold_analysis Calculating and visualizing topological metrics of networks across different thresholds. CSV files with calculated network metrics, PNG/SVG plots illustrating metric trends, and node clustering data for "best" networks.
Network_filtering Refining and cleaning inferred networks by applying thresholds and removing low-connectivity nodes. Filtered microNet objects and cleaned edgelist CSVs (typically with nosinglt in the filename).
Network_visualization Creating dynamic visualizations of networks in Cytoscape. Networks imported and visually styled within a running Cytoscape instance, mapping node properties (e.g., degree, cluster) and edge properties (e.g., weight, sign).
Network_analysis Advanced network analyses. Various analytical results, potentially including centrality measures, multi-layer network structures, and other network-specific metrics.
Metadata_correlation Correlating microbial community modules with sample metadata. CSV reports of module-metadata correlations (including statistical tests and FDR correction), and corresponding plots.
Network_comparisons Comparing different networks. Metrics and reports on network overlap, unique features, and clustering similarities.

For detailed instructions on running each script and understanding specific outputs, please refer to the individual README.md files located within each script subfolder (e.g., Scripts/Network_filtering/README.md).

Output Folders 📦

The Output directory is structured to store results from various analysis stages:

  • Best_nets: Stores information about optimal networks identified during threshold analysis. The .rds files were ommitted from this repo because of their large size, but they can be generated by running the scripts in the Threshold_analysis folder.

  • EDA: Contains results from initial exploratory data analysis.

  • Threshold_analysis: Stores calculated metrics and plots from threshold analysis.

Summary Reports

We created sumamry reports with the main results of this project using VueGen, a tool to automate the generation of scientific reports:

HTML5 Streamlit

Further details on the reports and the source code to generate them is available on this GitHub repository.

About

R and Python scripts to perform microbial network analysis of the AirBiome project, from data preprocessing to network comparisons and metadata correlations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published