This repository contains R and Python scripts organized into modules to perform microbial association network (MAN) analysis, from raw data preprocessing to network comparisons and metadata correlations. Here, we preprocess abundance and metadata files, build MANs using the NetComi R package, filter and analyze these networks, visualize them in Cytoscape, and correlate network modules with metadata variables.
The general workflow for analyzing MANs in this repository follows these steps:
-
Preprocessing: Prepare raw abundance and metadata for analysis.
-
Network Inference: Construct microbial association networks.
-
Network Filtering: Split networks based on various thresholds and remove singletons or low-connectivity nodes.
-
Threshold Analysis: Evaluate network topological metrics across networks with thresholds and identify optimal networks based on modularity and average clsutering coefficient.
-
Network Visualization: Generate interactive visualizations of networks in Cytoscape.
-
Network Analysis: Perform network analysis using NetComi and custom Python scripts.
-
Metadata Correlation: Investigate relationships between network modules and metadata variables.
-
Network Comparisons: Compare the structure and composition of two distinct networks, overlaps, and clustering similarities.
Each subfolder within the Scripts
directory contains specialized tools. For detailed usage instructions and parameters for individual scripts, refer to the README.md
file located within each respective subfolder.
Folder | Purpose | Outputs |
---|---|---|
Preprocessing |
Initial data handling: parsing raw data into usable formats and exploratory data analysis (EDA). | Cleaned abundance, taxonomic, and metadata files, EDA plots. |
Network_inference |
Building microbial interaction networks with the cclasso method on NetComi. | microNet R objects and corresponding edgelist CSVs, often saved by network type and threshold. |
Threshold_analysis |
Calculating and visualizing topological metrics of networks across different thresholds. | CSV files with calculated network metrics, PNG/SVG plots illustrating metric trends, and node clustering data for "best" networks. |
Network_filtering |
Refining and cleaning inferred networks by applying thresholds and removing low-connectivity nodes. | Filtered microNet objects and cleaned edgelist CSVs (typically with nosinglt in the filename). |
Network_visualization |
Creating dynamic visualizations of networks in Cytoscape. | Networks imported and visually styled within a running Cytoscape instance, mapping node properties (e.g., degree, cluster) and edge properties (e.g., weight, sign). |
Network_analysis |
Advanced network analyses. | Various analytical results, potentially including centrality measures, multi-layer network structures, and other network-specific metrics. |
Metadata_correlation |
Correlating microbial community modules with sample metadata. | CSV reports of module-metadata correlations (including statistical tests and FDR correction), and corresponding plots. |
Network_comparisons |
Comparing different networks. | Metrics and reports on network overlap, unique features, and clustering similarities. |
For detailed instructions on running each script and understanding specific outputs, please refer to the individual README.md
files located within each script subfolder (e.g., Scripts/Network_filtering/README.md
).
The Output
directory is structured to store results from various analysis stages:
-
Best_nets
: Stores information about optimal networks identified during threshold analysis. The .rds files were ommitted from this repo because of their large size, but they can be generated by running the scripts in theThreshold_analysis
folder. -
EDA
: Contains results from initial exploratory data analysis. -
Threshold_analysis
: Stores calculated metrics and plots from threshold analysis.
We created sumamry reports with the main results of this project using VueGen, a tool to automate the generation of scientific reports:
Further details on the reports and the source code to generate them is available on this GitHub repository.