Skip to content

ema-pe/fluidc-large-network-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FluidC on large and real complex networks

This small project evaluates the FluidC algorithm for community detection in large-scale and real-data complex networks. It includes Python scripts to run experiments, analyze results, and generate plots.

This was developed in July 2025 as part of a computer science doctoral course exam called "Graph Theory and Algorithms" at the University of Milano-Bicocca for the 2024-2025 academic year. A brief report on the experiments and their results is available for download. You can also get the raw results files (plots and CSV files). The project will not be updated after submission, and the code is provided as-is.

The git repository is available online on both GitLab and GitHub. However, GitHub is only a mirror of GitLab.

How to run

  1. Prepare the virtual environment (Python 3 is required):

    $ git clone https://gitlab.com/ema-pe/fluidc-large-network-analysis.git
    $ cd fluidc
    $ python3 -m venv .env
    $ source .env/bin/activate
    $ pip install -r requirements.txt
  2. Download the complex networks and the ground-truth communities (from SNAP). A Bash script is provided to automatically download them:

    $ chmod u+x dataset/download.sh
    $ ./dataset/download.sh
  3. Get the number of ground-truth communities for each network, using the ground_truth.py script. This number is a required parameter (k) for the FluidC algorithm. You may need to update the networks variable in run.py with the correct ground truth values for each network.

    $ python ground_truth.py --communities dataset/com-amazon.all.dedup.cmty.txt.gz
    # Example output: "dataset/com-amazon.all.dedup.cmty.txt.gz": 75149
  4. Run FluidC algorithm with various configurations. Warning: this process runs sequentially and can be very time-consuming, depending on the size of the networks. The communities are saved in the results/ directory. If you want to run just a single FluidC execution, you can call directly the fluidc.py script.

    $ python run.py
  5. Finally, use the plot.py script to analyze the output from the experiments. This script generates metric CSV files and plots and saves them in the results/ and results/plots/ directories. The calculated metrics are execution time, normalized mutual information (NMI), adjusted rand index (ARI), and cluster purity for each FluidC execution compared to the ground truth.

    $ python plot.py --results-dir results --graph-name com-amazon com-dblp com-youtube

License

Copyright (c) 2025 Emanuele Petriglia. This project is licensed under the MIT License. See the LICENSE file for details.

About

A FluidC analysis on large-scale and real-data complex networks.

Topics

Resources

License

Stars

Watchers

Forks