PILLAR Benchmark Evaluation

This repository contains the code to evaluate the LINDDUN GO implementation of PILLAR.

Running the Benchmark

The benchmark has already been run with a variety of models. The results are available in the benchmarks directory. If you want to run the benchmark yourself, you can do so by executing the following commands:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

python main.py
python main.py --multiagent 3 # number of rounds of multiagent discussion

After the evaluation, run

deactivate

to close the virtual environment.

Visualization of Results

To visualize the results, you can use the following command:

python results_viewer.py

Or the following command to visualize box plots:

python boxplot_viewer.py

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
llms		llms
misc		misc
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
boxplot_viewer.py		boxplot_viewer.py
main.py		main.py
reason_evaluation.py		reason_evaluation.py
requirements.txt		requirements.txt
results.py		results.py
results_viewer.py		results_viewer.py
table.py		table.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PILLAR Benchmark Evaluation

Running the Benchmark

Visualization of Results

License

About

Uh oh!

Releases

Packages

Languages

License

stfbk/PILLAR-Benchmarking

Folders and files

Latest commit

History

Repository files navigation

PILLAR Benchmark Evaluation

Running the Benchmark

Visualization of Results

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages