LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models

This repository contains the source code for LLMxCPG, a framework for vulnerability detection using Code Property Graphs (CPG) and Large Language Models (LLM).

The core methodology involves a two-phase process:

Slice Construction: An LLM generates specific queries for a Code Property Graph to extract a minimal, relevant "slice" of code that may contain a vulnerability.
Vulnerability Detection: A second LLM analyzes the extracted code slice to classify it as either vulnerable or safe.

Repository Structure

.
├── baselines/      # Implementations of baseline models for comparison.
├── data/           # Information on datasets used.
├── inference/      # Scripts for running the LLMxCPG-Q and LLMxCPG-D models.
├── prompts/        # Prompt templates for query generation and classification.
├── queries/        # LLMxCPG-Q generation process and examples of generated CPGQL queries.
├── training/       # Scripts and configurations for fine-tuning the models.
└── README.md

Getting Started

Models

Our finetuned models (i.e., LLMxCPG-Q and LLMxCPG-D) are available on Hugging Face at: 🤗 LLMxCPG Collection.

Prerequisites

Docker
Python 3.8+
Joern - for CPG generation and querying (tested with v4.0.408)

Installation

Clone the repository:

git clone https://github.com/qcri/llmxcpg
cd llmxcpg

Install Python dependencies:
```
pip install -r requirements.txt
```

Training

The models can be fine-tuned using the scripts provided in the training/ directory.

Query Generation Model (LLMxCPG-Q): Fine-tuned from Qwen2.5-Coder-32B-Instruct.
Detection Model (LLMxCPG-D): Fine-tuned from QwQ-32B-Preview.

The training process uses the Unsloth framework and employs Low-Rank Adaptation (LoRA) for efficient fine-tuning. Refer to the scripts and configurations in the training/ directory for details.

Citation

If you use this codebase in your research, please cite the associated paper:

To appear in USENIX Security 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models

Repository Structure

Getting Started

Models

Prerequisites

Installation

Training

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
baselines		baselines
data		data
inference		inference
prompts		prompts
queries		queries
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.env		example.env
pipeline.sh		pipeline.sh
requirements.txt		requirements.txt

License

qcri/llmxcpg

Folders and files

Latest commit

History

Repository files navigation

LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models

Repository Structure

Getting Started

Models

Prerequisites

Installation

Training

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages