Skip to content

Commit 00f20fb

Browse files
committed
Update to version 1.2.0 with comprehensive documentation updates
- Update all Maven modules to version 1.2.0 - Update README.md with new sourceDirectories XML configuration format - Add comprehensive CHANGELOG.md entry for version 1.2.0 covering all major changes - Remove redundant ScannerTest.kt file - Update model performance data with latest evaluation results - Document multi-directory scanning and enhanced false positive tracking
1 parent 90c2697 commit 00f20fb

File tree

7 files changed

+78
-119
lines changed

7 files changed

+78
-119
lines changed

CHANGELOG.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,34 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.2.0] - 2025-06-22
9+
10+
### Added
11+
- **File Chunking System** - Advanced chunking for large files with 40-line chunks and 5-line overlap preservation
12+
- **False Positive Rate Tracking** - New metric showing percentage of clean code incorrectly flagged as vulnerable
13+
- **Multi-Directory Support** - Changed `sourceDirectory` to `sourceDirectories` list for scanning multiple directories
14+
- **Issue Deduplication** - Sophisticated deduplication system for chunk-based analysis preventing duplicate issues
15+
- **Enhanced Evaluation Metrics** - Added `DetectionMetrics` and `DetectionResults` models for comprehensive performance tracking
16+
- **Negative Test Cases** - Comprehensive 238-line Java file with false positive test scenarios
17+
- **Package Reorganization** - Moved classes to domain-specific packages: `files/`, `llm/`, `service/`, `util/`
18+
- **New Service Components** - Added `PromptGenerator`, `IssueDeduplicator`, `ScannerDefaults` for better code organization
19+
20+
### Changed
21+
- **Detection Rate Calculation** - Fixed calculation to prevent values >100% by tracking unique matched expected issues
22+
- **False Positive Rate Logic** - Returns 100% when files cannot be analyzed (timeout/error scenarios)
23+
- **Code Structure** - Refactored `CodeAnalyzer` following Single Responsibility Principle
24+
- **Class Naming** - Renamed `FileScanner` to `FileFinder`, `AnalysisResultMapper` to `IssueParser`
25+
- **Configuration Management** - Extracted constants to `ScannerDefaults` object for better maintainability
26+
- **Evaluation Directory Structure** - Reorganized to `test-cases/positive/` and `test-cases/negative/`
27+
- **Model Performance** - Updated with latest evaluation results showing significant improvements:
28+
- `ai/phi4:latest` now achieves 93.8% detection rate (up from 76.7%)
29+
- Added zero false positive models: `ai/deepcoder-preview:latest`, `ai/mistral-nemo:latest`
30+
31+
### Fixed
32+
- **Duplicate Scanner Icons** - Removed duplicate 🔍 emoji from evaluation output
33+
- **Expected Files Exclusion** - Added `**/expected/**` pattern to exclude JSON expected results from scanning
34+
- **Detection Rate Accuracy** - Fixed double-counting of issues in chunk-based analysis
35+
836
## [1.1.0] - 2025-06-20
937

1038
### Added
@@ -121,5 +149,6 @@ This is the first stable release of LLM Secret Scanner - an AI-powered security
121149

122150
For detailed usage instructions, see the [README.md](README.md) file.
123151

152+
[1.2.0]: https://github.com/CyclingBits/llm-secret-scanner/compare/v1.1.0...v1.2.0
124153
[1.1.0]: https://github.com/CyclingBits/llm-secret-scanner/compare/v1.0.0...v1.1.0
125154
[1.0.0]: https://github.com/CyclingBits/llm-secret-scanner/releases/tag/v1.0.0

README.md

Lines changed: 45 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
> AI-powered security scanner that detects secrets, API keys, and sensitive data in source code using local Large Language Models.
44
55
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6-
[![Changelog](https://img.shields.io/badge/changelog-v1.1.0-blue.svg)](CHANGELOG.md)
6+
[![Changelog](https://img.shields.io/badge/changelog-v1.2.0-blue.svg)](CHANGELOG.md)
77
[![Build Status](https://img.shields.io/badge/build-passing-brightgreen.svg)](https://github.com/cyclingbits/llm-secret-scanner)
88
[![Test Coverage](https://img.shields.io/badge/coverage-80%25-green.svg)](https://github.com/cyclingbits/llm-secret-scanner)
99
[![Java](https://img.shields.io/badge/Java-17+-blue.svg)](https://openjdk.java.net/)
@@ -16,10 +16,12 @@
1616
- 🔒 **Privacy-First** - All analysis happens locally, no data leaves your machine
1717
- 🎯 **Smart Detection** - Identifies API keys, passwords, certificates, database credentials, and more
1818
- 🧠 **Adaptability** - Can detect unusual secret patterns that would escape traditional regex-based scanners
19+
- 📄 **File Chunking** - Advanced chunking system for analyzing large files with overlapping context preservation
1920
- 🚀 **Maven Integration** - Seamlessly integrates with your build pipeline
2021
- ⚙️ **Highly Configurable** - Flexible file patterns, model selection, and timeout settings
2122
- 🐳 **Containerized** - Automatic Docker container lifecycle management
2223
- 🎨 **Beautiful Output** - Colorful, structured logging with emojis and clear issue reporting
24+
- 🔍 **False Positive Reduction** - Enhanced accuracy with sophisticated issue deduplication
2325

2426
## 📋 Requirements
2527

@@ -62,9 +64,9 @@ You'll also need to authenticate with GitHub Packages. Add to your `~/.m2/settin
6264
#### Option B: Download from GitHub Releases (Recommended)
6365
```bash
6466
# Download the latest release JARs
65-
wget https://github.com/cyclingbits/llm-secret-scanner/releases/latest/download/llm-secret-scanner-maven-plugin-1.1.0.jar
67+
wget https://github.com/cyclingbits/llm-secret-scanner/releases/latest/download/llm-secret-scanner-maven-plugin-1.2.0.jar
6668
# Install to local Maven repository
67-
mvn install:install-file -Dfile=llm-secret-scanner-maven-plugin-1.1.0.jar -DgroupId=net.cyclingbits -DartifactId=llm-secret-scanner-maven-plugin -Dversion=1.1.0 -Dpackaging=jar
69+
mvn install:install-file -Dfile=llm-secret-scanner-maven-plugin-1.2.0.jar -DgroupId=net.cyclingbits -DartifactId=llm-secret-scanner-maven-plugin -Dversion=1.2.0 -Dpackaging=jar
6870
```
6971

7072
#### Option C: Build from Source
@@ -82,7 +84,7 @@ Add the plugin to your `pom.xml` (minimal configuration):
8284
<plugin>
8385
<groupId>net.cyclingbits</groupId>
8486
<artifactId>llm-secret-scanner-maven-plugin</artifactId>
85-
<version>1.1.0</version>
87+
<version>1.2.0</version>
8688
</plugin>
8789
```
8890

@@ -99,33 +101,29 @@ The scanner supports various LLM models via [Docker Model Runner](https://hub.do
99101
### 🎯 **Recommended Models**
100102

101103
**`ai/phi4:latest` ⭐ (Default Choice)**
102-
With a 74.3% detection rate and 100% scan success, this **15B parameter** model offers the best balance of accuracy and performance. At **8.43 GB**, it provides excellent results in just 3m 50s analysis time, making it ideal for most use cases.
104+
With an outstanding **93.8%** detection rate, only **2.0%** false positives, and 100% scan success, this **15B parameter** model offers the best overall performance. At **8.43 GB**, it provides excellent results in just 1m 47s analysis time, making it ideal for most use cases.
103105

104-
**`ai/llama3.2:latest` 🚀 (Fast & Lightweight)**
105-
Perfect for quick scans and resource-constrained environments. This **3B parameter** model delivers 70.4% detection rate with 100% reliability in only 1m 20s. At just **1.87 GB**, it's the smallest model that maintains high accuracy and speed.
106106

107107
### 📊 **All Available Models**
108108

109-
| Model | Detection Rate | Scan Success | Analysis Time | Parameters | Context Window | Size | Best For |
110-
|-------|----------------|--------------|---------------|------------|----------------|------|-----------------------------------------------|
111-
| `ai/llama3.3:latest` | **82.7%** | 100% | 17m 26s | 70B | 131K tokens | 39.59 GB | Highest accuracy |
112-
| `ai/phi4:latest`| **74.3%** | 100% | 3m 50s | 15B | 16K tokens | 8.43 GB | **Default choice** |
113-
| `ai/llama3.2:latest` | **70.4%** | 100% | 1m 20s | 3B | 131K tokens | 1.87 GB | Fast & lightweight |
114-
| `ai/deepcoder-preview:latest` | **69.3%** | 100% | 11m 15s | 14B | 131K tokens | 8.37 GB | - |
115-
| `ai/mistral-nemo:latest` | **65.7%** | 100% | 3m 5s | 12B | 131K tokens | 6.96 GB | - |
116-
| `ai/llama3.1:latest` | **64.8%** | 100% | 2m 1s | 8B | 131K tokens | 4.58 GB | - |
117-
| `ai/qwq:latest` | **64.3%** | 52.9% | 88m 22s | 32B | 41K tokens | 18.48 GB | JSON generation errors and model API timeouts |
118-
| `ai/qwen3:latest` | **64.2%** | 76.5% | 21m 51s | 8B | 41K tokens | 4.68 GB | JSON generation errors |
119-
| `ai/qwen2.5:latest` | **60.5%** | 94.1% | 1m 49s | 7B | 33K tokens | 4.36 GB | JSON generation errors |
120-
| `ai/gemma3:latest` | **56.4%** | 100% | 1m 43s | 4B | 131K tokens | 2.31 GB | - |
121-
| `ai/gemma3-qat:latest` | **55.3%** | 100% | 1m 41s | 3.88B | 131K tokens | 2.93 GB | - |
122-
| `ai/deepseek-r1-distill-llama:latest` | **54.1%** | 100% | 4m 51s | 8B | 131K tokens | 4.58 GB | - |
123-
| `ai/mistral:latest` | **52.5%** | 100% | 2m 10s | 7B | 33K tokens | 4.07 GB | - |
124-
| `ai/smollm2:latest` | **0.0%** | 0% | 9m 44s | 360M | 8K tokens | 256.35 MB | JSON generation errors |
125-
126-
> 📊 **Performance data** based on analysis of test fixtures with known vulnerabilities. All models available from [Docker Hub AI](https://hub.docker.com/u/ai).
109+
| Model | Detection Rate | False Positive Rate | Scan Success | Analysis Time | Parameters | Context Window | Size | Best For |
110+
|-------|----------------|---------------------|--------------|---------------|------------|----------------|------|-----------------------------------------------|
111+
| `ai/phi4:latest`| **93.8%** | 2.0% | 100% | 1m 47s | 15B | 16K tokens | 8.43 GB | **Default choice - highest accuracy** |
112+
| `ai/llama3.3:latest` | **90.6%** | 2.8% | 100% | 7m 59s | 70B | 131K tokens | 39.59 GB | Maximum accuracy for critical environments |
113+
| `ai/deepcoder-preview:latest` | **84.4%** | 0.0% | 100% | 7m 19s | 14B | 131K tokens | 8.37 GB | Code-specialized with zero false positives |
114+
| `ai/llama3.1:latest` | **75.0%** | 7.4% | 100% | 7m 19s | 8B | 131K tokens | 4.58 GB | Balanced performance |
115+
| `ai/mistral:latest` | **71.9%** | 2.8% | 100% | 1m 27s | 7B | 33K tokens | 4.07 GB | Fast scanning |
116+
| `ai/mistral-nemo:latest` | **68.8%** | 0.0% | 100% | 1m 11s | 12B | 131K tokens | 6.96 GB | Zero false positives |
117+
| `ai/qwen3:latest` | **68.8%** | 0.0% | 100% | 9m 34s | 8B | 41K tokens | 4.68 GB | Zero false positives |
118+
| `ai/gemma3:latest` | **65.6%** | 14.5% | 100% | 1m 40s | 4B | 131K tokens | 2.31 GB | Fast & lightweight |
119+
| `ai/gemma3-qat:latest` | **59.4%** | 9.9% | 100% | 1m 27s | 3.88B | 131K tokens | 2.93 GB | Fast & lightweight |
120+
| `ai/llama3.2:latest` | **56.3%** | 100.0% | 50% | 4m 14s | 3B | 131K tokens | 1.87 GB | ⚠️ High false positive rate |
121+
| `ai/deepseek-r1-distill-llama:latest` | **56.3%** | 100.0% | 50% | 3m 28s | 8B | 131K tokens | 4.58 GB | ⚠️ High false positive rate |
122+
123+
> 📊 **Performance data** based on analysis of test fixtures with known vulnerabilities and clean code samples. All models available from [Docker Hub AI](https://hub.docker.com/u/ai).
127124
>
128125
> **Detection Rate** indicates the percentage of known security issues correctly identified by the model (with line number accuracy verification ±1).
126+
> **False Positive Rate** indicates the percentage of clean code incorrectly flagged as containing secrets.
129127
> **Scan Success** indicates the percentage of files that were successfully analyzed without errors (e.g., timeouts, JSON parsing failures).
130128
131129
## 📖 Usage Examples
@@ -145,9 +143,15 @@ mvn llm-secret-scanner:scan -Dscan.modelName=ai/llama3.2:latest
145143
mvn llm-secret-scanner:scan -Dscan.failOnError=true
146144
```
147145

148-
### Custom Source Directory
149-
```bash
150-
mvn llm-secret-scanner:scan -Dscan.sourceDirectory=./custom-src
146+
### Multiple Source Directories
147+
```xml
148+
<configuration>
149+
<sourceDirectories>
150+
<sourceDirectory>${project.basedir}/src/main</sourceDirectory>
151+
<sourceDirectory>${project.basedir}/src/test</sourceDirectory>
152+
<sourceDirectory>${project.basedir}/config</sourceDirectory>
153+
</sourceDirectories>
154+
</configuration>
151155
```
152156

153157
## ⚙️ Advanced Configuration
@@ -158,9 +162,11 @@ For more control, you can customize the plugin configuration:
158162
<plugin>
159163
<groupId>net.cyclingbits</groupId>
160164
<artifactId>llm-secret-scanner-maven-plugin</artifactId>
161-
<version>1.1.0</version>
165+
<version>1.2.0</version>
162166
<configuration>
163-
<sourceDirectory>${project.basedir}/src</sourceDirectory>
167+
<sourceDirectories>
168+
<sourceDirectory>${project.basedir}/src</sourceDirectory>
169+
</sourceDirectories>
164170
<includes>**/*.java,**/*.kt,**/*.properties,**/*.yml,**/*.env</includes>
165171
<excludes>**/target/**,**/test/**</excludes>
166172
<modelName>ai/phi4:latest</modelName>
@@ -189,7 +195,7 @@ For more control, you can customize the plugin configuration:
189195

190196
| Parameter | Default | Description |
191197
|-----------|---------|-------------|
192-
| `sourceDirectory` | `${project.basedir}` | Directory to scan |
198+
| `sourceDirectories` | `${project.basedir}` | List of directories to scan |
193199
| `includes` | `**/*.java,**/*.kt,...` | File patterns to include |
194200
| `excludes` | `**/target/**` | File patterns to exclude |
195201
| `modelName` | `ai/phi4:latest` | LLM model to use |
@@ -264,15 +270,21 @@ mvn test
264270
Quick evaluation (Java files only, single model):
265271
```bash
266272
cd evaluator
267-
mvn exec:java -Dexec.mainClass="net.cyclingbits.llmsecretscanner.evaluator.QuickEvaluator"
273+
mvn exec:java -Dexec.mainClass="net.cyclingbits.llmsecretscanner.evaluator.QuickEvaluation"
268274
```
269275

270276
Full evaluation (all file types, all models):
271277
```bash
272278
cd evaluator
273-
mvn exec:java -Dexec.mainClass="net.cyclingbits.llmsecretscanner.evaluator.FullEvaluator"
279+
mvn exec:java -Dexec.mainClass="net.cyclingbits.llmsecretscanner.evaluator.FullEvaluation"
274280
```
275281

282+
The evaluator now includes comprehensive metrics:
283+
- **Detection Rate**: Percentage of known vulnerabilities correctly identified
284+
- **False Positive Rate**: Percentage of clean code incorrectly flagged as vulnerable
285+
- **Scan Success Rate**: Percentage of files successfully analyzed without errors
286+
- **Performance Timing**: Analysis time for each model
287+
276288
## 🚨 What It Detects
277289

278290
The LLM scanner is trained to identify various types of sensitive information:

core/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<parent>
66
<groupId>net.cyclingbits</groupId>
77
<artifactId>llm-secret-scanner-parent</artifactId>
8-
<version>1.1.0</version>
8+
<version>1.2.0</version>
99
</parent>
1010

1111
<artifactId>llm-secret-scanner-core</artifactId>

core/src/test/kotlin/net/cyclingbits/llmsecretscanner/core/ScannerTest.kt

Lines changed: 0 additions & 82 deletions
This file was deleted.

evaluator/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<parent>
66
<groupId>net.cyclingbits</groupId>
77
<artifactId>llm-secret-scanner-parent</artifactId>
8-
<version>1.1.0</version>
8+
<version>1.2.0</version>
99
</parent>
1010

1111
<artifactId>llm-secret-scanner-evaluator</artifactId>

maven-plugin/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<parent>
66
<groupId>net.cyclingbits</groupId>
77
<artifactId>llm-secret-scanner-parent</artifactId>
8-
<version>1.1.0</version>
8+
<version>1.2.0</version>
99
</parent>
1010

1111
<artifactId>llm-secret-scanner-maven-plugin</artifactId>

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
<groupId>net.cyclingbits</groupId>
66
<artifactId>llm-secret-scanner-parent</artifactId>
7-
<version>1.1.0</version>
7+
<version>1.2.0</version>
88
<packaging>pom</packaging>
99

1010
<name>LLM Secret Scanner Parent</name>

0 commit comments

Comments
 (0)