Skip to content

Commit 9cd046f

Browse files
authored
Update README.md
1 parent 7704b1c commit 9cd046f

File tree

1 file changed

+67
-20
lines changed

1 file changed

+67
-20
lines changed

README.md

Lines changed: 67 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,105 @@
11
# .NET Bad Word Detector
22

3-
This is a fast and robust library that detects offensive language within text strings. It currently supports only English language, more languages will be added soon.
3+
This is a fast and robust library that detects offensive language within text strings. It currently supports English, Portuguese, and Spanish languages.
44

55
## How It Works
66

7-
This library uses a logistic regression [ML.NET](https://dotnet.microsoft.com/en-us/apps/machinelearning-ai/ml-dotnet) model trained on thousands of human-labeled words. The trained model then was loaded as a resource for this lib and it is consulted on every new prediction.
7+
This library uses a logistic regression [ML.NET](https://dotnet.microsoft.com/en-us/apps/machinelearning-ai/ml-dotnet) model trained on thousands of human-labeled words. The trained model is embedded as a resource in this library and is consulted on every prediction.
88

9-
## Why to use this library?
9+
## Why Use This Library?
1010

11-
Up to this moment all .NET profanity detection libraries use hard-coded lists of bad words to detect profanity, for instance, [ProfanityDetector](https://github.com/stephenhaunts/ProfanityDetector) uses this [list stored in memory](https://github.com/stephenhaunts/ProfanityDetector/blob/main/ProfanityFilter/ProfanityFilter/ProfanityList.cs), there are obvious glaring issues with this approach, and while they might be performant, these list based libraries are not comprehensive, they are easily outperformed by misspelling and by the human creativity to replace letters for meaningless chars creating new words that are perceived as curse words (e.g. house and h0us3).
11+
Unlike other .NET profanity detection libraries that rely solely on static hard-coded lists of bad words — such as [ProfanityDetector](https://github.com/stephenhaunts/ProfanityDetector) — which uses [this list stored in memory](https://github.com/stephenhaunts/ProfanityDetector/blob/main/ProfanityFilter/ProfanityFilter/ProfanityList.cs), this library uses a machine learning approach that can detect creative substitutions and misspellings (e.g., "h0us3" instead of "house"). This makes it much harder to bypass.
1212

1313
## Performance
1414

15-
In a single prediction this library was 618 times faster than the most downloaded .NET package for detecting profanity. For 100 successive predictions it was around 24 times faster.
15+
In benchmarks, this library was up to **618 times faster** than the most downloaded .NET package for detecting profanity. For 100 successive predictions, it was approximately **24 times faster**.
1616

17-
| Package | 1 Prediction | 10 Predicitons | 100 predictions |
18-
|------------------------|--------------|----------------|-----------------|
19-
| .Net Bad Word Detector | 0.0462 ms | 1.5508 ms | 4.1887 ms |
17+
| Package | 1 Prediction | 10 Predictions | 100 Predictions |
18+
| ---------------------- | ------------ | -------------- | --------------- |
19+
| .NET Bad Word Detector | 0.0462 ms | 1.5508 ms | 4.1887 ms |
2020
| ProfanityDetector | 28.5823 ms | 42.4606 ms | 102.0750 ms |
2121

22-
PC specs: Dell Inspiron 13, I7 8th gen, 16 GB.
22+
**PC specs:** Dell Inspiron 13, i7 8th gen, 16 GB RAM.
2323

24-
## How to install
24+
## Installation
2525

2626
```bash
2727
dotnet add package DotnetBadWordDetector
2828
```
2929

30-
## How to use it
30+
## How to Use
31+
32+
### Create the detector
3133

3234
```csharp
35+
using DotnetBadWordDetector;
36+
37+
// English only (default)
3338
var detector = new ProfanityDetector();
3439

35-
if(detector.IsProfane("foo bar")){
36-
//do something
40+
// Or load all supported languages: English, Spanish, and Portuguese
41+
var detectorAll = new ProfanityDetector(allLocales: true);
42+
```
43+
44+
---
45+
46+
### Check if a word is offensive
47+
48+
```csharp
49+
if (detector.IsProfane("example")) {
50+
// Word is classified as offensive
3751
}
52+
```
53+
54+
---
55+
56+
### Check if a phrase contains any offensive words
3857

58+
```csharp
59+
if (detector.IsPhraseProfane("this is an example")) {
60+
// Phrase contains at least one offensive word
61+
}
3962
```
40-
It is strongly suggested to keep the library always loaded in memory to increase its performance, it uses very little memory (less than 100 KB).
41-
## Accuracy, AUC and F1 score
63+
64+
---
65+
66+
### Get the probability that a word or phrase is offensive
67+
68+
```csharp
69+
float probWord = detector.GetProfanityProbability("example");
70+
float probPhrase = detector.GetPhraseProfanityProbability("this is an example");
71+
```
72+
73+
---
74+
75+
### Mask offensive words in a phrase
76+
77+
```csharp
78+
string cleanText = detector.MaskProfanity("this is an example", '*');
79+
```
80+
81+
This will replace any detected offensive words with asterisks or your chosen character.
82+
83+
## Model Quality
4284

4385
```bash
4486
Model quality metrics evaluation
4587
--------------------------------
4688
Accuracy: 98.43%
47-
Auc: 99.49%
48-
F1Score: 97.25%
89+
AUC: 99.49%
90+
F1 Score: 97.25%
4991
```
5092

51-
## Caveat
52-
53-
This library is not perfect, it is not 100% precise, and it is context-free, e.g. it can not detect profane phrases consisted of decent words. Also people diverge on what is considered profane.
93+
## Notes
5494

95+
This library is not perfect: it is not 100% accurate and it is context-free — meaning it cannot detect profane phrases made of individually inoffensive words.
96+
Definitions of "profanity" can vary by culture. This library uses human-labeled data, which might not align perfectly with all contexts.
5597

98+
## Tips
5699

100+
* Keep the detector instance in memory for better performance — it uses very little memory (less than 100 KB).
101+
* Be cautious when enabling all locales together, as it may produce more false positives in multilingual texts.
57102

103+
## Contributing
58104

105+
Contributions are welcome! Feel free to open an issue or submit a pull request with suggestions for new features, languages, or improvements.

0 commit comments

Comments
 (0)