Skip to content

Commit 1c5ef98

Browse files
FelipeLuzLuz
authored andcommitted
Update README.md
1 parent 691813c commit 1c5ef98

File tree

2 files changed

+61
-5
lines changed

2 files changed

+61
-5
lines changed

README.md

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,58 @@
1-
# dotnet-bad-word-detector
1+
# .NET Bad Word Detector
2+
3+
This is a fast and robust library that detects offensive language within text strings. It currently supports only English language, more languages will be added soon.
4+
5+
## How It Works
6+
7+
This library uses a logistic regression [ML.NET](https://dotnet.microsoft.com/en-us/apps/machinelearning-ai/ml-dotnet) model trained on thousands of human-labeled words. The trained model then was loaded as a resource for this lib and it is consulted on every new prediction.
8+
9+
## Why to use this library?
10+
11+
Up to this moment all .NET profanity detection libraries use hard-coded lists of bad words to detect profanity, for instance, [ProfanityDetector](https://github.com/stephenhaunts/ProfanityDetector) uses this [list stored in memory](https://github.com/stephenhaunts/ProfanityDetector/blob/main/ProfanityFilter/ProfanityFilter/ProfanityList.cs), there are obvious glaring issues with this approach, and while they might be performant, these list based libraries are not comprehensive, they are easily outperformed by misspelling and by the human creativity to replace letters for meaningless chars creating new words that are perceived as curse words (e.g. house and h0us3).
12+
13+
## Performance
14+
15+
In a single prediction this library was 618 times faster than the most downloaded .NET package for detecting profanity. For 100 successive predictions it was around 24 times faster.
16+
17+
| Package | 1 Prediction | 10 Predicitons | 100 predictions |
18+
|------------------------|--------------|----------------|-----------------|
19+
| .Net Bad Word Detector | 0.0462 ms | 1.5508 ms | 4.1887 ms |
20+
| ProfanityDetector | 28.5823 ms | 42.4606 ms | 102.0750 ms |
21+
22+
PC specs: Dell Inspiron 13, I7 8th gen, 16 GB.
23+
24+
## How to install
25+
26+
```bash
27+
dotnet add package DotnetBadWordDetector
28+
```
29+
30+
## How to use it
31+
32+
```csharp
33+
var detector = new ProfanityDetector();
34+
35+
if(detector.IsProfane("foo bar")){
36+
//do something
37+
}
38+
39+
```
40+
It is strongly suggested to keep the library always loaded in memory to increase its performance, it uses very little memory (less than 100 KB).
41+
## Accuracy, AUC and F1 score
42+
43+
```bash
44+
Model quality metrics evaluation
45+
--------------------------------
46+
Accuracy: 98.43%
47+
Auc: 99.49%
48+
F1Score: 97.25%
49+
```
50+
51+
## Caveat
52+
53+
This library is not perfect, it is not 100% precise, and it is context-free, e.g. it can not detect profane phrases consisted of decent words. Also people diverge on what is considered profane.
54+
55+
56+
57+
58+

src/DotnetBadWordDetector/ProfanityDetector.cs

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
using DotnetBadWordDetector.Model;
33

44
namespace DotnetBadWordDetector;
5-
65
public class ProfanityDetector
76
{
87

@@ -29,7 +28,7 @@ private Stream GetModelStream()
2928
}
3029

3130
/// <summary>
32-
/// Predicts if the phrase is profane
31+
/// Predicts if the word or small sentence is profane
3332
/// </summary>
3433
/// <param name="word"></param>
3534
/// <returns>true if classified as profane</returns>
@@ -40,11 +39,11 @@ public bool IsProfane(string word)
4039
}
4140

4241
/// <summary>
43-
///
42+
/// Gets the probability of a given word or small sentence being profane
4443
/// </summary>
4544
/// <param name="word"></param>
4645
/// <returns> 0 < prediction < 1</returns>
47-
public float GetProbability(string word)
46+
public float GetProfanityProbability(string word)
4847
{
4948
var obj = new BadWord { Word = word };
5049
return _predictionEngine.Predict(obj).Probability;

0 commit comments

Comments
 (0)