|
1 | 1 | # .NET Bad Word Detector
|
2 | 2 |
|
3 |
| -This is a fast and robust library that detects offensive language within text strings. It currently supports only English language, more languages will be added soon. |
| 3 | +This is a fast and robust library that detects offensive language within text strings. It currently supports English, Portuguese, and Spanish languages. |
4 | 4 |
|
5 | 5 | ## How It Works
|
6 | 6 |
|
7 |
| -This library uses a logistic regression [ML.NET](https://dotnet.microsoft.com/en-us/apps/machinelearning-ai/ml-dotnet) model trained on thousands of human-labeled words. The trained model then was loaded as a resource for this lib and it is consulted on every new prediction. |
| 7 | +This library uses a logistic regression [ML.NET](https://dotnet.microsoft.com/en-us/apps/machinelearning-ai/ml-dotnet) model trained on thousands of human-labeled words. The trained model is embedded as a resource in this library and is consulted on every prediction. |
8 | 8 |
|
9 |
| -## Why to use this library? |
| 9 | +## Why Use This Library? |
10 | 10 |
|
11 |
| -Up to this moment all .NET profanity detection libraries use hard-coded lists of bad words to detect profanity, for instance, [ProfanityDetector](https://github.com/stephenhaunts/ProfanityDetector) uses this [list stored in memory](https://github.com/stephenhaunts/ProfanityDetector/blob/main/ProfanityFilter/ProfanityFilter/ProfanityList.cs), there are obvious glaring issues with this approach, and while they might be performant, these list based libraries are not comprehensive, they are easily outperformed by misspelling and by the human creativity to replace letters for meaningless chars creating new words that are perceived as curse words (e.g. house and h0us3). |
| 11 | +Unlike other .NET profanity detection libraries that rely solely on static hard-coded lists of bad words — such as [ProfanityDetector](https://github.com/stephenhaunts/ProfanityDetector) — which uses [this list stored in memory](https://github.com/stephenhaunts/ProfanityDetector/blob/main/ProfanityFilter/ProfanityFilter/ProfanityList.cs), this library uses a machine learning approach that can detect creative substitutions and misspellings (e.g., "h0us3" instead of "house"). This makes it much harder to bypass. |
12 | 12 |
|
13 | 13 | ## Performance
|
14 | 14 |
|
15 |
| -In a single prediction this library was 618 times faster than the most downloaded .NET package for detecting profanity. For 100 successive predictions it was around 24 times faster. |
| 15 | +In benchmarks, this library was up to **618 times faster** than the most downloaded .NET package for detecting profanity. For 100 successive predictions, it was approximately **24 times faster**. |
16 | 16 |
|
17 |
| -| Package | 1 Prediction | 10 Predicitons | 100 predictions | |
18 |
| -|------------------------|--------------|----------------|-----------------| |
19 |
| -| .Net Bad Word Detector | 0.0462 ms | 1.5508 ms | 4.1887 ms | |
| 17 | +| Package | 1 Prediction | 10 Predictions | 100 Predictions | |
| 18 | +| ---------------------- | ------------ | -------------- | --------------- | |
| 19 | +| .NET Bad Word Detector | 0.0462 ms | 1.5508 ms | 4.1887 ms | |
20 | 20 | | ProfanityDetector | 28.5823 ms | 42.4606 ms | 102.0750 ms |
|
21 | 21 |
|
22 |
| -PC specs: Dell Inspiron 13, I7 8th gen, 16 GB. |
| 22 | +**PC specs:** Dell Inspiron 13, i7 8th gen, 16 GB RAM. |
23 | 23 |
|
24 |
| -## How to install |
| 24 | +## Installation |
25 | 25 |
|
26 | 26 | ```bash
|
27 | 27 | dotnet add package DotnetBadWordDetector
|
28 | 28 | ```
|
29 | 29 |
|
30 |
| -## How to use it |
| 30 | +## How to Use |
| 31 | + |
| 32 | +### Create the detector |
31 | 33 |
|
32 | 34 | ```csharp
|
| 35 | +using DotnetBadWordDetector; |
| 36 | + |
| 37 | +// English only (default) |
33 | 38 | var detector = new ProfanityDetector();
|
34 | 39 |
|
35 |
| -if(detector.IsProfane("foo bar")){ |
36 |
| - //do something |
| 40 | +// Or load all supported languages: English, Spanish, and Portuguese |
| 41 | +var detectorAll = new ProfanityDetector(allLocales: true); |
| 42 | +``` |
| 43 | + |
| 44 | +--- |
| 45 | + |
| 46 | +### Check if a word is offensive |
| 47 | + |
| 48 | +```csharp |
| 49 | +if (detector.IsProfane("example")) { |
| 50 | + // Word is classified as offensive |
37 | 51 | }
|
| 52 | +``` |
| 53 | + |
| 54 | +--- |
| 55 | + |
| 56 | +### Check if a phrase contains any offensive words |
38 | 57 |
|
| 58 | +```csharp |
| 59 | +if (detector.IsPhraseProfane("this is an example")) { |
| 60 | + // Phrase contains at least one offensive word |
| 61 | +} |
39 | 62 | ```
|
40 |
| -It is strongly suggested to keep the library always loaded in memory to increase its performance, it uses very little memory (less than 100 KB). |
41 |
| -## Accuracy, AUC and F1 score |
| 63 | + |
| 64 | +--- |
| 65 | + |
| 66 | +### Get the probability that a word or phrase is offensive |
| 67 | + |
| 68 | +```csharp |
| 69 | +float probWord = detector.GetProfanityProbability("example"); |
| 70 | +float probPhrase = detector.GetPhraseProfanityProbability("this is an example"); |
| 71 | +``` |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +### Mask offensive words in a phrase |
| 76 | + |
| 77 | +```csharp |
| 78 | +string cleanText = detector.MaskProfanity("this is an example", '*'); |
| 79 | +``` |
| 80 | + |
| 81 | +This will replace any detected offensive words with asterisks or your chosen character. |
| 82 | + |
| 83 | +## Model Quality |
42 | 84 |
|
43 | 85 | ```bash
|
44 | 86 | Model quality metrics evaluation
|
45 | 87 | --------------------------------
|
46 | 88 | Accuracy: 98.43%
|
47 |
| -Auc: 99.49% |
48 |
| -F1Score: 97.25% |
| 89 | +AUC: 99.49% |
| 90 | +F1 Score: 97.25% |
49 | 91 | ```
|
50 | 92 |
|
51 |
| -## Caveat |
52 |
| - |
53 |
| -This library is not perfect, it is not 100% precise, and it is context-free, e.g. it can not detect profane phrases consisted of decent words. Also people diverge on what is considered profane. |
| 93 | +## Notes |
54 | 94 |
|
| 95 | +This library is not perfect: it is not 100% accurate and it is context-free — meaning it cannot detect profane phrases made of individually inoffensive words. |
| 96 | +Definitions of "profanity" can vary by culture. This library uses human-labeled data, which might not align perfectly with all contexts. |
55 | 97 |
|
| 98 | +## Tips |
56 | 99 |
|
| 100 | +* Keep the detector instance in memory for better performance — it uses very little memory (less than 100 KB). |
| 101 | +* Be cautious when enabling all locales together, as it may produce more false positives in multilingual texts. |
57 | 102 |
|
| 103 | +## Contributing |
58 | 104 |
|
| 105 | +Contributions are welcome! Feel free to open an issue or submit a pull request with suggestions for new features, languages, or improvements. |
0 commit comments