|
1 |
| -# dotnet-bad-word-detector |
| 1 | +# .NET Bad Word Detector |
| 2 | + |
| 3 | +This is a fast and robust library that detects offensive language within text strings. It currently supports only English language, more languages will be added soon. |
| 4 | + |
| 5 | +## How It Works |
| 6 | + |
| 7 | +This library uses a logistic regression [ML.NET](https://dotnet.microsoft.com/en-us/apps/machinelearning-ai/ml-dotnet) model trained on thousands of human-labeled words. The trained model then was loaded as a resource for this lib and it is consulted on every new prediction. |
| 8 | + |
| 9 | +## Why to use this library? |
| 10 | + |
| 11 | +Up to this moment all .NET profanity detection libraries use hard-coded lists of bad words to detect profanity, for instance, [ProfanityDetector](https://github.com/stephenhaunts/ProfanityDetector) uses this [list stored in memory](https://github.com/stephenhaunts/ProfanityDetector/blob/main/ProfanityFilter/ProfanityFilter/ProfanityList.cs), there are obvious glaring issues with this approach, and while they might be performant, these list based libraries are not comprehensive, they are easily outperformed by misspelling and by the human creativity to replace letters for meaningless chars creating new words that are perceived as curse words (e.g. house and h0us3). |
| 12 | + |
| 13 | +## Performance |
| 14 | + |
| 15 | +In a single prediction this library was 618 times faster than the most downloaded .NET package for detecting profanity. For 100 successive predictions it was around 24 times faster. |
| 16 | + |
| 17 | +| Package | 1 Prediction | 10 Predicitons | 100 predictions | |
| 18 | +|------------------------|--------------|----------------|-----------------| |
| 19 | +| .Net Bad Word Detector | 0.0462 ms | 1.5508 ms | 4.1887 ms | |
| 20 | +| ProfanityDetector | 28.5823 ms | 42.4606 ms | 102.0750 ms | |
| 21 | + |
| 22 | +PC specs: Dell Inspiron 13, I7 8th gen, 16 GB. |
| 23 | + |
| 24 | +## How to install |
| 25 | + |
| 26 | +```bash |
| 27 | +dotnet add package DotnetBadWordDetector |
| 28 | +``` |
| 29 | + |
| 30 | +## How to use it |
| 31 | + |
| 32 | +```csharp |
| 33 | +var detector = new ProfanityDetector(); |
| 34 | + |
| 35 | +if(detector.IsProfane("foo bar")){ |
| 36 | + //do something |
| 37 | +} |
| 38 | + |
| 39 | +``` |
| 40 | +It is strongly suggested to keep the library always loaded in memory to increase its performance, it uses very little memory (less than 100 KB). |
| 41 | +## Accuracy, AUC and F1 score |
| 42 | + |
| 43 | +```bash |
| 44 | +Model quality metrics evaluation |
| 45 | +-------------------------------- |
| 46 | +Accuracy: 98.43% |
| 47 | +Auc: 99.49% |
| 48 | +F1Score: 97.25% |
| 49 | +``` |
| 50 | + |
| 51 | +## Caveat |
| 52 | + |
| 53 | +This library is not perfect, it is not 100% precise, and it is context-free, e.g. it can not detect profane phrases consisted of decent words. Also people diverge on what is considered profane. |
| 54 | + |
| 55 | + |
| 56 | + |
| 57 | + |
| 58 | + |
0 commit comments