A curated list of stopwords tailored for Nenglish β a common blend of Nepali and English used in everyday digital communication. This list is designed to assist researchers and developers in preprocessing informal chat data from platforms like WhatsApp, Facebook Messenger, Viber, and Telegram.
In Nepal and among Nepali-speaking communities, digital conversations often contain a mix of Nepali and English words, also known as Nenglish. These conversations follow an informal structure and are filled with semantically weak words (stopwords) that add noise to natural language processing (NLP) tasks like:
- Sentiment analysis
- Keyword extraction
- Chat summarization
- Intent recognition in chatbots
- Code-mixed language modeling
This project offers a comprehensive stopword list combining both Nepali and English terms commonly used in informal chats. It's ideal for preprocessing code-mixed text data for research or production NLP applications.
Original Sentence | After Stopword Removal |
---|---|
"tmi ra ma aja meet garxau" | "aja meet garxau" |
"k gardai chau bro?" | "gardai bro?" |
"Let's go khana khana" | "go khana khana" |
- β Social media comment analysis
- β Chat-based sentiment classification
- β WhatsApp or Messenger conversation mining
- β Preprocessing for Nepali-English chatbots
- β Code-mixed NLP dataset cleaning