-
-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Labels
Description
Describe the bug
Segmenter removes space of English words in code-mixed sentence, for example this sentence:
這是Career Centre
To reproduce
Here is the code:
import pycantonese
from pycantonese.word_segmentation import Segmenter
segmenter = Segmenter()
pyseg = pycantonese.segment("這是Career Centre", cls=segmenter)
for word in pyseg:
print(word)
The output is:
這是
CareerCentre
Expected behavior
The expected output is:
這是
Career Centre
or
這是
Career
Centre
System (please complete the following information):
- Operating System: macOS Sonoma 14.0 (23A344)
- PyCantonese version:
3.4.0