You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
📝 docs(api): enhance docstring with usage notes and parameter details
- Added guidelines for handling line breaks in input text.
- Clarified parameter descriptions for better user understanding.
- Highlighted the importance of large models for accuracy.
Copy file name to clipboardExpand all lines: src/fast_langdetect/ft_detect/infer.py
+19-4Lines changed: 19 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -192,12 +192,15 @@ def detect(
192
192
) ->Dict[str, Union[str, float]]:
193
193
"""
194
194
Detect the language of a text using FastText.
195
-
This function assumes to be given a single line of text. We split words on whitespace (space, newline, tab, vertical tab) and the control characters carriage return, formfeed and the null character.
196
-
If the model is not supervised, this function will throw a ValueError.
195
+
196
+
- You MUST manually remove line breaks(`n`) from the text to be processed in advance, otherwise a ValueError is raised.
197
+
198
+
- In scenarios **where accuracy is important**, you should not rely on the detection results of small models, use `low_memory=False` to download larger models!
199
+
197
200
:param text: The text for language detection
198
-
:param low_memory: Whether to use a memory-efficient model
201
+
:param low_memory: Whether to use the compressed version of the model (https://fasttext.cc/docs/en/language-identification.html)
199
202
:param model_download_proxy: Download proxy for the model if needed
200
-
:param use_strict_mode: If it was enabled, strictly loads large model or raises error if it fails
203
+
:param use_strict_mode: When this parameter is enabled, the fallback after loading failure will be disabled.
201
204
:return: A dictionary with detected language and confidence score
202
205
:raises LanguageDetectionError: If detection fails
203
206
"""
@@ -227,6 +230,18 @@ def detect_multilingual(
227
230
) ->List[Dict[str, Any]]:
228
231
"""
229
232
Detect the top-k probable languages for a given text.
233
+
234
+
- You MUST manually remove line breaks(`n`) from the text to be processed in advance, otherwise a ValueError is raised.
235
+
236
+
- In scenarios **where accuracy is important**, you should not rely on the detection results of small models, use `low_memory=False` to download larger models!
237
+
238
+
:param text: The text for language detection
239
+
:param low_memory: Whether to use the compressed version of the model (https://fasttext.cc/docs/en/language-identification.html)
240
+
:param model_download_proxy: Download proxy for the model if needed
241
+
:param k: Number of top languages to return
242
+
:param threshold: Minimum confidence score to consider
243
+
:param use_strict_mode: When this parameter is enabled, the fallback after loading failure will be disabled.
244
+
:return: A list of dictionaries with detected languages and confidence scores
0 commit comments