Skip to content

Commit 8577da5

Browse files
committed
📝 docs(api): enhance docstring with usage notes and parameter details
- Added guidelines for handling line breaks in input text. - Clarified parameter descriptions for better user understanding. - Highlighted the importance of large models for accuracy.
1 parent 1859e86 commit 8577da5

File tree

1 file changed

+19
-4
lines changed

1 file changed

+19
-4
lines changed

src/fast_langdetect/ft_detect/infer.py

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -192,12 +192,15 @@ def detect(
192192
) -> Dict[str, Union[str, float]]:
193193
"""
194194
Detect the language of a text using FastText.
195-
This function assumes to be given a single line of text. We split words on whitespace (space, newline, tab, vertical tab) and the control characters carriage return, formfeed and the null character.
196-
If the model is not supervised, this function will throw a ValueError.
195+
196+
- You MUST manually remove line breaks(`n`) from the text to be processed in advance, otherwise a ValueError is raised.
197+
198+
- In scenarios **where accuracy is important**, you should not rely on the detection results of small models, use `low_memory=False` to download larger models!
199+
197200
:param text: The text for language detection
198-
:param low_memory: Whether to use a memory-efficient model
201+
:param low_memory: Whether to use the compressed version of the model (https://fasttext.cc/docs/en/language-identification.html)
199202
:param model_download_proxy: Download proxy for the model if needed
200-
:param use_strict_mode: If it was enabled, strictly loads large model or raises error if it fails
203+
:param use_strict_mode: When this parameter is enabled, the fallback after loading failure will be disabled.
201204
:return: A dictionary with detected language and confidence score
202205
:raises LanguageDetectionError: If detection fails
203206
"""
@@ -227,6 +230,18 @@ def detect_multilingual(
227230
) -> List[Dict[str, Any]]:
228231
"""
229232
Detect the top-k probable languages for a given text.
233+
234+
- You MUST manually remove line breaks(`n`) from the text to be processed in advance, otherwise a ValueError is raised.
235+
236+
- In scenarios **where accuracy is important**, you should not rely on the detection results of small models, use `low_memory=False` to download larger models!
237+
238+
:param text: The text for language detection
239+
:param low_memory: Whether to use the compressed version of the model (https://fasttext.cc/docs/en/language-identification.html)
240+
:param model_download_proxy: Download proxy for the model if needed
241+
:param k: Number of top languages to return
242+
:param threshold: Minimum confidence score to consider
243+
:param use_strict_mode: When this parameter is enabled, the fallback after loading failure will be disabled.
244+
:return: A list of dictionaries with detected languages and confidence scores
230245
"""
231246
model = load_model(
232247
low_memory=low_memory,

0 commit comments

Comments
 (0)