Skip to content

Conversation

zoobereq
Copy link
Collaborator

@zoobereq zoobereq commented Jun 14, 2024

What does this PR do ?

Fixes the issue where the sentence-final period in sentences ending with domain is incorrectly normalized as part of the domain. The PR also adds support for normalizing social media tags and includes updated tests.

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

zoobereq and others added 2 commits June 14, 2024 17:46
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
@zoobereq zoobereq changed the title Fixes issue #166 HU TN Fixes issue #166 Jun 14, 2024
@zoobereq zoobereq requested a review from tbartley94 June 26, 2024 15:29
@zoobereq zoobereq marked this pull request as ready for review June 26, 2024 15:29
@zoobereq zoobereq requested a review from mgrafu July 10, 2024 15:10
zoobereq and others added 2 commits July 16, 2024 10:45
zoobereq and others added 2 commits July 16, 2024 16:27
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
@zoobereq zoobereq requested review from tbartley94 and removed request for mgrafu July 17, 2024 13:04
Copy link
Collaborator

@tbartley94 tbartley94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zoobereq zoobereq merged commit aa7cf17 into main Jul 18, 2024
5 checks passed
@ekmb ekmb deleted the HU-TN-Fixes branch July 19, 2024 00:31
zoobereq added a commit that referenced this pull request Jul 22, 2024
zoobereq added a commit that referenced this pull request Jul 23, 2024
This reverts commit aa7cf17.

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
BuyuanCui pushed a commit that referenced this pull request Aug 20, 2024
* Fixes issue #166

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Implements aliases for common string literals

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes the period variable

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui pushed a commit that referenced this pull request Sep 19, 2024
* Fixes issue #166

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Implements aliases for common string literals

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes the period variable

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui pushed a commit that referenced this pull request Sep 26, 2024
* Fixes issue #166

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Implements aliases for common string literals

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes the period variable

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui pushed a commit that referenced this pull request Sep 26, 2024
* Fixes issue #166

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Implements aliases for common string literals

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes the period variable

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui pushed a commit that referenced this pull request Oct 16, 2024
* Fixes issue #166

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Implements aliases for common string literals

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes the period variable

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
ngachchi pushed a commit to ngachchi/NeMo-text-processing that referenced this pull request Jun 23, 2025
* Fixes issue NVIDIA#166

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Implements aliases for common string literals

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes the period variable

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com>
FredHaa pushed a commit to FredHaa/NeMo-text-processing that referenced this pull request Aug 15, 2025
* Fixes issue NVIDIA#166

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Implements aliases for common string literals

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes the period variable

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants