-
Notifications
You must be signed in to change notification settings - Fork 8
PredPatt Integration and Python 3.12+ Modernization #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…uces type-hints to pass all mypy checks.
…`mypy.ini` for improved readability and added tests for argument filtering, predicate filtering, and integrated filtering to ensure consistent behavior with the original PredPatt implementation.
…ing configuration settings. Introduces new test files for differential testing of argument and predicate classes, ensuring compatibility with the original PredPatt implementation. Updates `pyproject.toml` for linting configurations and removes deprecated dependencies from `requirements.txt`.
…nization and consistency. Updates argument and predicate filtering functions to follow naming conventions. Enhances test files by ensuring compatibility with the original PredPatt implementation and improving readability. Additionally, minor formatting adjustments and code cleanups are applied throughout the codebase.
…prove code clarity and robustness. Updates the `pyproject.toml` to include new dependencies and removes deprecated ones. Enhances test coverage for argument and predicate classes, ensuring proper handling of edge cases and improving overall test reliability.
…s across various classes. Introduces a new typing module for shared type definitions, improves docstrings for clarity, and refines method signatures to ensure type safety. Updates the UDS corpus and document classes to better manage sentence and document-level graphs, including improved metadata handling and annotation methods. Additionally, refactors existing code for consistency and readability.
…and semantics modules to enhance type safety and clarity. Updates the UDS annotation system with new type definitions for better consistency. Improves error handling in various methods and enhances test coverage for the corpus and graph converters, ensuring robust functionality and compatibility with existing implementations.
…aset loading process. Enhances the `__init__.py` file with detailed module description and usage examples, improving overall code organization and readability.
…on for graph corpus management. Introduces detailed class and type alias documentation, improving clarity and usability for developers implementing corpus readers in the decomp framework.
…tions for the UDS corpus, annotation, and metadata classes. Refines type hints for improved clarity and consistency, ensuring better type safety throughout the UDS annotation system. Updates method signatures and docstrings to reflect changes, enhancing usability for developers working with UDS datasets.
… parameter to support both PredPattCorpus and a dictionary of UDSSentenceGraph. Updates the _validate_arguments method to reflect this change. Additionally, improves the get_ontologies function to prioritize loading metadata from annotation files, with fallback to the UDS corpus, enhancing the ontology collection process.
…hances class descriptions and method signatures for clarity, ensuring better type safety and usability. Updates type aliases to use `type` instead of `TypeAlias` for consistency, and improves error messages for better debugging. Additionally, restructures nested dictionary types for improved readability.
… graph modules. Updates type aliases to use `type` instead of `TypeAlias` for consistency, enhances method signatures for clarity, and improves error messages. Additionally, restructures docstrings for better readability and usability, ensuring a more robust and user-friendly API for developers working with UDS datasets.
…on files, enhancing type checking flexibility. Removes outdated test file for differential imports, streamlining the codebase. Updates type casting in PredPattCorpus for improved type safety and clarity, ensuring consistent handling of corpus data.
…ngs for classes and functions, improving clarity and usability. Refines type hints for better type safety and consistency, and restructures method signatures for improved readability. Updates the `get_ontologies` function to enhance metadata loading from annotation files, ensuring a more robust ontology collection process.
…rpus.py`, and `graph.py` files. Enhances documentation with detailed descriptions of classes and methods, improving clarity and usability. Introduces the `PredPattCorpus` and `PredPattGraphBuilder` classes for better management of semantic extractions and graph construction. Updates type hints for improved type safety and consistency across the module.
…dule. Updates the module docstring to provide clearer descriptions of key components, including the `HasPosition` protocol and `UDSchema` type alias. Refines type alias declaration for `UDSchema` to improve consistency and clarity across the PredPatt framework.
…ates the module and class docstrings to enhance clarity and detail regarding token representation and its attributes. Improves comments for better readability and understanding of the code structure.
…ew PredicateType enumeration for better type safety and clarity. Updates the documentation to reflect changes in predicate type handling, enhancing usability and consistency across the module. Modifies various components to utilize the new enumeration, ensuring a more robust implementation of predicate types.
… and usability. Updates class and function docstrings in various files, including `__init__.py`, `corpus.py`, `graph.py`, and `typing.py`, to provide detailed descriptions of components and their functionalities. Introduces structured sections for classes, functions, and constants, improving the overall organization of the documentation. Additionally, refines type hints and comments for better readability and consistency throughout the module.
…rules modules for improved clarity and consistency. Updates comment styles to lowercase and enhances readability in various files, including `__init__.py`, `argument_filters.py`, `predicate_filters.py`, and `base.py`. This change aims to standardize documentation practices and improve the overall usability of the codebase.
… for clarity and consistency. Updates comments in `corpus.py`, `__init__.py`, `nx.py`, and `graph.py` to standardize formatting and improve readability. This change aims to provide clearer descriptions of methods and properties, enhancing the overall usability of the codebase.
…ty and consistency. Updates `from_conll_and_annotations`, `from_json`, `add_annotation`, and various other methods in `corpus.py`, `document.py`, and `graph.py` to use a more structured format. This change enhances code clarity and maintains uniformity in method definitions throughout the codebase.
…ngine.py`, and `linearization.py`, by refining docstrings for clarity and consistency. Updates comments to standardize formatting and improve readability. This change aims to provide clearer descriptions of classes, methods, and their functionalities, enhancing the overall usability of the codebase.
- Introduced a new CHANGELOG.md to document notable changes and version history for the Decomp project. - Added a CI workflow in .github/workflows/ci.yml for automated testing, linting, and type checking using Python 3.12. - Updated README.md with badges for CI status, GitHub link, and license information. - Enhanced documentation across various modules, including installation instructions, release notes, and detailed API references for the new PredPatt integration and Python 3.12+ compatibility.
…tegration - Changed the base image in Dockerfile to jupyter/datascience-notebook with Python 3.12. - Updated working directory and copy commands in Dockerfile for better ownership management. - Modified installation commands to use editable mode and pre-build the UDS corpus. - Enhanced README.md and install.rst with updated instructions for building and running the Docker image, including starting a Jupyter Lab server. - Updated requirements.txt to reflect new package versions and added development dependencies for testing.
- Updated README.md and install.rst to clarify installation methods, including direct installation from GitHub and from source. - Added requirements for Python 3.12 or higher and detailed steps for development installation with dependencies. - Improved documentation structure and content in various files, including sentence-graphs.rst and predpatt.rst, for better clarity and usability. - Refined comments and docstrings across multiple modules to enhance readability and consistency.
bd88ca2
to
c024247
Compare
07c3fc9
to
1c0e24d
Compare
1c0e24d
to
568cb89
Compare
- Modifies the Dockerfile to install the toolkit in editable mode with visualization dependencies, removing the requirements.txt file. - Updates the tests/README.md to clarify installation steps for running tests, emphasizing the use of editable mode for development dependencies. - Removes tests/requirements.txt as its contents are now integrated into the main installation process.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR represents a significant modernization of the Decomp toolkit with full integration of PredPatt predicate-argument structure extraction functionality and comprehensive Python 3.12+ compatibility updates.
Summary
This PR integrates the standalone PredPatt library directly into decomp as
decomp.semantics.predpatt
, modernizes the codebase for Python 3.12+, and adds comprehensive CI/CD infrastructure. The integration maintains complete compatibility with the original PredPatt implementation while providing seamless interoperability with the UDS framework.Key Changes
1. PredPatt Integration (~7,000 lines)
decomp.semantics.predpatt
:core/
: Core data structures (Token, Predicate, Argument, PredPattOpts)extraction/
: Main extraction engine with linguistic rule applicationparsing/
: Universal Dependencies parsing utilitiesrules/
: Modular linguistic rules for predicate/argument identificationfilters/
: Configurable filtering systemutils/
: Visualization and debugging utilities2. Python 3.12+ Modernization
X | Y
instead ofUnion[X, Y]
)list[str]
instead ofList[str]
)type
aliases instead ofTypeAlias
setup.py
topyproject.toml
[dev]
extras3. CI/CD Infrastructure
.github/workflows/ci.yml
):ruff.toml
: Linting and formatting configurationmypy.ini
: Type checking configuration4. Documentation Enhancements
CHANGELOG.md
with complete release historyreleases.rst
documentation page5. Bug Fixes
document_ids
→documentids
in documentationTesting
All tests pass including the comprehensive PredPatt differential test suite:
Breaking Changes
None for existing users. The integration is additive:
decomp
API remains unchangeddecomp.semantics.predpatt
Migration Guide
For users of standalone PredPatt:
Example Usage
Future Work
parsing
feature. Currently this feature has concrete as a dependency, but this dependency is likely to be removed, since it is unlikely to be necessary and is requires many old dependencies.Checklist