Skip to content

Releases: whythawk/whyqd

1.0.8

10 Aug 09:16
Compare
Choose a tag to compare
  • CategoryModel terms now use StrictBool to avoid Pydantic's liberal interpretation of booleans.

1.0.7

09 Aug 16:13
Compare
Choose a tag to compare
  • Minor fix, but crucial for continuing ambiguity issue with category terms. Any of source or destination fields or terms can share a name, which causes painful and tedious issues. Hopefully this fix resolves it.

1.0.5

09 Aug 11:49
Compare
Choose a tag to compare
  • Additional ambiguity checks for category term edge case where source or destination fields can share category names.

1.0.4

08 Aug 08:48
Compare
Choose a tag to compare
  • Dependency updates.
  • Permitting nrow limit on Parquet files.
  • Disambiguation where schema subject and object field categories have the same name.
  • Ambiguity checks for string blank space. If source data includes, should not be removed to preserve original structure.

1.0.1

05 Jul 15:37
Compare
Choose a tag to compare

Minor convenience update.

  • Permitting nrow limit on Parquet files.

1.0.0

10 May 14:15
96b77c8
Compare
Choose a tag to compare

This version shares some features with the previous version, but is a complete refactoring and conceptual redesign. It is
not backwardly compatible. Future versions will maintain compatability with this one.

  • Separated data models from schema models so that crosswalks are schema-to-schema.
  • Complete revision of the API into four discrete Definition classes, SchemaDefinition, DataSourceDefinition,
    CrosswalkDefinition and TransformDefinition.
  • Removed filters and actions that are no longer relevant (including REBASE, and MERGE).
  • Simplified CATEGORISE since it no longer requires deriving terms as part of the crosswalk.
  • Crosswalks are designed to support continuous integration.
  • Pydantic models are more transparent via each Definition's .get property.
  • Refactored Pandas to support Modin and Ray for data >1 million rows.
  • Mime type support for data sources in Parquet and Feather.
  • Rewrote documentation in MKDocs from Sphinx.
  • Revised all tutorials and documentation.

whyqd: simplicity, transparency, speed

23 Aug 17:31
Compare
Choose a tag to compare

whyqd provides an intuitive method for restructuring messy data to conform to a standardised metadata schema. It supports data managers and researchers looking to rapidly, and continuously, normalise any messy spreadsheets using a simple series of steps. Once complete, you can import wrangled data into more complex analytical systems or full-feature wrangling tools.

Read the docs and there are two worked tutorials to demonstrate
how you can use whyqd to support source data curation transparency:

Install using pip:

pip install whyqd

Version 0.5.0 introduced a new, simplified, API, along with script-based transformation actions. You can import and
transform any saved method.json files with:

SCHEMA = whyqd.Schema(source=SCHEMA_SOURCE)
schema_scripts = whyqd.parsers.LegacyScript().parse_legacy_method(
            version="1", schema=SCHEMA, source_path=METHOD_SOURCE_V1
        )

Where SCHEMA_SOURCE is a path to your schema. Existing schema.json files should still work.

whyqd: simplicity, transparency, speed

02 Nov 15:08
Compare
Choose a tag to compare

whyqd provides an intuitive method for restructuring messy data to conform to a standardised metadata schema. It supports data managers and researchers looking to rapidly, and continuously, normalise any messy spreadsheets using a simple series of steps. Once complete, you can import wrangled data into more complex analytical systems or full-feature wrangling tools.

Read the docs and a full tutorial.

Install using pip:

pip install whyqd