Releases: whythawk/whyqd
1.0.8
1.0.7
1.0.5
1.0.4
- Dependency updates.
- Permitting
nrow
limit on Parquet files. - Disambiguation where schema subject and object field categories have the same name.
- Ambiguity checks for string blank space. If source data includes, should not be removed to preserve original structure.
1.0.1
1.0.0
This version shares some features with the previous version, but is a complete refactoring and conceptual redesign. It is
not backwardly compatible. Future versions will maintain compatability with this one.
- Separated data models from schema models so that crosswalks are schema-to-schema.
- Complete revision of the API into four discrete
Definition
classes,SchemaDefinition
,DataSourceDefinition
,
CrosswalkDefinition
andTransformDefinition
. - Removed
filters
andactions
that are no longer relevant (includingREBASE
, andMERGE
). - Simplified
CATEGORISE
since it no longer requires deriving terms as part of the crosswalk. - Crosswalks are designed to support continuous integration.
- Pydantic models are more transparent via each
Definition
's.get
property. - Refactored Pandas to support Modin and Ray for data >1 million rows.
- Mime type support for data sources in
Parquet
andFeather
. - Rewrote documentation in MKDocs from Sphinx.
- Revised all tutorials and documentation.
whyqd: simplicity, transparency, speed
whyqd provides an intuitive method for restructuring messy data to conform to a standardised metadata schema. It supports data managers and researchers looking to rapidly, and continuously, normalise any messy spreadsheets using a simple series of steps. Once complete, you can import wrangled data into more complex analytical systems or full-feature wrangling tools.
Read the docs and there are two worked tutorials to demonstrate
how you can use whyqd
to support source data curation transparency:
Install using pip
:
pip install whyqd
Version 0.5.0 introduced a new, simplified, API, along with script-based transformation actions. You can import and
transform any saved method.json
files with:
SCHEMA = whyqd.Schema(source=SCHEMA_SOURCE)
schema_scripts = whyqd.parsers.LegacyScript().parse_legacy_method(
version="1", schema=SCHEMA, source_path=METHOD_SOURCE_V1
)
Where SCHEMA_SOURCE is a path to your schema. Existing schema.json
files should still work.
whyqd: simplicity, transparency, speed
whyqd provides an intuitive method for restructuring messy data to conform to a standardised metadata schema. It supports data managers and researchers looking to rapidly, and continuously, normalise any messy spreadsheets using a simple series of steps. Once complete, you can import wrangled data into more complex analytical systems or full-feature wrangling tools.
Read the docs and a full tutorial.
Install using pip
:
pip install whyqd