Skip to content

FAIR DATA: Pipeline for processing archived datasets into a common format #15

@jeanpaulrsoucy

Description

@jeanpaulrsoucy

Once a common output data format is established (#10), a huge number of workflows will need to be developed, one per dataset, in order to transform raw, archived data into FAIR data.

The exact nature of these data workflows has not yet been decided, but will likely include one or more of: SQL, Python, R and related tools.

These workflows require a few different features:

Since May 2021, automation has been used to maintain the Covid19Canada and CovidTimelineCanada datasets. This process involved writing and maintaining a significant amount of R code to process dozens of existing datasets, see the R packages Covid19CanadaETL, Covid19CanadaData and Covid19CanadaDataProcess to view this existing and ongoing effort.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Canadian COVID-19 Data ArchiveIssues directly related to the Canadian COVID-19 Data Archivedata processing / FAIR dataProcessing raw data into FAIR datasetsmetadataMetadata for archived or derived datasets

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions