-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Canadian COVID-19 Data ArchiveIssues directly related to the Canadian COVID-19 Data ArchiveIssues directly related to the Canadian COVID-19 Data Archivedata processing / FAIR dataProcessing raw data into FAIR datasetsProcessing raw data into FAIR datasetsmetadataMetadata for archived or derived datasetsMetadata for archived or derived datasets
Description
Once a common output data format is established (#10), a huge number of workflows will need to be developed, one per dataset, in order to transform raw, archived data into FAIR data.
The exact nature of these data workflows has not yet been decided, but will likely include one or more of: SQL, Python, R and related tools.
These workflows require a few different features:
- Ability to link several datasets together if they represent a continuous data series (e.g., if the name/URL of the dataset changed over time) (see also comment in METADATA: Metadata taxonomy for FAIR data #7)
- Ability to incorporate manual data entry to fill gaps, errors and other entries requiring correction (these should be clearly indicated in metadata with a reason)
- Easy to establish data provenance (FAIR DATA: Data provenance for FAIR datasets #12)
Since May 2021, automation has been used to maintain the Covid19Canada
and CovidTimelineCanada
datasets. This process involved writing and maintaining a significant amount of R code to process dozens of existing datasets, see the R packages Covid19CanadaETL
, Covid19CanadaData
and Covid19CanadaDataProcess
to view this existing and ongoing effort.
Metadata
Metadata
Assignees
Labels
Canadian COVID-19 Data ArchiveIssues directly related to the Canadian COVID-19 Data ArchiveIssues directly related to the Canadian COVID-19 Data Archivedata processing / FAIR dataProcessing raw data into FAIR datasetsProcessing raw data into FAIR datasetsmetadataMetadata for archived or derived datasetsMetadata for archived or derived datasets