Ingest data into the HRL platform

Goal

Capture workflows for harvesting published datasets (HRL and external) and standardizing them for program use.

Preconditions

HRL GitHub and infrastructure access, along with credentials for source repositories/APIs.
Metadata about source datasets (DOI, version, schema expectations, sensitivity flags).

Workflow outline

Retrieve the static dataset or synthesis output using the DOI/API and stage files securely.
Record provenance (source release, commit hashes) in ingestion configs.
Align schemas to HRL standards (columns, units, vocabularies, CRS) and apply data dictionaries.
Run automated validation suites, log issues, and resolve discrepancies with data producers.
Publish the harmonized dataset plus machine-readable schema to the storage/serving environment.

Deliverables

Versioned curated dataset, validation reports, ingestion notes, and catalog-ready metadata.
Flags for sensitive data routed to storage/serving and reporting teams.

Collaboration and escalation

Guidance for coordinating with Data Producers/Synthesis Teams when questions arise.
Criteria for involving HRL Science Committee or governance leads when standards need updates.