Ingest data into the HRL platform

Goal

  • Capture workflows for harvesting published datasets (HRL and external) and standardizing them for program use.

Preconditions

  • HRL GitHub and infrastructure access, along with credentials for source repositories/APIs.
  • Metadata about source datasets (DOI, version, schema expectations, sensitivity flags).

Workflow outline

  • Retrieve the static dataset or synthesis output using the DOI/API and stage files securely.
  • Record provenance (source release, commit hashes) in ingestion configs.
  • Align schemas to HRL standards (columns, units, vocabularies, CRS) and apply data dictionaries.
  • Run automated validation suites, log issues, and resolve discrepancies with data producers.
  • Publish the harmonized dataset plus machine-readable schema to the storage/serving environment.

Deliverables

  • Versioned curated dataset, validation reports, ingestion notes, and catalog-ready metadata.
  • Flags for sensitive data routed to storage/serving and reporting teams.

Collaboration and escalation

  • Guidance for coordinating with Data Producers/Synthesis Teams when questions arise.
  • Criteria for involving HRL Science Committee or governance leads when standards need updates.