Ingestion and standardization

Purpose

Explain how the Central Data Team harvests HRL and external datasets and converts them into interoperable program assets.
Show how ingestion supports timely analysis, catalog updates, and downstream reporting.

Pipeline requirements

Version-controlled R/Python pipelines with containers, automation, and CI/CD checks.
Provenance capture (source DOI, source version, commit hashes, processing parameters).
Ability to ingest static publication releases and synthesis products.

Harmonization standards

Schema alignment (column names, units, data types) and tidy data expectations.
Controlled vocabularies for species, habitats, locations, and QA codes.
Missing-value conventions and spatial reference requirements.

Quality management

Automated schema validation, cross-dataset consistency checks, and program-level gates (row counts, uniqueness, bounding boxes).
Error logging stored with datasets plus remediation workflow.

Infrastructure and access

Cloud-native deployment guidance, container registries, and scheduling/orchestration patterns.
Flagging and routing sensitive datasets for special handling during storage and serving.
Deliverables for downstream teams (harmonized dataset, machine-readable schema, QA reports).