Data governance and roles
The HRL Science Program relies on a Science Committee, which is a distributed, multi-agency governance structure designed to support high-quality, management-relevant science across the Sacramento River watershed and Bay-Delta estuary over the program’s eight-year duration. Part of the HRL Science Committee’s governance structure addresses data governance—defining who does what in data collection, publication, and analysis; how data move through the program; and how decisions are made at each stage of the HRL data lifecycle.
Data governance is a set of processes that ensures that data assets are formally managed throughout an enterprise. A data governance model establishes authority and management and decision-making parameters related to the data produced or managed by the enterprise.
Definition drawn from the National Institute of Standards and Technology.
The HRL data governance model balances the independence of Data Producers with shared program-wide standards, centralized data engineering capacity, and strategic oversight from the HRL Science Committee.
For a description of how data move from collection to analysis and reporting, see the HRL Data Lifecycle section of this site.
Why governance matters
HRL is evaluating dozens of hypotheses across multiple spatial scales, habitat types, and time periods. The HRL Science Program will last at least eight years. The complexity and duration of this collaborative research program necessitates:
- Clear accountability for data collection and stewardship
- Documented standards for data collection, management, publication, and analysis
- Trusted data repositories and open, reproducible workflows for cleaning, publishing, and synthesizing data
- Feedback loops that allow HRL to learn and adapt
- Transparent decision-making around priorities, resources, and changes to standards
Well-defined governance ensures that HRL’s annual and triennial synthesis products are generated collaboratively and are scientifically defensible, reproducible, and comparable across years, tributaries, and agencies.
Governance structure overview
- Data Producers generate and publish data.
- The Central Data Team curates, standardizes, and serves data.
- Synthesis Teams analyze curated data and generate synthesis products.
- The HRL Science Committee provides program-level oversight and prioritization.
The HRL data governance framework is built around four interacting groups:
- Data Producers
- The Central Data Team
- Synthesis Teams
- The HRL Science Committee
Each group plays a distinct role in guiding data from collection to static publication, ingestion, storage, analysis, synthesis, and reporting and communication, with clearly defined decision authorities and responsibilities.
The figure below illustrates how these roles interact across the HRL data lifecycle. The HRL Science Committee plays an oversight role for the entire data lifecycle but does not directly implement any phase of the data lifecycle.
Roles and responsibilities
Data Producers
- Primary domain: field, lab, and model-based data collection
- Key outputs: quality-controlled and published datasets and metadata
- Data lifecycle focus: collection, static publication
Who they are:
System governance entities, HRL signatories, consultants, and partner organizations responsible for creating original data under the HRL program.
Core responsibilities:
- Collect or generate raw data using standardized protocols approved by the HRL Science Committee through system science plan review
- Capture complete metadata at the point of collection
- Apply quality management (QA/QC) practices
- Prepare and publish static datasets and metadata in trusted HRL program repositories (e.g., EDI) using reproducible code hosted under the HRL GitHub organization
- Notify the Central Data Team of new data publications and updates
Data lifecycle phases:
Decision authority:
- Select data collection methods within agreed-upon protocols
- Determine timing of dataset release within governance rules (typically within one year of data collection)
Central Data Team
- Primary domain: data engineering, architecture, and shared analysis and visualization tools
- Key outputs: curated datasets, data catalogs, APIs, software development kits (SDKs), dashboards, decision support tools, and standards
- Lifecycle focus: ingestion and standardization, storage and serving, analysis and synthesis (support), reporting and communication
Who they are:
Interagency technical experts (data engineers and data scientists) who design and operate the program’s shared data, analysis, and technology infrastructure.
Core responsibilities:
- Develop and maintain the central data and technology architecture
- Maintain the HRL Style and Development Guide and program-wide coding standards
- Ingest, harmonize, and standardize datasets (ETL/ELT), including HRL-produced and relevant external datasets
- Maintain databases, catalogs, APIs, and SDKs for accessing HRL data
- Provide documentation, tools, dashboards, and automated reporting
- Guide policies and implementations for sensitive data management, including CARE-aligned practices for Tribal data
- Support synthesis teams with modeling infrastructure, workflows, and technical assistance
- Facilitate feedback cycles across HRL parties to improve data quality, usability, and documentation
Lifecycle phases:
- Ingestion and standardization
- Storage and serving
- Analysis and synthesis (in support role)
- Reporting and communication
Decision authority:
- Set technical standards for metadata, schema consistency, coding and development practices, and program-level quality management
- Manage curated dataset updates and semantic versioning on a reasonable timeline
- Recommend resourcing, staffing, and prioritization to the HRL Science Committee
Synthesis Teams
- Primary domain: modeling, analysis, and interpretation
- Key outputs: synthesis datasets, indicators, models, and reports
- Lifecycle focus: analysis and synthesis, reporting and communication (support)
Who they are:
Interdisciplinary scientists working on HRL hypotheses and synthesis products.
Core responsibilities:
- Synthesize curated, analysis-ready datasets provided by the Central Data Team
- Conduct reproducible analyses and modeling in accordance with the HRL Style and Development Guide
- Produce synthesis datasets, indicators, models, and reports using version-controlled workflows
- Provide structured feedback on data quality, usability, and gaps to Data Producers and the Central Data Team
- Suggest adaptive management actions to the HRL Science Committee based on synthesis findings and research needs
- Identify future data and analysis priorities based on synthesis findings
Lifecycle phase:
- Analysis and synthesis
- Reporting and communication (in support role)
Decision authority:
- Influence prioritization through documented feedback
- Propose new data needs and analytical directions
- Determine best statistical and modeling methods to investigate HRL hypotheses
HRL Science Committee
- Primary domain: program-level governance and prioritization
- Key outputs: resourcing decisions, policy guidance, and governance approvals
- Lifecycle focus: cross-cutting oversight across all phases
Who they are:
HRL Science Committee members who ensure that HRL’s science, data practices, and investments remain aligned with program goals and charter commitments.
Core responsibilities:
- Oversee resource allocation and prioritization across HRL science workstreams
- Resolve tensions between data producer independence and program-wide standardization
- Identify and support capacity and training needs related to data, modeling, and open science
- Request decision support tools, dashboards, and frontend products from the Central Data Team
Lifecycle phases:
- Cross-cutting across the entire lifecycle, with particular focus on major program decisions and trade-offs
Decision authority:
- Allocate financial and human resources
- Approve or revise major governance policies and standards
- Escalate critical unresolved issues to the HRL Systemwide Governance Committee
How these roles work together
Across the data lifecycle, governance mechanisms and data product quality gates ensure that:
- Every dataset has clear provenance, from field collection and static publication to curated HRL datasets and synthesis products
- Every dataset and analysis is traceable, citable, and reproducible
- Data can be harmonized and compared across systems, habitat types, and years
- Synthesis teams have timely, reliable access to analysis-ready data and tools
- The HRL Science Program learns and adapts based on documented feedback and evaluation
Together, these roles and structures provide the foundation for evidence-based decision-making, adaptive management, long-term data infrastructure stewardship, and transparent public reporting on HRL’s ecological outcomes.
For more detail on data practices at each stage of the lifecycle, see the HRL Data Lifecycle section.