Reproducible R workflows

targets, renv, and GitHub Actions

Lucy Andrews

2026-05-14

What do we mean by reproducibility?

The same inputs and code can regenerate the same outputs
The project records the software environment needed to run
The workflow makes dependencies and execution order explicit
Changes can be checked automatically before results are trusted

Sometimes projects are reproducible only in the author’s head!

Scripts must be run in the “right” order
Some outputs are saved manually
Package versions are unknown
Results may be stale relative to the code
No automated check confirms that the project still runs

A reproducible project has a visible chain of evidence

Three recurring reproducibility problems

Problem	Tool	What it protects against
“What order do I run the scripts in?”	`targets`	Hidden dependencies, stale outputs
“Why does this work on my machine but not yours?”	`renv`	Package version drift
“How do I know this still works?”	GitHub Actions	Silent breakage, collaboration risk

Worked example: Palmer penguins

Analytical question: How do body mass patterns differ among penguin species and islands?

Illustration by Allison Horst

Worked example: Palmer penguins

The workflow produces:

Cleaned analysis dataset
Summary tables by species and island
Exploratory figure
Simple linear regression model
Rendered Quarto report

The workflow produces analysis artifacts

outputs/
├── penguins_clean.rds
├── species_summary.csv
├── body_mass_plot.png
├── model_summary.rds
└── report.html

The pipeline defines what the products are and how to make them.

A reproducible repository has a shape

reproducible-r-workflow-demo/
├── _targets.R
├── renv.lock
├── .Rprofile
├── README.md
├── *.Rproj
│
├── R/
│   ├── data.R
│   ├── clean.R
│   ├── summarize.R
│   ├── visualize.R
│   └── model.R

├── reports/
│   └── penguins_report.qmd
│
├── slides/
│   └── reproducible_r_workflows.qmd
│
├── outputs/
│
├── data/
│
└── .github/
    └── workflows/
        ├── run-targets.yaml
        └── publish-slides.yaml

What belongs where?

Location	Purpose
`_targets.R`	Defines the workflow
`R/`	Stores reusable analysis functions
`reports/`	Quarto reports that consume pipeline outputs
`slides/`	Presentation materials
`renv.lock`	Records package versions
`.github/workflows/`	Automated checks
`outputs/`	Derived analysis products

`targets` turns an analysis into a pipeline

Defines the steps of an analysis as a list of targets
Tracks dependencies among objects and files
Skips work that is already up to date
Rebuilds downstream results when upstream code or data change

Scripts can suggest order implicitly, pipelines make it explicit

Script sequence

source("01_clean_data.R")
source("02_summarize.R")
source("03_figures.R")
source("04_report.R")

This order is a convention. It is not enforced.

Pipeline

list(
  tar_target(raw_penguins,
    get_penguins()),
  tar_target(clean_penguins,
    clean_penguins_data(raw_penguins)),
  tar_target(species_summary,
    summarize_by_species(clean_penguins)),
  tar_quarto(report,
    "reports/penguins_report.qmd")
)

The workflow can be visualized

targets::tar_visnetwork() renders this interactively.

What needs to rerun?

If clean_penguins_data() changes, targets rebuilds:

clean_penguins
species_summary, island_summary
body_mass_plot, body_mass_model
report

Unrelated targets are skipped.

targets::tar_outdated()   # shows what needs to run
targets::tar_make()       # runs only what is out of date

The same code can behave differently in a different R environment

Packages change over time
Collaborators may have different package versions installed
A lockfile records the exact package environment
Continous integration starts from a completely clean machine

`renv` records the project environment

# Start project-specific dependency tracking
renv::init()

# Record current package versions to renv.lock
renv::snapshot()

# Recreate the recorded environment elsewhere
renv::restore()

The lockfile becomes part of the reproducible record — commit it.

What gets committed?

File or folder	Commit?	Why
`renv.lock`	✅ Yes	Records package versions
`renv/activate.R`	✅ Yes	Activates the project environment
`renv/settings.json`	✅ Yes	Records project renv settings
`renv/library/`	❌ No	Local package cache; machine-specific

Continuous integration asks: does this project still run from scratch?

Runs checks on a clean machine with no local state
Detects missing or unrecorded dependencies
Detects broken code after changes
Makes reproducibility visible to collaborators

The GitHub Action restores and runs the workflow

name: Run targets workflow
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  run-targets:
    runs-on: ubuntu-latest
    steps:
      - name: Check out repository
        uses: actions/checkout@v4
      - name: Set up R
        uses: r-lib/actions/setup-r@v2
        with:
          r-version: '4.5.2'
          use-public-rspm: true
      - name: Set up pandoc
        uses: r-lib/actions/setup-pandoc@v2
      - name: Set up Quarto
        uses: quarto-dev/quarto-actions/setup@v2
      - name: Install system dependencies
        run: sudo apt-get install -y libglpk-dev
      - name: Restore R package environment
        uses: r-lib/actions/setup-renv@v2
      - name: Run targets pipeline
        run: Rscript -e 'targets::tar_make()'

CI is a reproducibility check, not scientific peer review

CI helps confirm

The workflow runs on a clean machine
Dependencies are declared
Outputs can be regenerated
Code changes do not break execution

CI does not confirm

The analysis is scientifically correct
The model is appropriate
Input data are valid
Results are meaningful

The reproducibility stack

Layer	Question it answers
Repository structure	Where is everything?
R functions	What does each step do?
`targets`	What depends on what?
`renv`	What software versions are needed?
GitHub Actions	Does it still run elsewhere?
Quarto	How are results communicated?

Adoption can be incremental

Put the analysis in a well-structured R project
Move repeated code into functions
Add renv and commit renv.lock
Add targets for the main workflow
Add GitHub Actions to run the workflow automatically
Add Quarto reports or slides that draw from pipeline outputs

Common pitfalls and better practices

Pitfall	Better practice
One giant script	Small functions plus a pipeline
Manual intermediate files	Define outputs as named targets
Untracked package versions	Commit `renv.lock`
CI installs the latest packages	Restore from `renv` lockfile
Reports do hidden analysis	Reports consume pipeline outputs

Tutorial repo to be shared

https://github.com/lucy-dwr/reproducible-r-workflow-demo

Open _targets.R — each target is an analysis product.
Run targets::tar_visnetwork()
Run targets::tar_make()
Modify one plot label or summary function.
Run targets::tar_outdated()
Run targets::tar_make() again — only affected targets rebuild.
Open GitHub Actions and show the most recent successful run.

Reproducibility is a workflow, not a single tool

targets makes the analysis workflow explicit
renv records the software environment
GitHub Actions checks that the project still runs
Quarto turns reproducible outputs into communication products

The goal is not to make analyses more complicated. The goal is to make the complexity visible, testable, and easier to maintain.

Reproducible R workflows

What do we mean by reproducibility?

Sometimes projects are reproducible only in the author’s head!

A reproducible project has a visible chain of evidence

Three recurring reproducibility problems

Worked example: Palmer penguins

Worked example: Palmer penguins

The workflow produces analysis artifacts

A reproducible repository has a shape

What belongs where?

targets turns an analysis into a pipeline

Scripts can suggest order implicitly, pipelines make it explicit

The workflow can be visualized

What needs to rerun?

The same code can behave differently in a different R environment

renv records the project environment

What gets committed?

Continuous integration asks: does this project still run from scratch?

The GitHub Action restores and runs the workflow

CI is a reproducibility check, not scientific peer review

The reproducibility stack

Adoption can be incremental

Common pitfalls and better practices

Tutorial repo to be shared

Reproducibility is a workflow, not a single tool

`targets` turns an analysis into a pipeline

`renv` records the project environment