Posit Data Science Platform

Author

Lucy Andrews

Published

May 2026

WarningWorking draft

This is an internal planning document, not a policy or design specification.

HRL data scientists work primarily in R and Python and are familiar with RStudio, Positron, JupyterLab, and Visual Studio Code. The Azure stack and pipelines described on the Data Pipeline Architecture page handles data ingestion, validation, standardization, storage, and public-facing APIs — but its services (Container Apps, API Management, App Service, etc.) are designed for infrastructure engineers, not ecologists running analyses and building Shiny applications. A data scientist should not need to understand Azure DevOps to share a dashboard with a colleague.

Posit Workbench and Posit Connect fill that gap. Together they give HRL data scientists a familiar, cloud-hosted development environment and a publishing platform for analytical content — for example, Shiny applications, Quarto reports, scheduled scripts, and APIs — without requiring them to learn the underpinning cloud infrastructure. These are distinct products that serve different purposes; both are made by Posit, PBC (formerly RStudio).

Note

This page describes the data science tooling layer. It complements the data engineering layer described on the Data Pipeline Architecture page; it does not replace it. Both are needed.

How the two stacks relate

The Azure data pipeline is concerned with how data gets in, gets cleaned, and gets stored. The Posit platform is concerned with how data scientists access and publish from that store. Data scientists working in Posit Workbench read from the same PostgreSQL/PostGIS database and Blob Storage that the pipeline writes to; they do not interact with the pipeline itself. They can also easily import data from other locations, for example external repositories such as EDI and manual uploads to repositories.

flowchart LR
    subgraph pipe ["Azure data pipeline"]
        direction TB
        DB[("<b>PostgreSQL + PostGIS</b><br/>Standardized spatial data")]
        ST[("<b>Blob Storage</b><br/>Export files")]
    end

    subgraph ext ["Other data sources"]
        direction TB
        EDI["<b>EDI</b> and other<br/>external repositories"]
        MAN["Manual uploads<br/>and ad-hoc files"]
    end

    subgraph posit ["Posit platform"]
        direction TB
        PW["<b>Posit Workbench</b><br/>Development environment<br/>RStudio · VS Code · Jupyter"]
        PC["<b>Posit Connect</b><br/>Publishing platform"]
    end

    DB -->|"read (e.g., via DBI,<br/>RPostgres, sf)"| PW
    ST -->|"read (e.g., via<br/>AzureStor, arrow)"| PW
    EDI -->|"read (e.g., via<br/>EDIutils, httr2)"| PW
    MAN --> PW
    PW -->|publish| PC

    PC --> A["Shiny apps"]
    PC --> B["Quarto documents<br/>and reports"]
    PC --> C["Plumber APIs"]
    PC --> D["Scheduled scripts<br/>and email reports"]

    classDef azure fill:#D2EAF4,stroke:#2E7DA1,stroke-width:2px,color:#0C425C
    classDef posit fill:#C4D8C6,stroke:#2E6E3D,stroke-width:2px,color:#2E6E3D
    classDef external fill:#F7F7F7,stroke:#777777,stroke-width:1.5px,color:#333333
    class DB,ST azure
    class PW,PC posit
    class EDI,MAN external

Posit Workbench

Posit Workbench is a server-based development environment — think of it as an integrated development environment (IDE) such as RStudio running in the cloud rather than on a local laptop, accessible through a browser. Multiple users can connect simultaneously, each with their own isolated session, and an individual user can run multiple sessions simultaneously.

For HRL data scientists, Workbench means:

  • A consistent, pre-configured R and Python environment that does not vary by machine or operating system
  • Access to the DWR data layers (e.g., PostgreSQL, Blob Storage, Snowflake, Databricks) using standard R and Python packages, without needing to configure Azure credentials locally
  • Enough memory and compute to work with large datasets (spatial or otherwise) that would be slow or impractical on a laptop
  • Support for RStudio, VS Code, and Jupyter IDEs — users can work in whichever they prefer

Workbench is where code gets written, data get explored, models are constructed, analyses are performed, and applications get developed before they are published.

Posit Connect

Posit Connect is a publishing platform. Once a data scientist has built a Shiny or other application, a Quarto report or Jupyter Notebook, or a scheduled analysis in Workbench, they publish it to Connect with a single button click from their IDE of choice. Connect handles the rest: hosting, authentication, scheduling, and sharing.

Content types Connect supports:

  • Shiny, Streamlit, and related applications — interactive R and Python apps
  • Quarto documents and Jupyter Notebooks — rendered reports, dashboards, and websites
  • Plumber, Flask and FastAPI REST APIs — R- and Python-based APIs that can feed dashboards, apps, and public access to data
  • Scheduled scripts — R or Python scripts that run on a cron schedule and optionally email results to designated recipients

From a data scientist’s perspective, Connect makes deployment invisible. They do not configure servers, write Dockerfiles (i.e., set up containers), or touch Azure App Service. From an infrastructure perspective, Connect is a managed deployment target that keeps all published content centralized, versioned, and access-controlled.

How this differs from the Azure pipeline’s application layer

The Azure pipeline page on this site describes hosting data engineering functions, applications, and public APIs on Azure App Service or Azure Container Apps, backed by a full Azure stack. Those services are appropriate for applications that have complex compute and deployment needs, are written primarily in languages other than R or Python or frameworks other than Shiny and Streamlit, need to be produced prior to Posit Connect’s procurement and configuration, or are built by developers who prefer containerized deployment workflows. Posit Connect is appropriate for applications built by data scientists in R or Python who want to share work with internal colleagues or controlled external audiences without an infrastructure intermediary.

In practice, the two may coexist: some applications run on Azure, others on Posit Connect, depending on deployment and developer needs.

Hosting on Azure

Posit Workbench and Posit Connect are server products. Unlike the managed PaaS services that make up the rest of the HRL stack, they run on Azure virtual machines — either provisioned directly in Azure or via the Posit offerings in the Azure Marketplace. This is a deliberate exception to the PaaS-first principle: Posit does not currently offer a fully managed cloud-native hosting option equivalent to Azure App Service, so a VM (Azure IaaS) is the standard deployment path.

DWR IT will need to be involved in provisioning. Key decisions to make:

  • VM size (RAM and CPU requirements depend on the number of concurrent users and the scale of datasets being processed)
  • Authentication integration (both products support Microsoft Entra ID, which is how DWR does identity management)
  • Network configuration