ETC5523: Communicating with Data

Data storytelling on the web

Lecturer: Michael Lydeamore

Department of Econometrics and Business Statistics



Aim

  • Understand a website structure and its difference to a webpage
  • Learn how to use Quarto
  • Adopt reproducible workflows using Quarto
  • Host web content using either Quarto Pub, GitHub Pages or Netlify

Why

  • Communication on the web is increasingly common
  • There are challenges to streamline reproducible data analysis on the web

Rmd R Markdown

(Assumed knowledge from ETC5513)

R Markdown system

  • Better reproducibility for analytical results via R
  • Change output document type easily (thanks to Pandoc)
  • Active maintenance and development by RStudio a.k.a. Posit team

qmd Quarto

(multi-language, next generation version of R Markdown)

Quarto system

Changes

  • The reproducible workflow is no longer dependant on R
  • Better multi-language support (e.g. Python, Julia, JavaScript, R, etc) and multi-engine support (e.g. Jupyter, Knitr, Observable)
  • Consistency in systems across all formats (e.g. layouts, cross references)
  • Some specifications for YAML and chunk options

Overall syntax comparison

Rmd

---
title: "My document"
output:
  html_document:
    toc: true
    css: styles.css
---

```{r, warning = FALSE, message = FALSE}
knitr::opts_chunk$set(echo = FALSE,
                      fig.width = 8, 
                      fig.height = 6)
library(tidyverse)
```

Qmd

---
title: "My document"
execute: 
  echo: false
format:
  html:
    toc: true
    css: styles.css
    fig-width: 8
    fig-height: 6
    html-math-method: katex 
---

```{r}
#| warning: false
#| message: false
library(tidyverse)
```

Do we use Rmd or Qmd?

  • If your computation uses only R, Rmd is completely fine.
  • In this unit, we will be using Quarto for making:
    • websites (including blogs) and
    • presentation slides.

How to use Quarto

  • Quarto is quite NEW – v1 was released only on 20th July 2022!

  • The best documentaton is at https://quarto.org/

Making Websites with Quarto

Webpage vs. Website

What is the difference?

  • A webpage is a single document written in HTML.
  • While a website is a collection of webpages where it usually share a common navigation bar (or tab), and possibly a common footer.

Web server directory index

Web server directory index

This typically means you end up with a structure like this:

    cwd.numbat.space/
    β”œβ”€β”€ index.html
    β”œβ”€β”€ lectures/
    β”‚   β”œβ”€β”€ lecture-01
    β”‚   |   └── index.html
    β”‚   β”œβ”€β”€ lecture-02
    β”‚   |   └── index.html
    β”‚   └── lecture-03
    β”‚       └── index.html
    └── assignments/
        β”œβ”€β”€ assignment-01
        β”‚   └── index.html
        └── assignment-02
            └── index.html

There is a VSCode/Positron extensions that can help you identify files based on their folder:

"workbench.editor.labelFormat": "short"

If you’re on RStudio you’re on your own sorry

Getting started with Quarto blog

Using RStudio IDE

File > New Project > New Directory > Quarto Blog

Command line

Run in the terminal:

quarto create-project myblog --type website:blog

This creates a basic file structure in the myblog directory.

Quarto blog template (demo)

Quarto blog structure

β”œβ”€β”€ _quarto.yml
β”œβ”€β”€ index.qmd
β”œβ”€β”€ about.qmd
β”œβ”€β”€ profile.jpg
β”œβ”€β”€ styles.css
└── posts
    β”œβ”€β”€ _metadata.yml
    β”œβ”€β”€ welcome
    β”‚   β”œβ”€β”€ thumbnail.jpg
    β”‚   └── index.qmd
    └── post-with-code
        β”œβ”€β”€ image.jpg
        └── index.qmd

Quarto workflow

  • For a live preview of the website (when developing):
quarto preview 
  • For rendering the website (default folder is _site):
quarto render 

_quarto.yml

project:
  type: website

website:
  title: "myblog"
  navbar:
    right:
      - about.qmd
      - icon: github
        href: https://github.com/
      - icon: twitter
        href: https://twitter.com
format:
  html:
    theme: cosmo
    css: styles.css

index.qmd


---
title: "myblog"
listing:
  contents: posts
  sort: "date desc"
  type: default
  categories: true
  sort-ui: false
  filter-ui: false
page-layout: full
title-block-banner: true
---

Publishing websites

Web hosting

Sharing on the web with Quarto Pub

quarto publish quarto-pub
  • The website will be published at https://username.quarto.pub/mysite/ where
    • username is your Quarto Pub username
    • mysite is the site name

Sharing on the web with GitHub Pages

usethis::use_git()
usethis::use_github() # or manually link with your local folder
  1. Push your directory to your Github repo, say mysite.
  2. Go to your GitHub repo settings and enable β€œGitHub Pages”.
  3. Your website will be available with url: http://username.github.io/mysite

Note: it may take 10 minutes or so to render the first time.

Alternatively use

quarto publish gh-pages

Sharing on the web with Netlify

  1. Go to https://app.netlify.com and log in
  2. Drag and drop your site folder which contains the index.html to:
  1. Do go to Site settings > Change site name for a more sensible domain name.

Alternatively use

quarto publish netlify

Some cool Quarto stuff

webR

webR is a web server for R that allows you to run R code in the browser

Check out the REPL

webR with Quarto

  • Community-developed extension to run R code in the browser
  • Works with HTML, RevealJS presentations, websites, blogs ```{python}

Setup

---
title: "Your slide title"
format: revealjs
engine: knitr
filters:
  - webr
webr:
  channel-type: "post-message"
---

And run:

quarto add coatless/quarto-webr

An example slide:

```{webr-r}
#| context: interactive
summary(lm(mpg ~ wt, data = mtcars))
```

An example slide:

Preload packages

You can specify packages to install in YAML:

webr:
  packages: ['palmerpenguins', 'ggplot2']
  show-startup-message: false

and then use them as per normal:

library(ggplot2)
library(palmerpenguins)
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, colour = species)) +
  geom_point() +
  labs(title = "Penguin bill dimensions")

Result:

Mermaid Diagrams

At some poinmt you’ll almost certainly need to draw a flow diagram. Quarto can do this using Mermaid.

```{mermaid}
%%| echo: false
graph LR
    A[Data Sources] --> B[Extract]
    B --> C[Staging Area]
    C --> D[Clean & Transform]
    D --> E[Data Quality Checks]
    
    %% Branch: some data skips transformation
    C --> E
    
    E --> F[Load]
    F --> G[Data Warehouse]
    
    %% Consumption paths
    G --> H[Reports & Dashboards]
    G --> I[Analytics & Modeling]
    
    %% Feedback loop from analytics to transform
    I --> D
```

Mermaid diagrams

graph LR
    A[Data Sources] --> B[Extract]
    B --> C[Staging Area]
    C --> D[Clean & Transform]
    D --> E[Data Quality Checks]
    
    %% Branch: some data skips transformation
    C --> E
    
    E --> F[Load]
    F --> G[Data Warehouse]
    
    %% Consumption paths
    G --> H[Reports & Dashboards]
    G --> I[Analytics & Modeling]
    
    %% Feedback loop from analytics to transform
    I --> D

Maps

```{r}
library(leaflet)
library(dplyr)

# Create a data frame of locations
locations <- tibble::tribble(
  ~name,                           ~lat,       ~lng,
  "Melbourne CBD",                 -37.8136,  144.9631,
  "Monash University (Clayton)",   -37.8768,  145.0450,
  "University of Melbourne",       -37.7964,  144.9633
)

# Make the map
leaflet(locations) |>
  addTiles() |>
  addMarkers(
    lng = ~lng,
    lat = ~lat,
    popup = ~name
  )
```

Maps

Week 4 Lesson

Summary

  • We looked at a website structure
  • We built a website using the Quarto system
  • We learnt how to host websites using Quarto Pub, GitHub Pages or Netlify