--- title: 'ETC5523: Communicating with Data' subtitle: "Tutorial 2 Solution" author: "Emi Tanaka" date: "Week 2" output: html_document: toc: true css: "tutorial.css" --- ```{r setup, include = FALSE} knitr::opts_chunk\$set(message = FALSE, warning = FALSE) ``` ## 🛠 Exercise 2B **Test your knowledge in HTML/CSS** See Moodle for the solution file. ## 🛠 Exercise 2C **File storage size** First let's make the `Titanic` data tidy. There are many ways you can do this and I show only one way below. ```{r tidy-data} library(tidyverse) library(glue) df1 <- expand_grid(class = c("1st", "2nd", "3rd", "Crew"), sex = c("Male", "Female"), age = c("Child", "Adult"), survived = c("No", "Yes")) %>% pmap_dfr(function(class, sex, age, survived) { data.frame(class = class, sex = sex, age = age, survived = survived, count = Titanic[class, sex, age, survived]) }) %>% as_tibble() df1 ``` This tidy data has `r nrow(df1)` rows and `r ncol(df1)` variables. Below I make the other forms of the data where the variables that store the `sex`, `class`, `age`, and `survived` are stored as factors or integers. ```{r tidy-data-transformations} df2 <- df1 %>% mutate(across(class:survived, ~factor(.x, labels = 1:n_distinct(.x)))) df2 df3 <- df2 %>% mutate(across(everything(), as.integer)) df3 ``` We save the data into separate files below. ```{r write-data} write_csv(df1, "titanic1.csv") write_csv(df2, "titanic2.csv") write_csv(df3, "titanic3.csv") saveRDS(df1, "titanic1.rds") saveRDS(df2, "titanic2.rds") saveRDS(df3, "titanic3.rds") save(list = "df1", file = "titanic1.rda") save(list = "df2", file = "titanic2.rda") save(list = "df3", file = "titanic3.rda") ``` ```{r file-sizes} info <- expand_grid(format = 1:3, ext = c("csv", "rds", "rda")) %>% pmap_dfr(function(format, ext) { data.frame(format = format, ext = ext, size = file.info(glue("titanic{format}.{ext}"))\$size) }) knitr::kable(info) %>% kableExtra::kable_classic(full_width = FALSE) ``` The `rda` file is always larger than `rds` for the same formatted data because `rda` also includes the extra reference to the object name that stores the data. ```{r diff} diff <- info %>% filter(format == 1 & ext %in% c("csv", "rds")) %>% pull(size) %>% diff() %>% abs() ``` The difference in file size for `csv` and `rds` from the first formatted data is `r diff` bytes. This is for data with `r nrow(df1)` rows so for a similar data with 100,000,000 observations, we can roughly estimate that the difference in size will be `r scales::comma(diff * 100000000 / nrow(df1)/1e9, 0.001)` GB. ## 🛠 Exercise 2D **Modify the look of your Rmd HTML documents** Here is the solution file: `tutorial-02-suppAsol.Rmd`. ## 🛠 Exercise 2E **JavaScript in R Markdown documents with `htmlwidgets`** This is the solution file: `tutorial-02-suppBsol.Rmd`.