Please do Preparation before the tutorial!

🎯 Objectives

🔧 Preparation

Install the following packages (if you don’t have them already!)

install.packages(c("plotly", "crosstalk"))

In this tutorial, we are going to attempt to reproduce two interactive graphics from the Our World in Data website on the topic of air pollution.

The two charts we are going to recreate are embedded below.

The charts explore the relationships between the wealth of a country, the proportion of the population that have access to clean cooking fuels and the death rate for children caused by respiratory infections.

Exercise 7A

To begin with let’s try to recreate a static version of the graphic with ggplot2.

fuels <- read_csv("tutorial-07data/access-to-clean-fuels-for-cooking-vs-gdp-per-capita.csv")
# save the column names for plotting labels
plot_labels <- names(fuels)
# then clean up the names with the janitor package
fuels <- fuels %>% 
  janitor::clean_names()

We’ll focus on the most recent available data:

fuels_2016 <- fuels %>% 
  filter(year == 2016)
fuels_2016
## # A tibble: 290 x 7
##    entity   code   year access_to_clean_fue… gdp_per_capita_p… total_population…
##    <chr>    <chr> <dbl>                <dbl>             <dbl>             <dbl>
##  1 Afghani… AFG    2016                32.4              2057.          35383028
##  2 Africa   <NA>   2016                NA                  NA         1213040542
##  3 Africa … <NA>   2016                17.6              3547.                NA
##  4 Africa … <NA>   2016                 9.63             4107.                NA
##  5 Albania  ALB    2016                77.4             12292.           2886427
##  6 Algeria  DZA    2016                92.6             11826.          40551398
##  7 America… ASM    2016                NA                  NA              55739
##  8 Andorra  AND    2016               100                  NA              77295
##  9 Angola   AGO    2016                48.0              7569.          28842482
## 10 Anguilla AIA    2016                NA                  NA              14435
## # … with 280 more rows, and 1 more variable: continent <chr>

You might have noticed there’s a slight issue where continent names are only available in 2015 (the joys of other people’s data!). So we will have to back join to get the continent names for the chart’s legend.

country_continent <- fuels %>% 
  filter(year == 2015, !is.na(continent)) %>% 
  select(entity, code, continent)

# restrict to only countries not upper level
fuels_2016_countries_only <- fuels_2016 %>% 
  select(-continent) %>% 
  left_join(country_continent) %>% 
  filter(!is.na(code)) # drop higher level entities like "Latin America"

We now have the data in place to make a ggplot. Here’s some starter code:

clean_fuels_scatter <- fuels_2016_countries_only %>% 
  ggplot(aes(
    x = gdp_per_capita_ppp_constant_2017_international,
    y = access_to_clean_fuels_and_technologies_for_cooking_percent_of_population, 
    color = continent)) + 
  geom_point()

Modify the code for the basic chart so it has the following properties:

clean_fuels_scatter <- fuels_2016_countries_only %>% 
  ggplot(aes(
    x = gdp_per_capita_ppp_constant_2017_international,
    y = access_to_clean_fuels_and_technologies_for_cooking_percent_of_population, 
    color = continent)
  ) + 
  geom_point() +
  scale_x_log10() + 
  scale_size_continuous(trans = "log10") +
  scale_y_continuous(labels = scales::label_percent(scale = 1)) +
  labs(x = "GDP per captia", 
       y = "Access to clean fuels and technologies for cooking",
       color = "")
clean_fuels_scatter

Now we are ready to make an interactive chart. Play around with the default view and see if you can turn off certain countries.

ggplotly(clean_fuels_scatter)

Modify the tooltip of the interactive chart, so that when you hover a point the country name, percent, GDP (bonus format as dollars), and the population appears. Note that you may have to modify the aesthetics and recreate the ggplot object you made above. There are many ways you could do this, so I would recommend looking over the plotly documentation here.

fuels_2016_countries_only$tooltip <- 
  glue::glue_data(fuels_2016_countries_only, 
                  "Country: {entity}", 
                  "\nPopulation: {scales::label_number_auto()(total_population_gapminder_hyde_un)}",
                  "\nProportion: {scales::percent(access_to_clean_fuels_and_technologies_for_cooking_percent_of_population, scale = 1, accuracy = 1)}",
                  "\nGDP per capita: {scales::dollar(gdp_per_capita_ppp_constant_2017_international)}")


clean_fuels_scatter_w_tooltip <- fuels_2016_countries_only %>% 
  ggplot(aes(
    x = gdp_per_capita_ppp_constant_2017_international,
    y = access_to_clean_fuels_and_technologies_for_cooking_percent_of_population, 
    color = continent)
  ) + 
  geom_point(aes(text = tooltip)) +
  scale_x_log10() + 
  scale_size_continuous(trans = "log10") +
  scale_y_continuous(labels = scales::label_percent(scale = 1)) +
  labs(x = "GDP per captia", 
       y = "Access to clean fuels and technologies for cooking",
       color = "")
  
p <- ggplotly(
  clean_fuels_scatter_w_tooltip,
  tooltip = "text"
)

Disable some of the features of the modebar that appears when you place your cursor over the plot. In particular remove the zoom and pan buttons, and the plotly logo.

p %>% 
  config(displaylogo = FALSE, 
         modeBarButtonsToRemove = c("zoomIn2d", "zoomOut2d", "zoom2d", "pan2d"))

What are some other ways the chart could be improved? As a challenge you could try reproducing the animation of the chart over time.