
Lecturer: Michael Lydeamore
Department of Econometrics and Business Statistics
Aim
Why

🎯 Characterise mpg in terms of wt.
We fit the model:
\[\texttt{mpg}_i = \beta_0 + \beta_1\texttt{wt}_i + e_i\]
fit?fit even contain?List of 12
 $ coefficients : Named num [1:2] 37.29 -5.34
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "wt"
 $ residuals    : Named num [1:32] -2.28 -0.92 -2.09 1.3 -0.2 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ effects      : Named num [1:32] -113.65 -29.116 -1.661 1.631 0.111 ...
  ..- attr(*, "names")= chr [1:32] "(Intercept)" "wt" "" "" ...
 $ rank         : int 2
 $ fitted.values: Named num [1:32] 23.3 21.9 24.9 20.1 18.9 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ assign       : int [1:2] 0 1
 $ qr           :List of 5
  ..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
  .. .. ..$ : chr [1:2] "(Intercept)" "wt"
  .. ..- attr(*, "assign")= int [1:2] 0 1
  ..$ qraux: num [1:2] 1.18 1.05
  ..$ pivot: int [1:2] 1 2
  ..$ tol  : num 1e-07
  ..$ rank : int 2
  ..- attr(*, "class")= chr "qr"
 $ df.residual  : int 30
 $ xlevels      : Named list()
 $ call         : language lm(formula = mpg ~ wt, data = mtcars)
 $ terms        :Classes 'terms', 'formula'  language mpg ~ wt
  .. ..- attr(*, "variables")= language list(mpg, wt)
  .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:2] "mpg" "wt"
  .. .. .. ..$ : chr "wt"
  .. ..- attr(*, "term.labels")= chr "wt"
  .. ..- attr(*, "order")= int 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(mpg, wt)
  .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
 $ model        :'data.frame':  32 obs. of  2 variables:
  ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
  ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
  ..- attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ wt
  .. .. ..- attr(*, "variables")= language list(mpg, wt)
  .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:2] "mpg" "wt"
  .. .. .. .. ..$ : chr "wt"
  .. .. ..- attr(*, "term.labels")= chr "wt"
  .. .. ..- attr(*, "order")= int 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. ..- attr(*, "predvars")= language list(mpg, wt)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
 - attr(*, "class")= chr "lm"Accessing model parameter estimates:
(Intercept)          wt 
  37.285126   -5.344472 (Intercept)          wt 
  37.285126   -5.344472 This gives us the estimates of \(\beta_0\) and \(\beta_1\).
But what about \(\sigma^2\)? Recall \(e_i \sim NID(0, \sigma^2)\).
You can also get a summary of the model object:
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
    Min      1Q  Median      3Q     Max 
-4.5432 -2.3647 -0.1252  1.4096  6.8727 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7446 
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10So how do I extract these summary values out?
# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    37.3      1.88      19.9  8.24e-19
2 wt             -5.34     0.559     -9.56 1.29e-10# A tibble: 32 × 9
   .rownames           mpg    wt .fitted .resid   .hat .sigma .cooksd .std.resid
   <chr>             <dbl> <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>      <dbl>
 1 Mazda RX4          21    2.62    23.3 -2.28  0.0433   3.07 1.33e-2    -0.766 
 2 Mazda RX4 Wag      21    2.88    21.9 -0.920 0.0352   3.09 1.72e-3    -0.307 
 3 Datsun 710         22.8  2.32    24.9 -2.09  0.0584   3.07 1.54e-2    -0.706 
 4 Hornet 4 Drive     21.4  3.22    20.1  1.30  0.0313   3.09 3.02e-3     0.433 
 5 Hornet Sportabout  18.7  3.44    18.9 -0.200 0.0329   3.10 7.60e-5    -0.0668
 6 Valiant            18.1  3.46    18.8 -0.693 0.0332   3.10 9.21e-4    -0.231 
 7 Duster 360         14.3  3.57    18.2 -3.91  0.0354   3.01 3.13e-2    -1.31  
 8 Merc 240D          24.4  3.19    20.2  4.16  0.0313   3.00 3.11e-2     1.39  
 9 Merc 230           22.8  3.15    20.5  2.35  0.0314   3.07 9.96e-3     0.784 
10 Merc 280           19.2  3.44    18.9  0.300 0.0329   3.10 1.71e-4     0.100 
# ℹ 22 more rowsBut how do these functions work?
Rows: 32
Columns: 4
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, …
$ wt  <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.4…
$ vs  <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, …
$ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, …df <- mtcars |> mutate(cyl = factor(cyl),
                        vs = factor(vs))
table1::table1(~ mpg + wt + vs | cyl, data = df)| 4 (N=11) | 6 (N=7) | 8 (N=14) | Overall (N=32) | |
|---|---|---|---|---|
| mpg | ||||
| Mean (SD) | 26.7 (4.51) | 19.7 (1.45) | 15.1 (2.56) | 20.1 (6.03) | 
| Median [Min, Max] | 26.0 [21.4, 33.9] | 19.7 [17.8, 21.4] | 15.2 [10.4, 19.2] | 19.2 [10.4, 33.9] | 
| wt | ||||
| Mean (SD) | 2.29 (0.570) | 3.12 (0.356) | 4.00 (0.759) | 3.22 (0.978) | 
| Median [Min, Max] | 2.20 [1.51, 3.19] | 3.22 [2.62, 3.46] | 3.76 [3.17, 5.42] | 3.33 [1.51, 5.42] | 
| vs | ||||
| 0 | 1 (9.1%) | 3 (42.9%) | 14 (100%) | 18 (56.3%) | 
| 1 | 10 (90.9%) | 4 (57.1%) | 0 (0%) | 14 (43.8%) | 
df <- mutate(mtcars, 
             cyl = as.factor(cyl))
fit1 <- lm(mpg ~ wt + vs + cyl, data = df)
fit2 <- lm(mpg ~ wt + cyl,  data = df)
summary(fit1)
Call:
lm(formula = mpg ~ wt + vs + cyl, data = df)
Residuals:
    Min      1Q  Median      3Q     Max 
-4.6506 -1.3138 -0.4689  1.3457  5.6718 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  33.4197     2.2016  15.179 9.65e-15 ***
wt           -3.2977     0.7838  -4.207 0.000255 ***
vs            0.8598     1.6413   0.524 0.604675    
cyl6         -3.8887     1.5693  -2.478 0.019763 *  
cyl8         -5.1314     2.4533  -2.092 0.046012 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.591 on 27 degrees of freedom
Multiple R-squared:  0.8391,    Adjusted R-squared:  0.8152 
F-statistic: 35.19 on 4 and 27 DF,  p-value: 2.402e-10
Call:
lm(formula = mpg ~ wt + cyl, data = df)
Residuals:
    Min      1Q  Median      3Q     Max 
-4.5890 -1.2357 -0.5159  1.3845  5.7915 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  33.9908     1.8878  18.006  < 2e-16 ***
wt           -3.2056     0.7539  -4.252 0.000213 ***
cyl6         -4.2556     1.3861  -3.070 0.004718 ** 
cyl8         -6.0709     1.6523  -3.674 0.000999 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.557 on 28 degrees of freedom
Multiple R-squared:  0.8374,    Adjusted R-squared:   0.82 
F-statistic: 48.08 on 3 and 28 DF,  p-value: 3.594e-11| Characteristic | 
 | 
 | ||||
|---|---|---|---|---|---|---|
| Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
| wt | -3.3 | -4.9, -1.7 | <0.001 | -3.2 | -4.7, -1.7 | <0.001 | 
| vs | 0.86 | -2.5, 4.2 | 0.6 | |||
| cyl | ||||||
| 4 | — | — | — | — | ||
| 6 | -3.9 | -7.1, -0.67 | 0.020 | -4.3 | -7.1, -1.4 | 0.005 | 
| 8 | -5.1 | -10, -0.10 | 0.046 | -6.1 | -9.5, -2.7 | <0.001 | 
| 1 CI = Confidence Interval | ||||||
This depends on what you want to convey and your audience.
There are two key purposes of the table:
In general, tables tend to be about display of information and graphs are preferred for communication.
However, if precision matters then tables can be better at communicating this than graphs.
| name | Mean | SD | 1. | 2. | 
|---|---|---|---|---|
| mpg | 20.09062 | 6.0269481 | ||
| wt | 3.21725 | 0.9784574 | -0.8676594 | |
| hp | 146.68750 | 68.5628685 | -0.7761684 | 0.6587479 | 
options(knitr.kable.NA = '')
mtcars |>
  select(mpg, wt, hp) |>
  pivot_longer(everything()) |>
  mutate(name = factor(name, levels = c("mpg", "wt", "hp"))) |>
  group_by(name) |>
  summarise(
    Mean = mean(value),
    SD = sd(value)
  ) |>
  mutate(
    `1.` = case_when(
      name == "wt" ~ cor(mtcars$mpg, mtcars$wt),
      name == "hp" ~ cor(mtcars$mpg, mtcars$hp),
      TRUE ~ NA_real_
    ),
    `2.` = case_when(
      name == "hp" ~ cor(mtcars$wt, mtcars$hp),
      TRUE ~ NA_real_
    )
  ) |>
  knitr::kable(table.attr = "class='cor-table'") |> 
  kable_classic(full_width = FALSE)| cyl/gear | 3 | 4 | 5 | Total | 
|---|---|---|---|---|
| 4 | 3.1% (1) | 25.0% (8) | 6.2% (2) | 34.4% (11) | 
| 6 | 6.2% (2) | 12.5% (4) | 3.1% (1) | 21.9% (7) | 
| 8 | 37.5% (12) | 0.0% (0) | 6.2% (2) | 43.8% (14) | 
| Total | 46.9% (15) | 37.5% (12) | 15.6% (5) | 100.0% (32) | 
library(janitor)
mtcars |> 
  tabyl(cyl, gear) |> 
  adorn_totals(where = c("row", "col")) |> 
  adorn_percentages("all") |>
  adorn_pct_formatting(digits = 1) |>
  adorn_ns() |> 
  adorn_title("combined") |> 
  knitr::kable(table.attr = "class='xtab'") |> 
  kable_styling() |> 
  row_spec(4, extra_css = "border-top: 1px solid black;") |> 
  kable_classic(full_width = FALSE)*.| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| supp | 1 | 205.350 | 205.35000 | 15.571979 | 0.0002312 | 
| factor(dose) | 2 | 2426.434 | 1213.21717 | 91.999965 | 0.0000000 | 
| supp:factor(dose) | 2 | 108.319 | 54.15950 | 4.106991 | 0.0218603 | 
| Residuals | 54 | 712.106 | 13.18715 | 
| 
Weight
 | |||
|---|---|---|---|
| Car brand | 5 d.p. | 2 d.p. | 0 d.p. | 
| Mazda RX4 | 2.61937 | 2.62 | 3 | 
| Mazda RX4 Wag | 2.87518 | 2.88 | 3 | 
| Datsun 710 | 2.31916 | 2.32 | 2 | 
| Hornet 4 Drive | 3.21660 | 3.21 | 3 | 
| 
Trailing zeroes
 | |
|---|---|
| Yes | No | 
| 0.233 | 0.233 | 
| 0.320 | 0.32 | 
| 0.400 | 0.4 | 
| 0.343 | 0.343 | 
| Car brand | Weight (lbs) | Weight (1000 lbs) | Weight (kg) | Weight (g) | Weight (mg) | 
|---|---|---|---|---|---|
| Mazda RX4 | 2620 | 2.620 | 1188 | 1190000 | 1,188,000,000 | 
| Mazda RX4 Wag | 2875 | 2.875 | 1304 | 1300000 | 1,304,000,000 | 
| Datsun 710 | 2320 | 2.320 | 1052 | 1050000 | 1,052,000,000 | 
| Hornet 4 Drive | 3215 | 3.215 | 1458 | 1460000 | 1,458,000,000 | 
1000000 is harder to read than 1,000,000.| 
Car brand
 | 
Horsepower
 | ||||
|---|---|---|---|---|---|
| Left | Center | Right | Left | Center | Right | 
| Mazda RX4 | Mazda RX4 | Mazda RX4 | 110 | 110 | 110 | 
| Mazda RX4 Wag | Mazda RX4 Wag | Mazda RX4 Wag | 110 | 110 | 110 | 
| Datsun 710 | Datsun 710 | Datsun 710 | 93 | 93 | 93 | 
| Hornet 4 Drive | Hornet 4 Drive | Hornet 4 Drive | 110 | 110 | 110 | 
| car | wt | disp | 
|---|---|---|
| Mazda RX4 | 2620 lbs | 160 cubic inches | 
| Mazda RX4 Wag | 2875 lbs | 160 cubic inches | 
| Datsun 710 | 2320 lbs | 108 cubic inches | 
| Hornet 4 Drive | 3215 lbs | 258 cubic inches | 
| Hornet Sportabout | 3440 lbs | 360 cubic inches | 
| Valiant | 3460 lbs | 225 cubic inches | 
| Car brand | Weight (1000 lbs) | Displacement (cubic inches) | 
|---|---|---|
| Mazda RX4 | 2.620 | 160 | 
| Mazda RX4 Wag | 2.875 | 160 | 
| Datsun 710 | 2.320 | 108 | 
| Hornet 4 Drive | 3.215 | 258 | 
| Hornet Sportabout | 3.440 | 360 | 
| Valiant | 3.460 | 225 | 
There are many packages that make table in R, including ones that wrangle the data for you to make specialised table output. E.g. knitr::kable, kableExtra, formattable, gt, DT, pander, xtable, stargazer.
You can read the documentation for each packages to make the table you want (I mainly use knitr::kable, kableExtra and DT).
Whatever package you use, it’s important that you understand the output and how it works with the medium you are trying to display the table.
In markdown file:
First Header  | Second Header
------------- | -------------
Content Cell  | Content Cell
Content Cell  | Content CellPossible display:
| First Header | Second Header | 
|---|---|
| Content Cell | Content Cell | 
| Content Cell | Content Cell | 
HTML → Web browser
<table>
<thead>
<tr>
<th>First Header</th> <th>Second Header</th>
</tr>
</thead>
<tbody>
<tr>
<td>Content Cell</td> <td>Content Cell</td>
</tr>
<tr>
<td>Content Cell</td> <td>Content Cell</td>
</tr>
</tbody>
</table>gtSource: https://gt.rstudio.com/
gtgtgtlibrary(gt)
df <- select(head(mtcars), wt, disp, cyl)
gt(df) |> 
  tab_header(title = "Motor Trend Car Road Tests", 
             subtitle = "Design and performance of 1973-74 automobile models")| Motor Trend Car Road Tests | ||
|---|---|---|
| Design and performance of 1973-74 automobile models | ||
| wt | disp | cyl | 
| 2.620 | 160 | 6 | 
| 2.875 | 160 | 6 | 
| 2.320 | 108 | 4 | 
| 3.215 | 258 | 6 | 
| 3.440 | 360 | 8 | 
| 3.460 | 225 | 6 | 
gtlibrary(gt)
df <- select(head(mtcars), wt, disp, cyl)
gt(df) |> 
  tab_header(title = "Motor Trend Car Road Tests",
             subtitle = "Design and performance of 1973-74 automobile models") |> 
  tab_source_note(md("Source: 1974 *Motor Trend* US magazine"))| Motor Trend Car Road Tests | ||
|---|---|---|
| Design and performance of 1973-74 automobile models | ||
| wt | disp | cyl | 
| 2.620 | 160 | 6 | 
| 2.875 | 160 | 6 | 
| 2.320 | 108 | 4 | 
| 3.215 | 258 | 6 | 
| 3.440 | 360 | 8 | 
| 3.460 | 225 | 6 | 
| Source: 1974 Motor Trend US magazine | ||
gtlibrary(gt)
df <- select(head(mtcars), wt, disp, cyl)
gt(df) |> 
  tab_header(title = "Motor Trend Car Road Tests",
             subtitle = "Design and performance of 1973-74 automobile models") |> 
  tab_source_note(md("Source: 1974 *Motor Trend* US magazine")) |> 
  cols_label( 
    wt = html("Weight<br>(1000lbs)"), 
    disp = html("Displacement<br> (inch<sup>3</sup>)"), 
    cyl = html("Number of<br>cylinders") 
  ) | Motor Trend Car Road Tests | ||
|---|---|---|
| Design and performance of 1973-74 automobile models | ||
| Weight (1000lbs) | Displacement (inch3) | Number of cylinders | 
| 2.620 | 160 | 6 | 
| 2.875 | 160 | 6 | 
| 2.320 | 108 | 4 | 
| 3.215 | 258 | 6 | 
| 3.440 | 360 | 8 | 
| 3.460 | 225 | 6 | 
| Source: 1974 Motor Trend US magazine | ||
DTDT::datatable works best for HTML documents.webshot of the HTML table and inserts it as an image.formattable| Variables | mpg | disp | wt | hp | 
|---|---|---|---|---|
| mpg | 1.0000000 | -0.8475514 | -0.8676594 | -0.7761684 | 
| disp | -0.8475514 | 1.0000000 | 0.8879799 | 0.7909486 | 
| wt | -0.8676594 | 0.8879799 | 1.0000000 | 0.6587479 | 
| hp | -0.7761684 | 0.7909486 | 0.6587479 | 1.0000000 | 
sparkline| Australian wool yarn production — sparklines by decade | |||||
|---|---|---|---|---|---|
| Sparkline: quarterly values within each decade · 5-number summary at right | |||||
| Decade | Last | Min | Max | Median | |
| 1960s | 7,402 | 6,172 | 7,402 | 6,858 | |
| 1970s | 5,079 | 3,324 | 7,819 | 5,618 | |
| 1980s | 4,868 | 3,642 | 6,590 | 5,178 | |
| 1990s | 6,396 | 3,827 | 6,396 | 4,868 | |
Summary

ETC5523 Week 5A