.circle.w2.h2.monash-bg-blue.white[**1**] Working with model objects

.circle.w2.h2.monash-bg-blue.white[**2**] Presenting information as a table ] --- # 📈 Statistical models * All models are approximations of the unknown data generating process - whether how good of an approximation depends on the collected data and the model choice. -- .grid[ .item[ # .blue[Example 1] ```{r, echo = FALSE, fig.height = 4, fig.width = 6} ggplot(mtcars, aes(wt, mpg)) + geom_point() + labs(y = "Miles per gallon\n(mpg)", x = "Weight\n(wt)", title = "Motor Trend Car Road Tests") ``` **Aim**: characterise `mpg` in terms of `wt`. ] .item[ {{content}} ] ] --

* We fit the model: $$\texttt{mpg}_i = \beta_0 + \beta_1\texttt{wt}_i + e_i$$

- \(\texttt{mpg}_i\) is the miles per gallon of the \(i\)-th car,
- \(\texttt{wt}_i\) is the weight of the \(i\)-th car,
- \(\beta_0\) is the intercept,
- \(\beta_1\) is the slope, and
- \(e_i\) is random error, usually assumed \(e_i \sim NID(0, \sigma^2)\)

So how do I get these summary values out?

But how do these functions work?

.pl3[ ```{r} f1 <- function(x) sum(x) / length(x) ``` ]

] .w-40.pl3[ {{content}} ] ] .footnote.f4[ Wickham (2019) Advanced R, 2nd edition, Chapman & Hall ] -- ```{r} formals(f1) body(f1) environment(f1) ``` --- count: false # Functions .f4[Part 2/3] .flex[ .w-50[ using the function ```{r avg1, eval = FALSE} x1 <- c(1, 1, 2, 2) f1(x1) ``` {{content}} ] .w-50[ ] ] -- ```{r avg1, echo = FALSE} ``` {{content}} -- what if there are missing values in the vector? ```{r avg2, eval = FALSE} x2 <- c(1, 1, 2, 2, NA) f1(x2) ``` {{content}} -- ```{r avg2, echo = FALSE} ``` {{content}} -- what about on a vector of date objects? ```{r avg3, eval = FALSE} x3 <- as.Date(c("2021-08-04", "2021-08-11")) f1(x3) ``` {{content}} -- ```{r avg3, echo = FALSE, error = TRUE} ``` --- # Functions .f4[Part 2/3] .flex[ .w-50.br[ using the function ```{r avg1} ``` what if there are missing values in the vector? ```{r avg2} ``` what about on a vector of date objects? ```{r avg3, error = TRUE} ``` ] .w-50.pl3[ ```{r avg} f2 <- function(x, na.rm = TRUE) { n <- sum(!is.na(x)) sum(x, na.rm = na.rm) / n } ``` {{content}} ] ] -- ```{r, error = TRUE} f2(x1) f2(x2) f2(x3) ``` --- # Functions .f4[Part 3/3] ```{r, error = TRUE} f3 <- function(x, na.rm = TRUE) { n <- sum(!is.na(x)) out <- sum(as.numeric(x), na.rm = na.rm) / n if(class(x)=="Date") { #<< return(as.Date(out, #<< origin = "1970-01-01")) #<< } #<< out } ``` -- .flex[ .w-33[ ```{r} f3(x1) ``` ] .w-33[ ```{r} f3(x2) ``` ] .w-33[ ```{r} f3(x3) ``` ] ] -- * What about for another object class? ```{r} x4 <- as.POSIXct(c("2021-08-11 18:00", "2021-08-11 20:00"), tz = "UTC") ``` --- # .gray[S3] Object oriented programming (OOP) .f4[Part 1/3] * The **_S3 system_** is the most widely used OOP system in R but there are other OOP systems in R, e.g. the S4 system is used for model objects in `lme4` R-package, but it will be out of scope for this unit .flex[ .w-35.br[ ```{r} class(x1) class(x2) class(x3) class(x4) ``` ] .w-65[ {{content}} ] ] -- * Here I create a generic called `f4`: ```{r} f4 <- function(x, ...) UseMethod("f4") ``` {{content}} -- * And an associated default **method**: ```{r} f4.default <- function(x, na.rm = TRUE) { sum(x, na.rm = na.rm) / sum(!is.na(x)) } ``` {{content}} -- * And an associated specific **method** for the `Date` class: ```{r} f4.Date <- function(x, na.rm = TRUE) { out <- f4.default(as.numeric(x), na.rm = na.rm) as.Date(out, origin = "1970-01-01") } ``` --- # .gray[S3] Object oriented programming (OOP) .f4[Part 2/3] .flex[ .w-35[ ```{r} f4(x1) f4(x2) f4(x3) ```

{{content}} ] ] -- ```{r, error = TRUE} class(x4) ``` --- count: false # .gray[S3] Object oriented programming (OOP) .f4[Part 2/3] .flex[ .w-35[ ```{r} f4(x1) f4(x2) f4(x3) ```

```{r} class(x4) ``` ] .w-65.bl.pl3[ ```{r} f4.POSIXct <- function(x, na.rm = TRUE) { out <- f4.default(as.numeric(x), na.rm = na.rm) as.POSIXct(out, tz = attr(x, "tzone"), origin = "1970-01-01") } ``` {{content}} ] ] -- ```{r} f4(x4) ``` --- # .gray[S3] Object oriented programming (OOP) .f4[Part 3/3] * A method is created by using the form `generic.class`. * When using a method for `class`, you can omit the `.class` from the function. * E.g. `f4(x4)` is the same as `f4.POSIXct(x4)` since the class of `x4` is `POSIXct` (and `POSIXt`). -- * But notice `f4.numeric` doesn't exist, instead there is `f4.default`. * `default` is a special class and when a generic doesn't have a method for the corresponding class, it falls back to `generic.default` --- class: transition # Designing functions

*for humans* --- # Human-centered design .info-box.w-60[ **Syntactic sugar** means using function name or syntax in a programming language that is designed to make things _**easier to read or to express for humans**_. ] -- * We used functions names `f1`, `f2`, `f3` and `f4` previously -- * The function was trying to calculate the average (or sample mean) -- * We could have called the function `average` say -- .f2.w-60.pa3.monash-bg-blue.white[ Keep in mind that a human reads your code, so write your function in a way that reads and works well for humans ] --- class: transition # Working with *model objects*

in --- # Modelling in R * There are many R-packages that fit all kinds of models, e.g. * `mgcv` fits generalized additive models, * `glmnet` fits a generalised linear models with elastic net, * `rstanarm` fits Bayesian regression models using Stan, * `BLR` also fits Bayesian linear regression models, and * `fable` .f4[(written by your tutor Mitch O'Hara-Wild)] fits forecast models, * many other contributions by the community. -- .w-70[ * There are a lot of new R-packages contributed — some implementing the latest research results * This means that if you want to use the state-of-the-art research, then you need to work with model objects beyond the standard `lm` and `glm` ] --- # Example with Baysian regression ```{r, message = FALSE, warning = FALSE, results='hide', error = TRUE} library(rstanarm) fit_stan <- stan_lm(mpg ~ 1 + wt, data = mtcars, prior = R2(0.7528, what = "mean")) ``` -- ```{r, message = FALSE, warning = FALSE, error = TRUE} broom::tidy(fit_stan) ``` -- Huh? -- ```{r} coef(fit_stan) ``` -- Okay this works, but why not for `broom::tidy`? --- # .gray[S3] Object classes * So how do you find out the functions that work with model objects? -- * First notice the class of the object `fit`: .f4[ ```{r} class(fit) ``` ] -- * The methods associated with this can be found using: .f4.overflow-scroll.h5[ ```{r} methods(class = "lm") ``` ] --

Where is

---
# .gray[S3] Inspecting functions
* Let's have a look inside of `coef` function:
.item[
```{r}
coef
```
]
--
`coef()`

?
Try also looking inside

---
# Case study .circle.white.bg-black[1] `broom::tidy` .f4[Part 1/2]
```{r}
class(fit_stan)
```
--
```{r}
broom::tidy
```
--
There is no `tidy.stanreg` method so uses the `broom:::tidy.glm` instead.
---
# Case study .circle.white.bg-black[1] `broom::tidy` .f4[Part 2/2]
```{r}
library(broom)
tidy.stanreg <- function(x, ...) {
est <- x$coefficients
tibble(term = names(est),
estimate = unname(est),
std.error = x$ses)
}
tidy(fit_stan)
```
---
# Working with model objects
* Is this only for R though? How do you work with model objects in general?
--
## Python
```{python}
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array(r.mtcars['wt']).reshape(-1, 1)
y = np.array(r.mtcars['mpg'])
model = LinearRegression().fit(x, y)
```
--
```{python}
[model.intercept_, model.coef_]
```
--
.border-box[
```{r, echo = FALSE}
coef(fit) # in R
```
]
---
class: middle
.w-70.f2[
* Model objects are usually a `list` returning multiple output from the model fit
{{content}}
]
--
* When working with model objects, check the object structure and find the methods associated with it {{content}}
--
(and of course check the documentation)
{{content}}
--
`lm`

You should be able to work (or at least know how to get started) with all sort of model objects

---
```{r endslide, child="assets/endslide.Rmd"}
```