Multiple Regression & Fundamentals of Causal Inference

class: center, middle, inverse, title-slide

.title[
# Multiple Regression & <br><br> Fundamentals of Causal Inference
]
.subtitle[
## Randomized Controlled Trials (RCT) <br><br> The Gold Standard of Causal Inference
]
.author[
### Merlin Schaeffer<br> Department of Sociology
]
.date[
### 2024-10-09
]

---

class: clear

<img src="https://pbs.twimg.com/media/EouSsOKUUAM0P_y.jpg" width="60%" style="display: block; margin: auto;" />
.backgrnote[.center[*Source:* Polack, Thomas, Kitchin, Absalon, Gurtman, Lockhart, Perez, Pérez Marc, Moreira, Zerbini, Bailey, Swanson, Roychoudhury, Koury, Li, Kalina, Cooper, Frenck, Hammitt, Türeci, Nell, Schaefer, Ünal, Tresnan, Mather, Dormitzer, Şahin, Jansen, and Gruber (2020)]]

---
# Goal of empirical sociology

.font130[.center[Use data to discover patterns, <br> and the .alert[social mechanisms that bring them about.]]]

---
class: inverse middle
# Today's schedule

1. **Today's research question**: The Integration Paradox

2. **Experiments**
  + Randmozied Controlled Trials (RCTs)
  + RCTs, potential outcomes, and DAGs

---
class: inverse
# The Integration Paradox .font60[Research question of the day]

.right-column[
<img src="img/Shocking.jpeg" width="60%" style="display: block; margin: auto;" />

<img src="img/Steinmann3.png" width="100%" style="display: block; margin: auto;" />
.font70[.center[*Source:* Steinmann (2019)]]
]

.left-column[.center[.font110[
**Does news media consumption** <br><br> _**increase**_ <br><br> **immigrant minorities' reports of discrimination?**
]]]

---
# Preparation

.panelset[
.panel[.panel-name[Packages for today's session]

``` r
pacman::p_load( # Load several R packages using the pacman package manager
  tidyverse,  # A collection of packages for data manipulation and visualization
  ggplot2,    # Powerful package for creating static, animated and interactive visualizations
  estimatr,   # Package for fast estimators for regression with weighted data
  modelr,     # Provides functions for modelling and prediction
  kableExtra, # Enhances table creation in R
  modelsummary) # Creates tables and plots to summarize statistical models
```
]
.panel[.panel-name[The APAD survey]
.left-column[.font80[
- Schaeffer, Kas, and Hagedorn (2023)
- 1093 Immigrants and children of immigrants.
- Berlin, Hamburg, Munich, Frankfurt, and Cologne.
- Interviewed August 2021.
- Financed by [German Research Council (DFG)](https://gepris.dfg.de/gepris/projekt/428878477?language=en)
]]

.right-column[.font80[
1. > On a typical day, about how much time do you spend watching, reading, or listening to news about politics and current affairs? *Please give your answer in hours and then minutes.*

2. > Now we would like to ask you about discrimination. How often were you personally discriminated in the following situations here in Germany? .backgrnote[
Discrimination means that a person is treated worse than others including specific reasons for this behavior and no factual justification. People use different modes of discrimination like insult, ostracism, or sexual harassment. Rules and laws disadvantaging people are also discrimination.]
> ... When looking for work or an apprenticeship<br>
> ... At work / in professional life<br>
> ... While attending school or higher education<br>
> ... When looking for housing<br>
> ... When having contact with government officials or public administrators<br>
> ... When you were out in public during your free time<br><br>
> (1) Never, (2) Rarely, (3) Sometimes, (4) Often, (5) Very often<br>
]]
]

.panel[.panel-name[Get the APAD data]
.push-left[.font80[

``` r
load("APAD.RData") # Load APAD dataset
```

``` r
APAD <- APAD %>% mutate( # Process APAD data
  # Convert news consumption to minutes
  # Example: 2 hours and 30 minutes becomes 2*60 + 30 = 150 minutes
  news = news_hrs*60 + news_mins, 
  # Create binary variable for news consumption
  news_yn = case_when(
    news < 15 ~ 0,  # 0 if less than 15 minutes
    news >= 15 ~ 1, # 1 if 15 minutes or more
    TRUE ~ as.numeric(NA)), # Handle other cases as missing
  # Calculate average discrimination index across multiple domains
  dis_index = rowMeans( # Calculate the mean for each row (participant)
    select(., # Choose specific discrimination columns from "." (APAD)
           dis_trainee, dis_job, dis_school, 
           dis_house, dis_gov, dis_public),
    na.rm = TRUE), # Ignore NA values
  # Standardize discrimination index
  # scale() standardizes; as.numeric() converts to numeric vector
  z_dis_index = scale(dis_index) %>% as.numeric()
)
```
]]

.push-right[.font80[
<br>

```
# # A tibble: 1,093 × 24
#    antidiscr_law dis_index news_hrs news_mins dis_trainee dis_job dis_school dis_house dis_gov dis_public gewFAKT
#            <dbl>     <dbl>    <dbl>     <dbl>       <dbl>   <dbl>      <dbl>     <dbl>   <dbl>      <dbl>   <dbl>
#  1             5      3.17        1         0           3       3          4         5       2          2   0.105
#  2             4      1.8         1         1           5      NA          1         1       1          1   0.103
#  3             3      2           0        30           1       2          1         3       3          2   0.802
#  4             5      2.83        0        10           3       3          2         5       1          3   0.104
#  5             4      3.2         0        30           2       3         NA         5       3          3   0.639
#  6             5      2.75        2         0          NA       2         NA         4       3          2   0.2  
#  7             2      1           3        58           1       1          1         1       1          1   2.42 
#  8             4      1.67        5         0           3       1          1         3       1          1   0.102
#  9             5      4           1        15           4       4          4         4       4          4   6.19 
# 10             4      1.33        0        30           1       2          1         1       1          2   2.08 
# # ℹ 1,083 more rows
# # ℹ 13 more variables: gender <fct>, age <dbl>, imor <fct>, german <fct>, nbh_exposed <dbl>, appearance <fct>,
# #   article <chr>, ment_happy <dbl>, leftright <dbl>, gen_trust <dbl>, news <dbl>, news_yn <dbl>,
# #   z_dis_index <dbl>
```
]]

]]

---
# Fruitless\naïv comparison

.push-left[
<img src="6-RCTees_files/figure-html/naiv-1.png" width="98%" style="display: block; margin: auto;" />
]

.push-right[.font80[

``` r
ols <- lm_robust(dis_index ~ news_yn, # Run weighted OLS
                 weight = gewFAKT, data = APAD)

modelsummary( # Create summary table of OLS results
  list("Discr." = ols), # List of OLS model objects
  stars = TRUE, # Indicate significance level
  # Choose Goodness of Fit indicators
  gof_map = c("nobs", "r.squared"), 
  output = "kableExtra") # Format as HTML
```

]]

---
# Why? Selection bias!

`$$\begin{equation} \begin{split}
\underbrace{Avg_{n}[Y_{1i}|D_{i} = 1] - Avg_{n}[Y_{0i}|D_{i} = 0]}_{\text{Difference in observed group means}} = \underbrace{Avg_{n}[Y_{1i}|D_{i} = 1] \color{gray}{(-  Avg_{n}[Y_{0i}|D_{i} = 1]}}_{\text{Average causal effect } among \text{ } the \text{ } treated} \color{gray}{+} \underbrace{\color{gray}{Avg_{n}[Y_{0i}|D_{i} = 1])} -  Avg_{n}[Y_{0i}|D_{i} = 0]}_{\text{Selection bias}}.
\end{split} \end{equation}$$`

.content-box-red[.center[
`$\text{Selection bias} = \underbrace{Avg_{n}[Y_{0i} | D_{i} = 1]}_{\text{Unobserved!}} - Avg_{n}[Y_{0i} | D_{i} = 0].$`

`$\rightarrow$` The difference in `$Avg_{n}(Y_{0i})$`, the baseline of the outcome, between the groups we compare.
]]

---
# (Im-)balance .font70[.alert[of oberserved variables!]]

.panelset[
.panel[.panel-name[R code]

``` r
APAD %>% # Start with the APAD dataset, then pipe
  # Select specific variables for the balance test
  select(news_yn, age, nbh_exposed, imor, german, gewFAKT) %>%
* # Rename the weights variable so that the following
* # command (datasummary_balance) automatically treats it as a weight
* rename(weights = gewFAKT) %>%
* # Create a balance table
* datasummary_balance(
*   # Formula specifies to compare groups based on news_yn
*   formula = ~ news_yn,
*   data = ., # Use the data piped in from above
*   # Provide a title for the table
*   title = "Socio-demographic characteristics of those who read news and those who do not",
*   output = "kableExtra"# Specify the output format as kableExtra
* )
```

]
.panel[.panel-name[Balance table]
<table class="table" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
<caption>Socio-demographic characteristics of those who read news and those who do not</caption>
 <thead>
<tr>
<th style="empty-cells: hide;border-bottom:hidden;" colspan="2"></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">0</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">1</div></th>
<th style="empty-cells: hide;border-bottom:hidden;" colspan="2"></th>
</tr>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;">    </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> Std. Dev. </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> Std. Dev. </th>
   <th style="text-align:right;"> Diff. in Means </th>
   <th style="text-align:right;"> Std. Error </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> age </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:right;"> 36.7 </td>
   <td style="text-align:right;"> 12.3 </td>
   <td style="text-align:right;"> 43.8 </td>
   <td style="text-align:right;"> 15.4 </td>
   <td style="text-align:right;"> 7.1 </td>
   <td style="text-align:right;"> 2.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;box-shadow: 0px 1.5px"> nbh_exposed </td>
   <td style="text-align:left;box-shadow: 0px 1.5px">  </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 4.2 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 0.8 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 4.2 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 1.0 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 0.0 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 0.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:right;"> N </td>
   <td style="text-align:right;"> Pct. </td>
   <td style="text-align:right;"> N </td>
   <td style="text-align:right;"> Pct. </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> imor </td>
   <td style="text-align:left;"> Child of immigrant </td>
   <td style="text-align:right;"> 64 </td>
   <td style="text-align:right;"> 29.4 </td>
   <td style="text-align:right;"> 399 </td>
   <td style="text-align:right;"> 45.6 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> Immigrant </td>
   <td style="text-align:right;"> 154 </td>
   <td style="text-align:right;"> 70.6 </td>
   <td style="text-align:right;"> 476 </td>
   <td style="text-align:right;"> 54.4 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> german </td>
   <td style="text-align:left;"> Ja </td>
   <td style="text-align:right;"> 89 </td>
   <td style="text-align:right;"> 40.8 </td>
   <td style="text-align:right;"> 561 </td>
   <td style="text-align:right;"> 64.1 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> Nein </td>
   <td style="text-align:right;"> 129 </td>
   <td style="text-align:right;"> 59.2 </td>
   <td style="text-align:right;"> 314 </td>
   <td style="text-align:right;"> 35.9 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
</tbody>
</table>

]]

---
# Directed Acyclical Graphs (DAG)

.backgrnote[
The red bi-directed arrow is officially not part of the DAG. But helps it helps us to understand that the correlation between Discrimination and Reading the news is not causal in nature. The reason for this non-causal relationship is the existence of a backdoor path. The backdoor path is opened by German citizenship, which influences whether people read the news but also how much discrimination they experience.
]

---
class: inverse middle center
# Experiments

---
layout: true
# Experiment

.left-column[
- We do not *passively observe*, <br> we .alert[actively intervene].

- Thereby *we* control, who gets `$D$` and who does not!

<img src="./img/randomization1.png" width="80%" style="display: block; margin: auto;" />
]

---

.push-right[
<img src="6-RCTees_files/figure-html/DAG3-1.png" width="80%" style="display: block; margin: auto;" />
]

---

.push-right[
<img src="6-RCTees_files/figure-html/unnamed-chunk-12-1.png" width="70%" style="display: block; margin: auto;" />

.content-box-green[.center[
How can we `$\color{red}{I}$`ntervene to eliminate correlation between the treatment `$D$` and potential confounders `$C$`?
]]]

---
layout: false
class: middle center
background-image: url("https://gummibaerenland.de/cdn/shop/products/212669_lakritz_schnecken_1.jpg?v=1649245056&width=1200")
background-position: center
background-size: cover

---
layout: false
class: inverse middle center
# Break

---
class: middle clear

.left-column[
<img src="https://www.laserfiche.com/wp-content/uploads/2014/10/femalecoder.jpg" width="80%" style="display: block; margin: auto;" />

.right-column[
<br>
<iframe src='exercise1.html' width='1000' height='600' frameborder='0' scrolling='yes'></iframe>
]

---
layout: false
class: inverse middle center
# Break

---
layout: true
# Randomized Controlled Trials (RCT)

.push-left[

- We .alert[randomly] decide, who gets: 
  + `$\text{Read news = 0} \rightarrow$` *Control* group, 
  + `$\text{Read news = 1} \rightarrow$` *Treatment* group.
]

---

.push-right[

Remember, randomization is "fair":

Everyone has the same probability to be part of the treatment or control group, .alert[regardless of who they are]!

.content-box-green[
Why does this result in equal `$Y_{0i}$` baselines?
`$$Avg_{n}[Y_{0i}|\text{News} = 1] = Avg_{n}[Y_{0i}|\text{News} = 0]$$`
]]

---
layout: false
# Randomized Controlled Trials (RCTs)

.push-left[

If we *randomly* divide subjects into treatment and control groups, .alert[they come from the same underlying population]. 
  <br> <br> `$\rightarrow$` They will be similar, on average, *in every way*;<br> **including their `$Y_{0}$` **!
  <br> <br> `$\rightarrow E[Y_{0i}|D=1] = E[Y_{0i}|D=0]$`!
  
**Beware**, in practice randomization can fail, especially if your sample is small.
]

.push-right[
<img src="./img/randomization2.png" width="90%" style="display: block; margin: auto;" />
]

---
# RCTs and potential outcomes

.push-left[
<img src="./img/randomization2.png" width="60%" style="display: block; margin: auto;" />

If we *randomly* divide subjects into treatment and control groups, .alert[they come from the same underlying population]. 
    <br> <br> `$\rightarrow$` They will be similar, on average, *in every way*;<br> **including their `$Y_{0}$` **!
    <br> <br> `$\rightarrow E[Y_{0i}|D=1] = E[Y_{0i}|D=0]$`!
    
**Beware**, in practice randomization can fail, especially if your sample is small.
]

.push-right[

`$$\begin{equation}\begin{split} &  \underbrace{E[Y_{1i}|D=1] - E[Y_{0i}|D=0]}_{\text{Comparison between treatment and control group}} \\  \\ & = E[Y_{0i} + \color{red}{\kappa} |D=1] - E[Y_{0i}|D=0], \\ \\ &= \color{red}{\kappa} + \underbrace{E[Y_{0i} |D=1] - E[Y_{0i}|D=0]}_{\underbrace{0}_{\text{Selection bias (if randomization has worked)}}}, \\ \\ & = \underbrace{\color{red}{\kappa}.}_{\text{The average causal effect}} \end{split}\end{equation}$$`
]

---
# RCTs and DAGs

.left-column[
Because of the randomization, there is no backdoor path. That is, no path from `$I$` to `$Y$` that starts with an arrow into `$I$`.

`$\Rightarrow$` No selection/confounder bias!

**Beware**, in practice randomization can fail, especially if your sample is small.
]

.right-column[
<img src="6-RCTees_files/figure-html/unnamed-chunk-18-1.png" width="70%" style="display: block; margin: auto;" />
]

---
layout: true
class: clear
.push-left[
<br>

.font160[.center[**APAD survey experiment**]]

We asked APAD subjects to read a news article.

<br>

We .alert[randomly] decided, who got:

+ Venus `$\rightarrow$` *Control* group, 
  
  + Discrimination `$\rightarrow$` **_Treatment_ group 1**,
  
  + Acculturation `$\rightarrow$` **_Treatment_ group 2**.
]

---

.push-right[
<img src="./img/Contr.png" width="100%" style="display: block; margin: auto;" />
]

---

.push-right[
<img src="./img/Exp2.png" width="89%" style="display: block; margin: auto;" />
]

---

.push-right[
<img src="./img/Exp1.png" width="75%" style="display: block; margin: auto;" />
]

---
layout: false
# Balance test

.panelset[
.panel[.panel-name[Balance table (surveyed news reading)]
<table class="table" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
<caption>Socio-demographic characteristics of those who read news and those who do not</caption>
 <thead>
<tr>
<th style="empty-cells: hide;border-bottom:hidden;" colspan="2"></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">0</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">1</div></th>
<th style="empty-cells: hide;border-bottom:hidden;" colspan="2"></th>
</tr>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;">    </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> Std. Dev. </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> Std. Dev. </th>
   <th style="text-align:right;"> Diff. in Means </th>
   <th style="text-align:right;"> Std. Error </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> age </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:right;"> 36.7 </td>
   <td style="text-align:right;"> 12.3 </td>
   <td style="text-align:right;"> 43.8 </td>
   <td style="text-align:right;"> 15.4 </td>
   <td style="text-align:right;"> 7.1 </td>
   <td style="text-align:right;"> 2.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;box-shadow: 0px 1.5px"> nbh_exposed </td>
   <td style="text-align:left;box-shadow: 0px 1.5px">  </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 4.2 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 0.8 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 4.2 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 1.0 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 0.0 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 0.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:right;"> N </td>
   <td style="text-align:right;"> Pct. </td>
   <td style="text-align:right;"> N </td>
   <td style="text-align:right;"> Pct. </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> imor </td>
   <td style="text-align:left;"> Child of immigrant </td>
   <td style="text-align:right;"> 64 </td>
   <td style="text-align:right;"> 29.4 </td>
   <td style="text-align:right;"> 399 </td>
   <td style="text-align:right;"> 45.6 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> Immigrant </td>
   <td style="text-align:right;"> 154 </td>
   <td style="text-align:right;"> 70.6 </td>
   <td style="text-align:right;"> 476 </td>
   <td style="text-align:right;"> 54.4 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> german </td>
   <td style="text-align:left;"> Ja </td>
   <td style="text-align:right;"> 89 </td>
   <td style="text-align:right;"> 40.8 </td>
   <td style="text-align:right;"> 561 </td>
   <td style="text-align:right;"> 64.1 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> Nein </td>
   <td style="text-align:right;"> 129 </td>
   <td style="text-align:right;"> 59.2 </td>
   <td style="text-align:right;"> 314 </td>
   <td style="text-align:right;"> 35.9 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
</tbody>
</table>

]
.panel[.panel-name[Balance table (RCT news)]
<table class="table" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;">
<caption>Socio-demographic characteristics of those who read news and those who do not</caption>
 <thead>
<tr>
<th style="empty-cells: hide;border-bottom:hidden;" colspan="2"></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Contr (N=358)</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Treat_1 (N=375)</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Treat_2 (N=360)</div></th>
</tr>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;">    </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> Std. Dev. </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> Std. Dev. </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> Std. Dev. </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> news_yn </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:right;"> 0.8 </td>
   <td style="text-align:right;"> 0.4 </td>
   <td style="text-align:right;"> 0.8 </td>
   <td style="text-align:right;"> 0.4 </td>
   <td style="text-align:right;"> 0.8 </td>
   <td style="text-align:right;"> 0.4 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> age </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:right;"> 41.8 </td>
   <td style="text-align:right;"> 15.7 </td>
   <td style="text-align:right;"> 41.2 </td>
   <td style="text-align:right;"> 12.4 </td>
   <td style="text-align:right;"> 43.8 </td>
   <td style="text-align:right;"> 16.9 </td>
  </tr>
  <tr>
   <td style="text-align:left;box-shadow: 0px 1.5px"> nbh_exposed </td>
   <td style="text-align:left;box-shadow: 0px 1.5px">  </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 4.2 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 0.9 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 4.2 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 1.1 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 4.2 </td>
   <td style="text-align:right;box-shadow: 0px 1.5px"> 1.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:right;"> N </td>
   <td style="text-align:right;"> Pct. </td>
   <td style="text-align:right;"> N </td>
   <td style="text-align:right;"> Pct. </td>
   <td style="text-align:right;"> N </td>
   <td style="text-align:right;"> Pct. </td>
  </tr>
  <tr>
   <td style="text-align:left;"> imor </td>
   <td style="text-align:left;"> Child of immigrant </td>
   <td style="text-align:right;"> 156 </td>
   <td style="text-align:right;"> 43.6 </td>
   <td style="text-align:right;"> 157 </td>
   <td style="text-align:right;"> 41.9 </td>
   <td style="text-align:right;"> 150 </td>
   <td style="text-align:right;"> 41.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> Immigrant </td>
   <td style="text-align:right;"> 202 </td>
   <td style="text-align:right;"> 56.4 </td>
   <td style="text-align:right;"> 218 </td>
   <td style="text-align:right;"> 58.1 </td>
   <td style="text-align:right;"> 210 </td>
   <td style="text-align:right;"> 58.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> german </td>
   <td style="text-align:left;"> Ja </td>
   <td style="text-align:right;"> 215 </td>
   <td style="text-align:right;"> 60.1 </td>
   <td style="text-align:right;"> 222 </td>
   <td style="text-align:right;"> 59.2 </td>
   <td style="text-align:right;"> 213 </td>
   <td style="text-align:right;"> 59.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> Nein </td>
   <td style="text-align:right;"> 143 </td>
   <td style="text-align:right;"> 39.9 </td>
   <td style="text-align:right;"> 153 </td>
   <td style="text-align:right;"> 40.8 </td>
   <td style="text-align:right;"> 147 </td>
   <td style="text-align:right;"> 40.8 </td>
  </tr>
</tbody>
</table>

]

.panel[.panel-name[R code]

``` r
APAD %>% # Start with the APAD dataset, then pipe
  # Select specific variables for the balance test
  select(article,news_yn, age, nbh_exposed, imor, german, gewFAKT) %>%
* # Rename the weights variable so that the following
* # command (datasummary_balance) automatically treats it as a weight
* rename(weights = gewFAKT) %>%
* # Create a balance table
* datasummary_balance(
*   # Formula specifies to compare groups based on treatment/control
*   formula = ~ article,
*   data = ., # Use the data piped in from above
*   # Provide a title for the table
*   title = "Socio-demographic characteristics of those who read news and those who do not",
*   output = "kableExtra"# Specify the output format as kableExtra
* )
```
]]

---
# Causal effect of news articles

.push-left[.font80[
<br>

``` r
# Weighted OLS to analyze survey RCT.
ols <- lm_robust(dis_index ~ article, 
                 weight = gewFAKT, data = APAD)
# Weighted and z-standardized OLS to analyze survey RCT.
zols <- lm_robust(z_dis_index ~ article, 
                  weight = gewFAKT, data = APAD)

modelsummary( # Formatted table comparing both OLS models
  list("Discr." = ols, "Z-Discr." = zols), # List of models
  stars = TRUE, # Show significance stars
  coef_map = c("(Intercept)" = "Intercept (Venus control)", 
               "articleTreat_1" = "Article on discrimination", 
               "articleTreat_2" = "Article on Acculturation"),
  # 'coef_map' renames coefficients for clarity
  gof_map = c("nobs", "r.squared"), # Goodness-of-fit
  output = "kableExtra") # Output format for the table
```
]]

.push-right[
<br>

<table style="NAborder-bottom: 0; color: black; width: auto !important; margin-left: auto; margin-right: auto;" class="table">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> Discr. </th>
   <th style="text-align:center;"> Z-Discr. </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Intercept (Venus control) </td>
   <td style="text-align:center;"> 1.979*** </td>
   <td style="text-align:center;"> −0.171* </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.067) </td>
   <td style="text-align:center;"> (0.074) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Article on discrimination </td>
   <td style="text-align:center;"> 0.239* </td>
   <td style="text-align:center;"> 0.264* </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.107) </td>
   <td style="text-align:center;"> (0.118) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Article on Acculturation </td>
   <td style="text-align:center;"> 0.137 </td>
   <td style="text-align:center;"> 0.151 </td>
  </tr>
  <tr>
   <td style="text-align:left;box-shadow: 0px 1.5px">  </td>
   <td style="text-align:center;box-shadow: 0px 1.5px"> (0.118) </td>
   <td style="text-align:center;box-shadow: 0px 1.5px"> (0.130) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Num.Obs. </td>
   <td style="text-align:center;"> 1085 </td>
   <td style="text-align:center;"> 1085 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> R2 </td>
   <td style="text-align:center;"> 0.012 </td>
   <td style="text-align:center;"> 0.012 </td>
  </tr>
</tbody>
<tfoot><tr><td style="padding: 0; " colspan="100%">
<sup></sup> + p &lt; 0.1, * p &lt; 0.05, ** p &lt; 0.01, *** p &lt; 0.001</td></tr></tfoot>
</table>

]

---
# Visualize

.panelset[
.panel[.panel-name[R code]

``` r
plotdata <- zols %>% # Prepare data for plotting
  tidy() %>% # Convert regression results to a tidy data frame
  filter(term != "(Intercept)") %>% # Remove the intercept term
  mutate( # Rename treatment variables for clarity
    term = case_when(
      term == "articleTreat_1" ~ "Discrimination",
      term == "articleTreat_2" ~ "Acculturation"))

# Create the plot
ggplot(data = plotdata, aes(y = estimate, x = term)) +
  geom_hline(yintercept = 0, color = "orange", 
             lty = "dashed") + # Add a horizontal line at y=0
  # Plot point estimates and confidence intervals
  geom_pointrange(aes(min = conf.low, max = conf.high)) + 
  coord_flip() + # Flip coordinates for horizontal display
  labs(title = "Causal effect of news articles", 
       x = "Article on",
       y = "Comparison to control group (article on Venus) 
    in standard deviations",
    caption = "Note: Results are based on a weighted OLS with robust standard errors.
    N = 1085, and R2 = 0.012.") + 
  theme_minimal() # Use a minimal theme for clean appearance
```
]

.panel[.panel-name[Plot]
<img src="6-RCTees_files/figure-html/unnamed-chunk-26-1.png" width="65%" style="display: block; margin: auto;" />
]]

---
# Learning goal achieved!

.push-left[
<img src="./img/SchaefferKas.png" width="75%" style="display: block; margin: auto;" />
.backgrnote[.center[*Source:* Schaeffer and Kas (2024)]]
]

.push-right[
.center[.font80[Effects of experimentally induced awareness of discrimination]]
<img src="https://onlinelibrary.wiley.com/cms/asset/9a6ef318-ef35-4789-8c8a-6bccadf24f18/pops13027-fig-0001-m.jpg" width="70%" style="display: block; margin: auto;" />
.backgrnote[.font70[
Point estimates with 90 and 95% confidence intervals based on post-stratification weighted OLS regression with (cluster-)robust standard errors.
]]]

---
class: middle clear

.left-column[
<img src="https://www.laserfiche.com/wp-content/uploads/2014/10/femalecoder.jpg" width="80%" style="display: block; margin: auto;" />
]

.right-column[
<iframe src='exercise2.html' width='1000' height='600' frameborder='0' scrolling='yes'></iframe>
]

---
class: inverse
# Today's general lessons

1. **Experiment**: A study in which the researcher manipulates one or more variables (the "treatment") and then observes the effect of this manipulation on another variable (the "outcome").

2. **Randomized Controlled Trial (RCT)**: A type of experiment in which participants are randomly assigned to either a treatment group or a control group. The treatment group receives the treatment, while the control group does not. This randomization ensures that the two groups are as similar as possible at the start of the experiment, except for the fact that the treatment group receives the treatment and the control group does not.

3. **Randomization**: The process of randomly assigning participants to either the treatment group or the control group in an RCT. Randomization is important because it helps to ensure that the two groups are as similar as possible at the start of the experiment, which minimizes the risk of selection bias.

4. **OLS regression**: A statistical technique that can be used to analyze the results of an RCT. OLS regression can be used to estimate the causal effect of the treatment on the outcome, while controlling for the effects of other variables that may be correlated with the treatment.

---
# References

.font80[
Polack, F. P., S. J. Thomas, N. Kitchin, et al. (2020). "Safety and Efficacy of the BNT162b2 mRNA
Covid-19 Vaccine". In: _New England Journal of Medicine_, pp. 2603-2615.

Schaeffer, M. and J. Kas (2024). "The integration paradox: Does awareness of the extent of
ethno-racial discrimination increase reports of discrimination?" In: _Political Psychology_.

Schaeffer, M., J. Kas, and P. Hagedorn (2023). "The Association between Actual and Perceived
Discrimination (APAD): Technical Report". In: _SocArXiv_.

Steinmann, J. (2019). "The paradox of integration: why do higher educated new immigrants perceive more
discrimination in Germany?" In: _Journal of Ethnic and Migration Studies_, pp. 1377-1400.
]