1 Inspecting the Data from MIMIC-IV dataset

We begin by loading and inspecting the required data from MIMIC-IV:

# load information about ICU admission episode
adm_info <- load_concepts("adm_episode", "miiv", verbose = FALSE)
# subset to only first admission episode
patient_ids <- id_col(adm_info[adm_episode == 1])

# load relevant concepts for investigating mortality
dat <- load_concepts(c("acu_24", "diag", "age", "sex", "charlson",
                       "lact_24", "pafi_24", "ast_24",
                       "race", "death"), "miiv", patient_ids = patient_ids,
                     verbose = FALSE)

# compare only White and African-American patients
dat <- dat[race %in% c("Caucasian", "African American")]
dat[, majority := as.integer(race == "Caucasian")]
dat[, race := NULL]

# remove time variable
dat[, c(index_var(dat)) := NULL]

# impute normal/median values for missing covariates
imp_lst <- list(
  age = 65,
  acu_24 = 0,
  charlson = 0,
  lact_24 = 1,
  ast_24 = 20,
  pafi_24 = 500,
  death = FALSE
)

for (i in seq_len(ncol(dat))) {

  var <- names(dat)[i]
  if (any(is.na(dat[[var]])) & !is.null(imp_lst[[var]]))
    dat[is.na(get(var)), c(var) := imp_lst[[var]]]
}

# visualize the data
knitr::kable(head(dat), caption = "MIMIC-IV Mortality data.")
MIMIC-IV Mortality data.
stay_id acu_24 diag age sex charlson lact_24 pafi_24 ast_24 death majority
30000153 3 TRAUM 61 Male 0 1.7 430.0000 20 FALSE 1
30001396 3 MED 40 Male 2 1.8 361.9048 23 FALSE 1
30001446 11 MED 56 Male 3 1.7 466.6667 114 FALSE 1
30001656 3 NMED 68 Female 0 1.0 500.0000 20 FALSE 1
30001947 2 SURG 43 Male 1 0.7 535.0000 36 FALSE 1
30002415 4 CSURG 72 Female 0 1.0 240.0000 20 FALSE 1

We consider the cohort of all patients in the database admitted to the ICU (and we only consider the first admission of each patient). We load information on the SOFA score, admission diagnosis, age, sex, Charlson comorbidity index, worst values of lactate, PaO2/FiO2, and AST in the first 24 hours of ICU stay. We further load the race information (protected attribute \(X\)) and the death indicator (outcome \(Y\)). We want to investigate the disparities in outcome between "Caucasian" and "African American" groups.

2 Constructing the Graphical Model

We next construct the mapping of the covariates to our graphical model.

# constructing the causal diagram
X <- "majority" # protected attribute, "treatment"
Z <- c("age") # confounding variable
# mediators: SOFA, admission diagnosis, Charlson index, lactate, PaO2/FiO2, AST
W <- c("acu_24", "diag", "charlson", "lact_24", "pafi_24", "ast_24")
Y <- "death" # outcome variable

3 Decomposing the Disparity

We use the faircause R-package, which performs the decomposition of the TV measure into direct, indirect, and spurious components.

fcb <- fairness_cookbook(data = dat, X = X, Z = Z, W = W, Y = Y, x0 = 0, x1 = 1)

We can then inspect the decomposition by calling autoplot() on the fcb object which is an S3 class of type faircause:

autoplot(fcb) +
  labs(title="Causal decomposition of mortality difference: MIMIC-IV race effect",
       y="Mortality difference (%)") +
  theme_minimal() +
  scale_x_discrete(labels = c("Total Variation", "Direct", "Indirect",
                                "Confounded"), name = "Pathway") +
  scale_fill_discrete(labels = c("Total Variation", "Direct", "Indirect",
                                 "Confounded"), name = "Pathway")

4 Understanding the Decomposition

We now analyze each of the spurious, indirect, and direct effects, to better understand what drives them.

4.1 Zooming-in on the Spurious Effect

To better understand the spurious effect, we plot the density of the age distributions between groups:

4.2 Zooming-in on the Indirect Effect

For the indirect effect, we want to inspect the distribution of the mediators W. Here, we will specifically focus on the Charlson comorbidity index, and the admission diagnosis. However, we must be careful when using a visualization: the TV decomposition tells us that minority patients are likely more sick along the indirect effect; but if we simply look at the conditional distribution \(P(W \mid X = x)\), we have not adjusted for the effect of age, which we know is important since we saw previously that minority patients are younger on average, and age is known to affect illness severity and/or pre-morbid health status. To resolve this issue, we need to make sure our visualizations adjust for the effect of age. We write a function called mc_age_adjust():

mc_age_adjust <- function(dat, n_mc = list(10^5, 10^5)) {
  
  dat[, age := round(age)]
  
  # pre-compute age look-ups
  lookup <- replicate(17, NULL)
  for (i in seq(range(dat$age)[1], range(dat$age)[2])) {
    
    lookup[[i]] <- list(NULL)
    lookup[[i]][[1]] <- which(dat$age == i & dat$majority == 0)
    lookup[[i]][[2]] <- which(dat$age == i & dat$majority == 1)
  }
  
  if (max(dat$age) == 100) {
    
    if (length(lookup[[100]][[1]]) == 0) 
      lookup[[100]][[1]] <- lookup[[99]][[1]]
    
    if (length(lookup[[98]][[1]]) == 0) 
      lookup[[98]][[1]] <- lookup[[99]][[1]]
  }
  
  ret <- c()
  for (maj in c(0, 1)) {
    
    # sample age distribution
    if (maj == 0 || n_mc[[1]] != n_mc[[2]])
      ag <- sample(dat$age, n_mc[[maj + 1]], replace = TRUE)
    
    rows <- lapply(
      unique(ag),
      function(agi) {
        
        n_samp <- sum(ag == agi)
        sample(x = lookup[[agi]][[maj + 1]], n_samp, replace = TRUE)
      }
    )
    
    ret <- if (maj == 0) dat[do.call(c, rows)] else rbind(ret, dat[do.call(c, rows)])
  }
  
  ret
}

adj_dat <- mc_age_adjust(dat)

Using the adjusted data, we quantify the difference in CCI

ggplot(adj_dat, aes(x = charlson, fill = factor(majority))) +
  geom_histogram(position = "identity", alpha = 0.5, color = "black") +
  scale_fill_discrete(name = "Majority") +
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

After adjusting for age, minority patients have higher Charlson comorbidity indices. We further investigate the proportion of medical admissions in each group, since medical admissions are associated with a much higher mortality rate:

adj_dat[, med_adm := diag %in% c("CMED", "MED", "NMED", "OMED")]
adj_dat[, list(prop_med = mean(med_adm)), by = "majority"]

Minority patients are far more likely to be admitted for a medical reason (76%) compared to majority patients (60%), further explaining the indirect effect.

4.3 Zooming-in on the Direct Effect

The direct effect shows better survival of minority patients after adjusting for both the confounders and the mediators. This is an unexpected finding, since usually unobserved covariates would show that minority patients are worse-off (e.g., think of socioeconomic status). However, in our case, the sign is opposite.

To investigate this further, we analyze the heterogeneity of the direct effect along age and admission groups. For this,

dat <- merge(dat, load_concepts("elective", "miiv", verbose=FALSE), all.x = TRUE)
diag_dt <- data.table(
  diag_index = 0:17, 
  diag = c("MED", "CMED", "NMED", "OMED", "PSYCH", "GU", "TRAUM", 
           "ENT", "CSURG", "NSURG", "ORTHO", "PSURG", "SURG", 
           "TSURG", "VSURG", "GYN", "OBS", "DENT")
)
dat <- merge(dat, diag_dt,by = "diag")

# split elective and non-elective surgeries
dat[diag_index >= 5, diag_index := diag_index + 20 * elective]

W_cnd <- c(setdiff(W, "diag"), "diag_index", "elective")

source("r/cnd-effects.R")
source("r/one-step-debiased.R")

cndE_obj <- cnd_effect(dat, X = X, Z = Z, W = W_cnd, Y = Y, method = "osd")
sel_covs <- c("diag_grp", "age")
E_lst <- cmbn_E(sel_covs, "miiv")
cde <- infer(cndE_obj, E_lst)
plt_E_cnd(cde, sel_covs, "DE Heterogeneity")

We can see that the direct effect is heterogeneous; modified by age, and likely by type of admission. In several places, however, the confidence intervals are wide, and we cannot establish the effect sign with certainty.

4.4 Possible Explanation of the Direct Effect

We investigate further, and look at the same effect estimates on the ANZICS database, which has millions of admissions. We plot the same effect heterogeneity (LHS):

knitr::include_graphics(c("img/aics-de-hetero.png", "img/age-diag-risks.png"))

The pattern appears more clearly with larger sample size: the protective effect for the minority group is clustered around younger patients, and specifically for medical admissions.

So how could such an effect be explained? To connect this effect, we plot the baseline risk ratio of ICU admission. Our hypothesis can be described as follows: minority patients are more likely to be admitted in the first place, As the above plot (RHS) shows, the pattern across age groups and admission groups correlates very strongly: increased risk of admission for minority compared to majority patients is correlated with the direct effect!