8 Case Study 2: Breast Cancer Mortality in Massachusetts
By: Enjoli Hall
8.1 Overview
Our outcome of interest is breast cancer mortality. We investigate disparities in breast cancer mortality by racialized/ethnic group and socioeconomic position. We primarily focus on comparisons of risk in Black non-Hispanic and White non-Hispanic populations, motivated by longstanding health inequities between these groups and sufficiently large population sizes in Massachusetts to support the analyses. We will be working with mortality data from the Massachusetts Registry of Vital Records and Statistics [https://www.mass.gov/lists/death-data] for the years 2013-2017. Vital statistics and disease registries are how states (which submit their data to the federal government) keep an official enumeration of deaths (Friedman et al., 2005; Hetzel, 1997; Krieger, 2019); because the death certificates require data on residential address at time of death, they can be useful tools for monitoring trends in health and health equity. Having data on the residential address of each individual allows us to geocode each observation to the physical and social environment in which the person lived. Death records also typically include demographic information about the deceased, including categories for racialized groups that conform to the 1997 US Office of Management and Budget (OMB) standards for the classification of federal data on “race and ethnicity” (OMB, 1997). We will explore how social membership in racialized groups and area-based social metrics (ABSMs) might be associated with breast cancer mortality across Massachusetts. To do this, we will pair our mortality data with demographic and socioeconomic data from the U.S. Census Bureau’s 5-year American Community Survey (ACS) files.
8.2 Background and Significance
Breast cancer is the most commonly diagnosed cancer worldwide and is one of the leading causes of cancer death. In the United States, breast cancer incidence and mortality rates vary widely across geographic regions and racialized groups. For example, while breast cancer deaths in the United States have declined overall by 42 percent over the last 30 years, there is a persistent mortality gap between Black women and White women. Breast cancer incidence rates among Black and White women are close, but mortality rates are markedly different, with Black women having a 41 percent higher death rate from breast cancer compared to White women in the United States (Breast Cancer Research Foundation, 2022).
Nevertheless, the routine stratification and presentation of cancer data by “race” in the absence of socioeconomic data such as occupation, educational level, or income, perpetuates the view that “race”—wrongly construed as a biological variable—explains racialized inequities in breast cancer mortality and other cancer outcomes. Hidden from view are ways that economic forms of discrimination and inequality might drive “racial”/ethnic inequities in breast cancer mortality. Rather than taking an either/or approach to analyzing and interpreting cancer data by “race/ethnicity” and socioeconomic position, it is important for public health researchers to stratify cancer data by both “race/ethnicity” and socioeconomic position as neither by itself is sufficient to capture how membership in racialized groups and class relations, separately and together, affect the health of populations. However, U.S. public health surveillance systems typically do not include data relating health status to socioeconomic position for individual records. One possible solution to these gaps is to combine data from health surveillance systems with socioeconomic data derived from the U.S. Census to analyze breast cancer mortality in relation to area-based socioeconomic measures for domains of socioeconomic position such as income and poverty, thereby permitting calculation of population-based breast cancer mortality rates stratified by area-based socioeconomic position and making visible socioeconomic gradients in breast cancer mortality.
We can also monitor racialized and socioeconomic cancer and other health inequities using not only conventional individual- and area-level socioeconomic measures but also measures of racialized and economic segregation and polarization at the neighborhood, city or town, and regional levels; these latter measures bring into focus the full range of concentrations of privilege and deprivation in an area. Over 20 years ago, Williams and Collins (2001) explained how racial residential segregation acts as a fundamental cause of racial disparities in health by exposing Black communities to less healthy neighborhood and housing conditions, fewer economic and educational opportunities, and lower quality health care resources compared to White communities. Since that time, hundreds of empirical studies have examined segregation as a key driver of patterns of population health and health inequities, but relatively few studies have examined cancer outcomes. Together, these studies suggest that residential segregation can generate racialized economic inequities across the cancer continuum, in part through producing differential access to medical care and unequal exposures to social and environmental cancer risks.
8.3 Motivation, Research Questions, and Learning Objectives
The goal of this case study is to develop familiarity with methods of exploring and visualizing racialized and socioeconomic inequities in health. Our specific goals will be to:
Download and merge different datasets for our analysis
Visualize and map estimates of area-based social metrics and breast cancer mortality
Identify relationships between racialized group, area-based social metrics, and breast cancer mortality
Model how space may impact inequities in breast cancer mortality
The research questions we will seek to answer throughout this case study include:
What is the overall socioeconomic gradient in breast cancer mortality? (hint: we can visualize this with a spatial model)
What is the racialized disparity in breast cancer mortality overall? (hint: we can visualize this with stratified aggregate analyses)
How do area-based socioeconomic measures interact with individual-level membership in racialized groups to affect patterns of breast cancer mortality (i.e., interactions between socioeconomic position and racialized groups, not just socioeconomic inequities within racialized groups)? (hint: we can explore this with a Poisson model using an interaction term)
8.4 Getting and Wrangling Your Data
We are providing datasets for you to use throughout these case studies that have been wrangled and reshaped for your use, and we also provide code to show how you could go through that process on your own. You can look at whole datasets in RStudio using the view()
command, or look at summaries of the datasets by simply typing the dataset name into the console window. You can skip ahead to the “Approach” section and follow along with the case study without issue.
8.4.1 Running RStudio and Setting Up Your Working Directory
Download the Breast Cancer Mortality project folder [https://hu-my.sharepoint.com/:f:/g/personal/denajavadi_g_harvard_edu/EpyUmRZB-hBGvdygLgRLTmMByYr_lmlP6pqY09f_bz8QBg?e=tDxIIg] that contains all of the data files and geographic shapefiles as well as maps and figures that we will use for this case study. Save the folder to your Desktop or another easily accessible location on your computer. Note: Do not edit or change any of the file names as our code corresponds to the file names in the folder.
Next, you will need to open RStudio and create a new R Project file to work on the case study. Create a new R Project file by selecting File > New Project…
from the menu bar. Select New Directory
from the popup window. Next, select New Project
. Pick a meaningful name for your project folder, i.e. the Directory Name
. Ensure this project folder is created in the right place. You can change the subdirectory
by clicking on Browse…
. The subdirectory should be the place where the Breast Cancer Mortality folder and files that you just downloaded are saved on your computer. Lastly, tick Open in new session
. This will open your R Project in a new RStudio window. Once you are happy with your choices, you can click Create Project
. This will open a new R Session, and you can start working on the case study. To make sure all of your project files for the case study are properly loaded and to avoid potential errors when running the code, navigate to your project folder and files in the Files/Plots/Packages/Help
window in the bottom-right corner of your RStudio window and go through and double-click each of the file names to open and run all of the files for this project.
8.4.2 Dependencies
Run the lines of code below to load the R packages that you will need throughout the case study. If this code does not run for you, you may need to run install.packages(“package_name”)
.
# Libraries - if this code does not run for you, you may need to run install.package("package_name")
library(knitr)
library(tidyverse)
library(readxl)
library(ggplot2)
library(cowplot)
library(tidycensus)
library(tigris)
options(tigris_use_cache = TRUE)
library(sf)
library(spdep)
library(viridis)
library(Hmisc)
library(fastDummies)
library(lme4)
library(INLA)
library(broom)
8.4.3 Health Surveillance Data
Data on breast cancer deaths were obtained from the Massachusetts Department of Public Health. Each mortality record included data on the decedent’s age, gender, racialized group (using U.S. census categories), residential address at the time of death and coded cause of death following the International Classification of Diseases 10th Revision (ICD-10). We employed R software and Google Maps API to geocode the residential address of each case to its latitude and longitude, which were used to assign census tract and city/town geocodes.
The mortality data has been aggregated from individual observations into death counts by age, racialized group, census tract, and city/town. When you get unrestricted mortality files from government sources for research, you will likely receive files with one observation per death. After you have geocoded these observations, you will need to aggregate them up to the level of interest for your analysis. This requires aggregation not only up to the census tract level, but also age groups so we can do an appropriate age standardization, sex groups and racialized groups so we can stratify our analyses by these groups, and towns so we can explore a second areal level of analysis.
Below is some code (for example) to turn a raw mortality file into what has been provided to you, by merging census tracts and towns to the data, and then aggregating into groups. The vast majority of our area-based social metrics come from the U.S. Census, so the most detailed level for analysis available will be the census tract (however for this case study, because we are analyzing a health outcome with relatively small numbers of cases, we need to perform our analysis at a larger geography such as city/town instead of census tract to ensure that there are a sufficient number of cases, especially inclusive of different racialized groups, in our units of analysis). We can use the tigris
package to download census tract geometries and the sf package to link our geocoded observations to the appropriate census tracts.
Ideally when performing an analysis that might include two or more levels the smaller level (e.g., census tracts) would be nested entirely within the larger level (e.g., towns). In Massachusetts though, there are many towns that have such small populations that they are smaller than census tracts. For these towns, we have created a crosswalk wherein small towns have been combined to create larger super towns. These super towns each make up one census tract, so that each super town in the analysis will now have at least one census tract nested within it. Please note that for this analysis we will be focusing on breast cancer mortality in women, designated as such in the MA cancer registry records.
# the raw mortality data, with the geocoded coordinates converted into spatial feature (sf) data
ma_mort <- read_rds("data/08-breast-cancer-mortality/ma_mort_geo.RDS") %>%
st_as_sf(coords=c("lon","lat"), crs = 4269)
# Importing tract shapefile using tigris package
tracts_sf <- tracts(state = "25", year = "2010")
# Linking Census Tracts and Super Towns File
supertowns <- read_excel("data/08-breast-cancer-mortality/ma ct to supertinytowns.xlsx", col_names = FALSE, col_types = "text") %>%
transmute(GEOID10 = ...1,
super_town = str_to_lower(...2)) %>%
mutate(super_town = case_when(GEOID10 == "25023500101" ~ "hull",
super_town == "stt1" ~ "Super Town 1",
super_town == "stt2" ~ "Super Town 2",
super_town == "stt3" ~ "Super Town 3",
super_town == "stt4" ~ "Super Town 4",
super_town == "stt5" ~ "Super Town 5",
super_town == "stt6" ~ "Super Town 6",
super_town == "stt7" ~ "Super Town 7",
super_town == "stt8" ~ "Super Town 8",
super_town == "stt9" ~ "Super Town 9",
super_town == "stt10" ~ "Super Town 10",
super_town == "stt11" ~ "Super Town 11",
super_town == "stt12" ~ "Super Town 12",
super_town == "stt13" ~ "Super Town 13",
super_town == "stt14" ~ "Super Town 14",
super_town == "stt15" ~ "Super Town 15",
super_town == "stt16" ~ "Super Town 16",
super_town == "stt17" ~ "Super Town 17",
super_town == "stt18" ~ "Super Town 18",
super_town == "stt19" ~ "Super Town 19",
super_town == "stt20" ~ "Super Town 20",
super_town == "stt21" ~ "Super Town 21",
TRUE ~ super_town))
tracts_sf <- inner_join(tracts_sf,supertowns, by = "GEOID10")
#Creating a supertown-level map
town_geometry <- st_read("citytown_shp",
layer = "TOWNSSURVEY_POLYM") %>%
transmute(town = str_to_lower(TOWN),
geometry = geometry,
super_town = case_when(town %in% c("charlemont", "colrain", "hawley", "heath", "monroe", "rowe") ~ "Super Town 1",
town %in% c("cummington", "middlefield", "plainfield", "worthington") ~ "Super Town 2",
town %in% c("monterey", "tyringham") ~ "Super Town 3",
town %in% c("hardwick", "new braintree") ~ "Super Town 4",
town %in% c("bernardston", "gill", "leyden") ~ "Super Town 5",
town %in% c("blandford", "chester", "granville", "montgomery", "russell", "tolland") ~ "Super Town 6",
town %in% c("sunderland", "whateley") ~ "Super Town 7",
town %in% c("becket", "washington") ~ "Super Town 8",
town %in% c("holland", "wales") ~ "Super Town 9",
town %in% c("peru", "windsor") ~ "Super Town 10",
town %in% c("goshen", "williamsburg") ~ "Super Town 11",
town %in% c("hancock", "new ashford", "richmond") ~ "Super Town 12",
town %in% c("leverett", "new salem", "shutesbury") ~ "Super Town 13",
town %in% c("sandisfield", "otis") ~ "Super Town 14",
town %in% c("erving", "warwick", "wendell") ~ "Super Town 15",
town %in% c("buckland", "shelburne") ~ "Super Town 16",
town %in% c("alford", "egremont", "mount washington") ~ "Super Town 17",
town %in% c("ashfield", "conway") ~ "Super Town 18",
town %in% c("aquinnah", "chilmark", "gosnold", "west tisbury") ~ "Super Town 19",
town %in% c("petersham", "phillipston") ~ "Super Town 20",
town %in% c("florida", "savoy") ~ "Super Town 21",
TRUE ~ town)) %>%
group_by(super_town) %>%
mutate(geometry = st_combine(geometry)) %>%
select(super_town, geometry) %>%
unique()
# Merging to geocoded mortality and dropping geometries
# The geometries of a shapefile can be very unweildy and slow data wrangling commands. Since we only need them to map, we will remove them for now
ma_mort_ct <- st_join(tracts_sf, ma_mort, left = FALSE) %>%
st_drop_geometry() %>%
arrange(id) %>%
select(id, year, date,
age, sex, starts_with("race"), hisp, immigrant, starts_with("educ"),
contains("icd10"),
address, zip, starts_with("geo"), super_town, contains("type"), north, south, east, west,
GEOID10)
# Aggregating the data by our selected variables of interest
#Filtering to only breast cancer deaths
ma_mort_bc <- ma_mort_ct %>%
filter(str_detect(icd10, "C50"),
sex == "Female") %>%
mutate(age_cat = cut(age,
breaks = c(-1,4,9,14,19,24,29,34,39,44,49,54,59,64,69,74,79,84,200),
labels = c("0-4", "5-9", "10-14","15-19","20-24",
"25-29","30-34","35-39","40-44","45-49",
"50-54","55-59","60-64","65-69","70-74",
"75-79","80-84","85+")),
race_group = racegroup) %>%
select(-c(age, race, immigrant, educ_years, contains("icd10"),
contains("address"), zip, geo_town, contains("type"),
north, south, east, west, date, hisp)) %>%
group_by(year,
age_cat,
sex,
race_group,
super_town,
GEOID10) %>%
summarise(deaths = n()) %>%
ungroup()
town_ma_mort_bc <- ma_mort_bc %>%
group_by(super_town, year, age_cat, sex, race_group) %>%
summarise(deaths = sum(deaths, na.rm = TRUE)) %>%
ungroup()
## # A tibble: 6 × 6
## super_town year age_cat sex race_group deaths
## <chr> <dbl> <fct> <fct> <fct> <int>
## 1 abington 2013 70-74 Female Non-Hispanic White 1
## 2 abington 2014 75-79 Female Non-Hispanic White 1
## 3 abington 2016 40-44 Female Non-Hispanic White 2
## 4 abington 2016 50-54 Female Non-Hispanic White 1
## 5 abington 2016 85+ Female Non-Hispanic White 1
## 6 abington 2017 70-74 Female Non-Hispanic White 1
8.5 Approach
Now that we have our data, let’s revisit our questions of interest: 1. What is the overall socioeconomic gradient in breast cancer mortality?
What is the racialized disparity in breast cancer mortality overall?
How do area-based socioeconomic measures interact with individual-level membership in racialized groups to affect patterns of breast cancer mortality (i.e., interactions between socioeconomic position and racialized groups, not just socioeconomic inequities within racialized groups)?
8.5.1 Question 1: What is the overall socioeconomic gradient in breast cancer mortality?
First, let’s just look at the spatial distribution of the area-based socioeconomic measure of poverty that we constructed: the percentage of the city/town population below poverty. Here is a map of the percentage of population below poverty by Massachusetts city/town poverty quantile for years 2013-2017.
ma_poverty_map <- town_ma_absm_sum %>%
left_join(town_geometry, by= "super_town") %>%
ggplot() +
geom_sf(mapping = aes(geometry=geometry,
fill=pov_qt),
lwd = 0.1,
color = "black") +
scale_fill_viridis_d() +
labs(title = expression(atop("Percentage of population below poverty", "by MA city/town, 2013-2017")),
caption = "5-year ACS files from end-years 2015-2019",
fill = "Poverty quantile", x="", y="") +
theme_void() +
theme(axis.text.x=element_blank(), #remove x axis labels
axis.ticks.x=element_blank(), #remove x axis ticks
axis.text.y=element_blank(), #remove y axis labels
axis.ticks.y=element_blank(),
legend.position = c(0.25, 0.25),
legend.key.size = unit(0.4, "cm"))
ggsave("ma_poverty_map.png")
Now, let’s look at the spatial distribution of the other area-based socioeconomic measure that we constructed: the ICE measure for racialized economic segregation. Compare this map of city/town ICE (racialized group + income) quantile to the first map of city/town poverty quantile. Do you notice any differences? Do you find one of these area-based socioeconomic measures to be more visually clear and compelling than the other, and why or why not?
ma_ice_map <- town_ma_absm_sum %>%
left_join(town_geometry, by= "super_town") %>%
ggplot() +
geom_sf(mapping = aes(geometry=geometry,
fill=ICE_qt),
lwd = 0.1,
color = "black") +
scale_fill_viridis_d(option = "E") +
labs(title = expression(atop("Index of Concentration at the Extremes for Racialized Economic Segregation", "by MA city/town, 2013-2017")),
caption = "5-year ACS files from end-years 2015-2019",
fill = "ICE quantile", x="", y="") +
theme_void() +
theme(axis.text.x=element_blank(), #remove x axis labels
axis.ticks.x=element_blank(), #remove x axis ticks
axis.text.y=element_blank(), #remove y axis labels
axis.ticks.y=element_blank(),
legend.position = c(0.25, 0.25),
legend.key.size = unit(0.4, "cm"))
ggsave("ma_ice_map.png")
Next, we will look at how breast cancer mortality varies across the ICE measure for racialized economic segregation. Using the 2015-2019 American Community Survey (ACS) 5-year estimates counts, we compute age- and sex-standardized 2013-2017 breast cancer mortality rates (per 1000) for each MA city/town ICE quantile using the direct method. This method will require a reference population, which we have downloaded (and wrangled) from the National Cancer Institute. (https://seer.cancer.gov/stdpopulations/).
## # A tibble: 13 × 3
## age_cat pop wt
## <chr> <dbl> <dbl>
## 1 0-4 69135 0.0691
## 2 10-14 73032 0.0730
## 3 15-19 72169 0.0722
## 4 20-24 66478 0.0665
## 5 25-29 64529 0.0645
## 6 30-34 71044 0.0710
## 7 35-44 162613 0.163
## 8 45-54 134834 0.135
## 9 5-9 72533 0.0725
## 10 55-64 87247 0.0872
## 11 65-74 66037 0.0660
## 12 75-84 44841 0.0448
## 13 85+ 15508 0.0155
ma_mort_ice_qt <- town_ma_mort_bc %>%
inner_join(town_ma_demo_acs_bc, by = c("year","super_town","race_group","sex","age_cat")) %>%
left_join(town_ma_absm_sum, by= "super_town") %>%
filter(age_cat != "Total") %>%
group_by(ICE_qt, age_cat) %>%
summarise(num = sum(deaths, na.rm=T),
den = sum(population, na.rm=T))%>%
mutate(den = den + 0.001) %>%
left_join(seer_std, by="age_cat") %>%
mutate(rate_i = wt*num/den,
var_rate_i = (num*wt^2)/den^2) %>%
group_by(ICE_qt) %>%
summarise(num = sum(num, na.rm=T),
den = sum(den, na.rm=T),
std_rate = sum(rate_i, na.rm=T),
var_std_rate = sum(var_rate_i, na.rm=T),
sumwt = sum(wt),
sumwt2 = sum(wt^2)) %>%
mutate(std_rate = std_rate / sumwt *1000,
var_std_rate = var_std_rate / sumwt2 *1000,
std_rate_lo95 = std_rate - 1.96*sqrt(var_std_rate),
std_rate_up95 = std_rate + 1.96*sqrt(var_std_rate)) %>%
ungroup() %>%
ggplot(aes(x=ICE_qt, y=std_rate)) +
geom_bar(stat="identity", fill = "limegreen",color = "black") +
labs(title = expression(atop("Age-standardized breast cancer mortality rate by MA city/town","ICE for racialized group & income quantile, 2013-2017")),
caption = expression(atop("Source: Massachusetts Mortality Data 2013-2017",
"5-year ACS files from end-years 2015-2019"))) +
ylab("Breast cancer mortality per 1000") +
xlab("ICE quantile") +
theme(legend.position="bottom") +
theme_minimal()
ggsave("ma_mort_ice_qt.png")
We can see a clear socioeconomic gradient in breast cancer mortality rates at the city/town level in Massachusetts for years 2013-2017. How would you interpret this socioeconomic gradient? Is it what you expected, why or why not?
8.5.2 Question 2: What is the racialized disparity in breast cancer mortality overall?
We can describe inequities by racialized groups much the same way we describe inequities by ABSMs: by aggregating death and population data. We’ve already got this data aggregated, so we can stratify our analysis to compare Black non-Hispanic and White non-Hispanic populations. Note: Throughout the case study we may refer to these groups as Black and White as a shorthand (see footnote in preface regarding capitalization conventions used throughout the manual).
As we did for the first question, we age-adjust our aggregated data by racialized group to compare standardized mortality rates for Black and White populations. We can see lots of variation here as the mortality rates increase - what is causing that?
ma_mort_stratified <- town_ma_mort_bc %>%
inner_join(town_ma_demo_acs_bc, by = c("year","super_town","race_group","sex","age_cat"))%>%
filter(race_group %in% c("Non-Hispanic White","Non-Hispanic Black"),
age_cat != "Total") %>%
group_by(super_town, race_group, age_cat) %>%
summarise(num = sum(deaths, na.rm=T),
den = sum(population, na.rm=T))%>%
mutate(den = den + 0.001) %>%
left_join(seer_std, by="age_cat") %>%
mutate(rate_i = wt*num/den,
var_rate_i = (num*wt^2)/den^2) %>%
group_by(super_town, race_group) %>%
summarise(num = sum(num, na.rm=T),
den = sum(den, na.rm=T),
std_rate = sum(rate_i, na.rm=T),
var_std_rate = sum(var_rate_i, na.rm=T),
sumwt = sum(wt),
sumwt2 = sum(wt^2)) %>%
mutate(std_rate = std_rate / sumwt *1000,
var_std_rate = var_std_rate / sumwt2 *1000,
std_rate_lo95 = std_rate - 1.96*sqrt(var_std_rate),
std_rate_up95 = std_rate + 1.96*sqrt(var_std_rate)) %>%
ungroup()
bc_dotplot_agg_rates <- ma_mort_stratified %>%
filter(!is.na(std_rate),
den > 0.01) %>%
select(super_town,race_group, contains("std_rate")) %>%
arrange(std_rate) %>%
group_by(race_group) %>%
mutate(orderID = row_number()) %>%
ungroup() %>%
ggplot(aes(x=orderID, y=std_rate)) +
geom_point(color = "limegreen", alpha = 0.8, size = 1) +
geom_errorbar(aes(ymin = std_rate_lo95, ymax=std_rate_up95), size = 0.1) +
labs(title = "Age-standardized breast cancer mortality rate",
caption = expression(atop("Source: Massachusetts Mortality Data 2013-2017",
"5-year ACS files from end-years 2015-2019"))) +
xlab("City/towns") +
ylab("Mortality rates") +
facet_wrap(vars(race_group), ncol = 2, scales = "free_x") +
theme_minimal() +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank())
ggsave("bc_dotplot_agg_rates.png")
Before we explore that variation, as a reminder, we age-adjust our aggregated data by racialized group so we can compare standardized mortality rates, as differences in breast cancer mortality between the two groups may be due to differential distribution of ages by racialized group. We can see that the age distributions for each racialized group are quite different. How would you describe the differences?
bc_tab_age_race <- town_ma_demo_acs_bc %>%
filter(race_group %in% c("Non-Hispanic White","Non-Hispanic Black"),
sex == "Female",
age_cat != "Total") %>%
group_by(super_town, race_group, sex, age_cat) %>%
summarise(pop = sum(population, na.rm=T)) %>%
inner_join(town_ma_absm_sum, by = c("super_town")) %>%
mutate(age_cat_broad = case_when(age_cat %in% c("0-4","5-9","10-14","15-19","20-24") ~ "0-24",
age_cat %in% c("25-29","30-34","35-39","35-44","40-44") ~ "25-44",
age_cat %in% c("45-49","45-54","50-54","55-59","55-64","60-64") ~ "45-64",
age_cat %in% c("65-69","65-74","70-74","75-59","75-84","80-84","85+") ~ "65+")) %>%
group_by(race_group, age_cat_broad) %>%
summarise(pop = sum(pop, na.rm=T)) %>%
group_by(race_group) %>%
mutate(percentage = pop/sum(pop)) %>%
ggplot(aes(x=race_group, y=percentage, fill= age_cat_broad)) +
geom_bar(position="stack", stat="identity") +
scale_fill_viridis_d(option = "A") +
labs(title ="Age distribution by racialized group",
fill = "Age category",
caption = "5-year ACS files from end-years 2015-2019") +
ylab("Percentage of population") +
xlab("Racialized group") +
theme(legend.position="bottom") +
theme_minimal()
ggsave("bc_tab_age_race.png")
include_graphics("images/08-breast-cancer-mortality/bc_tab_age_race.png")
Now, to return to this question of why we see lots of variation in mortality rates. As we might expect, these extremely variable mortality rates are occurring in small populations. What do these maps look like?
plotlist <- vector(mode = "list", length = 2)
names(plotlist) <- c("Non-Hispanic White","Non-Hispanic Black")
for (plt in names(plotlist)) {
map.state <- ma_mort_stratified %>%
filter(race_group == plt) %>%
mutate(std_rate = ifelse(std_rate > 10, 10, std_rate)) %>%
left_join(town_geometry, by= "super_town") %>%
ggplot() +
geom_sf(mapping = aes(geometry = geometry,
fill = std_rate),
lwd = 0.1,
color = "black") +
scale_fill_viridis(option = "B", limits = c (0, 10)) +
labs(title = plt,
fill = "Age Standardized Mortality Rate", x="", y="") +
theme_void() +
theme(axis.text.x=element_blank(), #remove x axis labels
axis.ticks.x=element_blank(), #remove x axis ticks
axis.text.y=element_blank(), #remove y axis labels
axis.ticks.y=element_blank(),
legend.position = c(0.25, 0.25),
legend.key.size = unit(0.2, "cm"))
map.boston <- ma_mort_stratified %>%
filter(race_group == plt,
super_town == "boston") %>%
mutate(std_rate = ifelse(std_rate > 10, 10, std_rate)) %>%
left_join(town_geometry, by= "super_town") %>%
ggplot() +
geom_sf(mapping = aes(geometry = geometry,
fill = std_rate),
lwd = 0) +
scale_fill_viridis(option = "B", limits = c (0, 10)) +
labs(title = "Boston") +
theme_void() +
theme(strip.text.x = element_blank(),
legend.position = "None",
plot.title = element_text(hjust = 0.5))
plotlist[[plt]] <- ggdraw() +
draw_plot(map.state , x = 0.00, y = 0.00, width = 0.80, height = 1.00) +
draw_plot(map.boston, x = 0.65, y = 0.50, width = 0.30, height = 0.30)
}
title <- ggdraw() +
draw_label("Age-standardized breast cancer mortality rates", size = 12, fontface='bold', hjust = 0.2)
caption1 <- ggdraw() +
draw_label("Source: Massachusetts Mortality Data 2013-2017", size = 8, hjust = 0.5)
caption2 <- ggdraw() +
draw_label("5-year ACS files from end-years 2015-2019", size = 8, hjust = 0.5)
plots <- plot_grid(plotlist = plotlist,
ncol = 1)
bc_byrace_std_rates <- plot_grid(title, plots, caption1, caption2,
rel_heights = c(0.05, 1.0, 0.05, 0.05),
ncol = 1)
ggsave("bc_byrace_std_rates.png")
We are seeing some really extreme rates in the Black population, where we have smaller overall sample sizes. We are also seeing many cities and towns with no recorded breast cancer deaths for both Black and White populations, but this is particularly the case for the Black population (it is likely that these are cities and towns with very small or no resident Black populations due to racialized economic segregation). What else do you observe?
We can also display these two maps as one, by calculating the rate difference, or the rate ratio. We will calculate and display the incidence rate ratio comparing the age-standardized mortality rates for the Black non-Hispanic and White non-Hispanic populations, as we visualized at the start of this section.
irr_data <- ma_mort_stratified %>%
select(super_town,race_group, std_rate, var_std_rate) %>%
pivot_wider(id_cols = super_town,
names_from = race_group,
values_from = c(std_rate, var_std_rate)) %>%
mutate(irr = ifelse(`std_rate_Non-Hispanic White` == 0, NA_real_, `std_rate_Non-Hispanic Black` / `std_rate_Non-Hispanic White`),
irr_var = `var_std_rate_Non-Hispanic Black` + `var_std_rate_Non-Hispanic White`,
irr_lo95 = irr - 1.96*sqrt(irr_var),
irr_up95 = irr + 1.96*sqrt(irr_var))
irr_plot <- irr_data %>%
arrange(irr) %>%
filter(!is.na(irr)) %>%
mutate(orderID = row_number()) %>%
ungroup() %>%
ggplot(aes(x=orderID, y=irr)) +
geom_point(color = "limegreen", alpha = 0.8, size = 1) +
geom_errorbar(aes(ymin = irr_lo95, ymax=irr_up95), size = 0.1) +
labs(title = expression(atop("Incidence Rate Ratio Comparing Non-Hispanic Black",
"and Non Hispanic White Populations")),
caption = expression(atop("Source: Massachusetts Mortality Data 2013-2017",
"5-Year ACS files from end-years 2015-2019"))) +
xlab("Census Tracts") +
ylab("Incidence Rate Ratio") +
theme_minimal()
irr_plot
We can also map this like the other maps.
map.irr.bc <- irr_data %>%
mutate(irr = case_when(irr > 4 ~ 4,
TRUE ~ irr)) %>%
left_join(town_geometry, by= "super_town") %>%
ggplot() +
geom_sf(mapping = aes(geometry = geometry,
fill = irr),
lwd = 0.1) +
scale_fill_viridis(option = "E",
trans = scales::pseudo_log_trans(sigma=0.01),
limits = exp(c(-1,1)*log(4)),
breaks=c(0.25, 0.5,1,2,4),
na.value = "white") +
labs(title = expression(atop("Incidence Rate Ratio Comparing Non-Hispanic Black",
"and Non Hispanic White Populations")),
caption = expression(atop("Source: Massachusetts Mortality Data 2013-2017",
"5-Year ACS files from end-years 2015-2019")),
fill = "IRR", x="", y="") +
theme_void() +
theme(legend.position = c(0.25, 0.25),
legend.key.size = unit(0.3, "cm"),
plot.title = element_text(hjust = 0.1),
plot.subtitle = element_text(hjust = 0.1))
map.boston.irr <- irr_data %>%
filter(super_town == "boston") %>%
mutate(irr = case_when(irr > 4 ~ 4,
irr == 0 ~ 0.0000001,
TRUE ~ irr)) %>%
left_join(town_geometry, by= "super_town") %>%
ggplot() +
geom_sf(mapping = aes(geometry = geometry,
fill = irr),
lwd = 0.1) +
scale_fill_viridis(option = "E",
trans = scales::pseudo_log_trans(sigma=0.01),
limits = exp(c(-1,1)*log(4)),
breaks=c(0.25, 0.5,1,2,4),
na.value = "white") +
labs(title = "Boston") +
theme_void() +
theme(strip.text.x = element_blank(),
legend.position = "None",
plot.title = element_text(hjust = 0.5))
ggsave("map.irr.bc.png")
Note that the boundaries for some cities and towns are completely blank. These areas did not have any recorded breast cancer deaths for Black or White women so they don’t get visualized because there is literally no data. Cities and towns that are visualized, but are white or “empty,” only have data for White women. Again, we see that only a handful of cities and towns recorded breast cancer deaths for Black women, so these cities and towns get visualized as white or “empty” because an IRR can’t be calculated. Key is understanding that these rates are influenced by the small sample sizes and large potential errors we have previously visualized. It’s for researchers and communities to interpret how “real” the effects we see are.
8.5.3 Question 3: How do area-based socioeconomic measures interact with individual-level membership in racialized groups to affect patterns of breast cancer mortality (i.e., interactions between socioeconomic position and racialized groups, not just socioeconomic inequities within racialized groups)?
poisson_data <- town_ma_mort_bc %>%
inner_join(town_ma_demo_acs_bc, by = c("year","super_town","race_group","sex","age_cat")) %>%
filter(race_group %in% c("Non-Hispanic White","Non-Hispanic Black"),
age_cat != "Total") %>%
group_by(super_town, race_group, age_cat) %>%
summarise(num = sum(deaths, na.rm=T),
den = sum(population, na.rm=T))%>%
mutate(den = den + 0.001,
race_group = factor(race_group)) %>%
inner_join(town_ma_absm_sum, by = c("super_town")) %>%
fastDummies::dummy_cols(select_columns=c("pov_cat", "ICE_qt")) %>%
rename(pov_cat_1 = "pov_cat_0-4.9%",
pov_cat_2 = "pov_cat_5-9.9%",
pov_cat_3 = "pov_cat_10-19.9%",
pov_cat_4 = "pov_cat_20-100%",
ICE_qt_1 = "ICE_qt_[-0.272,0.0426]",
ICE_qt_2 = "ICE_qt_(0.0426,0.237]" ,
ICE_qt_3 = "ICE_qt_(0.237,0.337]" ,
ICE_qt_4 = "ICE_qt_(0.337,0.45]" ,
ICE_qt_5 = "ICE_qt_(0.45,0.687]")
# Null poisson model
model0 <- glm(num ~ race_group + factor(age_cat) + offset(log(den)),
family=poisson(link=log),
data=poisson_data)
summary.model0 <- summary(model0)
saveRDS(summary.model0, file = "poisson0_bc.rds")
poisson_data$fit <- as_tibble(predict(model0)) %>%
transmute(fit = exp(value))
poisson_irr <- poisson_data %>%
group_by(super_town, race_group) %>%
summarise(num = sum(fit, na.rm=T),
den = sum(den, na.rm=T)) %>%
mutate(fit_rate = num/den * 1000) %>%
pivot_wider(id_cols = super_town,
names_from = race_group,
values_from = fit_rate) %>%
mutate(fit_irr = ifelse(`Non-Hispanic White` == 0, NA_real_, `Non-Hispanic Black` / `Non-Hispanic White`))
poisson_irr_plot <- poisson_irr %>%
ungroup() %>%
arrange(fit_irr) %>%
mutate(orderID = row_number())%>%
ggplot(aes(x=orderID, y=fit_irr)) +
geom_point(color = "limegreen", alpha = 0.8, size = 1) +
labs(title = expression(atop("Incidence Rate Ratio Comparing Non-Hispanic Black",
"and Non Hispanic White Populations")),
caption = expression(atop("Source: Massachusetts Mortality Data 2013-2017",
"5-Year ACS files from end-years 2015-2019"))) +
xlab("Census Tracts") +
ylab("IRR") +
theme_minimal()
ggsave("poisson_irr_plot_bc.png")
And of course, now we can map that value as well.
An adjusted model would allow for us to see how poverty rates are impacting the incidence rate ratio.
# Adjusted Poisson Model
model1 <- glm(num ~ race_group + factor(age_cat) + (race_group * factor(pov_cat, exclude = NULL)) + offset(log(den)),
family=poisson(link=log),
data=poisson_data)
summary.model1 <- summary(model1)
saveRDS(summary.model1, file = "poisson1_bc.rds")
poisson_data$adj_fit <- as_tibble(predict(model1)) %>%
transmute(adj_fit = exp(value))
adj_poisson_irr <- poisson_data %>%
group_by(super_town, race_group) %>%
summarise(num = sum(adj_fit, na.rm=T),
den = sum(den, na.rm=T)) %>%
mutate(adj_fit_rate = num/den * 1000) %>%
pivot_wider(id_cols = super_town,
names_from = race_group,
values_from = adj_fit_rate) %>%
mutate(adj_fit_irr = ifelse(`Non-Hispanic White` == 0, NA_real_, `Non-Hispanic Black` / `Non-Hispanic White`))
adj_poisson_irr_plot <- adj_poisson_irr %>%
ungroup() %>%
arrange(adj_fit_irr) %>%
mutate(orderID = row_number())%>%
ggplot(aes(x=orderID, y=adj_fit_irr)) +
geom_point(color = "limegreen", alpha = 0.8, size = 1) +
labs(title = expression(atop("Poverty Adjusted Incidence Rate Ratio",
"Comparing Non-Hispanic Black and Non Hispanic White Populations")),
caption = expression(atop("Source: Massachusetts Mortality Data 2013-2017",
"5-Year ACS files from end-years 2015-2019"))) +
xlab("Census Tracts") +
ylab("IRR") +
theme_minimal()
ggsave("adj_poisson_irr_plot_bc.png")
And now the map.
adj_map.state.p.irr <- adj_poisson_irr %>%
mutate(adj_fit_irr = case_when(adj_fit_irr > 4 ~ 4,
TRUE ~ adj_fit_irr)) %>%
left_join(town_geometry, by= "super_town") %>%
ggplot() +
geom_sf(mapping = aes(geometry = geometry,
fill = adj_fit_irr),
lwd = 0.1) +
scale_fill_viridis(option = "E",
trans = scales::pseudo_log_trans(sigma=0.01),
limits = exp(c(-1,1)*log(4)),
breaks=c(0.25, 0.5,1,2,4),
na.value = "white") +
labs(title = expression(atop("Poverty Adjusted Incidence Rate Ratio",
"Comparing Non-Hispanic Black and Non Hispanic White Populations")),
caption = expression(atop("Source: Massachusetts Mortality Data 2013-2017",
"5-Year ACS files from end-years 2015-2019")),
fill = "IRR", x="", y="") +
theme_void() +
theme(legend.position = c(0.25, 0.25),
legend.key.size = unit(0.3, "cm"),
plot.title = element_text(hjust = 0.1),
plot.subtitle = element_text(hjust = 0.1))
ggsave("adj_map.p.irr_bc.png")
To understand statistically how city/town poverty level is impacting the relationship between racialized groups and breast cancer mortality, we can compare the models with and without poverty. We can see the impact of social membership in racialized group is reduced with the presence of the city/town level poverty.
##
## Call:
## glm(formula = num ~ race_group + factor(age_cat) + offset(log(den)),
## family = poisson(link = log), data = poisson_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.4188 -0.0709 0.7356 1.3605 7.0549
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.1048 1.0000 -6.105 1.03e-09 ***
## race_groupNon-Hispanic Black 0.3307 0.1640 2.016 0.0438 *
## factor(age_cat)25-29 -2.7375 1.0541 -2.597 0.0094 **
## factor(age_cat)30-34 -1.2184 1.0240 -1.190 0.2341
## factor(age_cat)85+ 0.2629 1.0006 0.263 0.7928
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 962.55 on 250 degrees of freedom
## Residual deviance: 649.01 on 246 degrees of freedom
## AIC: 1360.4
##
## Number of Fisher Scoring iterations: 8
##
## Call:
## glm(formula = num ~ race_group + factor(age_cat) + (race_group *
## factor(pov_cat, exclude = NULL)) + offset(log(den)), family = poisson(link = log),
## data = poisson_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.5916 -0.2185 0.4568 1.2308 7.0101
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.00791 1.00397 -5.984 2.17e-09 ***
## race_groupNon-Hispanic Black 3.60000 0.74960 4.803 1.57e-06 ***
## factor(age_cat)25-29 -2.21154 1.05829 -2.090 0.036641 *
## factor(age_cat)30-34 -1.13313 1.02978 -1.100 0.271174
## factor(age_cat)85+ 0.42021 1.00158 0.420 0.674820
## factor(pov_cat, exclude = NULL)5-9.9% -0.09688 0.08916 -1.087 0.277233
## factor(pov_cat, exclude = NULL)10-19.9% -0.36987 0.10031 -3.687 0.000227 ***
## factor(pov_cat, exclude = NULL)20-100% -0.67203 0.11095 -6.057 1.39e-09 ***
## race_groupNon-Hispanic Black:factor(pov_cat, exclude = NULL)5-9.9% 2.64245 0.84169 3.139 0.001692 **
## race_groupNon-Hispanic Black:factor(pov_cat, exclude = NULL)10-19.9% -2.01362 0.80967 -2.487 0.012884 *
## race_groupNon-Hispanic Black:factor(pov_cat, exclude = NULL)20-100% -3.30795 0.77136 -4.288 1.80e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 962.55 on 250 degrees of freedom
## Residual deviance: 505.50 on 240 degrees of freedom
## AIC: 1228.9
##
## Number of Fisher Scoring iterations: 10
8.5.4 References
Breast Cancer Research Foundation. (2022). Black women and breast cancer: Why disparities persist and how to end them. https://www.bcrf.org/blog/black-women-and-breast-cancer-why-disparities-persist-and-how-end-them/; accessed June 14, 2022.
Friedman D, Hunter E, Parrish R (eds). (2005). Health statistics. New York: Oxford University Press.
Hetzel AM. (1997). History and organization of vital statistics systems. Bethesda, MA: National Center for Health Statistics. https://www.cdc.gov/nchs/data/misc/usvss.pdf; accessed June 14, 2022.
Krieger N. (2019). The US Census and the People’s Health: Public Health Engagement From Enslavement and “Indians Not Taxed” to Census Tracts and Health Equity (1790-2018). Am J Public Health. 2019 Aug;109(8):1092-1100. doi: 10.2105/AJPH.2019.305017. Epub 2019 Jun 20.
Krieger N, Singh, N, Waterman, PD. (2016). Metrics for monitoring cancer inequities: residential segregation, the Index of Concentration at the Extremes (ICE), and breast cancer estrogen receptor status (USA, 1992-2012). Cancer Causes Control. 2016 Sep;27(9):1139-51. doi: 10.1007/s10552-016-0793-7. Epub 2016 Aug 8.
US Office of Management and Budget. Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity. Federal Register 1997; 62(210):58782-58790. https://www.govinfo.gov/content/pkg/FR-1997-10-30/pdf/97-28653.pdf; accessed June 14, 2022.
Williams DR, Collins C. (2001). Racial residential segregation: a fundamental cause of racial disparities in health. Public Health Rep. Sep-Oct 2001;116(5):404-16. doi: 10.1093/phr/116.5.404.