5 Visualizing your data

By: Christian Testa, Enjoli Hall

Data visualization is a critical component in communicating and advocating for health equity as it makes data accessible and transparent. However, it is not without its pitfalls, and in this chapter we will discuss some of the important points to consider when visualizing data for advancing health equity.

Firstly, as a matter of accessibility, we strive to use colorblind friendly color palettes when using color so that individuals with one of the different kinds of colorblindness can still interpret our visualizations. We encourage you to learn more about colorblindness and colorblind friendly palettes from a number of resources (Katsnelson, 2021) (Ou, 2021). The color vision deficiency simulator from the colorblindr package is especially helpful in testing if a visual you are creating is colorblind friendly.

If you are looking for help learning how to create data visualizations in R, we recommend checking out the online, free book: ggplot2: Elegant Graphics for Data Analysis.

If you are looking for help learning how to work with spatial data in R, we recommend the following free, online books:

Spatial Data Science with applications in R by Edzer Pebesma and Roger Bivand
https://r-spatial.org/book/
Geocomputation with R by Robin Lovelace, Jakub Nowosad, and Jannes Muenchow
https://geocompr.robinlovelace.net/
Analyzing US Census Data: Methods, Maps, and Models in R, by Kyle Walker
https://walker-data.com/census-r/
Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny by Paula Moraga. https://www.paulamoraga.com/book-geospatial/index.html.

One of the main points we urge caution around with respect to the visualization and mapping of area based health outcome data are the presentation of rates which are unstable due to small population sizes. As a broader principle, we emphasize the need for careful choice of the area level at which results are presented. In part, this is because it is well established that changing the areal units into which data are aggregated and analyzed can change the relationships observed in the data. This is known as the Modifiable Areal Unit Problem, which can be read about in (Wong 2004) and (Buzzelli 2020). Of note, choice of area should not be arbitrary, but should be guided by a priori reasons, including which areas make the most sense to use to answer which questions. For example, analyses of population-wide health inequities within the entire US may wish to use census tract or county level data, whereas analyses more specifically focused on the political geography of health inequities may wish to use areas with political boundaries, e.g., state or congressional legislative districts (Keena et al, 2021; Krieger, 2019; Krieger et al, 2022).

For example at small area levels like the Census tract level, the underlying population may be so small that the health outcomes observed are quite noisy because a single additional case represents a large shift in the rate. In such a situation, it it could potentially be erroneous to infer that the Census tract with the highest observed rate has the greatest underlying risk because it could be an artifact of noise. This motivates the need for spatial smoothing, and we encourage the use of spatially smoothed estimates for choropleth maps showing health outcome rates to avoid such a potential pitfall. As an introduction to these pitfalls, we recommend the Pitfalls to avoid chapter from (Gimond 2022).

It is worthwhile to remark on how different areal units can either center or marginalize social groups based on the geographic boundaries employed. It is also crucial to be clear about which social groups are excluded or inaccurately represented in the area-based data, including but not limited to people who are unhoused, incarcerated, otherwise institutionalized or experiencing other kinds of marginalization. Warranting scrutiny are the protocols employed by the data holders to assign addresses and/or georeferenced codes to the records of persons who are not living in non-institutional residences, as well as who is counted towards the population totals of the specified geographic areas.As an example, one lesson we have learned from analyzing Boston area mortality data is that it can be quite useful to know the locations of and areas containing homeless shelters, because we have found that individuals designated as experiencing homelessness at the time of their death have had their place of death in the residential field of the death certificate listed as the address of the homeless shelter. The net impact can be to inflate the mortality rate of the census tracts in which these shelters are located.

Oftentimes georeferenced and geocoded data will have anomalous or idiosyncratic features such as areas where the rate appears to be infinite because population estimates are zero for that area despite having observed health outcomes, especially if the data are for small areas or particular sub-populations. Although there is not a single solution to these problems, a key starting place is to examine the data critically to understand the socially-produced data protocols and social distribution of the populations at issue, so as to be on the lookout for these very real problems that arise from and manifest social spatial inequities. We accordingly encourage data analysts and visualizers to consider carefully the impacts the choice of areas, their boundaries, the processes by which the locations of people are assigned to these areas (whether as “numerators” or “denominators”), and the power relations affecting who is likely to be geocoded to these areas – and who is missed – whether as a “case” or member of the population from which the cases arise (i.e., “denominator”).

To start to develop a wider perspective on how data visualization and mapping can be used, we list below a range of recommended books and articles. Because data visualization and mapping are powerful tools to communicate (or miscommunicate) data, it is critical to be aware of how they can reflect bias and tell lies (Deluca and Nelson, 2017), (Fleckenstein 1991), (Monmonier 2021), as well as reveal powerful truths (Koch 2017). Tom Koch outlines the history of disease mapping in his book Disease Maps, Epidemics on the Ground (2011), and in his 2017 follow-up book, Ethics in Everyday Places: Mapping Moral Stress, Distress, and Injury, he takes a broader perspective.

5.1 Health Equity

To understand the distribution of health and disease in place, it is necessary to collect, analyze, and visualize health data and area-based social metrics. It is equally important, however, that when documenting and mapping health inequities, we contextualize such data with adequate analysis of social, political, and ecological context. Shared observations of disparities in health do not necessarily translate to common understandings of cause, especially when population patterns of disease and health mirror population distributions of deprivation and privilege. Mapping and visualizing the uneven social and spatial distribution of health and disease in areas – which can be a particularly effective way of communicating health inequities to decision makers – without offering some explanatory context of the allocation of resources and hazards in these areas can perpetuate harmful ideas and actions that actually undermine the goal of eliminating health inequities.

With the increasing availability of local health data such as the CDC PLACES program, as well as advancements in the availability and accessibility of geocoding services, there is ample opportunity to disaggregate health data, particularly to the neighborhood (census tract) level. Geographic disaggregation allows for more fine-grained analyses, including multilevel spatial modeling, which can inform more “targeted” interventions. But when presented by themselves with no explanatory context, such granular data can create or reinforce what sociologist Loïc Wacquant refers to as “territorial stigmatization,” whereby the characteristics or features of a place are associated with the moral character and behavior of its residents, or vice versa—especially for people and places who are already politically, economically, and socially marginalized and/or materially deprived of important resources (Chowkwanyun and Reed 2020). For example, if some places are found to have a high concentration of illness or disease, narratives and representations of those places as “diseased,” “contaminated,” could produce or reinforce existing stigma and lead to targeted interventions such as heightened policing and surveillance in an attempt to contain and control residents, reclamation or demolition of physical structures, and social neglect and abandonment (There are too many historical case studies of this, for these potential risks to be downplayed or ignored: Craddock 2004; Molina 2006; Roberts 2009; Lopez 2009; Krupar and Ehlers 2017).

There are various approaches to countering territorial stigmatization. Mapping place-based risks and resource deficits that might help explain the spatial distribution of disease, illness, and injury along racial and socioeconomic lines can focus public and policy attention on shifting the context for health rather than individual behaviors and attitudes. For example, in the case of Covid-19, this could look like mapping and visualizing the uneven geographic distribution of preventive health care facilities or the concentration of respiratory hazards in areas of racialized concentrated poverty. Furthermore, one could map historical and political variables such as historical redlining that can offer important insight into how and why the social and spatial patterning of life-enhancing and harmful resources exists in an area as a result of racist policies and practices (Rothstein 2017; Mapping Inequality 2022; Krieger et al 2020a, 2020b; Wright et al. 2022). Additionally, asset mapping can provide helpful information about the strengths and resources of a community to facilitate discussion and action around building on these assets to address community needs and improve community health.

Analyzing and visualizing patterns of White wealth and health is also important to understanding and addressing patterns of population health and health inequities. Mapping and visualizing “racially concentrated areas of affluence” can help move research and policy attention away from a predominant concern for racially concentrated areas of poverty and toward a more holistic consideration of the full range of health outcomes, resources, and hazards in an area (Goetz et al. 2019). A focus on racially concentrated areas of affluence underscores the reality that structural racism produces both racialized concentrations of poverty (and hazards) and racialized concentrations of wealth (and health-enabling resources). Other measures such as the Index of Concentration of the Extremes (ICE), which quantifies the distribution of persons at the extremes of relationships of privilege and deprivation, also bring the full population and power relations into view, and can be scaled for use at multiple levels of geography (e.g., census block, census block group, census tract, city/town, county, etc.) (Massey 2001; Krieger et al. 2015, 2016, 2017, 2018). Initially developed to measure spatial polarization in economic terms (i.e., economic residential segregation), in public health studies we have extended its use to include novel measures of racialized residential segregation and racialized economic segregation (Krieger et al. 2015, 2016, 2017, 2018).

In summary, addressing health inequities requires a relational understanding of how systems of power and resource allocation simultaneously produce poor health for some and good health for others. This approach may require analyzing and visualizing patterns in population health and health inequities at large geographic scales such as counties and regions, rather than at the city level for example, to capture a wider range of values for health outcome data and area-based social metrics.

REFERENCES

Buzzelli M. (2020) ‘Modifiable Areal Unit Problem’, International Encyclopedia of Human Geography, pp. 169–173. doi:10.1016/B978-0-08-102295-5.10406-8.

Chowkwanyun M and Reed Jr AL. (2020). Racial health disparities and Covid-19—Caution and context. New England Journal of Medicine, 383(3), 201-203. doi:10.1056/NEJMp2012910

Craddock, S. (2004). City of Plagues: Disease, Poverty, and Deviance in San Francisco. University of Minnesota Press.

Deluca E. and Nelson S. (2017) ‘Lying With Maps’. Available at: https://open.lib.umn.edu/mapping/chapter/7-lying-with-maps/ (Accessed: 7 June 2022).

Dorling D, Fairbairn D (1997). Mapping: Ways of Representing the World. Old Tappen, UK: Routledge.

Fleckenstein L. (1991) ‘How Maps Lie’, Syracuse University Magazine, December. Available at: https://surface.syr.edu/cgi/viewcontent.cgi?article=1245&context=sumagazine.

Gimond M. (2022) Intro to GIS and Spatial Analysis. Available at: https://mgimond.github.io/Spatial/index.html (Accessed: 7 June 2022).

Goetz EG, Damiano A, and Williams RA. Racially concentrated areas of affluence: A preliminary investigation. Cityscape, 21(1), 99-123.

Katsnelson, A. (2021) ‘Colour me better: fixing figures for colour blindness’, Nature, 598(7879), pp. 224–225. doi:10.1038/d41586-021-02696-z.

Keena A, Latner M, McGann AJM, Smith CA. Gerrymandering the States: Partisanship, Race, and the Transformation of American Federalism. Cambridge, UK: Cambridge University Press, 2021.

Koch T. (2011) Disease Maps: Epidemics on the Ground. Chicago, IL: University of Chicago Press. Available at: https://press.uchicago.edu/ucp/books/book/chicago/D/bo8490164.html (Accessed: 7 June 2022).

Koch T. (2017) Ethics in Everyday Places: Mapping Moral Stress, Distress, and Injury | Esri Press (2017). Available at: https://www.esri.com/en-us/esri-press/browse/ethics-in-everyday-places-mapping-moral-stress-distress-and-injury (Accessed: 7 June 2022).

Krieger N, Van Wye G, Huynh M, Waterman PD, Maduro G, Li W, Gwynn C, Barbot O, Bassett MT. Historical redlining, structural racism, and preterm birth risk in New York City (2013-2017). Am J Public Health 2020; 110(7):1046-1053.

Krieger N, Wright E, Chen JT, Waterman PD, Huntley ER, Arcaya M. Cancer stage at diagnosis, historical redlining, and current neighborhood characteristics: breast, cervical, lung, and colorectal cancer, Massachusetts, 2001-2015. Am J Epidemiol 2020; 189(10):1065-1075.

Krieger N, Waterman PD, Spasojevic J, Li W, Maduro G, Van Wye G. Public health monitoring of privilege and deprivation using the Index of Concentration at the Extremes (ICE). Am J Public Health 2016; 106: 256-253

Krieger N, Waterman PD, Gryparis A, Coull BA. Black carbon exposure, socioeconomic and racial/ethnic spatial polarization, and the Index of Concentration at the Extremes (ICE). Health & Place 2015; 34:215-228.

Krieger N, Feldman JM, Waterman PD, Chen JT, Coull BA, Hemenway D. Local residential segregation matters: stronger association of census tract compared to conventional city-level measures with fatal and non-fatal assaults (total and firearm related), using the Index of Concentration at the Extremes (ICE) for racial, economic, and racialized economic segregation, Massachusetts (US), 1995-2010. J Urban Health 2017; 94:244-258.

Krieger N, Kim R, Feldman J, Waterman PD. Using the Index of Concentration at the Extremes at multiple geographic levels to monitor health inequities in an era of growing spatial social polarization: Massachusetts, USA (2010-2014). Int J Epidemiol 2018; 47:788-819.

Krieger N. The US Census and the people’s health: Public health engagement from enslavement and “Indians Not Taxed” to census tracts and health equity (1790-2018). Am J Public Health 2019; 109(8):1092-1100.

Krieger N, Testa C, Chen JT, Hanage WP, McGregor AJ. Relationship of political ideology of US federal and state elected officials and key COVID pandemic outcomes during the vaccine era: April 2021-March 2022. Lancet – Regional Health Americas (in press).

Krupar S, & Ehlers N. (2017). Biofutures: Race and the governance of health. Environment and Planning D: Society and Space, 35(2), 222-240. doi:10.1177/0263775816654475

Lopez RP. (2009). Public health, the APHA, and urban renewal. American Journal of Public Health, 99(9), 1603-1611. doi:10.2105/AJPH.2008.150136

Lovelace R, Nowosad J, and Muenchow J. (2022) Geocomputation with R. Available at: https://geocompr.robinlovelace.net/ (Accessed: 7 June 2022).

Molina N. (2006). Fit to be Citizens? Public Health and Race in Los Angeles, 1879-1939. University of California Press.

Monmonier M. (2018) How to Lie with Maps, Third Edition. Chicago, IL: University of Chicago Press. Available at: https://press.uchicago.edu/ucp/books/book/chicago/H/bo27400568.html (Accessed: 7 June 2022).

Nelson R, Winling L, Connolly NDB, Madron J, Marciano, R. Mapping Inequality: Redlining in New Deal America. https://dsl.richmond.edu/panorama/redlining/#loc=5/39.1/-94.58&text=about ; accessed June 17, 2022.

Ou, J. (2021) Safe colorsets. Available at: https://cran.r-project.org/web/packages/colorBlindness/vignettes/colorBlindness.html (Accessed: 7 June 2022).

Roberts Jr SK. (2009). Infectious Fear: Politics, disease, and the Health Effects of Segregation. University of North Carolina Press.

Rothstein R. (2017). The Color of Law: A Forgotten History of How Our Government Segregated America. New York: Liveright Publishing Co., W.W. Norton & Co.

Tufte D. (2001). The Visual Display of Quantitative Information. 2nd ed. Cheshire, CT: Graphics Press.

Tufte ER. (2020). Seeing with Fresh Eyes: Meaning, Space, Data, Truth. Cheshire, CT: Graphics Press.

Wong D. (2004) ‘The modifiable areal unit problem (MAUP)’, in WorldMinds: Geographical Perspectives on 100 Problems. Springer Netherlands. Available at: https://link.springer.com/chapter/10.1007/978-1-4020-2352-1_93.

Wright E, Waterman PD, Testa C, Chen JT, Krieger N. Breast Cancer Incidence, Hormone Receptor Status, Historical Redlining, and Current Neighborhood Characteristics in Massachusetts, 2005-2015. JNCI Cancer Spectrum 2022; 6(2); https://doi.org/10.1093/jncics/pkac016; epub on-line: Feb 18, 2022.

4 Getting your data

6 Analyzing your data