BEST PRACTICES - MAPPING HEALTH DATAWorking with area dataData about individuals are often available only at an aggregated level in order to protect personal information. For example, average income levels for census tracts are readily available, but the income of an individual person in that census tract is usually not available. Similarly, the total number of people with asthma in a health service area might be known, but not each persons’ individual location within that area. When using area-based data, there are a number of issues to consider:
Ecological fallacy and data analysisAn ecological fallacy occurs whenever a researcher makes assumptions about individuals based on data that have been summarized for areas. For example, a researcher might examine the relationship between low income and heart disease using data for health regions across Canada. They might find a positive association suggesting that areas with higher percentages of people with low income also have higher percentages of people with heart disease. Concluding that low income is a causative factor for heart disease would be an ecological fallacy. It may be that the people with heart disease are not the same people with low income in any given area, and there is no way of determining this without individual-level data. Studies that use only area-based data are called ‘ecological studies’ and should always be considered as exploratory in nature. Case examples by Robinson (1950) and Openshaw (1984) are the most well-known and in-depth reviews of the effects of the ecological fallacy when using Census or other health registry data. There are also studies that combine individual-level data with area-based data. These are commonly called multi-level studies (MLM), and are meant to include effects at the individual level (for example, heart disease) and ecological level (for example, average income for the neighbourhood) for each person in the study. MLM separately analyze the variance between different levels of data (e.g. individual vs. neighbourhood) when analyzing the effects on health outcomes. Theoretically, MLM also allows the researcher to analyze at what level, or scale, variations in individual-level health outcomes are best explained. Research papers authored by Diez-Roux et al (2000), and Stafford et al (2001) contain excellent review of MLM and provide theoretical as well as analytical summary of its strengths and limitations for population health research. In many instances, however, data restrictions or availability limit the ability to obtain individual-level data. For example, many microdata sources may not contain all of the ‘causal’ variables that are required for analyzing a particular health condition. In these instances, it may be necessary to link the microdata with a separate dataset that contains a population average that can be used as a surrogate indicator that is otherwise known to be related to the particular health condition. Census data are some of the most frequently used surrogate measures in health research. Case examples of how to construct proxy measures of individual socio-economic status can be found in the following papers: When using area-based data, the following points should be considered:
For more information on multi-level modelling: http://www.paho.org/English/DD/AIS/be_v24n3-multilevel.htm Modifiable areal unit problemA wide range of health-related data are available only in a summarized form in order to protect individual confidentiality. Examples include Census data, health outcome rates for health services jurisdictions, and vital statistics data. Whenever individual data are summarized for areas, the statistic of interest (total count, percent of low income, and so on) depends on the area boundaries used, and if different boundaries are used, even for the same individual-level data, different statistics can result. This is commonly referred to as the modifiable areal unit problem (MAUP). This means that analysis results might change, depending on the area boundaries used! Anaylses using area-based data may also lead to ecological fallacies. There are no solutions for MAUP, but the following approaches can be useful for minimizing or understanding MAUP effects in analyses using area-based data.
Rate instabilityComparing rates of health outcomes among different areas is commonly used for disease surveillance purposes, i.e., to identify areas where disease rates are higher or lower than expected. However, incidence rates computed for areas can produce highly unreliable or ‘unstable’ rates, especially when calculated for sparsely populated rural or remote areas, or for rare diseases. Spatial patterns of rates may vary for a number of reasons:
When working with rate data for different areas which may be unstable due to small populations, researchers and analysts should consider the following:
![]() Useful links:Population data for BC (includes by age/sex): http://www.bcstats.gov.bc.ca/data/pop/popstart.asp References:Carstairs, V. and R. Morris (1989). "Deprivation and Mortality - an Alternative to Social-Class." Community Medicine 11(3): 210-219. Diez-Roux, A. V. (2000). "Multilevel analysis in public health research." Annual Review of Public Health 21: 171-192. Macintyre, S., S. Maciver, et al. (1993). "Area, Class and Health - Should We Be Focusing on Places or People." Journal of Social Policy 22: 213-234. Openshaw, S. (1984). "Ecological Fallacies and the Analysis of Areal Census-Data." Environment and Planning A 16(1): 17-31. Pampalon, R. and G. Raymond (2000). "A Deprivation Index for Health and Welfare Planning in Quebec." Chronic Diseases in Canada 21(3): 104-113. Robinson, W. S. (1950). "Ecological Correlations and the Behavior of Individuals." American Sociological Review 15: 351-357. Stafford, M., M. Bartley, et al. (2001). "Characteristics of individuals and characteristics of areas: investigating their influence on health in the Whitehall II study." Health & Place 7(2): 117-129. Krieger, N., J. T.Chen, P. D. Waterman, M. J. Soobader, S. V. Subramanian and R. Carson (2002). "Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: Does the choice of area-based measure and geographic level matter? The Public Health Disparities Geocoding Project." American Journal of Epidemiology 156(5): 471-482. Nakaya, T. (2000). "An information statistical approach to the modifiable areal unit problem in incidence rate maps." Environment and Planning A 32(1): 91-109. Soobader, M. J., F. B. LeClere, W. Hadden and B. Maury (2001). "Using aggregate geographic data to proxy individual socioeconomic status: Does size matter?" American Journal of Public Health 91(4): 632-636. Richardson S, Thomson A, Best N, Elliott P. (2004) Interpreting posterior relative risk estimates in disease-mapping studies. Environmental Health Perspectives 112: 1016-1025. 1. Population counts are extrapolated between census years which can significantly alter rates if the extrapolation is erroneous. For example, one census tract may have experienced significant immigration which increased observed disease rates, but the denominator (population) used to calculate risk may not have reflected this immigration because the count was extrapolated between census years. < back to Mapping Health Data main page |

