BEST PRACTICES - MAPPING HEALTH DATAWorking with point dataSome data are available for specific geographic locations, for example, hospital or other service addresses, or the residential postal code locations for everyone with asthma in a community. GIS is used to geocode data that has address or postal code information, so they can be mapped and analyzed, but maintaining individual confidentiality is critical. Please begin by selecting a component from the following list of themes: Geocoding
What is geocoding?Geocoding is the process by which geographic coordinates are linked with addresses or other location information such as postal codes, street intersections, or unique identifiers to produce a map. What type of information can be geocodedAlmost any form of information can be geocoded. In some cases, this information is exact, such as the street address of a person's residence. It is also possible to geocode health data using postal codes. Occasionally, data are collected using Global Positioning Systems (GPS) and provide an exact latitude and longitude coordinate for mapping. What type of output do you get from geocoding data?In most instances geocoding will produce a map of points representing the location of each record in a health data table. The record will contain all of the information originally associated with the health data (i.e. health status, age, weight, etc.). When should geocoding be performed?Geocoding can be performed whenever a user wishes to map health data that is currently in tabular form or if it is necessary to add a spatial identifier to health data in order to perform a spatial analysis. What is address matching?Address geocoding, or address matching, is a multi-step process that combines spatial reference data (e.g. street network file) with an event table (e.g. hospital registry). Address geocoding is organized by a series of rules that define how descriptors in the event table, such as street addresses, will be transformed into spatial data before being displayed as features on a map. Most GIS operating systems support multiple address styles that you can use to link health data to a spatial identifier. Choosing the appropriate reference style depends on knowing what type of reference data you have access to as well as the descriptors in your tabular data. For example, if the address style supports alias names (e.g. St. Paul's Hospital; Cambie Library) in addition to street addresses it is helpful to ensure that this information is separately listed in your event table. This ensures that when using an automatic geocoding engine addresses with generic names (e.g. Main St.) are matched to their the correct location.
How accurate is geocoded data?In GIS, accuracy refers to the degree to which the geocoded location approximates the true location of the object on the ground. Geocoding is only as accurate as either the information contained in the health data table or the quality of the spatial data you use to determine locations from. If your location data is a street network file and this file contains each address along the entire road then all health data joined to this spatial file will return a highly accurate representation of the location associated with the health data. However, most spatial reference datasets only contain intersections of streets. In these instances, the address location of the health data is interpolated along the line between the intersections.
Non-spatial challenges associated with address matching are primarily due to spelling and sensitivity errors when matching the health data table to a spatial reference file. An initial review of your health data as well as the spatial reference data is always recommended to maximize the likelihood of having a successful match rate. Most often, poor match rates are the result of subtle differences between the health and spatial reference tables, such as a period after ‘St.’ or full versus truncated street suffix names (e.g. Avenue vs. Ave.). Six digit postal codes are also frequently used to approximate the longitude and latitude location of individual health data. The accuracy of postal code locations is dependent on a number of factors. For example, in urban areas, one side of a single street block may have a unique postal code. Sometimes, large apartment buildings have their own postal code. In rural areas, postal codes often represent the location of a community mailbox, which may serve a large number of addresses that are located at varied distances from the mailbox. Postal codes can also represent the physical location of the rural post office rather than the location where residents live. See step three in address matching for a demonstration of this concept. Maintaining confidentialityMaintaining an individual's confidentiality when using health data is a primary concern. This web page includes information on using location offsets, geomasking and aggregating data as ways to deal with confidentiality. This section also includes references and links to further information. Location offsets/geomaskingExact locations can be obscured using location offsets where a new location is generated based on random variation from the real location. The location must not be offset so much that spatial analysis suffers, so one method would be to do the analysis with real locations, but publish with offset data for maps (if required, or don't publish maps). Maps using graduated symbols to summarize counts inside a polygon are another solution. There are three main techniques for geomasking:
Point aggregation - This method involves the aggregation of several closely placed points into one point. The size of the point will typically represent the sum of the aggregated points. This allows for the elimination of points from the map while preserving trends within the data (see Figure 1).
Figure 1. On the left points are shown at their original location. On the right points are aggregated and snapped to the nearest intersection. Transformation - This method involves the use of scaling to shift points a pre-determined distance from their original location, or to rotate the point pattern around a certain point. A combination of these methods may also be used (see Figure 2).
Figure 2. On the left points are displayed at their original location. On the right points have been scaled and rotated to shift their location. The red point represents the same point before and after geomasking. Researchers should clearly state that points were shifted to alter their locations. Random peturbation - This method introduces randomization by generating random distances and angles into the points. A pre-determined distance is applied based on population density. The larger the population density, the smaller the distance should be (see Figure 3).
Figure 3. The red point is the true location of a point. By assigning a random distance and angle to the point it can be moved to any location within the assigned distance. Kwan et al. (2004) (see under References below) looked at three types of geomasks and their affect on the accuracy of geocoded health data. Data were masked using point pattern analysis methods, by changing the location of the points within a number of set distances, and by exposing more or less area based on the area's population density. The authors found that most masks produced a negative relationship between concern for privacy and spatial accuracy. However, the findings were not consistent for each technique. This suggests that geographic masking techniques should be purpose-specific as different techniques may have varying effects that could affect either confidentiality or accuracy. Aggregating dataData can be rendered unidentifiable through aggregation. In this scenario, analysis is completed using point locations, but results are aggregated results for map publication. Pre-aggregated dataYou may only be able to get health data pre-aggregated to health service areas or census areas, and there are a number of issues to consider when using this type of data. If this is the case, please see Working with Area Data. ReferencesThe North American Association for central Cancer Registries has produced a handbook of best practices on the use of GIS with cancer data which contains best practices related to geocoding, confidentiality, spatial analysis and cartography. Kwan, M.-P., I. Casas, et al. (2004). "Protection of Geoprivacy and Accuracy of Spatial Information: How Effective Are Geographical Masks?" Cartographica 39(2): 15-28. < back to Mapping Health Data main page |





