Step 4: Resolution

In this step, all candidate matches from all the pockets are resolved to find the best link.  A comparison outcome string, component weights and the total weight for each potential match are available  to establish the best link. The use of comparison outcome strings reduces the need for manual resolution and speeds up the process of linking large files while providing for more refined selection then using a single threshold weight.

The first rule of resolution is that if a record matched to the Population Directory with perfect agreement on all of the linkage fields, that record is considered the best link. As records are linked, the record and best match from the Population Directory are moved to a separate file, and all other potential matches for that record are removed from the working file, leaving only unresolved potential matches. The remaining records then go through a series of resolution rules whereby at each step the next best links are taken.  For example, the next best link might be a perfect match on the majority of the linkage fields, with one or two missing fields (such as middle name). The final resolution step may involve manual resolution by the programmer. 

Various statistics are documented for the linkage process.  At each resolution step, the percentage of records that linked using that resolution rule is recorded. Linkage rates are also calculated for the dataset as a whole and by Local Health Area, and age and sex groups. These rates are examined for potential problems or issues – for example, a low linkage rate among newborns would be compared to rates found for previous years for the same data. 

Once the new data set is linked to the Population Directory, the “Linkage ID”, which is the consistent, encrypted identifier that uniquely identifies each individual in the Population Directory, is placed in a new file, along with the Record ID from the original content data. Records that did not find a link get assigned a missing Linkage ID. The new Linkage ID is matched to its PopData ID, which is a generated number unique to each individual and is applied to the content data. The content data now has PopData IDs that are consistent across all PopData holdings and so can be used as a base to link data sets across time and content areas without needing to access the personal identifiers again.

Page last revised: November 4, 2014