DATA ACCESS - FREQUENTLY ASKED QUESTIONSHow do I load & read my data?WARNING: Data extract files are very large, and cannot be opened or viewed in programs such as text editors (Notepad, Textpad, Wordpad) or Microsoft Excel. You will need a statistical program, such as SAS, SPSS, or Stata. The data are provided to you as ASCII flat files, as this data format can be imported into a wide variety of software programs for analysis. In addition, researchers who receive their data on CD/DVD will find that due to the large size of the files, we have ‘zipped' or compressed the data files. 'Unzipping' / decompressing data received on CD/DVD - If you're running Windows XP, Windows Vista, or Mac OS X
Your operating system has built-in support for Zip files. Select the drive where the CD (or DVD) is located. Double-click on one of the Zip files, and a window will prompt you to enter a password. Once the password has been entered, you will see a list of files with the extension dat.gz. These files are also zipped. To open them, simply double-click on the file, and select where you would like to save the files. - If you are using an older Windows operating system, or an older Mac OS, then you'll likely need to install a special application to open Zip files, if you don't already have one installed.
Several options exist: WinZip offers an evaluation version of their decompression software, and 7-Zip or Stuffit Expander are offered as free decompression software. After you have installed the decompression software of your choice, start the program. To decompress the files, select the drive where the CD (or DVD) is located. Open one of the files, and a window will prompt you to enter a password. Once the password has been entered, you will see a list of files with the extension dat.gz. These files are also zipped. To open them, simply double-click on the file, or click on the file and click Extract, then select where you would like to save the files. Reading / loading data received on CD/DVD Once you have unzipped all the files (as described above) and securely saved them on your desktop computer or server, you will be able to open the files in your chosen statistical program. To load the data files in your statistical/analysis software, you first need to use the provided data dictionary layout (described below) to create the syntax code which will enable you to load the data. Delivered with your data, you will find: - a data dictionary layout that contains the length, start position, and end position of each field in your requested data files. The data dictionary also provides coding information that will assist in interpreting the data.
- a file called ‘sizes.txt' which contains a count of the records for each year of your requested data files.
- various supplementary documents which will help you interpret the data.
Cautions when Loading the Data - Many researchers encounter a problem with some statistical programs, in particular SAS, left-justifying the data, i.e. chopping off white spaces (blanks) in the data fields. This is a particular problem with the DIAG 1-25 fields in the Hospital data. Many of the diagnosis codes are supposed to have blanks in the first position, for example, a V-code should always have a blank in the first position, so if your program chops off the blank, then the ‘V' in the code will be shifted to the first position and you will have difficulty interpreting the codes.
- The total number of characters present for each character position should be the same as indicated by your final cumulative frequency value.
Explanation of File Extensions: - *.dat.gz = zipped data file
- *.dat = ASCII data file
|