AcaStat statistical software includes this Handbook, a search-and-expand statistics glossary, and an affordable easy to use analytical tool. Available on CD-ROM or instantly as a download. AcaStat Software, All Rights Reserved http://www.acastat.com |
Data File Basics There are two general sources of data. One source is called primary data. This source is data collected specifically for a research design developed by you to answer specific research questions. The other source is called secondary data. This source is data collected by others for purposes that may or may not match your research goals. An example of a primary data source would be an employee survey you designed and implemented for your organization to evaluate job satisfaction. An example of a secondary data source would be census data or other publicly available data such as the General Social Survey. Designing data files The best way to envision a data file is to use the analogy of the common spreadsheet software. In spreadsheets, you have columns and rows. For many data files, a spreadsheet provides an easy means of organizing and entering data. In a rectangular data file, columns represent variables and rows represent observations. Variables are commonly formatted as either numerical or string. A numerical variable is used whenever you wish to manipulate the data mathematically. Examples would be age, income, temperature, and job satisfaction rating. A string variable is used whenever you wish to treat the data entries like words. Examples would be names, cities, case identifiers, and race. Many times variables that could be considered string are coded as numeric. As an example, data for the variable "sex" might be coded 1 for male and 2 for female instead of using a string variable that would require letters (e.g., "Male" and "Female"). This has two benefits. First, numerical entries are easier and quicker to enter. Second, manipulation of numerical data with statistical software is generally much easier than using string variables. Data file format There are many different formats of data files. As a general rule, however, there are data files that are considered system files and data files that are text files. System files are created by and for specific software applications. Examples would be Microsoft Access, dBase, SAS, and SPSS. Text files contain data in ASCII format and are almost universal in that they can be imported into most statistical programs. Text files Text files can be either fixed or free formatted. Fixed: In fixed formatted data, each variable will have a specific column location. When importing fixed formatted data into a statistical package, you must specify these column locations. The following is an example: 10123HoustonTX12Female1
Free: In free formatted data, either a space or special value separates each variable. Common separators are tabs or commas. When importing free formatted data into a statistical package, the software assumes that when a separator value is read that it is the end of the previous variable and the next character will begin another variable. The following is an example of a comma separated value data file (know as a csv file): Data dictionary A data dictionary
defines the variables contained in a data file (and sometimes the
format of the data
file). The following is an example of a description of variable coding
for
a three-question survey.
To properly
define
and document a data file, you need to record the following information:
For the employee
survey, the data dictionary for a comma separated data file would look
like the following:
If the data for
four
completed surveys were entered into a spreadsheet, it would look like
the
following:
The data would look like the following if saved in a text file as comma separated (note: the total number of commas for each record equals the total number of variables): 1001, 2, 5, Admin, 1002, 5, 10, MIS, 1003, 1, 23, Accounting, 1004, 4, 3, Legal, 1002, 5, 10, MIS, 1003, 1, 23, Accounting, 1004, 4, 3, Legal, Hint: Use the Output Viewer to either open one of the data files provided with StatCalc or to practice creating your own data file. If you open one of StatCalc's data files, try adding a few observations (rows) and save as "practice.csv". Import the data into the continuous data module for analysis. |