Kamis, 08 November 2012


CHAPTER 9
Secondary Data Analysis: Finding and Analyzing
Existing Data

In this chapter you will learn
1.    Strategies for identifying, accessing, and evaluating quality of secondary data
2.    Advantages and disadvantages of secondary data analysis
3.    The general content of major LLS. Census Bureau population surveys and vital records
4.    Procedures used by the Census Bureau for selecting and training interviewers, assuring data quality, and determining question content

                 In our eagerness «o get a study underway, we may overlook the possible; that appropriate data may already be available. Similarly, we may avoid examining problems that have data needs that exceed our data gathering; capacity. Secondary data can be inexpensive, high quality data adequate define or solve a problem.

                 Secondary data arc existing data that investigators collected for a put pose other than the given research study. Secondary data may result from the research efforts of an individual researcher, a research team, an agency division or a research organization. The data may have been collected for a specific study, for a management system or to maintain a database.
                
                 The individual researcher or research team may have gathered, compiled and analyzed the data and written the report Typically, researchers collect more data than they analyze. They ask questions that turn out to be unreliable or-not operationally valid. Investigators drop questions from the analysis be cause they complicate the model, they do not improve the solution or they are simply overlooked. In writing a questionnaire the investigators may act a.* if information is free. They add questions that they might want to analyze later, but they run out of time and their interests shift.

                 Organizations collect and store data for many purposes. Managers con­sult data to monitor spending, personnel activity, resources acquired and spent, and productivity. Managers review specific1 pieces of data, such as per­formance indicators, at regular intervals as part of their management respon­sibilities. They examine data to make routine decisions, for example, whether the budget allows a certain expenditure, or how to charge employee leave.

                 They rely on existing data to estimate demand for services and the resources needed to meet the demand. Managers depend on data to track and react to the performance of agency subdivisions. No matter why data are collected and stored, they may be retrieved and combined to answer a range of questions beyond those originally asked.

                 Organizations, including the U.S. Census Bureau, state offices of vital statistics, public opinion polling firms, and university research groups, such as the Inter-university Consortium for Political and Social Research, exist to collect, compile, and interpret data.' Professional associations, such as the International City Management Association (ICMA), and public interest groups collect and publish survey data on topics of interest to their members; some of the surveys are conducted at regular intervals. Investigators with differing back ground, need,  and questions consult and use such data regularly.

                 In this chapter we consider how you identify appropriate databases, .gain access to them, and evaluate their quality and applicability to a study. The discussion on finding appropriate databases leads to a consideration of the advantages and disadvantages of secondary data analysis. Next, we introduce you to a few important data sources. The chapter discusses in some detail three population surveys that the U.S. Census Bureau conducts periodically, and it briefly considers vital statistics. In conjunction with our discussion of census surveys we point out Bureau procedures that affect data quality.

FINDING AND EVALUATING SECONDARY DATA

Finding Secondary Data
What if you want to find out whether data exist to meet your needs, where do you begin? The most obvious resources are the American Statistical Index (AS!) and the Statistical Reference Index (SRI). The ASI indexes federal data, and the SRI indexes nonfederal data. The ASI and SRI will lead you immedi­ately to specific data, if they are available. The referenced works should indi­cate where the data came from, alerting you to an existing database that you may wish to examine. Some works cite privately held data, that is, data owned by the researcher or a private organization. You may or may not have access to these data, depending on the willingness of the researcher or her sponsors to make them available or to perform requested analyses for you. Even your access to public data may be limited by confidentiality guarantees to the respondents.

       To learn which, if any, database has information you need may take crea­tive and persistent searching. Some data are never reported in publicly avail­able works; thus, you may be ignorant of an appropriate database. Nor can you assume that either ASI or SRI has identified all variables in a database. Appendix A contains recent information on large, publicly available data­bases on political and social characteristics. The appendix lists directories of databases, and catalogues of major government and academic databases. You may infer a database contains variables of interest from the description of its sample, purpose, general content or from the identity of the agency sponsors).

       Some databases are located almost by luck. Agencies may store appro­priate data in automated information systems. Throughout agencies and universities individuals hold data. Administrators or analysts may conduct a survey, analyze the data, and write up the findings. The data may have been entered into a computer file with little or no formal documentation of their existence, or paper questionnaires may be stored on an office shelf. Inven­tories of agency statistics and databases may reduce the waste of under analyzed data and redundant studies; however, the cost of documenting, storing, and protecting confidentiality may offset the benefits.

Database Access

       Identifying a database is only half the brittle. The researcher must determine whether he can access it and he must review its documentation. Poor docu­mentation or the inability to access a database may eliminate it from con­sideration.

       For some research questions investigators can confine themselves to aggregated and published data. Such questions tend to define state, counties, towns or institutions as the units of analysis. Example 9.1 shows how a re­searcher compiled information from four punished sources to study factors


Example 9.1    Combining Aggregated Databases

Situation: Academic researchers wanted to identify factors associated with work stoppages from strikes, walkouts, and lockouts. They collected data for one year (1973) measuring: number of public employee work stoppages
number of public employees involved
number of public employee-days lost

Strategy:
1     Identify sets of variables measuring: state social and economic characteristics; local government structure and finance; local govern­ment work force characteristics; state policies regarding local government collective bargaining.
2.    Identify data sources measuring variables.
3.    Conduct analysis: use factor analysis to combine variables into summary measures (see Chapter 10); covary summary measures with numbers of work stoppages, of employees participating in work stoppages, and of employee-days idle from work stoppages.

Unit of analysis:   States of the United States.
Sample:   All 50 states.
Variables measuring state social and economic characteristics:
1.    Percent state population urban, 1970.
2.    Percent of nonagricultural work force that has union membership, 1972.
3.    Right to Work Law.
4.    Percent state peculation below low-income level, 1969.
5.    State per capita income, 1973.
6.    State median family income, 1969.
7.    Percent state population black; 1970.
8.    Percent state population employed in nonagricultural establish ments, 1973.
9.    Employee-days Idle/Million nonagricultural employees, 1972.
10.  State population density, 1970.

Data sources for all variables in study:
1.    Council of State Governments, The Book of the States (Lexington: Council of State Governments).
2.  U.S. Dept of Labor, Summary of State Policy Regulations for Public Sector labor Relations (Washington, O.C-U.S. Government Printing Office, 1973).
3.    LT5. Census Bureau, Public Employment in 1972 (Washington. D.C.: US. Government Printing Office, 1973).
4.    Statistical Abstract of the United States (Washington. D.C - U.S. Government Printing Office).

illustration of data compilation procedures for Alabama:
1.    From Summary of Slate Policy Regulations, determine whether Alabama has right-to-work law; code "yes" or "no."
2.    From Public Employment in 1972, copy data for Alabama on: per­cent nonagricultural work force that has union membership, em­ployee-days idle/million nonagricultural employees.
3.    Copy Alabama data on other variables from specified data sources.

Discussion: First, actual study collected data on 40 variables. Second, all data were collected or aggregated at the state level. A preferable unit of ana­lysis would have been local governments. The researchers did not sample local governments because statistics on some variables of interest were not published for local governments. Third, all the dependent variable data were for 1973 and independent data variables were for 1973 or earlier. Earlier data were the most recent available statistics and were assumed to be the best estimate of 1973 values.

SOURCE: J. L. Perry and I- J. Berkes, "Predicting Loc.il Gaverment Strike Activity,- WESTERN POLITICAL QUARTERLY. Vol. 30. No. -4 (I977).

       associated with public employee work stoppages. The researchers studied variations in work stoppages among the states, because local government statistics were not published at the time the research was conducted.

       Compiling and copying aggregated data can be tedious. For some surveys, such as the Decennial Census, summary machine-readable tapes ?re avail­able. Nevertheless, a researcher who relics on aggregated and summarized data is limited in the analysis he can perform. For maximum Flecibility investi­gators require access to individual records for their analysis. They may get access in one of three ways:
1.    Through extracted files, containing a portion of the data a. a sample of cases, or b. a subset of variables on all cases
2.    Through direct access to the database a. a purchase of the computer. tape(s) b. a direct link allowing the investigator to access the database
3.    Through an agreement for the database holder to perform the requested manipulations

       Of course, the investigator has no guarantee that he can access the database at all, especially if it is held by a nonpublic agency. Furthermore, confidentiality considerations may limit his access to the database or his ability to per form certain tabulations.

                 Accessing census data is relatively easy. The Census Bureau has a Data Users Division at its headquarters and in its regional offices. A user can buy necessary tapes from the division or he can obtain a list of private vendors who analyze census data. Alternatively, a user may contact his state data center to learn how to access tapes, to get copies of tabulations from tapes, to have special tabulations performed or to have questions about the data answered. University faculty and research centers may help in-accessing and manipulating census data. Planning schools and' sociology departments, which teach demography, often have faculty who regularly use census data.
                 Agency policy and your affiliation with the agency affect whether you can access data stored on an agency information system. Investigator good­will largely determines whether you can access individually held data; how­ever, contractual agreements may either guarantee or prohibit public access.
                 Accessing data involves more than locating the database and receiving authorization to use it. The documentation accompanying the database affects the ability of a researcher to access it, manipulate it, and interpret the results. such as the computer file name, how the data are organized and structured, and the number of records.
                 After a database arrives at a research site its content and scaling should, be verified. To verify a database's content an analyst confirms that: (1) the number 01 records en the computer tape conform to the number indicated in the documentation; (2) variables listed in the documentation are on the tape; (3) summary statistics reported in the documentation can be replicated from the tape. Any discrepancies should be resolved. The wrong database could have been sent. Furthermore, the possibility that errors can be traced and resolved decreases as time passes and people involved in the collection, documentation, and compilation of the original database car. no longer oe reached.
                 Next, the scaling should be verified- This may have been done as pan of the context documentation, but if not, the researcher should review the variables of interest or a sample of variables. He should check that codes appearing on the computer printout conform to the codes found in the docu­mentation. For example, if the documentation indicates that males are coded as "1," females as "2," and missing as "9," no other codes should appear on the printout. Also, the frequency of each code should conform to the fre­quency found in the documentation. For example, if the documentation states that 49% of the sample was male, 50% female, and 1% unknown or missing the printout should report the same percentages for each categories. We have found that discrepancies occur because of errors in transferring the data into the computer file, or typographical errors in the documentation. Typographical errors may be especially serious since an analyst may be manip­ulating the wrong variable or assuming the wrong values for a variable.

                 If an analyst is unfamiliar with the database, unsure about the quality of the original survey or the accuracy of the documentation he may wish to verify the accuracy of the sample. The analyst may apply techniques to answer the question, "Could the data on this tape have been produced by the methodology described in the documentation?" The analyst may be especially concerned with checking out the nature of non responses and missing data-Sample verification may be especially important when databases are being merged and the study will be used to estimate parameters. In merging data­bases the number of cases may be reduced, altering the representativeness of the original samples. The techniques for sample verification can become quite complex. The reader interested in learning more about how to verify a sample should consult the article by Fortune and McBee1 and the references they cite.

Evaluating the Database

Quantitative researchers are continually reminded about their vulnerability to having the "tail wag the dog." They may unconsciously define a problem so that they can solve it using their methodological skills. Similarly, they may radically .alter a problem to make it conform to available secondary data. In working with secondary data some shifts in the original research question will be necessary. Nevertheless, a researcher should determine the impact of the shift snd whether the change will undermine his ability to accomplish the study's purpose.

       In working with secondary data a researcher must first remember that the secondary data can be no better than the research that produced it Analytical sophistication cannot compensate for poorly conceived measures and sloppy data collection. Investigators rosy be wary of working with data ac­companied by haphazard documentation. To decide whether the secondary data meet his reeds the researcher needs to know the following:
1.    What constituted the sample?
       a. What was the population?
       b. What was the sampling frame?
       c. What sampling strategy was used?
       d. What was the response rate?
2.    When were the data collected?
3.    How were the data collected?
4.    How were the data coded and edited?
5.    What were the operational definitions of measures?

       Information on the sample population lets the researcher know whether the data represent the population of interest. If the population of interest and the database's population do not coincide the researcher must decide the impact of the difference. In the research on available work stoppages the researchers decided to settle on state data, rather than collect local data; they reasoned that state data would uncover patterns of association that could be followed up on later.

       Information on the sampling frame, sampling strategy, and response rate all affect the quality of the sample. Furthermore, a low response rate may render the data inadequate for the researcher's purpose. He may also decide that a low response rate indicates low research quality.

       Knowing when the data were collected can be extremely important in correctly interpreting some study findings. Consider school data. Achieve­ment scores gathered on 4th graders in October should be interpreted dif­ferently than achievement scores gathered in May. Time can affect public opinion date, if you needed to examine the public's attitude toward space exploration, you would want to know whether the data were gathered before the Challenger explosion, shortly after the explosion, or well afterward.

       Information on how the data were collected helps an investigator make, inferences about data quality. He wants to know if respondents were sur­veyed by mail, telephone or in person, and how interviewers were trained and supervised. Information on data coding and editing procedures also relates to data quality. Specifically, the investigator wants to know whether someone checked for errors in coding data and transferring them into a computer. He also wants to learn how miscellaneous responses and atypical answers were handled. In evaluating the information the investigator uses his own judg-m3nt to decide whether the evidence suggests sound research procedures.

       Knowing the operational definitions of measures allows the investigator to determine the reliability, operational validity, and sensitivity of the mea­sures. Ideally, the documentation explains and evaluates the measures used. Studies generated with the database may also provide evidence of the quality of the measures. Finally, a copy of the instrument used to collect the data helps the investigator reach his own conclusions.
      
       Data published by the Bureau of the Census are accompanied by fairly detailed documentation. In Example 9.2 we have quoted sections of the

Example 9.2   Documenting Secondary Data:
An Example from Current Population Reports
Situation: In March 1984 the Current Population Survey collected data on households and persons who received specified noncash benefits in 1983. The survey findings were reported in Current Population Reports, Series
P-60, No. 148, Characteristics of Households and Persons Receiving Selected Noncash Benefits: 1983 (Washington, D.C-.Government Printing Office, 1985).

Population: The CPS sample covers civilian non institutional population of the US. and Armed Forces members living off post or with their families on post, but excludes all other members of the Armed Forces.

Sampling frame and strategy: Selected from 1970 census files and updated to reflect new construction. Current CPS sample is located in 629 areas com­prising 1,148 counties, independent cities, and minor civil divisions. Further details are available in a referenced document

Response rate: Approximately 62,200 occupied households were eligible for interview. About 3,100 occupied units were visited but interviews were not obtained because the occupants were not found at home after repeated calls or were unavailable for some other reason.

Wien data were collected : March 1984
How data were collected: As part of the CPS survey (readers unfamiliar with CPS surveying procedures would have to find other documentation).

How data were coded and edited: Not discussed in the accompanying documentation; however, census editing and processing procedures are well documented.

Data quality: Facsimile of questionnaire included in appendix. Concep­tual and operational definitions are provided; limitations of the data are cussed sources of non sampling error are identified.
      
-      Conceptual definition of noncash benefits: benefits received in a form other than money that serve to enhance or improve the economic well-being of the recipient.
-      Operational definition (partial): data collection concentrated on benefits that could be defined as public transfers or that could be catego­rized as employer or union-provided benefits to employees. The survey covered: the Food Stamp Program....
-      Data quality (underreporting): three aspects of underreporting: (I) failure to report receipt of noncash benefit by type; (2) underre­porting c-f the amount received; (3) misclassification of the amount received. Independent sources have been consulted to estimate underreporting, but estimates should be used with caution because of: problems in obtaining comparable data; assumptions made in adjusting data to CPS concepts; errors in the independent sources.


Other limitations noted: Definition of household membership may mean findings do not always reflect true economic status of household; institutional populations may receive studied noncash benefits, e.g., Medicaid; sources of non sampling errors.

Database access: Not explicitly indicated. Last page of report advertises the Bureau of Census Catalog, which includes availability of data on paper, tape, or microfiche, and lists Census Bureau specialists, libraries and data centers.

appendices of an issue of Current Population Reports. The purpose of the example is to show you what to look for in reviewing documentation. In addi­tion, if you conduct your own surveys you should consider how to document them so that others can use them. The example suggests the type of infor­mation you would want to include.

U.S. CENSUS DATA
The U.S. Census Bureau conducts periodic and special studies to describe the characteristics of the American people, their governments, and their busi­nesses. The periodic studies include:
Current Population Survey (CPS)
Survey of Income and Program Participation (SIPP)
Decennial Census of Population ?nd Housing
Census of Governments
Census of Agriculture
Economic Censuses

In this section we discuss the content, frequency, population, and pri­mary use of the Current Population Survey (CPS), Survey of Income and Pro­gram Participation (SWP); and the Census of Population and Housing. As part of our discussion we review Census Bureau procedures for interviewer selection and training, data editing, and questionnaire content The procedures vary relatively little among census studies. The procedures result from the Bureau's continuous efforts to get sound data from a large, geographically dispersed sample.

To "avoid overwhelming or boring you with specific details we divide up our discussion of the data collection and compilation strategies. Our com­ments should reinforce and add to your knowledge of data collection, and give you an appreciation of Census Bureau procedures. The section on the CPS concentrates on interviewer selection and training. The section on the SIPP considers date, editing procedures. The section on the Decennial Census examines issues of questionnaire content

The U.S. Bureau of the Census

The U.S. Bureau of the Census traces its origins back to the Constitutional requirement that the U.S. population be counted every ten years. The head count forms the basis for reapportioning the number of seats a state holds in the U.S. House of Representatives. By law, within nine months of "Census Day" the Census Bureau must give the President a count of state populations so that congressional delegations can be apportioned. Within a year state legislatures must receive population totals for specified political subdivisions, so that they can divide up legislative districts.

       A hallmark of the Census Bureau has been its record for protecting the confidentiality of the information it collets. The principle of confidentiality has contributed to its success in getting people and businesses to answer questions about themselves. Current law on confidentiality states.

Census data may be used only for statistical purposes.
Publication of census data must rot enable a user to identify individuals or establishments.
Only authorized employees of the Department of Commerce or Census Bureau may examine individual census forms.

       The Btueau releases its information on an individual or.ly will* that person's specific consent. For example, some individuals have asked for Census re-­cords to corroborate their age and demonstrate their eligibility for Social Security benefits. The availability of census data on computer tapes has com­plicated the ability to maintain confidentiality. Census officials must anti­cipate how users can manipulate data to uncover the identity and private information on specific individuals or establishments.

       The Census Bureau does not escape data collection problems, but in general it does as well as or better than other organizations that collect data. The Bureau's resources, including its reputation, add to its advantage. In gen­eral, people are mora willing to respond to government, requests for infor­mation; consequently, census surveys have lower non response rates than similar surveys conducted by nongovernment agencies and researchers.

Current Population Survey (CPS)
The CPS is a monthly household survey to gather current population and labor-force data. The US. Bureau of Labor Statistics releases the CPS data each month in its report on the nation's employment and unemployment rates. The Census Bureau analyses the population data and reports them in Current Population Reports.

       The data describe the personal characteristics of the labor force, includ­ing the age distribution, race, and sex of American workers. Data on who works, who works full-time, who works only part-time, and who is unem­ployed give us a picture of who gets ahead or falls behind in the labor force. For instance, the CPS reports separately the employment patterns for whites and blacks, for men and women, for teenagers and adults, and for rural residents and urban dwellers. What can be done with this information de­pends on one's responsibilities. Interest groups, journalists, and legislators cite data to document social problems or to advocate policy changes. Pro­gram managers, especially in education and job training programs, consult the data to structure programs to meet their clients' needs or simply to give their clients accurate information and advice.

       Planners in programs that deliver services to specific age groups, such as school children or the elderly, need current data on the population's age dis­tribution so that they can estimate demands for services. Business analysts examine the data to identify population trends that may change the demands' for products and services; administrators can undertake similar studies to improve their program planning or implementation.
       CPS data are gathered monthly from in person or telephone interviews; the initial contact and first interview with a CPS household takes place in person. The CPS sample consists of about 75,000 housing units. Interviewers collect data from approximately 80 percent of the sample units. Interviewers will find roughly 17 percent of the units unoccupied, and will be unable to obtain interviews with an additional 5 to 6 percent of the sample households. So the data usually represent 60,000 households.

      

       Census analysts construct the CPS sample so that it is representative of the nation's population. Normally, one can estimate parameters for states and large metropolitan areas from CPS. Occasionally, budget constraints have forced the Bureau to limit the sample, so that investigators could accurately estimate parameters only for the nation as a. whole and the most populous states. Example 9.3 refreshes your memory on sampling and illustrates why you may be able to use a given database tc estimate parameters for a state but not its communities.
      
       Listed below is an outline of the Census Bureau's requirements for CPS interviewers.3 The information on interviewer requirements and training gives you a sense of the costs of mounting in-person interviewing efforts and the steps associated with assuring interviewer consistency. The reliability of the data depend on an interviewer's ability to identify accurately and re­cord the requested information.

Census Bureau basic requirements for interviewers:
U.S. citizen.   
At least 18 years old (or 16 years old v, ith a high school diploma or equi­valent).
Able to read instructions and maps (documented by passing a test).
Able to do clerical work accurately (documented by passing a test).
Have an available automobile.
Be in good physical condition.
Able to work in all types of weather.
Able to attend training sessions.
Have a home telephone.
Available to work during the day, evening and Saturday.

Prior to beginning their jobs interviewers spend approximately one day on home study followed by three days of classroom training. Classroom train­ing includes lectures, quizzes, and role-playing. After completing training a supervisor accompanies the interviewer on his fiist assignment. Later, after the interviewer has been on his own awhile, the supervisor again observes



Example 9.3    Estimating Parameters from CPS
A Hypothetical Example

Situation: Can an analyst use CPS data to estimate the proportion of a city households with: a certain characteristic ? What will be the amount of sampling on or if the characteristic is split 50-50? 75-25?

                                 SE 95% Contidence SE           95% Confidence

Sample “           50-50 Interval                     75-25 Interval
Natational 50.000    0023       49.55-50.45%        0019       74.63-75.37%
Regional  11.000    0048               49.06-50.93%             0041       74.19-75.81%
State       1.700     012       47.65-52.35%                         0105       72.94-77.06%
City          85      054       39.37-60.63%                           047         69.79-84.20%

Discussion : The sampling error is Larger  interval data, because of their greater variability. In addition, this example assumes that the analyst is mani­pulating data for all city households included in the subsample. The sample sizes become considerably smaller, and the sampling error larger, when one tries to estimate the distribution of the characteristic among a specific group, for example, income of Hispanic families.

him. Within six months of the first assignment the supervisor observes the interviewer a third time. The initial training period ends when an interviewer's error rate, production level, and completion rate fall within Bureau standards. Recall that the error rate affects the reliability of the data, the production level affects the efficiency of the survey effort, and the completion rate affects the representativeness of the sample.

       CPS interviewers receive in-service training; typically, they attend re­fresher courses. Their performance (error rates, productivity, and completion rate) continues to be monitored and remedial training is required if their per­formance is judged inferior. In addition, supervisors observe interviewers' performance at least once a year.



Survey of Income and Program Participation (SJPP)

       The growth of federal assistance programs brought about demands to assess their impact The Census Bureau and the Social Services Administration collaborated to create SIPP to gather monthly data .on Americans' income, employment, and receipt of government  Assistance. The data should give an accurate picture of the effect of federal assistance on recipients and on the level of federal spending. With this information officials can estimate the impact of program changes and respond to detrimental effects of changes. Specific questions SIP? data should answer are:

How changes in eligibility requirements or the amount of benefits affect recipients and the total amount of federal spending.
How excessive or inadequate are the combined benefits received by indi­viduals and families participating in several federal assistance programs simultaneously.
Why changes in benefit status, employment, and household composition occur.

The Census Bureau began collecting SIPP data in late 1983, and the survey was fully implemented in 1336. The survey includes approximately 30,000 households. Each SIPP household participates in the survey for 2.5 years, with an interviewer gathering household data once every four months. Inter­viewers survey 25 percent of the SIPP households every month.

       Each of the Census Bureau's 12 regional offices controls the quality of the SIPP data it collects and sends the data to Bureau headquarters. Quality control procedures, summarized below, include: interview assignments, moni­toring and following up on data collection progress, reinter viewing of sub­jects, and data entry.

Quality Control Procedures/or Data. Collection and Preparation
SIPP Survey
Assignment Control
Make interview assignments.
Specify deadlines for completion of interviews.
Monitor progress toward completing interviews.      
Interviewer Control   
Monitor interviewer performance.
Review completed questionnaires for missing, incomplete,
incon­sistent answers.
Reinter view a sample of subjects.
Provide remedial training.
Provide in-service training.
Data Entry control
Check sample of each operator's entries.

       Assignment control begins with assigning each interviewer ar. equivalent number of interviews and requiring their completion within a specified time. After completing an interview or deter raining that an interview cannot be conducted the interviewer returns the questionnaire to the regional office. Supervisors track interviewer progress and assist interviewers who fall behind.

       Clerks check questionnaires for massing, incomplete Or inconsistent information. They check all questionnaires returned by new interviewers or interviewers whose performance has fallen below error standards. They re­view only selected items on questionnaires returned by other interviewers. Cierks may use available information, such as ZIP code directories, to com­plete some missing items. A clerk may contact the interviewer to see if he can correct an error. If an error cannot be corrected or a piece of data cannot be supplied the item is left unanswered. The Census Bureau then uses an imputa­tion procedure4 to assign a value to the missing item(s).

To assure the quality of the data collected the Bureau reinterviews a sample of subjects. Each month approximately one out of six interviewers is randomly selected to have one-third cf their cases sampled and reinter-viewed. Supervisors contact the sampled cases and ask selected questions from the SIPP questionnaire. The supervisor compares the reinterview answers with the original answers and determines the reasons for observed variations. The Bureau fires any interviewer whom it confirms has falsified data. If disparities result from interviewer error, additional training and closer supervision may be arranged.

The reinterview process is not foolproof. The person conducting the reinterview has a copy of the original responses. He can copy the original responses or, if a subject gives an ambiguous response, assume the correct­ness of the original answer. Better control would be maintained if a third per­son, not the reinterviewer, saw the original and reinterview questionnaires and compared them. The SIPP reinterview is conducted solely to assure the quality of interviewer's work. The CPS uses a portion of its reinterviews to determine the reliability of the data.5

       The final step in quality control is the monitoring of the data entry clerks' performance. Data entry clerks are responsible for transferring data from SIPPs questionnaires into a computer file. One-sixth of a clerk's entries are checked by a supervisor. If a clerk's error rate exceeds a certain error rate (in 1985 the minimum standard error rate was .043 percent) then all entries are reviewed.

The Census of Population and Housing

       The mainstay of the Census Bureau is the Decennial Census of Population and Housing, the constitutional reason for its existence. As we have suggested, counting the population is neither an easy task nor one that is done with com­plete accuracy. From the first census government officials added to the work of enumerators and asked them to collect additional information about the population. Thus, the Census Bureau-has had to develop procedures to accurately count the population and to determine what additional information to gather and from whom to gather it.

       The Census Bureau, government officials, and researchers have given considerable attention to how the census cap. obtain an accurate population count. The goal of an accurate court has to be balanced against costs. The 1980 Census cost over $ 1 billion, compared to $222 million for the 197C census. The General Accounting Office (GAO) studied how the Bureau could reduce costs and avoid a million census in 1990. GAO singled out programs to re­duce the number of uncounted persons as the least cost-effective; in 1980 S342 million was budgeted for programs to reduce the undercount. In its report to Congress GAO stated that "Attempting to get a complete count is an impossible task that is becoming increasingly costly and complex."6

      

       Neitner improved counting procedures nor statistical procedures can solve the undercount problem. The undercount goes beyond underestimating the total size of the U-S. population. Certain groups, particularly urban minori­ties, are more likely to be missed, and their need or demand for services may be seriously underestimated. Urban areas with a sizeable uncounted popula­tion may end up shortchanged in their number of legislative representatives or in the size of their allotment of government program funds. You can infer the success of the solutions tried to date from one bureau official's comment about solving the undercount, "The courts have determined that the bureau is the only agency that is qualified technically to do what apparently cannot be done."

       What to include in the Census requires negotiation and evaluation. Other federal agencies, state and local government users, interest groups, and indi­vidual citizens suggest questions. To decide what additional questions to ask, census officials assess whether the proposed information will serve a broad public interest Census officials approach the issue of the public interest by deciding whether the information justifies the expenditure of public monies. The criterion of public interest commonly leads to the elimination of ques­tions primarily of interest to businesses, such as information on the number of pet owners in the US.

       The Census Bureau, like other surveyors, wants to avoid overburdening respondents. It limits the number and content of questions to what its staff believe a respondent can complete within a reasonable amount of time. The Decennial Census has two forms. One form asks all households no more than two facing pages of information this form is referred as the 100 percent count." A sample of approximately 20 percent of all households receives a longer form. The longer form asks the same questions as the 100 percent count, plus 1.5 pages of of housing information and 2 addition? pages for infor­mation on each household member. Thus, space limits the total number of questions asked.8

       The 100 percent count asks a member of a household to give each house­hold member's: name, relationship to the respondent, sex, race, marital status, age, and if applicable, Hispanic origin. In 1980 the housing portion of the 100 percent count covered 12 topics. Some of the housing questions ..helped the Census Bureau identify missing households. Other questions, analyzed cither singly or in combination with other items, measured housing quality, over­crowding, and the proportion of owner-occupied and rental property. In ask­ing the population and housing questions, the Bureau considers historical comparability. If a question is reworded or response categories altered, the change affects the answers. The Bureau must weigh whether the benefit of a change balances the loss of comparability of information from one census to another.

       The history of the race and ethnicity questions, as shown in Example 9.4, suggests the trade off between consistency and the need to mike question changes so that they reflect current social conditions. From this abbreviated summary you can infer the social conditions that gave rise to the particular data gathering strategy and the problems in comparing racial or ethnic changes from one census to the next For example, note that until 1960 racial classification was not based on self-identification; rather by asking specific questions on parentage or by observation census takers decided a household's race. In 1960 and 1970 households who received a mailed form identified their own race, but if a census enumerator collected data he or she decided,

Example 9.4    Comparability Among Censuses
Summary of the Recent History of Racial and Ethnic Questions
1920: Enumerator decided appropriate category.
       Categories: White, Negro, Mulatto (Black-White mixture), Chinese,
              Japanese, Indian, Other
1930: Mulatto category dropped; a person identified as Mulatto counted as Negro.           
1940: Mexican (Mexican birth or Mexican parents) added; persons who
       qualified as Mexican but who were identifiable as Negro,
       Chinese, Japanese, or Indian were no longer counted as
       Mexican.
I960: Self-identification of race on mailed census forms.
       If data collected by an enumerator, the enumerator observed and filled in the racial data.
1970: Combination of self-identification and enumerator ostentation continued. Categories: White, Black or Negro, Japanese, Chinese, Filipino, Korean, Vietnamese, Indian (Amer.), Asian Indian, Hawaiian, Guamaman, Samoan, Eskimo, AJeut, Other
1980: Question added "to 100 percent count, "Is this person of Spanish/ Hispanic origin or descent? Categories: No (not Spanish/Hispanic); Yes, Mexican -Amer, Yes, Chicano; Yes, Puerto Rican; Yes, Cuban; Yes, Other

SOURCE: C. F. Citro and M. L Cohen (eds) THE BICENTENNIAL CENSUS: NEW DIREC­TIONS FOR METHODOLOGY IN 1990 (Washington, D.C: National Academy Press. 1985. pp 205-214.

based on observation, the appropriate racial category. Also, in 1980 respon­dents in all parts of the country could identify themselves as Eskimos or Aleuts; previously, these groups were listed only on census forms distributed in Alaska.

       In 1980 the race question included 15 categories, mixing traditional concepts of race with ethnic or geographic identities. The race and eth­nic categories can be combined or aggregated to conform to the Office of Management and Budget's standard categories for federal agencies collecting racial or ethnic data. The standard combined categories. are: white (not His­panic), black (not Hispanic), Hispanic, American Indian or Alaskan Native, Asian or Pacific Islander.

       The 1980 question, which did not include the word "race" or "color," re­sulted in an increased number of people who checked themselves as "other." Various respondents looked in vain for their nationality on the list. For ex­ample, a Thai household could have reasonably expected to see Thai" listed" as a possible response. Since 1980 new groups of refugees have entered the country; their presence suggests that the racial list could become even more unwieldy. In 1990 the Bureau is considering asking one question to gather both the general racial and specific Hispanic information. Responses repre­senting the standardized federal combined categorizations may appear on the shorter form. A more comprehensive list of racial and ethnic groups would be on the longer form. The Bureau plans to phrase and aggregate the responses so that investigators can compare 1980 and 1990 data.

       The longer form asks for information that the Census Bureau decides is important, but unnecessary to count precisely. The development of a longer form allows the census to reduce the costs of data collection. First the short form information can be processed earlier tc meet deadlines for releasing data. Second, the short form relieves roughly 80 percent of the respondents from a more burdensome questionnaire. Third, the Census Bureau can trade
cff the problems of missing data, associated with longer questionnaires, with the desirability of gathering more information, in 1980 the longer form asked questions about education, language, place of birth, previous residence, employment, and income. The housing questions gathered more complete data on the physical nature of the housing stock (including mobile homes and boats), accessibility to physically disabled persons, energy needs, residential stability, and housing quality and adequacy. 

       Approximately five years before "Census Day" a series of census pre­tests begin. The pretests gather data on diverse aspects of the census process. The 1986 pretest included assessments of automation procedures, follow-up with non respondents, questionnaire design, and enumerator selection and hiring policies.8 The pretest of the questionnaire examined alternative formats for the racial and ethnic questions. The pretest also tried shifting some hous­ing questions to a "knowledgeable" respondent For example, a resident" manager was asked for some housing data, such as the age of the building and number of units, rather than asking renters to supply these data.
      
       The Bureau conducts post-census evaluations of the census coverage and content The evaluation findings indicate data quality and suggest future changes. Post-census evaluations largely consist of information supplied by respondents, who are reinterviewed. The Bureau also checks administrative records. For example, public utilities records have been compared with re­spondents' reports of utility expenditures. Medicare, income tax returns, and similar governmental records have been used to improve the accuracy of the population count.

       Reinterviews and administrative records have their flaws. Reinterviewing has some of the problems associated with using test-retest to estimate item reliability. First, answers between testing’s may not change because respon­dents remember and duplicate their answers, or interviewers, who know the original answers, fit ambiguous answers into the originally selected cate­gories. Second, answers between interviews may change because a house­hold member other than the person who originally answered the census may be reinterviewed.

       The value of administrative records largely depends on the accuracy of the records themselves and the ability to correctly match administrative records with a completed census form. Some analysts have suggested that the Census Bureau undertake long-range research to investigate substituting administrative records for some housing items, such as age of structure. The investigators argue that consulting records to get housing information will improve the data's accuracy, reduce respondent burden, and will not involve confidential records.
Uses of Census Data
The Census bureau aggregates and releases data from the decennial census by political and statistical areas. The political areas are states, counties, minor civil divisions (such as townships and New England towns), and incorporated areas. Statistical areas have been created to describe relatively homogeneous and functionally integrated areas. The smallest statistical area is the block, which is defined as a well-defined piece of land bounded by a street, railroad track or similar physical feature. At the block level the Census Bureau releases 100 percent data, but not sampled data. Sampling errors make the longer questionnaire data too variable for accurate interpretation at the block level.

       Traditionally, census data have been publicly available as macrodata. that is, aggregated by political or statistical area. Beginning with the I960 census the Census Bureau compiled samples of individual records for public use; since then analogous samples from the 1940 and 1950 censuses have been drawn and released. These samples of individuals are referred to as microdata samples. Microdata samples are systematic, stratified samples of long questionnaires. A microdata record includes the responses reported to the Census Bureau except geographic locations or data or. very small and visible population subgroups. The geographic and subgroup data are elimi­nated to protect the confidentiality of respondents.

       Macrodata are available in books, on microfiche, and on computer tapes. Microdata are available on computer tapes. Major statistical software can handle the structure of the census data, but the computer user should be pre­pared to spend some time learning how to access and manipulate the data.

       The most obvious uses of the Decennial data are for legislative reapportionment, funding allocations, and policy decisions. The national, state, and local government data give politicians, administrators, and journalists a snap shot of the nation's population on one day. The population can be view cross section ally or longitudinally. Cross-sectional studies look at the pattern: among variables in the census dataset; longitudinal studies trace change-from one census to another.
      
       Census data may be viewed longitudinally to evaluate public policies For example, investigators can compare data from two adjacent censuses t< see whether state policies to improve the housing stock reached their object live. Investigators can also check to see whether housing improvements ii-targeted communities were offset by deterioration of the housing stock ii other communities. One way to compare two communities would be to com pare the percentage of change in standard housing between the two censuse: for each community studied.

       An important characteristic of census data is its coverage of small geo graphic areas. As we implied in our discussion of the CPS, at best national surveys estimate population characteristics of individual states and large metropolitan areas. Example 9.5 illustrates how a community can use census; data to select an arcs, for a pilot project serving impoverished preschoolers

Example 9.5    Using Census Data on Census           
Tracts  To Select a Site For a Program  
Situation: A local human services program plans to launch a pilot program for impoverished preschoolers. The program will combine a full day nursery school program with nutritional and health services. A human services analyst has reviewed data on city census tracts to identify communities when the program could be located; see Table 9.1.
                                                                              Cesus Tract
                                                              A                B              C                D              E
Population                                            4.390          3.458        2.670          5.262        3.147  
Blow Poverty Level                             151             114           91               173           41
Less t than six years old;                     
Number of related children                  77               51             56               63             12
Less than six years old in a
Household  with no husband              
Percentage population below               28               25             29               21             16
Poverty

Cesus tract : statistical subdivisions within a metropolitan area have an average population of 4.000 should be relatively homogeneous in respect to population characteristics, economic status, and living conditions : must not cross country lines.                              



Decision: Look for a site in Census Tracts A or D, indicators for both com­munities show a high level of need: number of young children living in poverty; number of young children living in a female-headed household; large percen­tage of population below poverty level Census Tract E is an inappropriate site for the community project, because of small number of children in target population and a distinctly lower level of need evidenced by indicators.

In practice, information other than census data will determine the final siting decision; however, census data distinguish between appropriate and Inappro­priate locations. Furthermore, census data may suggest services needed by the community.

       Administrators can aggregate census data, so that the data describe the appropriate service area. For example, city administrators can add together blocks to describe city school districts, police precincts or fire departments. Then, the administrators can examine the number of people in the service area, the population density, and its characteristics. With this information they can assign personnel, make budget decisions, and redraw senice area

       Think for a minute about the importance of comparability and consis' tericy. A community can survey its population, but to do it within a short time period normally requires a mammoth effort Nevertheless, data collec­tion spread over a month or two may be adequate for community decision makers. If the investigators want to compare their findings with a neighbor­ing community they will be stymied unless their neighbors have collected the sane information, in the same way, at approximately the same time.

       Census data are important components of demographic analysis, such as studies of patterns of fertility, mortality, and the population's age distribution. Public services rely on demographic data to help with planning. Consider school planning. A community benefits by advanced notice that its school capacity is greater or lesser than the anticipated number of children. If school capacity is too great for future demand, the community can plan to decrease the number -of teachers, classrooms, and school buildings. If school capacity is below future demand, the community can plan for orderly and efficient acquisition of the needed resources. In recent years commutations have turned their attention toward the needs of their aging population. Communities with an increasing proportion of elderly persons may experience a marked change in the demand for services, for example, a need for more nursing homes and specialized medical care.
       An obvious problem with decennial data is its timeliness. Over the course of a decade the counts become less and less accurate. Population changes at the block level can be rapid and dramatic. Within a matter of months vacant lots, fields, and wooded areas may be replaced by residential housing. Conversely, marginal housing may be condemned and disappear. Mid-decade censuses have been suggested to update population estimates. A mid-decadt census was planned for 1985, but never funded. Given current budgetary con­straints the probability of a mid-decade census in 1995 seems remote.
      
       The Census Bureau periodically develops post census estimates of state populations. The Bureau adjusts the state's census population with data on state births, deaths, and migration. Birth and death data are obtained from the state office of vital statistics. Migration data are estimated by "sympto­matic indicators," for example, unexpected changes in school enrollments." One state updates its census data by annually enumerating the following datr for each county.11
       school enrollment, grades 1-8
       births and deaths by race
       auto and truck registration
       enrollment in Medicare
       population of institutions with 200 or more residents
       population of major military bases

The derived data are then entered into a formula to arrive at current estimate of the population for each county. A political subdivision of a county may estimate its population, by assuming that its tate of population change is. identical to the county's. For example, if Smith County has an estimated population decrease of 6 percent since the last census, a planner in Smithville, a town in Smith County, may adjust the town's most recent census figure downward by 6 percent. Alliteratively. the planner may consult avail able records on town births and deaths, school enrollments, motor vehicle registration, and housing units to estimate the town population. The specific details of population estimation are well beyond the scope of this text. Never­theless, we want you to be aware of this limitation in census data.

       We have only scratched the surface in our description of census data and their uses. We have not considered data gathered in the census of govern­ments nor the economic censuses; these censuses are conducted every five years. Nor have we mentioned all the products derived from the. Decennial Census. Similarly, we have ignored other important federal statistics and statistical agencies. Federal agencies regularly gather statistics describing the nation's health, education, agriculture, crime, and criminal justice systems.

       We had three reasons for writing at length about census data. First, we wanted to alert you to their availability. Second, the Bureau of the Census continually appraises its data collection and compilation procedures; thus, it is an important source of information on-current developments in survey research methodology. Third, as we discuss later in this chapter, the infor­mation accompanying Census Bureau data serves as a model for documenting primary data so researchers can decide whether the data are suited to their needs.

VITAL STATISTICS
Vital records are another secondary data source, used by investigators from different disciplines with diverse interests. Vital records and the resulting vital statistics give information on births, deaths, marriages, divorces, abor­tions, communicable diseases, and hospitalizations. Federal, state, and local government agencies cooperate to collect, compile, and report vital statistics. Imestigators use vital statistics to assess the state of a community's mental and physical health. Policy makers can examine vital statistics to evaluate the effectiveness of current programs, change policies or programs to better meet existing needs, and forecast furture needs.

       Typically, a county official collects data required by state statute and reports them to the state. Hospital administrators, physicians, funeral direc­tors, and medical examiners may collect the actual data. In most states the state heath department. maintains vital records and releases them to the public in printed form or on computer tapes after removing information that identifies individuals. State offices of vital statistics also issue periodic statis­tical reports describing the health of the state and its communities.
      
       The state data are forwarded to the National Center for Health Statistics, which microfilms each state's records. The Center publishes US. vital statis­tics reports and provides technical assistance to state agencies and other data users.
      
      
       The information gathered on a Jive birth Illustrates the extensive information included in vital records:
Where birth occurred
Institution of delivery
Mother's residency
Mother's marital status
Mother's race
Mother's total pregnancies
Mother's previous number of live birtlis
Mother's previous number of fetal deaths
Date of mother's last live birth
Date. of mother's last fetal death
Outcome of mother's last delivery
Mother's number of previous children still living
Prenatal care
Baby's apgar score (a medical rating scale done at birth)
Complications with this pregnancy
Congenital malformations

          At first vital records may appear objective, and a user may not doubt the quality of the data. Actually, the accuracy of any one vital record is subject tc many possible errors. Think of distortions that can occur in the information on a live birth. For example, information on previous pregnancies may b<-misreported. Fear of censure may cause a woman not to tell her physician 01 midwife about previous pregnancies that resulted in an abortion, miscarriage or adoption. Many women may not have recognized miscarriages early in i pregnancy. Added to the problem of misreporting arc possible errors in re cording the data and different standards in diagnoses; considering these factors you can begin to appreciate the difficulties of maintaining data quality.
         
          Similarly, social values affect death reports. Vital statistics on causes 01 death are obtained from death certificates and are coded according to ar international code. Consider the current problems in getting accurate data or deaths from AIDS. As we are writing, the public considers AIDS a disease that Is largely ccixf.ncd to homosexual males and intravenous drug abusers. AIDS-" patients and their families fear the social stigma attached to the disease". Phy sicians sensitive to the feelings of patients and their families have admitted to indicating cancer or another related disease on the death certificate rather than AIDS. Similarly, accurate reporting of suicide varies widely. Societies that consider suicide a shameful act are likely to underreport its occurrence. Depending on community mores, then, a physician may decide to attribute a death to diseases or events that cause a family less embarrassment.

SUMMARY
Secondary data are existing data that investigators collected for a purpose other than the given research study. Secondary data can be inexpensive, high quality data adequate to define or solve a problem. Analyzing an existing data­base requires fewer resources than collecting original data. Some database have higher quality data than a researcher can hope to achieve. Government sponsored surveys usually have a higher response rate than nongovernment surveys. Organizations that specialize in collecting data typically have well-trained, professional staff: to check the reliability and operational validity ol measures; to design, implement, and document a sound sampling procedure; to collect arid compile data.
         
          One may also argue that secondary analysts contributes to the quality ol primary databases. First, secondary analysis requires that the original re­searchers fully document a database. Second, secondary analysis enables researchers to see whether they can replicate the original researcher's find ings. Both the need to document and the ability to check findings will encour­age researchers to attend to research quality.
         
          As documentation becomes routine and database inventories are created, investigators may increasingly turn to existing data to conduct preliminary research and to hone their research questions. Whether or not secondary data are appropriate for the final research question depends on both the ques­tion and the data. As investigators work on a research problem they begin to understand what population and what measures are needed to answer their questions. Sometimes they may modify the question so that it is consistent with the existing database. Such adjustments should be made only after the investigators fully consider what is lost in making such a shift The U.S. Census Bureau is a major source of data on the country's popu­lation, governments, and businesses. In addition to its regular censuses the Bureau conducts the Current Population Survey and the Survey of Income and Program Participation. The surveys give current information on the eco­nomic well-being of the American peculation, which can help in program planning and evaluation.

          Investigators find census data valuable because of their content and the quality of the data collection. In addition, the Census Bureau is an important source of information on various survey research issues, ranging from ques­tionnaire design to computerization and confidentiality.

          Other Federal bureaus, state offices of vital statistics, survey organiza­tions, and professional associations routinely collect data that others study. Virtually any organization is a potential source of data as are individual researchers. ASI and SRI may help you locate existing data. The organizations and surveys listed in Appendix A may also lead you to an appropriate data­base. Occasionally, by asking agency personnel, individual investigators may uncover a fugitive or unknown database.

         Once investigators locate a database they have to find a way to access it. Some questions can be answered by working with published statistics, but often one needs access to the database. Depending on who holds the data­base, and the contractual provisions for releasing data, a researcher may either access the database, or a portion of it, directly or have the database holder perform the analysis. Nevertheless, researchers cannot assume that access to a database is guaranteed. Organizational policies, contractual guarantees, and researcher inclination may become important factors in any agreement to allow someone to access data.
        
         If access is obtained the investigators need to verify the content of the database.  Occasionally, the wrong database is accessed or incorrect infor­mation about a variable and its coding is sent. The researchers also need to review information on the sample, the measures, when the data were col­lected, how they were collected, and coding procedures to infer data quality.
Chapter 10 examines techniques to combine indicators to form a single measure. Data are collected on each indicator and stored as separate van ablest in a database. Investigators may, and often do, analyze each variable separately. Nevertheless, as you should observe, combining variables cai give a more accurate, fuller picture of the phenomenon under consideration Examining a long list of single variables can easily mislead a busy decision maker, who may be mistakenly impressed by one or two striking findings.
NOTES
1.       J.C. Fortune and J. K. McBee. "Considerations and Methodology for uk Preparation of Data Files," in Secondary Analysis of Available Dale Bases, edited by D. J. Bowering (San Francisco: Jossey-Bass, New Direc tions for Program Evaluation, 1995).
2.    For a further discussion of confidentiality see: C. P. Kaplan and T. L. Var. Valley, Census '80: Continuing the Fact finder Tradition (Washington D.C.: US. Bureau of the Census, 1980), pp. 65-79; "Plenary Session V. Confidentiality Issues in Federal Statistics," First Annual Research Conference Proceedings (Washington, D.C.: US. Bureau oi the Census. 1985), pp. 199-233. Summary of current law is based on material found on p. 71 of Census '80.
3.       Material on interview selection and training <s from Statistical Surveys: Census Bureau has Creditable Employment and Economic Data-collec­tion Procedures (Washington: U.S. General Accounting Office, GAO/ IMTEC-86-8. 19SG), pp. 10-13.
4.       For a detailed discussion on imputation see "Concurrent Session IX. Nonrespense Adjustment Procedures in Sample Surveys," First Annual Research Conference Proceedings, pp. 421-470.
5.   siatistical surveys, pp. 14-19, discusses quality control procedures.
6.   A 54 Billion Census in 1990? Timely Decisions on Alternatives If 1980 Procedures Can Save Millions (Washington, D.C- General Accounting Office, GGD-82-13), p. ii.
7.   1.1. Mitroff,.R. O. Mason, and V. P. Barabba, The 1980 Census: Policy-making Amid Turbulence (Lexington, MA: Lexington Books, 1983), p. 48, quotes D. Levine, Deputy Director of the US. Bureau of Census. The book details the political, legal, and statistical aspects of the under-count The problem of the undercount and recommendations for 1990 are found in The Bicentennial Census: New Directions for Methodotogy in 1990, edited by C. F. Citro and M. L. Cohen (Washington: National Academy Press, 1985), chap. 5.
8.   The Bicentennial Census, p. 49.
9.   Ibid., pp. 104-114, discusses the pretesting program for the 1990 census.
10.     For a full description and evaluation of estimation methodology see Estimating Population and Income of Small Areas (Washington, D.C' National Academy Press, 1980), pp. 12-19.
11.     "Problems with Population Bases," SCHS Statistical Primer, Vol. 1, no. 2 (Raleigh: North Ca Yolina State Center for Health Statistics, 1980).

TERMS FOR REVIEW
After reading this chapter you should be able to explain the following terms:
secondary data                                                             100 percent count     
Current Population Surveys                                         macrodata     
Survey of Income and Program Participation              microdata
Decennial Census of Population and Housing            vital statistics
census undercount

QUESTIONS FOR REVIEW
The following questions should indicate whether you have a basic competency' in this chapter's material.
1.    Discuss the. importance of locating and using secondary data, When would you recommend collecting original data instead of relying on secondary data?
2.  A regional agency plans to study the relationship between highway features and accident rates.
       a.     Briefly describe how you would go about locating existing databases.
       b.    Briefly desirable how you would decide whether existing databases were adequate for the planted study.
       c. Assume that you have obtained a computerized copy of a database.
              What information would you need to be able to use and interpret the data?
3.    Ycu analyze a computerized database. You examine it and note that it reports 70 managers and 200 non managers. The documentation indi­cates the data represent 60 managers and 21C non managers. What would you do?
4.    Compare and contrast the data collection procedures for the Current Population Survey and the Survey of Income and Program Participation.
5.    Why is an undercount of the population during the Decennial Census treated as a serious problem?
6.   In conducting a survey regularly, such as the Decennial Census, what are the trade-offs between changing a question's wording or its re­sponses and keeping the wording the same?
7.    Why do local government planners consider census data important?
8.    Briefly contrast the two forms used in the Decennial Census (the 100 percent count and the long form). Defend the use of two forms as opposed to asking everyone all the questions included on the long form.
9.    What is the value of reinter viewing survey respondents? What proce­dures would you recommend for reinter viewing respondents? 10.    What is the purpose of post-census evaluations?

PROBLEMS FOR HOMEWORK AND DISCUSSION
1.    The legislature of your state has funded a limited project to aid the poorest areas in your state in economic development You work for the agency that will implement and oversee the project monies. You have been asked to select no more than six counties or cities to receive project funds. The primary criterion for selection will be median family income. Based on information in the County and City Data Book:
       a.     Select the counties or municipalities you would recommend for project funding.
       b.    For the recommended governments also present data on:
              (1) income
              (2) percentage of adults who are high school graduates
              (3) labor force characteristics
       c.     Use all the information and write short memo justifying your choices,
       d.    Compare your recommendations with your classmates'. Can your class as a whole reach a consensus on which communities should receive funding?
2.    Refer to Example 9.5. In Census Tract A:
       53 percent of the adults are high school graduates 100 percent of the population over 5 years of age speak English 1,105 rent housing median family income of female householder with no husband present and children under 18, 58,774 median family income in census tract, 512,580
       a.     If the center were to act as a clearinghouse of information to the entire community, as part of an outreach effort would you recommend it start with information on tenant's rights, high school equivalency requirements and resources, or English language resources.?
       b.    Based on the above data what types of eligibility requirements would you initially suggest to make sure that the children needing the services the most will receive them? (Clearly your recommendations will be tentative and they will be refined as you become more familiar with the community and its needs.)
3.    Go to ASI or SRI and find recent data on one of the following topics or a topic assigned by your instructor
       industrial accidents
       foreign languages studied by elementary school children
       infant mortality
       public employee benefits
       factory automation
       number of graduates of MPA programs
       starting salaries of college graduates
       (For the data examine and evaluate sample documentation.)
4.    Write a procedure for your own use outlining how to document a study lhat you might conduct
5.    Imagine that you work in a large agency where the agency as a whole and its various subdivisions regularly conduct studies. Would you recommend that the agency create an archive containing collected data? Justify your answer. What criteria would you recommend for deciding what studies to include in the archive? What documentation would you require far each archived study?
6.    A local government conducts a citizen survey every two years. High school juniors and seniors interview a sample of citizens. Outline your recommended training program for the students. Would you suggest reinterviewing some of the respondents? Explain. Briefly describe the reinterviewing procedures you would use.
7.    A Class Project: Update the data in the Perry and Berkes study (see Example 9.4). (This requires a relatively large class and probably a TA since data should be gathered on all 50 states and put into a computerized database. After reading Chapter 10 the class can try its hand at replicating Perry and Berkes analysis and reaching its own conclusions.) a. Decide what variables on which to gather data. b. Decide what years on which to gather data.

RECOMMENDED FOR FURTHER READING

The U.S. Census Bureau is an excellent source of information on the Decennial Census, other censuses, their use and methodology. For current materials, see the Bureau of the Census Catalogue. Census '80, published by the Bureau in 1980, gives an overview of the history of the Decennial Census and has essays on its use by demographers, geographers, planners, and
businesses.


For Information on secondary analysis see Secondary Analysis of Available Data Bases, edited by D. J. Bowering (San Francisco: Jossey-Bass, New Directions for Program Evaluation, 1984). This collection includes Fortune and McBee's detailed essay on merging and verifying databases.


Tidak ada komentar:

Posting Komentar