CHAPTER 9
Secondary Data Analysis: Finding and Analyzing
Existing Data
In this chapter you will learn
1. Strategies
for identifying, accessing, and evaluating quality of secondary data
2. Advantages
and disadvantages of secondary data analysis
3. The general
content of major LLS. Census Bureau population surveys and vital records
4. Procedures used by
the Census Bureau for selecting and training interviewers, assuring data
quality, and determining question content
In our
eagerness «o get a study underway, we may overlook the possible; that
appropriate data may already be available. Similarly, we may avoid examining
problems that have data needs that exceed our data gathering; capacity.
Secondary data can be inexpensive, high quality data adequate define or solve a
problem.
Secondary
data arc existing data that investigators collected for a put pose other than
the given research study. Secondary data may result from the research efforts
of an individual researcher, a research team, an agency division or a research
organization. The data may have been collected for a specific study, for a
management system or to maintain a database.
The
individual researcher or research team may have gathered, compiled and analyzed
the data and written the report Typically, researchers collect more data than
they analyze. They ask questions that turn out to be unreliable or-not
operationally valid. Investigators drop questions from the analysis be cause
they complicate the model, they do not improve the solution or they are simply
overlooked. In writing a questionnaire the investigators may act a.* if
information is free. They add questions that they might want to analyze later,
but they run out of time and their interests shift.
Organizations
collect and store data for many purposes. Managers consult data to monitor
spending, personnel activity, resources acquired and spent, and productivity.
Managers review specific1 pieces of data, such as performance indicators, at
regular intervals as part of their management responsibilities. They examine
data to make routine decisions, for example, whether the budget allows a
certain expenditure, or how to charge employee leave.
They rely on existing data to estimate
demand for services and the resources needed to meet the demand. Managers
depend on data to track and react to the performance of agency subdivisions. No
matter why data are collected and stored, they may be retrieved and combined to
answer a range of questions beyond those originally asked.
Organizations, including the U.S.
Census Bureau, state offices of vital statistics, public opinion polling firms,
and university research groups, such as the Inter-university Consortium for
Political and Social Research, exist to collect, compile, and interpret data.'
Professional associations, such as the International City Management
Association (ICMA), and public interest groups collect and publish survey data
on topics of interest to their members; some of the surveys are conducted at
regular intervals. Investigators with differing back ground, need, and questions consult and use such data regularly.
In this
chapter we consider how you identify appropriate databases, .gain access to
them, and evaluate their quality and applicability to a study. The discussion on
finding appropriate databases leads to a consideration of the advantages and
disadvantages of secondary data analysis. Next, we introduce you to a few
important data sources. The chapter discusses in some detail three population
surveys that the U.S. Census Bureau conducts periodically, and it briefly
considers vital statistics. In conjunction with our discussion of census
surveys we point out Bureau procedures that affect data quality.
FINDING AND EVALUATING SECONDARY DATA
Finding Secondary Data
What if you want to find out whether
data exist to meet your needs, where do you begin? The most obvious resources
are the American Statistical Index (AS!) and the Statistical Reference Index
(SRI). The ASI indexes federal data, and the SRI indexes nonfederal data. The
ASI and SRI will lead you immediately to specific data, if they are available.
The referenced works should indicate where the data came from, alerting you to
an existing database that you may wish to examine. Some works cite privately
held data, that is, data owned by the researcher or a private organization. You
may or may not have access to these data, depending on the willingness of the
researcher or her sponsors to make them available or to perform requested
analyses for you. Even your access to public data may be limited by
confidentiality guarantees to the respondents.
To
learn which, if any, database has information you need may take creative and
persistent searching. Some data are never reported in publicly available
works; thus, you may be ignorant of an appropriate database. Nor can you assume
that either ASI or SRI has identified all variables in a database. Appendix A
contains recent information on large, publicly available databases on
political and social characteristics. The appendix lists directories of
databases, and catalogues of major government and academic databases. You may
infer a database contains variables of interest from the description of its
sample, purpose, general content or from the identity of the agency sponsors).
Some
databases are located almost by luck. Agencies may store appropriate data in
automated information systems. Throughout agencies and universities individuals
hold data. Administrators or analysts may conduct a survey, analyze the data,
and write up the findings. The data may have been entered into a computer file
with little or no formal documentation of their existence, or paper questionnaires
may be stored on an office shelf. Inventories of agency statistics and
databases may reduce the waste of under analyzed data and redundant studies;
however, the cost of documenting, storing, and protecting confidentiality may
offset the benefits.
Database Access
Identifying
a database is only half the brittle. The researcher must determine whether he can
access it and he must review its documentation. Poor documentation or the
inability to access a database may eliminate it from consideration.
For
some research questions investigators can confine themselves to aggregated and
published data. Such questions tend to define state, counties, towns or
institutions as the units of analysis. Example 9.1 shows how a researcher
compiled information from four punished sources to study factors
Example 9.1 Combining Aggregated Databases
Situation: Academic researchers
wanted to identify factors associated with work stoppages from strikes,
walkouts, and lockouts. They collected data for one year (1973) measuring: number
of public employee work stoppages
number of public employees involved
number of public employee-days lost
Strategy:
1 Identify sets of
variables measuring: state social and economic characteristics; local
government structure and finance; local government work force characteristics;
state policies regarding local government collective bargaining.
2. Identify data sources measuring variables.
3. Conduct analysis:
use factor analysis to combine variables into summary measures (see Chapter
10); covary summary measures with numbers of work stoppages, of employees
participating in work stoppages, and of employee-days idle from work stoppages.
Unit of analysis:
States of the United States.
Sample: All 50
states.
Variables measuring state social and economic
characteristics:
1. Percent state
population urban, 1970.
2. Percent of nonagricultural work force that has
union membership, 1972.
3. Right to Work Law.
4. Percent state
peculation below low-income level, 1969.
5. State per capita
income, 1973.
6. State median
family income, 1969.
7. Percent state
population black; 1970.
8. Percent state
population employed in nonagricultural establish ments, 1973.
9. Employee-days Idle/Million
nonagricultural employees, 1972.
10. State population
density, 1970.
Data sources for all variables in study:
1. Council of State
Governments, The Book of the States (Lexington: Council of State Governments).
2. U.S. Dept of Labor,
Summary of State Policy Regulations for Public Sector labor Relations
(Washington, O.C-U.S. Government Printing Office, 1973).
3. LT5. Census Bureau, Public Employment in 1972
(Washington. D.C.: US. Government Printing Office, 1973).
4. Statistical Abstract of the United States
(Washington. D.C - U.S. Government Printing Office).
illustration of data compilation procedures for Alabama:
1. From Summary of
Slate Policy Regulations, determine whether Alabama has right-to-work law; code
"yes" or "no."
2. From Public Employment in 1972, copy data for
Alabama on: percent nonagricultural work force that has union membership, employee-days
idle/million nonagricultural employees.
3. Copy Alabama data on other variables from
specified data sources.
Discussion: First, actual study
collected data on 40 variables. Second, all data were collected or aggregated
at the state level. A preferable unit of analysis would have been local
governments. The researchers did not sample local governments because
statistics on some variables of interest were not published for local
governments. Third, all the dependent variable data were for 1973 and
independent data variables were for 1973 or earlier. Earlier data were the most
recent available statistics and were assumed to be the best estimate of 1973
values.
SOURCE: J. L. Perry and I- J. Berkes,
"Predicting Loc.il Gaverment Strike Activity,- WESTERN POLITICAL QUARTERLY.
Vol. 30. No. -4 (I977).
associated
with public employee work stoppages. The researchers studied variations in work
stoppages among the states, because local government statistics were not
published at the time the research was conducted.
Compiling
and copying aggregated data can be tedious. For some surveys, such as the
Decennial Census, summary machine-readable tapes ?re available. Nevertheless,
a researcher who relics on aggregated and summarized data is limited in the
analysis he can perform. For maximum Flecibility investigators require access
to individual records for their analysis. They may get access in one of three
ways:
1. Through extracted
files, containing a portion of the data a. a sample of cases, or b. a subset of
variables on all cases
2. Through direct access
to the database a. a purchase of the computer. tape(s) b. a direct link
allowing the investigator to access the database
3. Through an
agreement for the database holder to perform the requested manipulations
Of course, the
investigator has no guarantee that he can access the database at all,
especially if it is held by a nonpublic agency. Furthermore, confidentiality
considerations may limit his access to the database or his ability to per form
certain tabulations.
Accessing
census data is relatively easy. The Census Bureau has a Data Users Division at
its headquarters and in its regional offices. A user can buy necessary tapes
from the division or he can obtain a list of private vendors who analyze census
data. Alternatively, a user may contact his state data center to learn how to
access tapes, to get copies of tabulations from tapes, to have special
tabulations performed or to have questions about the data answered. University
faculty and research centers may help in-accessing and manipulating census
data. Planning schools and' sociology departments, which teach demography,
often have faculty who regularly use census data.
Agency
policy and your affiliation with the agency affect whether you can access data
stored on an agency information system. Investigator goodwill largely
determines whether you can access individually held data; however, contractual
agreements may either guarantee or prohibit public access.
Accessing
data involves more than locating the database and receiving authorization to
use it. The documentation accompanying the database affects the ability of a
researcher to access it, manipulate it, and interpret the results. such as the
computer file name, how the data are organized and structured, and the number
of records.
After a
database arrives at a research site its content and scaling should, be
verified. To verify a database's content an analyst confirms that: (1) the
number 01 records en the computer tape conform to the number indicated in the
documentation; (2) variables listed in the documentation are on the tape; (3)
summary statistics reported in the documentation can be replicated from the
tape. Any discrepancies should be resolved. The wrong database could have been
sent. Furthermore, the possibility that errors can be traced and resolved
decreases as time passes and people involved in the collection, documentation,
and compilation of the original database car. no longer oe reached.
Next,
the scaling should be verified- This may have been done as pan of the context
documentation, but if not, the researcher should review the variables of
interest or a sample of variables. He should check that codes appearing on the
computer printout conform to the codes found in the documentation. For
example, if the documentation indicates that males are coded as "1,"
females as "2," and missing as "9," no other codes should
appear on the printout. Also, the frequency of each code should conform to the
frequency found in the documentation. For example, if the documentation states
that 49% of the sample was male, 50% female, and 1% unknown or missing the
printout should report the same percentages for each categories. We have found
that discrepancies occur because of errors in transferring the data into the
computer file, or typographical errors in the documentation. Typographical
errors may be especially serious since an analyst may be manipulating the
wrong variable or assuming the wrong values for a variable.
If an
analyst is unfamiliar with the database, unsure about the quality of the
original survey or the accuracy of the documentation he may wish to verify the
accuracy of the sample. The analyst may apply techniques to answer the
question, "Could the data on this tape have been produced by the
methodology described in the documentation?" The analyst may be especially
concerned with checking out the nature of non responses and missing data-Sample
verification may be especially important when databases are being merged and
the study will be used to estimate parameters. In merging databases the number
of cases may be reduced, altering the representativeness of the original
samples. The techniques for sample verification can become quite complex. The
reader interested in learning more about how to verify a sample should consult
the article by Fortune and McBee1 and the references they cite.
Evaluating the Database
Quantitative researchers are
continually reminded about their vulnerability to having the "tail wag the
dog." They may unconsciously define a problem so that they can solve it
using their methodological skills. Similarly, they may radically .alter a
problem to make it conform to available secondary data. In working with
secondary data some shifts in the original research question will be necessary.
Nevertheless, a researcher should determine the impact of the shift snd whether
the change will undermine his ability to accomplish the study's purpose.
In
working with secondary data a researcher must first remember that the secondary
data can be no better than the research that produced it Analytical
sophistication cannot compensate for poorly conceived measures and sloppy data
collection. Investigators rosy be wary of working with data accompanied by
haphazard documentation. To decide whether the secondary data meet his reeds
the researcher needs to know the following:
1. What
constituted the sample?
a.
What was the population?
b.
What was the sampling frame?
c.
What sampling strategy was used?
d.
What was the response rate?
2. When
were the data collected?
3. How
were the data collected?
4. How
were the data coded and edited?
5. What
were the operational definitions of measures?
Information
on the sample population lets the researcher know whether the data represent
the population of interest. If the population of interest and the database's
population do not coincide the researcher must decide the impact of the
difference. In the research on available work stoppages the researchers decided
to settle on state data, rather than collect local data; they reasoned that
state data would uncover patterns of association that could be followed up on
later.
Information
on the sampling frame, sampling strategy, and response rate all affect the
quality of the sample. Furthermore, a low response rate may render the data
inadequate for the researcher's purpose. He may also decide that a low response
rate indicates low research quality.
Knowing
when the data were collected can be extremely important in correctly
interpreting some study findings. Consider school data. Achievement scores
gathered on 4th graders in October should be interpreted differently than
achievement scores gathered in May. Time can affect public opinion date, if you
needed to examine the public's attitude toward space exploration, you would
want to know whether the data were gathered before the Challenger explosion,
shortly after the explosion, or well afterward.
Information
on how the data were collected helps an investigator make, inferences about
data quality. He wants to know if respondents were surveyed by mail, telephone
or in person, and how interviewers were trained and supervised. Information on
data coding and editing procedures also relates to data quality. Specifically,
the investigator wants to know whether someone checked for errors in coding
data and transferring them into a computer. He also wants to learn how miscellaneous
responses and atypical answers were handled. In evaluating the information the
investigator uses his own judg-m3nt to decide whether the evidence suggests
sound research procedures.
Knowing
the operational definitions of measures allows the investigator to determine
the reliability, operational validity, and sensitivity of the measures.
Ideally, the documentation explains and evaluates the measures used. Studies
generated with the database may also provide evidence of the quality of the
measures. Finally, a copy of the instrument used to collect the data helps the
investigator reach his own conclusions.
Data
published by the Bureau of the Census are accompanied by fairly detailed
documentation. In Example 9.2 we have quoted sections of the
Example 9.2 Documenting Secondary Data:
An Example from Current Population
Reports
Situation: In March 1984 the Current
Population Survey collected data on households and persons who received
specified noncash benefits in 1983. The survey findings were reported in
Current Population Reports, Series
P-60, No. 148, Characteristics of
Households and Persons Receiving Selected Noncash Benefits: 1983 (Washington,
D.C-.Government Printing Office, 1985).
Population: The CPS sample covers
civilian non institutional population of the US. and Armed Forces members
living off post or with their families on post, but excludes all other members
of the Armed Forces.
Sampling frame and strategy: Selected
from 1970 census files and updated to reflect new construction. Current CPS
sample is located in 629 areas comprising 1,148 counties, independent cities,
and minor civil divisions. Further details are available in a referenced
document
Response rate: Approximately 62,200
occupied households were eligible for interview. About 3,100 occupied units
were visited but interviews were not obtained because the occupants were not
found at home after repeated calls or were unavailable for some other reason.
Wien data were collected : March 1984
How data were collected: As part of the CPS survey (readers
unfamiliar with CPS surveying procedures would have to find other
documentation).
How data were coded and edited: Not
discussed in the accompanying documentation; however, census editing and
processing procedures are well documented.
Data quality: Facsimile of
questionnaire included in appendix. Conceptual and operational definitions are
provided; limitations of the data are cussed sources of non sampling error are
identified.
- Conceptual
definition of noncash benefits: benefits received in a form other than money
that serve to enhance or improve the economic well-being of the recipient.
- Operational
definition (partial): data collection concentrated on benefits that could be
defined as public transfers or that could be categorized as employer or
union-provided benefits to employees. The survey covered: the Food Stamp
Program....
- Data quality
(underreporting): three aspects of underreporting: (I) failure to report
receipt of noncash benefit by type; (2) underreporting c-f the amount
received; (3) misclassification of the amount received. Independent sources
have been consulted to estimate underreporting, but estimates should be used
with caution because of: problems in obtaining comparable data; assumptions
made in adjusting data to CPS concepts; errors in the independent sources.
Other limitations noted: Definition
of household membership may mean findings do not always reflect true economic
status of household; institutional populations may receive studied noncash
benefits, e.g., Medicaid; sources of non sampling errors.
Database access: Not explicitly
indicated. Last page of report advertises the Bureau of Census Catalog, which
includes availability of data on paper, tape, or microfiche, and lists Census
Bureau specialists, libraries and data centers.
appendices of an issue of Current Population Reports. The
purpose of the example is to show you what to look for in reviewing documentation.
In addition, if you conduct your own surveys you should consider how to
document them so that others can use them. The example suggests the type of
information you would want to include.
U.S. CENSUS DATA
The U.S. Census Bureau conducts periodic and special studies
to describe the characteristics of the American people, their governments, and
their businesses. The periodic studies include:
Current Population Survey (CPS)
Survey of Income and Program Participation (SIPP)
Decennial Census of Population ?nd Housing
Census of Governments
Census of Agriculture
Economic Censuses
In this section we discuss the
content, frequency, population, and primary use of the Current Population
Survey (CPS), Survey of Income and Program Participation (SWP); and the Census
of Population and Housing. As part of our discussion we review Census Bureau
procedures for interviewer selection and training, data editing, and
questionnaire content The procedures vary relatively little among census
studies. The procedures result from the Bureau's continuous efforts to get
sound data from a large, geographically dispersed sample.
To "avoid overwhelming or boring
you with specific details we divide up our discussion of the data collection
and compilation strategies. Our comments should reinforce and add to your
knowledge of data collection, and give you an appreciation of Census Bureau
procedures. The section on the CPS concentrates on interviewer selection and
training. The section on the SIPP considers date, editing procedures. The
section on the Decennial Census examines issues of questionnaire content
The U.S. Bureau of the Census
The U.S. Bureau of the Census traces
its origins back to the Constitutional requirement that the U.S. population be
counted every ten years. The head count forms the basis for reapportioning the
number of seats a state holds in the U.S. House of Representatives. By law,
within nine months of "Census Day" the Census Bureau must give the
President a count of state populations so that congressional delegations can be
apportioned. Within a year state legislatures must receive population totals
for specified political subdivisions, so that they can divide up legislative
districts.
A
hallmark of the Census Bureau has been its record for protecting the confidentiality
of the information it collets. The principle of confidentiality has contributed
to its success in getting people and businesses to answer questions about themselves.
Current law on confidentiality states.
Census data may be used only for statistical
purposes.
Publication of census data must rot enable a user to identify
individuals or establishments.
Only authorized employees of the Department of Commerce or
Census Bureau may examine individual census forms.
The
Btueau releases its information on an individual or.ly will* that person's
specific consent. For example, some individuals have asked for Census re-cords
to corroborate their age and demonstrate their eligibility for Social Security
benefits. The availability of census data on computer tapes has complicated
the ability to maintain confidentiality. Census officials must anticipate how
users can manipulate data to uncover the identity and private information on
specific individuals or establishments.
The
Census Bureau does not escape data collection problems, but in general it does
as well as or better than other organizations that collect data. The Bureau's
resources, including its reputation, add to its advantage. In general, people
are mora willing to respond to government, requests for information;
consequently, census surveys have lower non response rates than similar surveys
conducted by nongovernment agencies and researchers.
Current Population Survey (CPS)
The CPS is a monthly household survey
to gather current population and labor-force data. The US. Bureau of Labor
Statistics releases the CPS data each month in its report on the nation's
employment and unemployment rates. The Census Bureau analyses the population
data and reports them in Current Population Reports.
The
data describe the personal characteristics of the labor force, including the
age distribution, race, and sex of American workers. Data on who works, who
works full-time, who works only part-time, and who is unemployed give us a
picture of who gets ahead or falls behind in the labor force. For instance, the
CPS reports separately the employment patterns for whites and blacks, for men
and women, for teenagers and adults, and for rural residents and urban
dwellers. What can be done with this information depends on one's
responsibilities. Interest groups, journalists, and legislators cite data to
document social problems or to advocate policy changes. Program managers,
especially in education and job training programs, consult the data to
structure programs to meet their clients' needs or simply to give their clients
accurate information and advice.
Planners
in programs that deliver services to specific age groups, such as school
children or the elderly, need current data on the population's age distribution
so that they can estimate demands for services. Business analysts examine the
data to identify population trends that may change the demands' for products
and services; administrators can undertake similar studies to improve their
program planning or implementation.
CPS
data are gathered monthly from in person or telephone interviews; the initial
contact and first interview with a CPS household takes place in person. The CPS
sample consists of about 75,000 housing units. Interviewers collect data from approximately
80 percent of the sample units. Interviewers will find roughly 17 percent of
the units unoccupied, and will be unable to obtain interviews with an
additional 5 to 6 percent of the sample households. So the data usually
represent 60,000 households.
Census
analysts construct the CPS sample so that it is representative of the nation's
population. Normally, one can estimate parameters for states and large metropolitan
areas from CPS. Occasionally, budget constraints have forced the Bureau to limit
the sample, so that investigators could accurately estimate parameters only for
the nation as a. whole and the most populous states. Example 9.3 refreshes your
memory on sampling and illustrates why you may be able to use a given database
tc estimate parameters for a state but not its communities.
Listed
below is an outline of the Census Bureau's requirements for CPS interviewers.3
The information on interviewer requirements and training gives you a sense of
the costs of mounting in-person interviewing efforts and the steps associated
with assuring interviewer consistency. The reliability of the data depend on an
interviewer's ability to identify accurately and record the requested
information.
Census Bureau basic requirements for
interviewers:
U.S. citizen.
U.S. citizen.
At least 18 years old (or 16 years old v, ith a high school
diploma or equivalent).
Able to read instructions and maps (documented by passing a
test).
Able to do clerical work accurately (documented by passing a
test).
Have an available automobile.
Be in good physical condition.
Able to work in all types of weather.
Able to attend training sessions.
Have a home telephone.
Available to work during the day, evening and Saturday.
Prior to beginning their jobs
interviewers spend approximately one day on home study followed by three days
of classroom training. Classroom training includes lectures, quizzes, and
role-playing. After completing training a supervisor accompanies the
interviewer on his fiist assignment. Later, after the interviewer has been on
his own awhile, the supervisor again observes
Example 9.3 Estimating Parameters from CPS
A Hypothetical Example
Situation: Can an analyst use CPS
data to estimate the proportion of a city households with: a certain
characteristic ? What will be the amount of sampling on or if the
characteristic is split 50-50? 75-25?
SE
95% Contidence SE 95%
Confidence
Sample “ 50-50 Interval 75-25 Interval
Natational 50.000 0023
49.55-50.45% 0019 74.63-75.37%
Regional 11.000
0048 49.06-50.93% 0041
74.19-75.81%
State 1.700
012 47.65-52.35% 0105 72.94-77.06%
City 85 054
39.37-60.63% 047 69.79-84.20%
Discussion : The sampling error is Larger
interval data, because of their greater
variability. In addition, this example assumes that the analyst is manipulating
data for all city households included in the subsample. The sample sizes become
considerably smaller, and the sampling error larger, when one tries to estimate
the distribution of the characteristic among a specific group, for example,
income of Hispanic families.
him. Within six months of the first
assignment the supervisor observes the interviewer a third time. The initial
training period ends when an interviewer's error rate, production level, and
completion rate fall within Bureau standards. Recall that the error rate
affects the reliability of the data, the production level affects the efficiency
of the survey effort, and the completion rate affects the representativeness of
the sample.
CPS
interviewers receive in-service training; typically, they attend refresher
courses. Their performance (error rates, productivity, and completion rate)
continues to be monitored and remedial training is required if their performance
is judged inferior. In addition, supervisors observe interviewers' performance
at least once a year.
Survey of Income and Program
Participation (SJPP)
The
growth of federal assistance programs brought about demands to assess their
impact The Census Bureau and the Social Services Administration collaborated to
create SIPP to gather monthly data .on Americans' income, employment, and
receipt of government Assistance. The
data should give an accurate picture of the effect of federal assistance on
recipients and on the level of federal spending. With this information
officials can estimate the impact of program changes and respond to detrimental
effects of changes. Specific questions SIP? data should answer are:
How changes in eligibility
requirements or the amount of benefits affect recipients and the total amount
of federal spending.
How excessive or inadequate are the
combined benefits received by individuals and families participating in
several federal assistance programs simultaneously.
Why changes in benefit status,
employment, and household composition occur.
The Census Bureau began collecting
SIPP data in late 1983, and the survey was fully implemented in 1336. The survey
includes approximately 30,000 households. Each SIPP household participates in
the survey for 2.5 years, with an interviewer gathering household data once
every four months. Interviewers survey 25 percent of the SIPP households every
month.
Each
of the Census Bureau's 12 regional offices controls the quality of the SIPP
data it collects and sends the data to Bureau headquarters. Quality control
procedures, summarized below, include: interview assignments, monitoring and
following up on data collection progress, reinter viewing of subjects, and
data entry.
Quality Control Procedures/or Data. Collection and
Preparation
SIPP Survey
Assignment Control
Make interview assignments.
Specify deadlines for completion of interviews.
Monitor progress toward completing interviews.
Interviewer Control
Specify deadlines for completion of interviews.
Monitor progress toward completing interviews.
Interviewer Control
Monitor interviewer performance.
Review completed questionnaires for missing, incomplete,
inconsistent answers.
Reinter view a sample of subjects.
Provide remedial training.
Provide in-service training.
Data Entry control
Check sample of each operator's entries.
Assignment
control begins with assigning each interviewer ar. equivalent number of
interviews and requiring their completion within a specified time. After completing
an interview or deter raining that an interview cannot be conducted the
interviewer returns the questionnaire to the regional office. Supervisors track
interviewer progress and assist interviewers who fall behind.
Clerks
check questionnaires for massing, incomplete Or inconsistent information. They
check all questionnaires returned by new interviewers or interviewers whose
performance has fallen below error standards. They review only selected items
on questionnaires returned by other interviewers. Cierks may use available
information, such as ZIP code directories, to complete some missing items. A
clerk may contact the interviewer to see if he can correct an error. If an
error cannot be corrected or a piece of data cannot be supplied the item is
left unanswered. The Census Bureau then uses an imputation procedure4 to
assign a value to the missing item(s).
To assure the quality of the data
collected the Bureau reinterviews a sample of subjects. Each month
approximately one out of six interviewers is randomly selected to have one-third
cf their cases sampled and reinter-viewed. Supervisors contact the sampled
cases and ask selected questions from the SIPP questionnaire. The supervisor
compares the reinterview answers with the original answers and determines the
reasons for observed variations. The Bureau fires any interviewer whom it
confirms has falsified data. If disparities result from interviewer error,
additional training and closer supervision may be arranged.
The reinterview process is not
foolproof. The person conducting the reinterview has a copy of the original
responses. He can copy the original responses or, if a subject gives an
ambiguous response, assume the correctness of the original answer. Better
control would be maintained if a third person, not the reinterviewer, saw the
original and reinterview questionnaires and compared them. The SIPP reinterview
is conducted solely to assure the quality of interviewer's work. The CPS uses a
portion of its reinterviews to determine the reliability of the data.5
The
final step in quality control is the monitoring of the data entry clerks'
performance. Data entry clerks are responsible for transferring data from SIPPs
questionnaires into a computer file. One-sixth of a clerk's entries are checked
by a supervisor. If a clerk's error rate exceeds a certain error rate (in 1985
the minimum standard error rate was .043 percent) then all entries are
reviewed.
The Census of Population and Housing
The
mainstay of the Census Bureau is the Decennial Census of Population and
Housing, the constitutional reason for its existence. As we have suggested, counting
the population is neither an easy task nor one that is done with complete
accuracy. From the first census government officials added to the work of
enumerators and asked them to collect additional information about the
population. Thus, the Census Bureau-has had to develop procedures to accurately
count the population and to determine what additional information to gather and
from whom to gather it.
The
Census Bureau, government officials, and researchers have given considerable
attention to how the census cap. obtain an accurate population count. The goal
of an accurate court has to be balanced against costs. The 1980 Census cost
over $ 1 billion, compared to $222 million for the 197C census. The General
Accounting Office (GAO) studied how the Bureau could reduce costs and avoid a million
census in 1990. GAO singled out programs to reduce the number of uncounted
persons as the least cost-effective; in 1980 S342 million was budgeted for
programs to reduce the undercount. In its report to Congress GAO stated that
"Attempting to get a complete count is an impossible task that is becoming
increasingly costly and complex."6
Neitner
improved counting procedures nor statistical procedures can solve the
undercount problem. The undercount goes beyond underestimating the total size
of the U-S. population. Certain groups, particularly urban minorities, are
more likely to be missed, and their need or demand for services may be
seriously underestimated. Urban areas with a sizeable uncounted population may
end up shortchanged in their number of legislative representatives or in the
size of their allotment of government program funds. You can infer the success
of the solutions tried to date from one bureau official's comment about solving
the undercount, "The courts have determined that the bureau is the only
agency that is qualified technically to do what apparently cannot be
done."
What
to include in the Census requires negotiation and evaluation. Other federal
agencies, state and local government users, interest groups, and individual
citizens suggest questions. To decide what additional questions to ask, census
officials assess whether the proposed information will serve a broad public interest
Census officials approach the issue of the public interest by deciding whether
the information justifies the expenditure of public monies. The criterion of
public interest commonly leads to the elimination of questions primarily of
interest to businesses, such as information on the number of pet owners in the US.
The
Census Bureau, like other surveyors, wants to avoid overburdening respondents.
It limits the number and content of questions to what its staff believe a
respondent can complete within a reasonable amount of time. The Decennial
Census has two forms. One form asks all households no more than two facing
pages of information this form is referred as the 100 percent count." A
sample of approximately 20 percent of all households receives a longer form.
The longer form asks the same questions as the 100 percent count, plus 1.5 pages
of of housing information and 2 addition? pages for information on each
household member. Thus, space limits the total number of questions asked.8
The
100 percent count asks a member of a household to give each household
member's: name, relationship to the respondent, sex, race, marital status, age,
and if applicable, Hispanic origin. In 1980 the housing portion of the 100
percent count covered 12 topics. Some of the housing questions ..helped the
Census Bureau identify missing households. Other questions, analyzed cither
singly or in combination with other items, measured housing quality, overcrowding,
and the proportion of owner-occupied and rental property. In asking the
population and housing questions, the Bureau considers historical
comparability. If a question is reworded or response categories altered, the
change affects the answers. The Bureau must weigh whether the benefit of a
change balances the loss of comparability of information from one census to
another.
The
history of the race and ethnicity questions, as shown in Example 9.4, suggests
the trade off between consistency and the need to mike question changes so that
they reflect current social conditions. From this abbreviated summary you can
infer the social conditions that gave rise to the particular data gathering
strategy and the problems in comparing racial or ethnic changes from one census
to the next For example, note that until 1960 racial classification was not
based on self-identification; rather by asking specific questions on parentage
or by observation census takers decided a household's race. In 1960 and 1970
households who received a mailed form identified their own race, but if a census
enumerator collected data he or she decided,
Example 9.4 Comparability Among Censuses
Summary of the Recent History of Racial and Ethnic Questions
1920: Enumerator decided appropriate category.
Categories:
White, Negro, Mulatto (Black-White mixture), Chinese,
Japanese,
Indian, Other
1930: Mulatto category dropped; a
person identified as Mulatto counted as Negro.
1940: Mexican (Mexican birth or
Mexican parents) added; persons who
qualified
as Mexican but who were identifiable as Negro,
Chinese,
Japanese, or Indian were no longer counted as
Mexican.
I960: Self-identification of race on
mailed census forms.
If
data collected by an enumerator, the enumerator observed and filled in the racial
data.
1970:
Combination of self-identification and enumerator ostentation continued.
Categories: White, Black or Negro, Japanese, Chinese, Filipino, Korean,
Vietnamese, Indian (Amer.), Asian Indian, Hawaiian, Guamaman, Samoan, Eskimo,
AJeut, Other
1980:
Question added "to 100 percent count, "Is this person of Spanish/ Hispanic
origin or descent? Categories: No (not Spanish/Hispanic); Yes, Mexican -Amer,
Yes, Chicano; Yes, Puerto Rican; Yes, Cuban; Yes, Other
SOURCE: C. F. Citro and M. L Cohen (eds)
THE BICENTENNIAL CENSUS: NEW DIRECTIONS FOR METHODOLOGY IN 1990 (Washington,
D.C: National Academy Press. 1985. pp 205-214.
based on observation, the appropriate
racial category. Also, in 1980 respondents in all parts of the country could
identify themselves as Eskimos or Aleuts; previously, these groups were listed only
on census forms distributed in Alaska.
In
1980 the race question included 15 categories, mixing traditional concepts of
race with ethnic or geographic identities. The race and ethnic categories can
be combined or aggregated to conform to the Office of Management and Budget's
standard categories for federal agencies collecting racial or ethnic data. The
standard combined categories. are: white (not Hispanic), black (not Hispanic),
Hispanic, American Indian or Alaskan Native, Asian or Pacific Islander.
The
1980 question, which did not include the word "race" or
"color," resulted in an increased number of people who checked
themselves as "other." Various respondents looked in vain for their
nationality on the list. For example, a Thai household could have reasonably
expected to see Thai" listed" as a possible response. Since 1980 new
groups of refugees have entered the country; their presence suggests that the
racial list could become even more unwieldy. In 1990 the Bureau is considering
asking one question to gather both the general racial and specific Hispanic
information. Responses representing the standardized federal combined
categorizations may appear on the shorter form. A more comprehensive list of
racial and ethnic groups would be on the longer form. The Bureau plans to
phrase and aggregate the responses so that investigators can compare 1980 and
1990 data.
The
longer form asks for information that the Census Bureau decides is important,
but unnecessary to count precisely. The development of a longer form allows the
census to reduce the costs of data collection. First the short form information
can be processed earlier tc meet deadlines for releasing data. Second, the
short form relieves roughly 80 percent of the respondents from a more
burdensome questionnaire. Third, the Census Bureau can trade
cff the problems of missing data, associated with longer questionnaires, with the desirability of gathering more information, in 1980 the longer form asked questions about education, language, place of birth, previous residence, employment, and income. The housing questions gathered more complete data on the physical nature of the housing stock (including mobile homes and boats), accessibility to physically disabled persons, energy needs, residential stability, and housing quality and adequacy.
cff the problems of missing data, associated with longer questionnaires, with the desirability of gathering more information, in 1980 the longer form asked questions about education, language, place of birth, previous residence, employment, and income. The housing questions gathered more complete data on the physical nature of the housing stock (including mobile homes and boats), accessibility to physically disabled persons, energy needs, residential stability, and housing quality and adequacy.
Approximately
five years before "Census Day" a series of census pretests begin.
The pretests gather data on diverse aspects of the census process. The 1986
pretest included assessments of automation procedures, follow-up with non respondents,
questionnaire design, and enumerator selection and hiring policies.8 The
pretest of the questionnaire examined alternative formats for the racial and
ethnic questions. The pretest also tried shifting some housing questions to a
"knowledgeable" respondent For example, a resident" manager was
asked for some housing data, such as the age of the building and number of
units, rather than asking renters to supply these data.
The
Bureau conducts post-census evaluations of the census coverage and content The
evaluation findings indicate data quality and suggest future changes.
Post-census evaluations largely consist of information supplied by respondents,
who are reinterviewed. The Bureau also checks administrative records. For
example, public utilities records have been compared with respondents' reports
of utility expenditures. Medicare, income tax returns, and similar governmental
records have been used to improve the accuracy of the population count.
Reinterviews
and administrative records have their flaws. Reinterviewing has some of the
problems associated with using test-retest to estimate item reliability. First,
answers between testing’s may not change because respondents remember and
duplicate their answers, or interviewers, who know the original answers, fit
ambiguous answers into the originally selected categories. Second, answers
between interviews may change because a household member other than the person
who originally answered the census may be reinterviewed.
The
value of administrative records largely depends on the accuracy of the records
themselves and the ability to correctly match administrative records with a
completed census form. Some analysts have suggested that the Census Bureau
undertake long-range research to investigate substituting administrative
records for some housing items, such as age of structure. The investigators
argue that consulting records to get housing information will improve the
data's accuracy, reduce respondent burden, and will not involve confidential
records.
Uses of Census Data
The Census bureau aggregates and
releases data from the decennial census by political and statistical areas. The
political areas are states, counties, minor civil divisions (such as townships
and New England towns), and incorporated areas. Statistical areas have been
created to describe relatively homogeneous and functionally integrated areas.
The smallest statistical area is the block, which is defined as a well-defined
piece of land bounded by a street, railroad track or similar physical feature.
At the block level the Census Bureau releases 100 percent data, but not sampled
data. Sampling errors make the longer questionnaire data too variable for
accurate interpretation at the block level.
Traditionally,
census data have been publicly available as macrodata. that is, aggregated by
political or statistical area. Beginning with the I960 census the Census Bureau
compiled samples of individual records for public use; since then analogous
samples from the 1940 and 1950 censuses have been drawn and released. These
samples of individuals are referred to as microdata samples. Microdata samples
are systematic, stratified samples of long questionnaires. A microdata record
includes the responses reported to the Census Bureau except geographic
locations or data or. very small and visible population subgroups. The
geographic and subgroup data are eliminated to protect the confidentiality of
respondents.
Macrodata
are available in books, on microfiche, and on computer tapes. Microdata are available
on computer tapes. Major statistical software can handle the structure of the
census data, but the computer user should be prepared to spend some time
learning how to access and manipulate the data.
The
most obvious uses of the Decennial data are for legislative reapportionment,
funding allocations, and policy decisions. The national, state, and local
government data give politicians, administrators, and journalists a snap shot
of the nation's population on one day. The population can be view cross section
ally or longitudinally. Cross-sectional studies look at the pattern: among
variables in the census dataset; longitudinal studies trace change-from one
census to another.
Census
data may be viewed longitudinally to evaluate public policies For example,
investigators can compare data from two adjacent censuses t< see whether
state policies to improve the housing stock reached their object live.
Investigators can also check to see whether housing improvements ii-targeted
communities were offset by deterioration of the housing stock ii other
communities. One way to compare two communities would be to com pare the
percentage of change in standard housing between the two censuse: for each
community studied.
An
important characteristic of census data is its coverage of small geo graphic
areas. As we implied in our discussion of the CPS, at best national surveys
estimate population characteristics of individual states and large metropolitan
areas. Example 9.5 illustrates how a community can use census; data to select
an arcs, for a pilot project serving impoverished preschoolers
Example 9.5 Using Census Data on Census
Tracts To Select a Site For a Program
Situation: A local human services
program plans to launch a pilot program for impoverished preschoolers. The
program will combine a full day nursery school program with nutritional and
health services. A human services analyst has reviewed data on city census
tracts to identify communities when the program could be located; see Table
9.1.
Cesus
Tract
A B C D E
Population 4.390 3.458 2.670 5.262 3.147
Blow Poverty Level 151 114 91 173 41
Less t than six years old;
Number of related children 77 51 56 63 12
Less than six years old in a
Household with no husband
Percentage population below 28 25 29 21 16
Poverty
Cesus tract : statistical
subdivisions within a metropolitan area have an average population of 4.000
should be relatively homogeneous in respect to population characteristics,
economic status, and living conditions : must not cross country lines.
Decision: Look for a site in Census
Tracts A or D, indicators for both communities show a high level of need:
number of young children living in poverty; number of young children living in
a female-headed household; large percentage of population below poverty level
Census Tract E is an inappropriate site for the community project, because of
small number of children in target population and a distinctly lower level of
need evidenced by indicators.
In practice, information other than
census data will determine the final siting decision; however, census data
distinguish between appropriate and Inappropriate locations. Furthermore,
census data may suggest services needed by the community.
Administrators
can aggregate census data, so that the data describe the appropriate service
area. For example, city administrators can add together blocks to describe city
school districts, police precincts or fire departments. Then, the
administrators can examine the number of people in the service area, the
population density, and its characteristics. With this information they can
assign personnel, make budget decisions, and redraw senice area
Think
for a minute about the importance of comparability and consis' tericy. A
community can survey its population, but to do it within a short time period
normally requires a mammoth effort Nevertheless, data collection spread over a
month or two may be adequate for community decision makers. If the
investigators want to compare their findings with a neighboring community they
will be stymied unless their neighbors have collected the sane information, in
the same way, at approximately the same time.
Census
data are important components of demographic analysis, such as studies of
patterns of fertility, mortality, and the population's age distribution. Public
services rely on demographic data to help with planning. Consider school
planning. A community benefits by advanced notice that its school capacity is
greater or lesser than the anticipated number of children. If school capacity
is too great for future demand, the community can plan to decrease the number
-of teachers, classrooms, and school buildings. If school capacity is below
future demand, the community can plan for orderly and efficient acquisition of
the needed resources. In recent years commutations have turned their attention
toward the needs of their aging population. Communities with an increasing
proportion of elderly persons may experience a marked change in the demand for
services, for example, a need for more nursing homes and specialized medical
care.
An
obvious problem with decennial data is its timeliness. Over the course of a
decade the counts become less and less accurate. Population changes at the
block level can be rapid and dramatic. Within a matter of months vacant lots,
fields, and wooded areas may be replaced by residential housing. Conversely,
marginal housing may be condemned and disappear. Mid-decade censuses have been
suggested to update population estimates. A mid-decadt census was planned for
1985, but never funded. Given current budgetary constraints the probability of
a mid-decade census in 1995 seems remote.
The
Census Bureau periodically develops post census estimates of state populations.
The Bureau adjusts the state's census population with data on state births,
deaths, and migration. Birth and death data are obtained from the state office
of vital statistics. Migration data are estimated by "symptomatic
indicators," for example, unexpected changes in school enrollments."
One state updates its census data by annually enumerating the following datr
for each county.11
school
enrollment, grades 1-8
births
and deaths by race
auto
and truck registration
enrollment
in Medicare
population
of institutions with 200 or more residents
population
of major military bases
The derived data are then entered
into a formula to arrive at current estimate of the population for each county.
A political subdivision of a county may estimate its population, by assuming
that its tate of population change is. identical to the county's. For example,
if Smith County has an estimated population decrease of 6 percent since the last
census, a planner in Smithville, a town in Smith County, may adjust the town's
most recent census figure downward by 6 percent. Alliteratively. the planner
may consult avail able records on town births and deaths, school enrollments,
motor vehicle registration, and housing units to estimate the town population.
The specific details of population estimation are well beyond the scope of this
text. Nevertheless, we want you to be aware of this limitation in census data.
We
have only scratched the surface in our description of census data and their
uses. We have not considered data gathered in the census of governments nor
the economic censuses; these censuses are conducted every five years. Nor have
we mentioned all the products derived from the. Decennial Census. Similarly, we
have ignored other important federal statistics and statistical agencies.
Federal agencies regularly gather statistics describing the nation's health,
education, agriculture, crime, and criminal justice systems.
We
had three reasons for writing at length about census data. First, we wanted to
alert you to their availability. Second, the Bureau of the Census continually
appraises its data collection and compilation procedures; thus, it is an
important source of information on-current developments in survey research
methodology. Third, as we discuss later in this chapter, the information
accompanying Census Bureau data serves as a model for documenting primary data
so researchers can decide whether the data are suited to their needs.
VITAL STATISTICS
Vital records are another secondary
data source, used by investigators from different disciplines with diverse
interests. Vital records and the resulting vital statistics give information on
births, deaths, marriages, divorces, abortions, communicable diseases, and
hospitalizations. Federal, state, and local government agencies cooperate to
collect, compile, and report vital statistics. Imestigators use vital
statistics to assess the state of a community's mental and physical health.
Policy makers can examine vital statistics to evaluate the effectiveness of current
programs, change policies or programs to better meet existing needs, and
forecast furture needs.
Typically,
a county official collects data required by state statute and reports them to
the state. Hospital administrators, physicians, funeral directors, and medical
examiners may collect the actual data. In most states the state heath
department. maintains vital records and releases them to the public in printed
form or on computer tapes after removing information that identifies
individuals. State offices of vital statistics also issue periodic statistical
reports describing the health of the state and its communities.
The
state data are forwarded to the National Center for Health Statistics, which
microfilms each state's records. The Center publishes US. vital statistics
reports and provides technical assistance to state agencies and other data
users.
The
information gathered on a Jive birth Illustrates the extensive information
included in vital records:
Where birth occurred
Institution of delivery
Mother's residency
Mother's marital status
Mother's race
Mother's total pregnancies
Mother's previous number of live birtlis
Mother's previous number of fetal deaths
Date of mother's last live birth
Date. of mother's last fetal death
Outcome of mother's last delivery
Mother's number of previous children still living
Prenatal care
Baby's apgar score (a medical rating scale done at birth)
Complications with this pregnancy
Congenital malformations
At
first vital records may appear objective, and a user may not doubt the quality
of the data. Actually, the accuracy of any one vital record is subject tc many
possible errors. Think of distortions that can occur in the information on a
live birth. For example, information on previous pregnancies may b<-misreported.
Fear of censure may cause a woman not to tell her physician 01 midwife about
previous pregnancies that resulted in an abortion, miscarriage or adoption.
Many women may not have recognized miscarriages early in i pregnancy. Added to
the problem of misreporting arc possible errors in re cording the data and
different standards in diagnoses; considering these factors you can begin to
appreciate the difficulties of maintaining data quality.
Similarly,
social values affect death reports. Vital statistics on causes 01 death are
obtained from death certificates and are coded according to ar international
code. Consider the current problems in getting accurate data or deaths from
AIDS. As we are writing, the public considers AIDS a disease that Is largely
ccixf.ncd to homosexual males and intravenous drug abusers. AIDS-"
patients and their families fear the social stigma attached to the
disease". Phy sicians sensitive to the feelings of patients and their
families have admitted to indicating cancer or another related disease on the
death certificate rather than AIDS. Similarly, accurate reporting of suicide
varies widely. Societies that consider suicide a shameful act are likely to
underreport its occurrence. Depending on community mores, then, a physician may
decide to attribute a death to diseases or events that cause a family less embarrassment.
SUMMARY
Secondary data are existing data that
investigators collected for a purpose other than the given research study.
Secondary data can be inexpensive, high quality data adequate to define or
solve a problem. Analyzing an existing database requires fewer resources than
collecting original data. Some database have higher quality data than a
researcher can hope to achieve. Government sponsored surveys usually have a
higher response rate than nongovernment surveys. Organizations that specialize
in collecting data typically have well-trained, professional staff: to check
the reliability and operational validity ol measures; to design, implement, and
document a sound sampling procedure; to collect arid compile data.
One
may also argue that secondary analysts contributes to the quality ol primary
databases. First, secondary analysis requires that the original researchers
fully document a database. Second, secondary analysis enables researchers to
see whether they can replicate the original researcher's find ings. Both the
need to document and the ability to check findings will encourage researchers
to attend to research quality.
As
documentation becomes routine and database inventories are created,
investigators may increasingly turn to existing data to conduct preliminary
research and to hone their research questions. Whether or not secondary data
are appropriate for the final research question depends on both the question
and the data. As investigators work on a research problem they begin to
understand what population and what measures are needed to answer their
questions. Sometimes they may modify the question so that it is consistent with
the existing database. Such adjustments should be made only after the
investigators fully consider what is lost in making such a shift The U.S.
Census Bureau is a major source of data on the country's population,
governments, and businesses. In addition to its regular censuses the Bureau
conducts the Current Population Survey and the Survey of Income and Program
Participation. The surveys give current information on the economic well-being
of the American peculation, which can help in program planning and evaluation.
Investigators
find census data valuable because of their content and the quality of the data
collection. In addition, the Census Bureau is an important source of
information on various survey research issues, ranging from questionnaire
design to computerization and confidentiality.
Other
Federal bureaus, state offices of vital statistics, survey organizations, and
professional associations routinely collect data that others study. Virtually
any organization is a potential source of data as are individual researchers.
ASI and SRI may help you locate existing data. The organizations and surveys
listed in Appendix A may also lead you to an appropriate database.
Occasionally, by asking agency personnel, individual investigators may uncover
a fugitive or unknown database.
Once
investigators locate a database they have to find a way to access it. Some
questions can be answered by working with published statistics, but often one
needs access to the database. Depending on who holds the database, and the contractual
provisions for releasing data, a researcher may either access the database, or
a portion of it, directly or have the database holder perform the analysis.
Nevertheless, researchers cannot assume that access to a database is
guaranteed. Organizational policies, contractual guarantees, and researcher
inclination may become important factors in any agreement to allow someone to
access data.
If
access is obtained the investigators need to verify the content of the
database. Occasionally, the wrong
database is accessed or incorrect information about a variable and its coding
is sent. The researchers also need to review information on the sample, the
measures, when the data were collected, how they were collected, and coding
procedures to infer data quality.
Chapter 10 examines techniques to
combine indicators to form a single measure. Data are collected on each
indicator and stored as separate van ablest in a database. Investigators may,
and often do, analyze each variable separately. Nevertheless, as you should
observe, combining variables cai give a more accurate, fuller picture of the
phenomenon under consideration Examining a long list of single variables can
easily mislead a busy decision maker, who may be mistakenly impressed by one or
two striking findings.
NOTES
1. J.C. Fortune and J. K. McBee.
"Considerations and Methodology for uk Preparation of Data Files," in
Secondary Analysis of Available Dale Bases, edited by D. J. Bowering (San Francisco:
Jossey-Bass, New Direc tions for Program Evaluation, 1995).
2. For a further
discussion of confidentiality see: C. P. Kaplan and T. L. Var. Valley, Census
'80: Continuing the Fact finder Tradition (Washington D.C.: US. Bureau of the
Census, 1980), pp. 65-79; "Plenary Session V. Confidentiality Issues in
Federal Statistics," First Annual Research Conference Proceedings (Washington,
D.C.: US. Bureau oi the Census. 1985), pp. 199-233. Summary of current law is
based on material found on p. 71 of Census '80.
3. Material on interview selection and training
<s from Statistical Surveys: Census Bureau has Creditable Employment and
Economic Data-collection Procedures (Washington: U.S. General Accounting
Office, GAO/ IMTEC-86-8. 19SG), pp. 10-13.
4. For a detailed discussion on imputation see
"Concurrent Session IX. Nonrespense Adjustment Procedures in Sample Surveys,"
First Annual Research Conference Proceedings, pp. 421-470.
5. siatistical
surveys, pp. 14-19, discusses quality control procedures.
6. A 54 Billion
Census in 1990? Timely Decisions on Alternatives If 1980 Procedures Can Save
Millions (Washington, D.C- General Accounting Office, GGD-82-13), p. ii.
7. 1.1. Mitroff,.R. O.
Mason, and V. P. Barabba, The 1980 Census: Policy-making Amid Turbulence
(Lexington, MA: Lexington Books, 1983), p. 48, quotes D. Levine, Deputy
Director of the US. Bureau of Census. The book details the political, legal,
and statistical aspects of the under-count The problem of the undercount and
recommendations for 1990 are found in The Bicentennial Census: New Directions
for Methodotogy in 1990, edited by C. F. Citro and M. L. Cohen (Washington:
National Academy Press, 1985), chap. 5.
8. The Bicentennial
Census, p. 49.
9.
Ibid., pp. 104-114, discusses the pretesting program for the 1990
census.
10. For a full description
and evaluation of estimation methodology see Estimating Population and Income of
Small Areas (Washington, D.C' National Academy Press, 1980), pp. 12-19.
11. "Problems
with Population Bases," SCHS Statistical Primer, Vol. 1, no. 2 (Raleigh:
North Ca Yolina State Center for Health Statistics, 1980).
TERMS FOR REVIEW
After reading this chapter you should be able to explain the
following terms:
secondary data 100
percent count
Current Population Surveys macrodata
Survey of Income and Program Participation microdata
Decennial Census of Population and
Housing vital statistics
census undercount
QUESTIONS FOR REVIEW
The following questions should
indicate whether you have a basic competency' in this chapter's material.
1. Discuss
the. importance of locating and using secondary data, When would you recommend
collecting original data instead of relying on secondary data?
2. A regional
agency plans to study the relationship between highway features and accident
rates.
a.
Briefly describe how you would go about
locating existing databases.
b.
Briefly desirable how you would decide
whether existing databases were adequate for the planted study.
c.
Assume that you have obtained a computerized copy of a database.
What
information would you need to be able to use and interpret the data?
3. Ycu
analyze a computerized database. You examine it and note that it reports 70
managers and 200 non managers. The documentation indicates the data represent
60 managers and 21C non managers. What would you do?
4.
Compare and contrast the data
collection procedures for the Current Population Survey and the Survey of
Income and Program Participation.
5.
Why is an undercount of the
population during the Decennial Census treated as a serious problem?
6. In conducting a
survey regularly, such as the Decennial Census, what are the trade-offs between
changing a question's wording or its responses and keeping the wording the
same?
7. Why do local government planners consider
census data important?
8. Briefly contrast the two forms used in the
Decennial Census (the 100 percent count and the long form). Defend the use of
two forms as opposed to asking everyone all the questions included on the long
form.
9. What is the value of reinter viewing survey
respondents? What procedures would you recommend for reinter viewing
respondents? 10. What is the purpose
of post-census evaluations?
PROBLEMS FOR HOMEWORK AND DISCUSSION
1. The legislature of your state has funded a
limited project to aid the poorest areas in your state in economic development
You work for the agency that will implement and oversee the project monies. You
have been asked to select no more than six counties or cities to receive
project funds. The primary criterion for selection will be median family
income. Based on information in the County and City Data Book:
a. Select the counties or municipalities you
would recommend for project funding.
b. For the recommended governments also present
data on:
(1)
income
(2)
percentage of adults who are high school graduates
(3) labor
force characteristics
c. Use all the information and write short memo
justifying your choices,
d. Compare your recommendations with your
classmates'. Can your class as a whole reach a consensus on which communities
should receive funding?
2. Refer to Example 9.5. In Census Tract A:
53 percent of the
adults are high school graduates 100 percent of the population over 5 years of
age speak English 1,105 rent housing median family income of female householder
with no husband present and children under 18, 58,774 median family income in
census tract, 512,580
a. If the center were to act as a clearinghouse
of information to the entire community, as part of an outreach effort would you
recommend it start with information on tenant's rights, high school equivalency
requirements and resources, or English language resources.?
b. Based on the above data what types of
eligibility requirements would you initially suggest to make sure that the
children needing the services the most will receive them? (Clearly your
recommendations will be tentative and they will be refined as you become more
familiar with the community and its needs.)
3. Go to ASI or SRI and find recent data on one
of the following topics or a topic assigned by your instructor
industrial
accidents
foreign languages
studied by elementary school children
infant mortality
public employee
benefits
factory
automation
number of
graduates of MPA programs
starting salaries
of college graduates
(For the data
examine and evaluate sample documentation.)
4. Write a procedure for your own use outlining
how to document a study lhat you might conduct
5. Imagine that you work in a large agency where
the agency as a whole and its various subdivisions regularly conduct studies.
Would you recommend that the agency create an archive containing collected
data? Justify your answer. What criteria would you recommend for deciding what
studies to include in the archive? What documentation would you require far
each archived study?
6. A local government conducts a citizen survey
every two years. High school juniors and seniors interview a sample of
citizens. Outline your recommended training program for the students. Would you
suggest reinterviewing some of the respondents? Explain. Briefly describe the
reinterviewing procedures you would use.
7. A Class Project: Update the data in the Perry
and Berkes study (see Example 9.4). (This requires a relatively large class and
probably a TA since data should be gathered on all 50 states and put into a
computerized database. After reading Chapter 10 the class can try its hand at
replicating Perry and Berkes analysis and reaching its own conclusions.) a.
Decide what variables on which to gather data. b. Decide what years on which to
gather data.
RECOMMENDED FOR FURTHER READING
The U.S. Census Bureau is an
excellent source of information on the Decennial Census, other censuses, their
use and methodology. For current materials, see the Bureau of the Census
Catalogue. Census '80, published by the Bureau in 1980, gives an overview of
the history of the Decennial Census and has essays on its use by demographers,
geographers, planners, and
businesses.
businesses.
For Information on secondary analysis
see Secondary Analysis of Available Data Bases, edited by D. J. Bowering (San
Francisco: Jossey-Bass, New Directions for Program Evaluation, 1984). This
collection includes Fortune and McBee's detailed essay on merging and verifying
databases.