0% found this document useful (0 votes)
32 views

Rubin - Multiple imputation after 18+ years

Uploaded by

lx20010516
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Rubin - Multiple imputation after 18+ years

Uploaded by

lx20010516
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Multiple Imputation After 18+ Years

Donald B. RUBIN

Multiple imputation was designed to handle the problem of missing data in public-use data bases where the data-base constructor
and the ultimate user are distinct entities. The objective is valid frequency inference for ultimate users who in general have access
only to complete-data software and possess limited knowledge of specific reasons and models for nonresponse. For this situation
and objective, I believe that multiple imputation by the data-base constructor is the method of choice. This article first provides a
description of the assumed context and objectives, and second, reviews the multiple imputation framework and its standardresults.
These preliminary discussions are especially important because some recent commentaries on multiple imputation have reflected
either misunderstandings of the practical objectives of multiple imputation or misunderstandings of fundamental theoretical results.
Then, criticisms of multiple imputation are considered, and, finally, comparisons are made to alternative strategies.
KEY WORDS: Confidence validity; Missing data; Nonresponse in surveys; Public-use files; Sample surveys; Superefficient
procedures.

1. THE PROBLEM MULTIPLEIMPUTATION


WAS ence (in the traditional complex survey sense of Neyman,
DESIGNED TO ADDRESS Cochran, and Hansen) in the difficult real-world situation
Missing values are a problem in many data sets and where (1) ultimate users and data-base constructors are dis-
seem especiallycommonin the medicalandsocial sciences. tinct entities with different analyses, models, and capabil-
For nearlytwo decadesI have been advocatingand devel- ities, and (2) there typically is no one accepted reason for
the missing data.
oping the use of multiple imputationto address aspects
In Section 2 multiple imputation is reviewed, with par-
of this problem;early documentsinclude Rubin (1977a,
ticular emphasis given to how it was designed to satisfy
1977b, 1978, 1980, 1983), Herzog and Rubin (1983), Ru- the stated objectives in the assumed environment for ulti-
bin and Schenker(1986), and the basic reference Rubin mate users. This review of critical points of the theory and
(1987). There are situationswhere multipleimputationis intended practice of multiple imputation minimizes techni-
appropriate,and, as with any statisticaltool, there are oth- cal details so that essential statistical points will be more
ers whereits applicationis morequestionable.Originallyit transparentthan in the theoretical material in Rubin (1987),
was viewed as being most appropriatein complex surveys which requires substantial familiarity with, and acceptance
that are used to createpublic-usedatasets to be sharedby of the relevance of, both randomization-based and Bayesian
manyultimateusers,althoughover the years,it has proven inference. Then, in Section 3, current concerns about mul-
valuablein othersettingsas well. tiple imputation are discussed with the benefit of the sim-
For the context for which it was envisioned,with data- plified theory. Finally, competing techniques are evaluated
base constructorsand ultimateusers as distinct entities, I for their utility in the assumed context and are found to be
firmly believe that multiple imputationis the method of less effective than multiple imputation.
choice for addressingproblemsdue to missingvalues:alter-
1.1 Assumed Environment for Ultimate Users
nativemethodseitherrequirespecial knowledgeand tech-
niquesnot availableto typicalusersor produceanswersthat Public-use (shared) data bases are analyzed by many ul-
are generallynot statisticallyvalid for scientificestimands. timate users with varying degrees of statistical expertise
This is a strongstatement,and it is clear that its accuracy and computing power, and with different scientific ques-
mustdependon the class of problemsto whichit is applied. tions and objectives. Typically such users have available
to them a number of standard complete-data techniques.
Consequentlythis articlebegins with a descriptionof the
These include various stand-alone routines such as ones for
assumedstatisticalcomputingenvironmentfor the ultimate
ordinary least-squares regression, logistic regression, factor
usersof shareddata-basesandof ourobjectivesfor handling
analysis, variance components estimation, proportional haz-
missing datain this environment.It is especiallyimportant ards models, etc., and various packages of programs such as
to providethis backgroundto emphasizethat the goal of SAS, BMDP, SPSS, etc. Also, there may be available rou-
multiple imputationis to provide statisticallyvalid infer- tines for inference in the presence of missing data under
particular models (e.g., Schafer 1995), complete-data man-
agement routines for merging files, subsetting data, deleting
Donald B. Rubin is Professor, Department of Statistics, Harvard Uni-
versity, Cambridge, MA 02138. This work was partially supported by Na- cases and variables, applying transformations, and creating
tional Science Foundation Grant SES-92-07456 and partially by the U.S. new variables, or various resampling programs to create
Census Bureau through a subcontract to Datametrics Research, Inc. from simulated replicate data, principally jackknife and bootstrap
NORC. Very helpful comments on earlier drafts were made by J. Brand,
R. J. A. Little, X. L. Meng, F. Scheuren, and editorial reviewers. Also,
routines.
thanks are due to R. E. Fay for his continuing interest in multiple imputa-
tion and for his special examples, which helped stimulate the formulation
here of superefficient multiple imputation and the associated new results. ? 1996 American Statistical Association
Finally, David Binder's comments on presentations of this material are Journal of the American Statistical Association
gratefully acknowledged. June 1996, Vol. 91, No. 434, Applications and Case Studies

473

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
474 Journal of the American Statistical Association, June 1996

Essentially all public-use data sets have missing values, sample mean. These latter quantities can be important for
typically not of any nice neat type. In general, ultimate users inference and design, but they are not scientific estimands
have neither the knowledge nor the tools to address miss- in my definition because they are functions of sample size,
ing data problems satisfactorily. Even if some ultimate users sample design, response rates for a particular survey, meth-
do have adequate resources for modeling and computation, ods for handling missing values, and scientific estimands
data-base constructors typically know more about reasons such as population means and variances.
for nonresponse and have better access to confidential and The distinction between estimates of scientific estimands
detailed information not released for public use (e.g., ex- and measures of their uncertainty is an old one in statis-
act addresses and neighborhood relationships, hourly blood tics; see, for example, Fisher (1925, p. 724) where a mea-
pressure readings and doctor indicators), information that sure of uncertainty associated with an estimate is called an
can be useful for modeling missing data. Moreover, ulti- "ancillary" statistic, that is, a subordinate or supplemental
mate users should be focused on their substantive scientific statistic. In Fisher's context, the estimate was the maximum
analyses and for these, missing data are generally simply likelihood estimate and the ancillary statistic was the sec-
a nuisance. My conclusion is that "correctly" modeling the ond derivative of the log-likelihood, but the distinction is
missing data must be, in general, the data constructor's re- relevant to more general estimates and associated measures
sponsibility. of uncertainty, as we see in the next section.
We, that is, data-base constructors and statistical software
designers, have no direct control over what ultimate users 1.4 What is Meant by Statistically Valid?
will do with their arsenal of tools. We cannot stop users
from doing bad science, but if possible we should facilitate In the context of shared data bases supporting analyses by
their ability to do good science with their available tools, many users, I believe that statistically valid must be a fre-
even when data sets suffer from missing values. quency concept, averaging over randomization distributions
generated by known sampling mechanisms (used to collect
data) and posited distributions for the response mechanisms
1.2 Achievable Basic Objective (the processes underlying nonresponse). In standard scien-
One achievable basic objective in such a setting is the tific surveys, the sampling mechanism is known but the non-
following: Each tool in the ultimate users' existing arsenals response mechanisms is rarely fully known and so typically
can be applied to any data set with missing values using the must be posited, either implicitly or explicitly.
same command structure and output standards as if there Bayesian validity is also important, but is far more diffi-
were no missing data. The only additional software that is cult to achieve in this context because it requires far more
allowed to be required comprises entirely general macros compatibility between the data-base constructor and the an-
that can be applied to any complete-data analysis and in- alyst. In fact, in general I do not believe it can be achieved
complete data set. Certain ad hoc methods of handling miss- in any real sense in the context of the basic objective to use
ing data, such as "complete-case analysis," "available-case existing complete-data tools with shared data bases. In any
analysis," and "fill-in with means" (e.g., see Little and Ru- case, no Bayesian should object to achieving frequentist va-
bin 1987, part I), satisfy this basic objective and so have a lidity; effectively, Bayesians want and promise much more:
certain appeal. The problem with such methods is that they calibration conditional on the data in addition to uncondi-
typically yield statistically invalid answers for scientific es- tional calibration (e.g., in Rubin 1984, I call such frequency
timands; "scientific estimands" and "statistically valid" re- calculations "Bayesianly relevant and justifiable").
quire definition. First and foremost, for statistical validity for scientific
estimands, point estimation must be approximately unbi-
ased for the scientific estimands, averaging over the sam-
1.3 Scientific Estimands pling and the posited nonresponse mechanisms (e.g., filling
By a scientific estimand I mean a quantity of scientific in zeros or means is not generally acceptable). Second, in-
interest that can be calculated in the population and does terval estimation and hypothesis testing must be valid in
not change its value depending on the data collection de- the sense that nominal levels describe operating character-
sign used to measure it (i.e., it does not vary with sample istics over sampling and posited response mechanisms. Two
size and survey design, or the number of nonrespondents, versions of such frequentist validity for nominal levels are
or follow-up efforts). Letting X be the array of all back- especially important to distinguish when assessing multiple
ground (e.g., stratification) information fully observed in a imputation.
population and Y be the arrayof outcome information in the Using terminology from Rubin (1987, pp. 117-118),
population that is to be sampled in the survey, a scientific "randomizationvalidity" means that, for interval estimates,
estimand is a function of X and Y, say Q = Q(X, Y). Sci- "actual interval coverage = nominal interval coverage," and
entific estimands include population means, variances, cor- for tests of hypotheses, "actual rejection rate = nominal re-
relations, factor loadings, regression coefficients, and these jection rate."Randomization validity is the naturalobjective
quantities within strata or domains, but exclude the sam- in most survey contexts. In standard asymptotic situations,
pling variance of a sample mean under a particularsampling a complete-data estimate Q of an estimand Q has a normal
plan and the expectation of the complete-data sample mean sampling distribution centered at Q with sampling variance
when missing values are filled in with zero or the observed (or more generally, variance-covariance) consistently esti-

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
Rubin: Multiple ImputationAfter 18+ Years 475

mated by the statistic U, where the randomization distribu- ter confidence-valid procedure exists (i.e., one with shorter
tion is that generated by the sampling indicator I given fixed intervals), which is also randomization valid, but in gen-
(X, Y)-the sampling mechanism. In this case we have eral this is not achievable. An attendant advantage, when
the best confidence interval is randomization valid, is that
E(QJX1 Y) =Q(1) the associated measure of precision can be thought of as
and a true rather than approximate weight (again, in the sense
of Fisher 1925, p. 724- also see Fisher 1934, criticizing
E(UIX,Y) var(QIX,Y), (1.2) Neyman 1934, on this point).
and then randomization validity is not only desirable but
theoretically achievable. The precision of Q is measured 1.5 Supplemental Objective Concerning
by U-1, which plays the role of the ancillary statistic and Statistical Validity
can be used as a "true weight" (Fisher 1925, p. 724) for We are now prepared to supplement the Achievable Ba-
combining estimates. sic Objective when faced with missing values, regarding
A more generally achievable objective, however, is "con- the ability to apply standard complete-data statistical tools,
fidence validity," meaning that for interval estimates, "ac- with an objective concerning statistically valid inference for
tual interval coverage > nominal interval coverage," and for scientific estimands. It is easy to ask for more than is possi-
tests of hypotheses, "actual rejection rate < nominal rejec- ble and then do something misguided when attempting the
tion rate." For confidence validity with complete data, we impossible. We first consider a hopeless objective, which is
replace (1.2) with commonly sought, and then state an achievable one.

E(UIX,Y) > var(Q X,Y). (1.3) Hopeless Supplemental Objective. Each complete-data
statistical tool can be applied to each incomplete data set to
If (1.3) is satisfied but (1.2) is not, then U1 is only an obtain the same inference as if the data set had no missing
"approximate weight for the value of the estimate" (Fisher values.
1925, p. 724).
The distinction between randomization validity and con- This objective is clearly impossible because of the lost in-
fidence validity can be quite important when dealing with formation, but nevertheless, it guides some thinking about
approximate procedures, which necessarily arise with non- how to handle missing data. It is analogous to saying that
response in public-use surveys, and this distinction appears the objective of a survey is to obtain the same answer as a
in Neyman (1934), which is the foundation for statisticians' complete census, and it can lead to an "operationsresearch"
current view of frequentist validity in surveys. Here Ney- objective of creating imputations for missing values that are
man (1934, pp. 562-563) defined confidence intervals, con- as close as possible to the truth (i.e., fill in missing values to
fidence coefficients, and confidence limits, and these defini- minimize some objective function). Our actual objective is
tions remain the accepted mathematical definitions of these valid statistical inference not optimal point prediction un-
terms (e.g., Lehmann 1959). In particular,confidence limits der some loss function, and replacing the former with the
are statistics defining an interval such that, in repeated ex- latter can lead one badly astray. For example, suppose we
perience, the estimand lies in the confidence interval with have a coin that, in truth, is biased .6 heads and .4 tails.
probability greater than or equal to the confidence coeffi- This known truth is model A, whereas model B asserts that
cient; the shorter the interval satisfying this constraint, the the coin has two heads. Using model A for creating im-
better. putations (i.e., future predictions) yields a hit rate (agree-
A simple example illustrates the wisdom implicit in Ney- ments between predictions and outcomes) of .6 x .6+ .4 x .4
man's definition. Consider a particular situation with two .52, whereas using model B for predictions yields a hit
different confidence-valid procedures for creating confi- rate of .6. This does not mean that model B is better than
dence intervals with confidence coefficient 95%. Procedure model A for handling missing values. Filling in missing
1 produces intervals that are always shorter than the inter- values using model B yields the invalid statistical inference
vals produced by Procedure 2, and moreover, Procedure 1 that in the future all coin tosses will be heads, clearly in-
has actually probability > 95% of covering the estimand, consistent for the estimand Q = fraction of tosses that are
whereas Procedure 2 has only the nominal 95% probabil- heads, whereas using model A yields consistent estimates
ity of covering the estimand. Clearly, Procedure 1 is sci- for all such scientific estimands. The lesson is simple: Judg-
entifically and statistically superior to Procedure 2 because ing the quality of missing data procedures by their ability to
it provides tighter inferences with greater confidence, and recreate the individual missing values (according to hit-rate,
Neyman's definition and desiderata agree with this fact. Re- mean square error, etc.) does not lead to choosing proce-
quiring exact agreement between nominal and actual levels dures that result in valid inference, which is our objective.
as a desideratum for validity would lead one to reject Proce- Statistical validity in our context is difficult because the
dure 1 as invalid and choose Procedure 2, clearly a mistake. answer that results from applying a complete-data analy-
It is for this reason that confidence validity is more funda- sis to an incomplete data set is generally invalid unless the
mental than randomization validity for interval estimation. complete-data analysis in the absence of missing data is
Of course, if we have a procedure that is confidence valid valid-the ultimate user's responsibility, and the reasons
but not randomization valid, there is the hope that a bet- for missing data are correctly modelled-the data-base con-

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
476 Journal of the American Statistical Association, June 1996

structor'sresponsibility.We can essentiallynever be sure sociated variance-covariance matrices {U*1, I... U*m}, and p
that the data-baseconstructor'smodel is appropriate,but values, that is, the final repeated-imputation inferences, are
assumingit is, and assumingthat the ultimateuser is per- derived in chapter 3 in Rubin (1987) under the Bayesian
forming an analysis that would be valid if there were no paradigm for survey inference (introduced in chap. 2 of
missing data,we can expect that the ultimateuser will ob- Rubin 1987), assuming that the multiple imputations are
tain a valid inference. repeated imputations.
The key Bayesian motivation for multiple imputation is
Achievable Supplemental Objective. Assuming that the given by result 3.1 in Rubin (1987). Ignoring both tech-
ultimateuser's complete-dataanalysisis statisticallyvalid nical details and indicator variables for sampling and re-
for a scientificestimand,the answerthat results from ap- sponse, the results and its consequences can be easily stated
plying the same analysismethodto an incomplete-dataset using the simplified notation that the complete-data are
remainsstatisticallyvalid for the same scientificestimand Y = (Yobs, Ymis), where Yobs is observed and Ymis is miss-
assumingthe truth of the data-baseconstructor'sposited ing. Specifically, the basic result is
model for missingdata.
I doubt that there is a much strongerobjectiveregard- P(Q|Yobs) J P(Q|Yobs, Ymis)P(Ymis Yobs) dYmis,
ing validitythat we can achieve in this context where the
ultimateuser andthe data-baseconstructorare distincten- or in words,
tities. Multiple imputationwas designed to satisfy both (
actual posterior
achievableobjectivesby using the Bayesianandfrequentist
distribution of Q
paradigmsin complementaryways: the Bayesian model-
based approachto create procedures,and the frequentist -AV E complete-data posterior
(randomization-based approach)to evaluateprocedures. distribution of Q
2. REVIEWOF MULTIPLEIMPUTATION
where AVE[ ] refers to the average over the repeated im-
FRAMEWORKAND RESULTS putations, which are draws from p(Ymis Yobs), which is the
posterior predictive distribution of missing data given the
Multiple imputationsfor the set of missing values are observed data. Two simple consequences follow (Rubin
multiple sets of plausiblevalues; these can reflect uncer- 1987, result 3.2). The first concerns the final estimate of
tainty under one model for nonresponseand across sev- Q
eral models. Each set of imputationsis used to create a
completeddata set, each of which is to be analyzedusing E (Q IYobs) = E [E (Q IYobs v Ymis) IYobs],

standardcomplete-datasoftwareto yield "completed-data" or in words,


statistics,which are typically complete-dataestimates,Q,
associatedvariance-covariancematrices,U, and p values. ( Posterior mean repeatedcomplete-data
The complete-datastatisticsQ andU aregeneral;for exam- of Q J [posterior meansofQ J
ple, U may be obtainedby mathematicalanalysis,lineariza-
tion methods,balanced-repeated replication,the jackknife The second concerns the final variance of Q:
(see, e.g., Krewskiand Rao 1981), the bootstrap(see, e.g., V (Q IYobs) = E [V (Q Yobs, Ymis) IYobs]
Efron 1994), or special routinesfor complex surveyssuch
as SUDAAN or VPLX (see, e.g., Fay 1990). But no matter + V[E(QIYobs, Ymis) IYobs],
how Q and U arecalculatedwith completedata,once miss- or in words,
ing data are filled in by imputation,they can be calculated
as if the data set were complete. ( Posterior ) AVE Repeatedcomplete-data1
variance of Q variances of Q
2.1 Repeated Imputations
A theoreticallyfundamentalform of multiple imputa- [Lposterior means of Q
+ VAR repeatedcomplete-datal
tion is repeated imputation (Rubin 1987, pp. 75-76). Re-
peatedimputationsare drawsfrom the posteriorpredictive where VAR refers to the variance over the repeated im-
distributionof the missing values undera specific model, putations. These simple relationships, which follow from
that is, a particularBayesianmodel for both the data and standard probability calculations, underlie the repeated-
the missing-datamechanism.The m complete-dataanalyses imputation inferences recommended for practice.
correspondingto the m imputationsunderone modelresult
in m repeatedcompleted-datastatistics,andthese are com- 2.2 Repeated-Imputation Inferences
bined to form one repeated-imputation inference that ap- The essential features of the repeated-imputation infer-
propriatelyadjustsfor nonresponseunderthe modelusedto ence are the following. The repeated-imputation estimate
createthe repeatedimputations.The valuesof the complete- iS
datastatisticsQand U calculatedon the m completeddata m
set are Q*i, ,Q*m and U*1,... , *m The basic proce-
..
Qmn Z Q*i/m, (2.1)
dures for combining the m estimates {Q*i . .. vQ*m}, as- 1=1

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
Rubin: Multiple ImputationAfter 18+ Years 477

of Qmis
andthe associatedvariance-covariance 2.4 Proper Multiple Imputation

-m?1 A key concept underlyingthese randomization-based


TM =Um +? Bi (2.2)
evaluations is that of proper multiple imputation,whose
m
mathematicaldefinitionis purely frequentist,since it in-
where volves expectationsgiven thatthe populationvalues (X, Y)
m are fixed. The crucialresult is that when (1) the multiple
U*i/m within-imputation
variability, (2.3) imputations are properfor (Q, U), and (2) the complete-
Um = E
1=1
data inferencebased on (Q, U) is randomization-valid for
Q, then the large-m repeated-imputation inference given
and by (2.5) is randomization-valid for the scientificestimand
m Q, no matter how complex the survey design. Whether a
multipleimputationprocedureis properdepends,in gen-
Bm= E (Q*-Qm)(Q*I- m) /(m-1) eral, on which complete-dataestimates,Q, and associated
1=1
variance-covariance matrices,U, are being considered.The
between-imputation
variability. (2.4) full definitionis given in Rubin (1987, pp. 118-119); it is
summarizedhere ignoringthe moretechnicalconditionsin
The largem repeated-imputation inferencetreats(Q- Qm) orderto focus attentionon threeessentialconditions.
as a normalrandomvariablewith variance-covariance ma- The definitionof a propermultipleimputationprocedure
trix Tm;notationally,letting m = oc we have treats (X, Y) and the intendedsample (as indicatedby I)
- N(O,
as fixed [exceptfor a minortechnicalcondition-eq. (4.2.9)
(Q- QOO) Too), (2.5) in Rubin 1987], and deals with the fixed but unknownval-
where Too = UOO+ Boo, and the eigenvalues of Boo rela-
ues of the complete-datastatistics(Q, U) in the sampleas
tive to Toomeasurethe fractions of informationmissing if they were estimands.That is, the randomizationdistri-
aboutQ due to nonresponse. butioncriticallyinvolvedin the definitionof propermulti-
The derivationof these expressions follows from the ple imputationis generatedby the responsemechanism,in
BayesianperspectivetreatingQ and Qas unobservedran- which X, Y, and I are fixed, and R is the randomvariable.
dom variableswith normalconditionaldistributionsgiven Becausethe conditionsfor properimputationinvolvelarge
the observed values {Q*i.. IQ*m and {U*l,... U*m}. m, the simplifieddefinitiononly involvesexpectationswith
For details, including specific small-m adjustments,see respectto the responsemechanism.
chapter3 in Rubin (1987), and for more extensiveresults For properimputation,the values of the complete-data
on p values, see Li, Raghunathan,and Rubin (1991), Li, statisticsQ andU createdby fillingin the missingY values,
that is Q*j and U*j, must be approximatelyunbiasedfor
Meng, Raghunathan,andRubin(1991), andMeng andRu-
bin (1993). theircomplete-dataanalogQ and U; thatis, in termsof the
large-maveragesof Q*l and U*j:
2.3 Evaluating Repeated-imputation Procedures Under E(QOQIX, Y, I) Q (2.6)
the Randomization-Based Paradigm
and
The Bayesianparadigm,whichis usedto deriverepeated-
imputationinferences,is formally predicatedon the cor- E(UOQIX, Y, I) U. (2.7)
rectness of all the model specifications.Although this Moreover,Bo, whichis the variance-covariance
of the Q*1
paradigmis ideal for creating procedures,especially in acrossthe m imputations,must be approximatelyunbiased
complicatedsituations,its results cannotbe unequivocally for the randomizationvarianceof QOO:
endorsedfor routinepracticebecause,in practice,we can
never be sure any model assumptionsare correct.Conse- E(BOQIX,Y,I) var(QIOIX,Y,I). (2.8)
quently,the Bayesianly-derived repeated-imputationproce-
Equation(2.6) for properimputationis analogousto (1.1)
dures were evaluatedin chapter4 in Rubin (1987) under
for randomizationvalidity:both requireapproximateunbi-
the randomization-based frequentistparadigmto investigate
asednessof the estimate(Qo or Q) for its estimated(Q or
their sensitivityand robustnessto model deviationsand fi-
Q) over its randomizationdistribution(inducedby the re-
nite m. This paradigmextends that of Neyman (1934) to
sponse mechanismor the samplingmechanism).Equation
includea mechanismfor nonresponsePr(RjX, Y,I) in ad-
(2.8) for properimputationis analogousto (1.2) for ran-
dition to the sampling mechanism Pr(IIX, Y), where I is
domizationvalidity:both requireapproximatelyunbiased
the arrayof fully observedsamplingindicatorsfor which
estimationby the ancillarystatistic(Booor U) for the vari-
values of Y were includedin the survey for observation,
ance of the estimate(QcOor Q) over its randomizationdis-
andR is the arrayof fully observedindicatorsfor response
tribution(inducedby the responseor samplingmechanism).
(i.e., for which componentsof Y that were intendedto be
Also, just as (1.1) and(1.2) togetherimply (at least in large-
observedwere observed).A component,Yij, is observed
samplesurveys)that randomization-valid inferencesfor Q
if both associatedindicators,Iij and Ri3 are one, and is
can be based on the approximation
not observedif eitheris zero. This perspectiveis calledthe
random-response randomization-basedperspective.

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
478 Journal of the American Statistical Association, June 1996

in- cally. The more straightforward


(2.6) and (2.8) togetherimply that randomization-valid conditions,(2.6) and (2.7),
ferencesfor the complete-datastatisticsQ can be basedon typicallywere simplepropertiesof any intelligentimputa-
the approximation tion scheme that tried to track the data. An exampleof a
methodthat does not track the data is "fill in the mean,"
(QcoIX, Y, I) N(Q, Boo),I which although it may satisfy (2.6) for Q = y, fails to do
so for Q = s or for Q = 25th percentile,or to satisfy
where the randomizationdistributionsare inducedby the (2.7) for U = s2/n, etc. Hot-deck(Bootstrap)andrandom-
samplingand response mechanisms,respectively.The re- drawregressionmethodstend to satisfy (2.6) and (2.7) but
mainingconditionfor properimputationhas no directana- fail to satisfy (2.8) until a Bayesian, systematicbetween-
log in complete-datarandomizationvalidity: expression imputationcomponentof variabilityis added(e.g., via the
(2.7) implies that the complete-dataancillarystatistic U, BayesianBootstrap,Rubin 1981), to reflectuncertaintyin
being treated as an ancillarycomplete-dataestimandfor the estimationof populationparameters.
the definitionof properimputation,is approximatelyunbi- The view in 1987, which I still hold today,was summa-
asedly estimatedafterimputation. rized as follows.
2.5 Conclusion Regarding Randomization ValidityWith Conclusion4.1: If imputationsaredrawnto approximate repetitionsfroma
Proper Multiple Imputation Bayesianposteriordistributionof YmiSunderthe positedresponsemech-
anismand an appropriatemodel for the data,then in large samplesthe
The crucialresultregardingthe randomizationvalidityof imputationmethodis proper.... Thereis little doubtthatif this conclu-
the large-rnrepeated-imputationinference,given by (2.5), sion were formalizedin a particularway,exceptionsto it couldbe found.
averagesover both the actualsamplingmechanismandthe Its usefulnessis not as a generalmathematical result,butratheras a guide
to practice.Nevertheless,in orderto understandwhy it may be expected
positedresponsemechanism;it is simpleandholds no mat- to hold relativelygenerally,it is importantto providea generalheuristic
ter how complexthe surveydesign: argumentfor it (Rubin1987,pp. 125-126).
and the
Result4.1: If the complete-datainferenceis randomization-valid This heuristicargumenttreatedthe sampleas the popula-
multiple-imputation procedureis proper,then the infinite-m repeated-
imputationinferenceis randomization-valid tion with estimandQ (andU), wherethe resultingposterior
underthe posited response
mechanism.(Rubin1987, p. 119). distributionof Q was centeredat QO with varianceBoo;
assumingthe Bayesianmodel appropriate[in the sense of
This result follows from combiningthe formalversionsof satisfying(2.6) and(2.7)]andthe sampleslarge,standardar-
(1.1), (1.2), (2.6), (2.7), and (2.8). Essentially,(1.1) and(2.6) gumentspresentedin chapter2 of Rubin(1987) suggested
imply that that typically (Q - Qo)BJ1/2 will have a samplingdis-
tribution(over the response mechanism)that is standard
E(QOOjX,Y) = E[E(QOIX,Y,I)IX,Y]= E(QIX,Y) = Q, normal,therebysatisfyingthe basic conditionsfor proper
and (1.2), (2.7), and (2.8) imply that multipleimputation.
2.6 Include All Variables in a Multiple Imputation Model
E(TOO[X,IY) = E(UooX,Y)+E(BoJX,Y)
To Make It Proper in General
-
E[E(UOoQX,Y,I)jX,Y] The definitionof properconcernsthe situationwhere:
"population" = complete-datasample,"estimands"-com-
+ E[E(Boo|X, Y,I) X, Y]
plete-datastatistics (Q, U), "surveydesign" = the posited
- E(UIX, Y) + E[var(QojIX, Y,I) {X, Y] responsemechanism,the criterionis valid frequencyinfer-
ence, and the method for creatinginferencesis Bayesian
- var(Q)X, Y) + E[var(QOj{X, Y, I) IX, Y] predictiveinferenceusing simulatedvalues (i.e., multiple
imputations).As with any finite populationsurvey where
- var[E(Q.jX, Y, I)JX, Y] valid frequencyinferenceis desiredfrom predictiveproce-
dures:(1) variablesinvolvedin the definitionof estimands
+ E[var(QmIX, Y,I) IX, Y]
(i.e., Q, U) shouldbe predicted,and (2) variablesinvolved
- var(Qo IX, Y). in the surveydesign (i.e., the responsemechanism)should
be used as predictors.More explicitly,when Q or U in-
Thus approximately(2.5) follows, which is the conclusion volves some variableX, then leaving X out of the imputa-
of Result4.1. tion schemeis improperand generallyleads to biasedesti-
Rubin (1987, chap. 4) presentedanalyticresults, simu- mationand invalidsurveyinference.For example,if X is
lation evaluations,and many examples of properand im- correlatedwith Y but not used to multiply-imputeY, then
propermultipleimputationmethods,wherethe evaluations the multiply-imputeddata set will yield estimates of the
were all from the random-responserandomization-based XY correlationbiased towardszero. In a complex survey,
frequentistperspective.The trick in many of the exam- Q, andespeciallyU, dependon stratificationandclustering
ples of properimputationwas to get the variancecondition indicators;consequently,in general these indicatorsneed
(2.8) correct,and it was shownthat when drawingimputa- to be includedas predictorvariablesin imputationmodels
tions to approximaterepetitionsfrom a sensible Bayesian for the multipleimputationschemeto be proper.Minimally,
model, conditions(2.6)-(2.8) typically followed automati- majorclusteringandstratificationindicatorsandsamplede-

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
Rubin: Multiple Imputation After 18+ Years 479

sign weights(or estimatedpropensityscoresof being in the that lack of model fit goes into residualvariance,which in
sample)should be includedin imputationmodels. Ezzati- a Bayesianmodel inflatesthe between-imputation variance
Rice, Johnson,Khare,Little, Rubin,and Schafer(1995) il- of draws (e.g., of regressioncoefficients),therebyleading
lustratessuch effortsandthe resultingvalid inferences. to a large enough Bm to compensatefor an omitted co-
Since with public-usedatasets it is alwaysunclearwhat efficient.Of course, this is an observationbased on some
analysesthe ultimateuserswill conduct,the rangeof statis- experience,not a theorem,but a relatedtheoreticalresult
tics (Q, U) thatmightbe used involvesessentiallyany vari- (Meng 1994, lem. 2) lends supportto this observation.
able or combinationof variablesavailablein the data set, Nevertheless,becauseproblemscan occur when the im-
at least up to some level of interactions.Thus, the danger puter'smodel leaves out importantpredictorvariables,the
with an imputer'smodel is generallyin leaving out pre- data-baseconstructormustincludea descriptionof the im-
dictorsratherthanincludingtoo many,and the advice has putationmodelwith the multiply-imputed database, so that
alwaysbeen to includeas manyvariablesas possible when ultimateusers know which relationshipsamong variables
doing multipleimputation.The press to include all possi- have been implicitlyset to zero.
bly relevantpredictorsis demandingin practice,but it is
generallya worthygoal. For example,in the originalpre- 3. CURRENT ISSUES CONCERNING
scriptionfor the industryand occupationrecodingproject MULTIPLEIMPUTATION
(Rubin1983), thousandsof logistic regressionswere done, Thereappearto be two distinctkinds of concernsabout
each with nearly20 variables,andsome with far fewerthan multipleimputation.The firsttype focuses on its implemen-
20 observations(e.g., 4), in orderto preservethis theme of tation:operationaldifficultiesfor the data-baseconstructor
trying to include all variablesthat might be used to de- and the ultimateuser, as well as the acceptabilityof an-
fine statisticsQ or U; this effortrequiredthe development swers obtainedpartiallythroughthe use of simulation.The
of specializedbut computationallyefficientBayesianlogis- second type concernsthe frequentistvalidity of repeated-
tic regressionproceduresfor sparse data (Clogg, Rubin, imputationinferences when the multiple imputationsare
Schenker,Schultz, and Weidman1991). The possible lost not proper,but appear"reasonable" in some sense.
precisionwhen includingunimportantpredictorsis usually
viewed as a relatively small price to pay for the general 3.1 Is Multiple Imputation Unprincipled or Unacceptable
validity of analysesof the resultantmultiply-imputeddata Because it Uses Simulation?
base.
An early criticism,not much heard anymorebut wor-
2.7 Some Experience With Useful But Improper
thy of response,is that multipleimputationis theoretically
Multiple Imputation
unsatisfactoryandpracticallyunacceptablebecauseit adds
In some cases, impropermultiple imputationscan still randomnoise to the data.In this context,it is criticalto re-
lead to confidence-validrepeated-imputationinferences. memberthatmultipleimputationdoes not pretendto create
This issue will be discussedin more detailin Sections3.5- informationthroughsimulatedvalues but simply to repre-
3.8 in reply to a recent criticism of multipleimputation, sent the observedinformationthis way so as to make it
but the issue has been previouslyconsidered.Rubin and amenableto valid analysisusing complete-datatools. The
Schenker(1987, sec. 7) explicitlyconsiderthe situationin extranoise createdwhen using a finite numberof imputa-
the early industryand occupationexamplewhere some in- tions is the price to be paid for this luxury.
formationused by the imputer(the originaldouble-coded In responseto this criticism,first appreciatethat simula-
sample) is not availableto the data analyst, and demon- tion methodsarebecomingmoreandmorecommonandac-
stratethe resultingpotentialconservativecoverage.Also, ceptedin statistics.Considerjackknifeandbootstrapmeth-
the evaluationsof the results of this projectinclude cases ods for complete-datafrequentistinference (e.g., Miller
where the data analystuses variablesnot used by the im- 1974; Efron and Tibsharani1993), or data augmentation
puter and, for this data set and practicalanalyses,find no (TannerandWong 1987), the Gibbs sampler(e.g., Gelfand
deleteriousconsequences(Schenker,Treiman,and Weid- and Smith 1990; Gelmanand Rubin 1992), and sampling
man 1993;Treiman,Bielby, and Cheng 1989;Weld 1987). importanceresamplingmethods(Rubin1983, 1987, 1988;
Carefuland extensiveevaluationsof this generalsituation, GelfandandSmith 1992)for complete-dataBayesianinfer-
involving variablesomitted by the imputer,are also in- ence. These methodshave now become acceptedcomplete-
cludedin work conductedat ETS in the contextof NAEP, data tools worthy of theoreticalinvestigationand routine
whichfor a decadehas createdmultiply-imputed public-use practicalapplication.
databases (e.g., Mislevy,Johnson,andMuraki1992). Second,multipleimputationhas a distinctadvantageover
Substantialempiricalwork,some given in the Appendix, such methodsin principle,because with multipleimputa-
supportsthe conclusionthat,even if mildly importantpre- tion, the simulationis only being used to handlethe miss-
dictorsare left out of the multipleimputationscheme,the ing information,with reliancefor handlingthe rest of the
repeated-imputation inferences are confidence-valid:with informationleft to the complete-datamethod,be it analytic
fractionsof missinginformationtypicalin carefulsurveys, or simulation-based.Thus, the acceptablenumberof im-
m =3 or 5 works very well, with the complete-datapro- putationscan be muchless than the acceptablenumberof
cedurefor smallrntypicallybreakingdownbeforemultiple simulationsfor a complete-datainference,at least assum-
imputationdoes. A heuristicreason for this robustnessis ing that the fractionof missing information,ty, is modest

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
480 Journal of the American Statistical Association, June 1996

Table 1. Approximate Factor for Inflating Normal Standard Errors in estimands,such as means,variances,correlations,and are
(2.5) to Reflect Finite Number of Imputations, m: v /(v -2% thereforenot appropriatefor public-usedatabases.
Where v = (m - 1)[1 + (1 + m- 1)Bm/ &m]2

3.3 Does a Multiply-imputedData Set Take


m 10% 20% 30% 50% 70% Too Much Storage?
3 1.01 1.03 1.07 1.22 1.53 A multiply-imputeddata set, in termsof needed storage
5 1.00 1.01 1.03 1.08 1.17 locations,is [1 + m (% missing values overall)]times as
10 1.00 1.01 1.01 1.03 1.06 big as the originaldataset, typicallya factorof two or less.
20 1.00 1.00 1.01 1.01 1.03
Forexample,supposethe dataset has 10,000units;20 back-
groundvariablesfully recorded;20 "easy"surveyquestions,
(e.g., < 30%) as commonly occurs in public-use surveys. 5% missing; 30 "moderate"survey questions, 10% miss-
More explicitly,few wouldrecommendbasingstandarder- ing; 30 "difficult"surveyquestions,30%missing:then the
rors on fewer than 100 bootstrapor jackknifesimulations, complete-dataset = 1,000,000 items with 130,000 items
hundredsor thousandsbeing moretypical.In contrast,typ- missing.The associatedmultiply-imputed(m = 5) dataset
ically as few as five multiple imputations(or even three consistsof the complete-dataset of 870,000datavaluesand
in some cases) is adequateunder each model for nonre- 130,000 pointersto the rows of the supplemental130,000
sponse.Two simplecalculationshelp to illustratewhy only x 5 matrix of imputations,for a total of 1,000,000
a few imputationscan be adequate.First, the asymptotic + 650,000 locations.Giventhe appropriatemacros,we can
efficiencyof the repeated-imputation finite-mestimaterel- unpackthe multipleimputationsto create five completed
ative to the infinitem estimateis [1 + (-y/rM)]-1/2 in units datasets only at the time of each of the five complete-data
of standarddeviations,which is close to one with realis- analyses,sequentially,in a mannertransparentto the ulti-
tic fractionsof missing informationand modest m (Rubin mateuser,andusing less thantwice the storageneededfor
1987, table 4.1). Second, Table 1 displays an approximate the originaldata set. Even with more missing values and
factor for expandingstandarderrorsin the infinite-rnnor- more imputationsper missing value, this issue should be
mal distribution(2.5) to reflect finite m. The table shows easily handledwith today'sstoragedevices and simple and
that the expansionof width of a confidenceintervaldue to generalmacros,althoughit can be a burdenwithoutappro-
finite rn is modestfor most practicalcases. priatesoftware.In situationswith nonresponseconfinedto
Finally,even when a particularmultipleimplementation a few variables,an effectivedevice can be to createa rect-
methodhas deficiencies,it can only distortpart of the in- angulardataset with m versionsof these variablesbut one
ference in contrastto an incorrectcomplete-dataanalysis, versionof the fully observedvariables.
which can distortthe entireinference.For example,results
in Heitjanand Rubin (1990) in a particularexample sug-
gest that doing some kind of multipleimputation,even if 3.4 Does It Take Too Much Work to Create Proper or
under a naive model, is far better inferentiallythan stan- Approximately Proper Multiple Imputations?
dard or sophisticatedapproacheswith single imputation.
In some vague sense, if a multiple imputationmethod is Again, my responseto this questionis "too much rel-
ative to what?"It certainlytakes much more work than
20% deficient(80% okay) with 30% missing information,
its total distortionis 20%of 30%,or 6%,implyingthatthe some methodsthat have no generalvalidity.But multiple
inferenceis 94% okay. imputationtakes little more work than othermethodsthat
repeated-imputation
attemptto addressnonresponsevalidlyandwith some gen-
erality.Moreover,essentiallyall the extra work is needed
3.2 Is Multiple ImputationToo Much Work fromthe data-baseconstructor,who mayhavethe resources
For The User? to do the job well, ratherthan the world of ultimateusers
My primaryresponseto this questionis: "Toomuchwork withtheirvariedandlimitedresources.In fact, someexperi-
relativeto doing what?"Multipleimputationis intellectu- ence suggeststhatin practiceit may be substantiallyeasier
ally trivialfor the user.Runningthe identicalcomplete-data to do model-basedmultipleimputationthan to use previ-
softwarerntimes (e.g., 3, 5, or 10 times)andcombiningthe ous approachesbecausewe can applypowerfulmethodsof
results "by hand"is admittedlya burden,but is computa- direct and indirectsimulationunderfull probabilitymod-
tionally trivial given appropriatemacros (which are easy els (e.g., dataaugmentation,the Gibbssampler)andlet the
to write, e.g., in S-Plus;see Schafer 1995, or SAS, Freed- computerdo much of the work previouslydone by expen-
man 1990). I believe it is substantiallyeasier for the user, sive andexhaustinghumaniteration;consider,for example,
even without appropriatemacros, than any other method the recentprojectdealingwith nonmonotonemissing data
that can validly addressnonresponsein any generality.As patternsin NHANES(FahimiandJudkins1993;Schaferet
repeatedlyemphasizedby manyworkersin this area,meth- al., 1993;Ezzati-Riceet al. 1993;Johnsonet al. 1993;and
ods such as "fillin the meanandignore,""availablecases," Little andRubin1993).Forotherexamplesdealingwith the
"treatthe data set as a two-way additivemodel and singly creationof multipleimputationsandrelatedissues,consider
imputewith zerointeraction," etc., arenot statisticallyvalid Kennickell(1991);ChandandAlexander(1994);Paulinand
in any generality,even for point estimationof a varietyof Ferraro(1994);andEltinge,Yansanehand Paulin(1994).

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
Rubin: Multiple ImputationAfter 18+ Years 481

3.5 Can Repeated Imputations Under An Appropriate cient estimateof the populationmean.The situationinvolv-
Bayesian Model Lead to Invalid Inferences? ing superefficientimputationsis moresubtle,however.Sup-
Fay (1991, 1992, 1993; also see Kott 1992) claims that pose that we have a multiply-imputeddata set, but subse-
even when the model used to createrepeatedimputations quentlythe datacollectorbringsforthvaluesof the missing
is "appropriate" in some sense, the resultingrepeatedin- data,therebyallowingus to calculateQ andU. Presumably,
ferencescan be invalid.I believe that this criticismis mis- we would then be inclinedto base our inferencesfor Q on
guided for a varietyof reasons,many of which have been (Q, U) and discardthe imputations.If the imputationsare
exposedin the work and discussionof Meng (1994). Nev- superefficient,however,the standardcomplete-dataproce-
ertheless,I will also brieflyaddressthe issue here because durecan be improvedby using informationin the imputa-
it has receivedattention,andbecauseI believe my results, tions about Q beyond that in Q, informationsuppliedby
althoughless extensive and detailed than those of Meng the imputer(e.g., in the canonicalexample,the knowledge
(1994), will be moretransparentto manyreaders. thatX = a units andX = b unitshave the samepopulation
The kernelof this criticismariseswhen an irrelevantpre- distributionof Y). The imputationsare "stronglysuperef-
ficient"if Q<>is at least as good an estimateas Q despite
dictorX of outcomeY is not usedby the Bayesianmultiple
imputerto createrepeatedimputations,but is used by the the existence of missing data,that is, despite the fact that
Q, is not identicalto Q in the formalsense that
ultimateanalystto define estimands(a case alreadyintro-
duced here in Sec. 2.7 because of historicaldiscussionof var(Q,,, - QjX, Y) > O, (3.1)
it). Morespecifically,supposeX is dichotomous,(a, b), and
Y is normal(0, 1) and independentof X in a population where with vector Q, ">" means at least one eigenvalue
in which X = a units and X = b units are equally rep- >0.
resented. Suppose a stratifiedrandomsample of size 2n More precisely, a multiple imputation procedure is
is taken where there are n units with X = a and n units stronglysuperefficientfor the complete-datastatisticQ if,
with X = b, and furthersupposethat nonresponseis sim- first, Q>, and Q estimatethe same estimand,that is, the
ply like anotherlevel of stratifiedrandomsamplingthat procedureis "first-moment proper"for Q,
results in nr respondentsand no nonrespondentsin both E(QO jX,Y) E(QIX, Y), (3.2)
the X = a sample and in the X = b sample. The esti-
mands are: Y = (Ya+ Yb)/2, the populationmean value and second, QOchas no largervariancethanthe complete-
of Y; and D = (Ya - Yb), the population difference of dataestimateitself:
means,which equalszero. The obviouscomplete-dataesti-
matorsare y = (Ya+ Yb)/2 for Y and d = (Ya - Yb) for D, var(Q,,IX, Y) < var(QIX,Y), (3.3)
with associatedstandardcomplete-datavarianceestimates wherewith vectorQ, (3.3) comparesthe generalizedeigen-
UgandUj, respectively,whichresultin randomization-valid values of the left side with respect to the right side. In
complete-datainferences,at least for large n. the canonicalexampleof Section 3.5, the imputationsare
Now suppose repeatedimputationsfor the 2no nonre- stronglysuperefficientfor Q = d because Q,o = dDosatis-
spondentsare generatedusing a fully exchangeablenor- fies both (3.2) and (3.3).
mal model based on the 2n1 respondents.That is, the im- The generaldefinitionof superefficiencyconcernsthe ex-
putationsfor both the X a and X = b units will be istence of imputationsthat make QCO informativeaboutQ
centered at the observed grand mean Yobs ratherthan at even with knowledge of Q. Bayesian models can be su-
the separateobservedsamplemeans -obs,a and Yobs,b. It is perefficientwhen they incorporateappropriatesmoothing
easy to show thatthe multipleimputationmethodis proper informationin their distributionalassumptions.The resul-
for (-, Up), but it is improper for (d, Uj): (1), the expec- tant draws of Ymiscannotbe sharperthan those from the
tation of doo = (nT1/n)(Yobs,a - Yobs,b) over the response parentdistributionand still lead to valid inferencesfor a
mechanism,that is given (X, Y,I), does not equal d, but varietyof estimands,but multipleimputationsof Ymiscan
(ni/n)d, therebynot satisfying (2.6) [nor (4.2.5) in Rubin be moreefficientthanthe one truevalueof Ymisbecauseof
19871;and(2) BOO, the varianceof the repeatedvaluesof d*1 their multiplicity.For instance,in the canonicalexample,
across repeatedimputationswith fixed no, is greaterthan suppose that the multiple imputationproceduredrew the
the varianceof dc, over the response mechanismby the groupdifferenceeffect from a normaldistributioncentered
factorn/n1, therebynot satisfying(2.8) [nor(4.2.6)-(4.2.7) at I (Pobs,a -Yobs,b) ratherthan at Yobs,a - Yobs,b (as when
in Rubin 19871. this effect is directly estimatedfrom the data) or at zero
(as with the stronglysuperefficientimputationsof sec. 3.5).
3.6 Superefficient Imputations These imputationswouldeffectivelybe additionaldataval-
In this example,the imputationsare "superefficient" from ues, which could contributeto a betterestimateof D, even
the perspectiveof the data analystinterestedin estimating if the actualmissingvalueswere found.The generaldefini-
D becausethe imputationsuse "extra"information,specifi- tion of superefficientimputationsfor Q replaces(3.3) with
cally the knowledgethatthe distributionof Y given X = a cov(QOO,QIX, Y) < var(QjX,Y); (3.4)
is identicalto the distributionof Y given X = b. For a mtore
familiarexampleof superefficiency,if the data are normal strongsuperefficiencyimplies supereffciencybecause(3.1)
with mean zero, then half the samplemean is a supereffi- and (3.3) imply (3.4).

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
482 Journal of the American Statistical Association, June 1996

Table 2. Analysis of Results from Fay (1992)-Nominal 95% Intervals and


Multiple imputation (m = 10) Rao and Shao E(To IX, Y) > var(Qo IX, Y).
Estimated Estimated The resultfollows because(3.2) and(1.1) imply thatQOOis
Statistics Width coverage Width coverage approximatelyunbiasedfor Q, and (1.2), (3.5), and (3.6)
Fay Table 1 Y .24 95% .26 95% together imply that UOO + Boo conservativelyestimates
Ya .33 97% .33 95% var(Qo IX, Y)
Fay Table 2 Y .24 95% .26 95% In the canonicalexample,the strong superefficiencyin
Ya .33 97% .33 95% the imputer'smodel for D implies that the data analyst's
Ys .33 95% .37 95% resultantrepeated-imputation intervalfor D will have at
NOTE: Y represents the sample mean; Ya, sample mean for class a, not used in imputation; least nominal coverageand hence will be confidence-valid;
Ys, sample mean in class s, used in imputation. whetherit is superioror inferiorto othervalid procedures
3.7 Confidence-Proper Multiple Imputation dependson its intervallength and the lengths of intervals
from otherconfidence-validprocedures.
We are now ready to provide an extended definition of The conclusion,howeveris as before:try to imputeus-
proper imputation and state an extended result concern- ing a Bayesianor approximateBayesianmodel that tracks
ing frequency validity. Although the conditions and conclu- the data and the posited responsemechanism-if you do
sion are similar to the major conclusions of Meng (1994), this and your complete-datainferenceis confidence-valid,
they are more direct and not as extensive since they avoid the result will be confidence-validrepeated-imputation in-
the issues of the ultimate user's incomplete-data procedure ferences no matter how complex the survey design.
and congeniality between the imputer's and analyst's mod-
els. The definition of "confidence-proper"multiple imputa- 3.8 Confidence ValidityVersus RandomizationValidity
tion is still in terms of the complete-data statistics (Q, U), in Canonical Example
but involves averaging over both the response mechanism
and the sampling mechanism and allows overestimation of Fay (1991, 1992) effectively claims that (a) wider 95%
between-imputation variability. confidenceintervals with exact 95% (asymptotic)cover-
A multiple imputation procedure is confidence-proper for age are superiorto (b) narrower95% confidenceintervals
the complete-data statistics (Q, U) if the imputations are with at least 95% coverage.Specifically,in the discussion
"first-moment proper" for (Q, U) in the sense of (3.2) and of tables 1 and 2 of Fay (1992), summarizedhere in Ta-
(3.5),
ble 2 after a bit of analysisto produceapproximatecover-
age, the claim is made that the Rao and Shao (1992) (RS)
E(UOO[X,Y)-E(UIX,Y), (3.5) procedure,using single-imputationhot deck, which results
and if Boo conservatively estimates the "excess variance" in uniformlywider intervalsbut with asymptoticcoverage
of QOOover Q: equalto the confidencecoefficient,is inferentiallysuperior
to the multiple-imputation version of the same procedure
E(Boo [X, Y) > var(Qoo [X, Y)- var(Q X, Y). (3.6) (MI), which results in uniformlynarrowerintervalswith
If a multiple imputation procedure is proper for (Q, U) it
asymptoticcoverageat least as greatas the confidenceco-
efficient.Both proceduresas reportedare confidencevalid,
is confidence proper for (Q, U); (2.6) implies (3.2), (2.7)
and I believe many statisticiansand scientistswould agree
implies (3.5), and (2.6) with (2.8) implies (3.6) with equality.
with Neyman'scriteriaandprefersharperintervalswith at
If a multiple imputation procedure is strongly superefficient
least 95% coverageratherthan wider intervalswith exact
for Q and first-moment proper for U, then it is confidence
95% coverage.
proper for (Q, U); (3.2) and (3.5) hold, and (3.3) implies that
Fay (1993) repeats the same criticism as Fay (1992)
(3.6) holds for any Boo. A superefficient multiple imputation
in more extreme examples (e.g., with up to 80% nonre-
procedure for Q is confidence proper for (Q, U) if it is first-
moment proper for U, and if
sponse)andlabels the confidencecoverageof the repeated-
imputationinference as "punishinglyconservative."But
E(Boo X,Y) > var(Qoo-QX,Y); (3.7) fromthe analyst'sperspective,punishinglyconservativerel-
ativeto what alternativeprocedure?Presumablyrelativeto
(3.2) and (3.5) hold, and (3.40 implies that the right side of
what would have happenedif the imputerhad done what
(3.7) is greater than the right side of (3.6), thereby satis-
the analystexpected,that is, had used the analyst'smodel
fying (3.6). A "second-moment proper" imputation method
for imputationratherthanbe superefficient.But thatwould
(Meng 1995, p. 548) is defined by (3.2), (3.5), and equality
haveled to widerintervalswith exactlynominalcoverage-
in (3.7).
a valid procedure,but less preferredaccordingto the Ney-
Analogous to Result 4.1 we have the following result:
man definitionand scientificcriteria,than narrowerinter-
vals with greatercoverage.
Result on Confidence Validity. If the complete-data in-
Of course, the confidence validity of the repeated-
ference based on (Q, U) is confidence valid and the multiple
imputationinference does not mean it yields the best
imputationprocedureis confidenceproperfor (Q,U), then
confidence-validinterval.By our mathematicalanalysisin
inferenceis confidencevalid with
the repeated-imputation
this simpleexamplewe knowthata shorter95%confidence
E(QOO|X,Y) Q intervalcan be found with exact 95% coverage.Also, be-

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
Rubin: Multiple ImputationAfter 18+ Years 483

cause the procedureis confidencevalid but not random- wider along the dooaxis. Using Fay's ratioof samplingco-
izationvalid,inefficienciescan arisewhen combiningvari- variancesof Yoo,a and Yoo,b is equivalentto describingthe
ous estimatesusing the assignedprecisionsas weights.But differencebetweenthese two ellipsoidsby the ratio of dif-
findinga randomization-valid procedurein generalrequires ferences of variances(i.e., of eigenvalues)of 2-00 and doo
extraworkbeyondthe use of standardcomplete-datameth- in the two ellipsoids.The ratio of eigenvalues,or of vari-
ods, andis generallyimpossiblefor the ultimateuserunless ancesin any direction,is relevantto inference,but the ratio
extrainformationis conveyedby the data-baseconstructor. of differencesbetween eigenvalues,Fay's measure,is by
Furthermore,this whole issue seems relativelyunlikelyto itself, irrelevant.
arisein practicebecauseknowledgeof populationparame-
ters by the data-baseconstructormust be unusual.
4. COMPETING METHODS
3.9 Reaching Correct Conclusions When Evaluating If multiple imputationsare proper (confidenceproper)
Multiple Imputation under the posited model for nonresponse, then using
Severalpointsarecriticalin reachingcorrectconclusions the repeated-imputation rules for combiningcomplete-data
concerningmultipleimputation. statistics (Q, U) yields a randomization-valid(confidence-
First,when evaluatingrepeated-imputation inferencesby valid) final inference under the posited response mecha-
analysis or simulation,we need to monitor whether the nism, assumingthat the complete-datainferencewas valid
complete-datainferencewith no missingdatais valid:mul- in the absence of nonresponse.And this holds no matter
tiple imputationfor missing datacannotfix problemswith how complex the surveydesign. Moreover,the combining
complete-dataanalyses (e.g., poor coverage propertiesof rules can be implementedusing completely general soft-
the normalapproximationfor the sample mean with rare ware that is the same for all data sets and all complete-
binomial trials, where, for example, logit transformscan data analyses.Thus multipleimputationand the repeated-
lead to moreaccuratecomplete-datainferences);Rubinand imputationcombiningrules satisfy both the basic objective
Schenker(1986) and Ezzati-Riceet al. (1995) provideex- and the supplementalachievableobjective.
amplesof suchevaluations.Also note whenevaluatingthese Are therecompetingmethodsthat,in some cases at least,
procedureswith the numberof respondentsfixed (e.g., as also satisfy these objectives?Yes, but such competitorsap-
in sec. 4.3 andprob.4-18, in Rubin,1987)thatthe resultant pearto me in generalto have substantiallygreaterdeficien-
answersare conditionalon these quantities,which in prac- cies for the intendedsituationwith ultimateusers distinct
tice arerandom.Moreover,whendoingevaluationstreating entities from databaseconstructors.These competitorsare
the numberof respondentsas random,the theoreticalvari- single imputation,multipleimputationwith some analysis
ances of unbiasedestimatorscan be undefined,since, for for the ultimateuser otherthanthe repeated-imputation in-
any finite sample size, with positive probability,all units ference,and weightingmethods.
will be nonrespondents; in such cases, it makesmore sense
to reportcoveragepropertiesof intervalestimates,which 4.1 Desiderata for Creating -Imputations,
are defined(no respondentsimplies zero coverage)andthe Single or Multiple
objectsof statisticalinferenceanyway.
Also importantin reaching correct conclusions about If imputationsareto be used,thenthe estimatewill be the
multipleimputationis the treatmentof estimatedsampling value of Q calculatedon the imputeddata, or the average
variancesas ancillarystatisticsratherthan as estimatesof of multiple values Q*1,1 1 1,2,.... In broad generality,
scientific estimands.For example,Fay (1992) treatedthe consistentestimationrequiresthat the imputationmethod
ratioof repeated-sampling covariancesas an estimand,and must be first-momentproper,in the sense of (3.2), for a
therebywas led to misunderstand the effectof superefficient variety of statistics Q, for example Q = sample mean,
imputationon inference.This illustrateswhy it is important sample variance,median,25th percentile,factor loadings,
not to confusescientificestimandsandancillaries.In partic- andthesequantitieswithinstrata,domains,subdomains,etc.
ular,Fay (1992, sec. 3) states,in the contextof the canonical For this to hold for each Q in such a range,the imputation
exampleof Section 3.5: method,single or multiple,must in generalnot only track
the positedresponsemechanismbut also mustbe a random
. the design-based approach gives 19 times the covariance of multiple drawmethod;otherwise,it cannotbe first-momentproper
imputation ... such a limitation, if general, imposes severe restrictions on
the validity of the multiple imputation inferences for complex applications,
for Q - y, Q s2, Q 25th percentile,etc.
such as Clogg et al. (1991). Consequently,any imputationmethod that satisfies the
validity objective in generalitymust not only reflect the
Considerthe true samplingvariance-covarianceellipsoid underlyingresponsemechanismbut must also be a random
for (y-O,doo) under the exchangeablenormal repeated- drawmethod.Nonrandomdrawmethodscan be appliedin
imputationschemeandthe samplingellipsoidfor (yO0doo) special cases but requirespecial analysis techniques.The
assigned to it by the repeated-imputation inference;both most carefulworkon this topic of deterministicimputation
have zero correlationbecause Yoo,a = (2yo ? doo) and of which I am aware concernsimputingprobabilitiesfor
Yoo,b =(2WOo- do) have equal variance.The repeated- missingdichotomousvariables(Schenker1989;Schaferand
imputation-assignedellipsoidis outsidebecauseit touches Schenker1991), andthis work revealsthe substantialextra
the correctone at the two points along the yO axis but is effortthat is needed,even in a special situation.

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
484 Journal of the American Statistical Association, June 1996

When an imputationmethodis a random-drawmethod, two kinds of inefficienciesarise.First,becauseimputations


then multiple draws will automaticallyprovide the basis arerequiredto be independentwithineach of K replicates,
for improvedefficiency of estimationand more accurate thereis 1/Kth the amountof dataused for modelingimpu-
inference,and are no more difficultto obtainthan a single tationsas actuallyavailable.Second,small K implies very
random-draw imputation.Thusmultipleimputationis more poor varianceestimation,and often the largestpossible K
attractivethan single imputation,and the largerm the bet- is truly 1, so that actualindependentreplicatescannotbe
ter, no matterhow variancesare to be calculatedfrom the used when tryingto applythe method.
multiply-imputed dataset. Little (1988) providesadditional I believe that Hansenagreedthat the independentrepli-
discussionof desideratafor creatingimputations,which is cate approach was generally inadequate and that the
consistentwith this position. Bayesianmultipleimputationapproachis necessaryto han-
dle missing datain surveys:
4.2 Imputation in Random Independent Olkin: Have you become involvedwith Bayesianstatis-
Replicates-An Alternative tics or othertechniquesdevelopedwithinthe last ten years?
Suppose the samplingmechanismis such that the pri- Hansen: Not really. I guess I endorse and approvethe
marysamplingunits can be randomlydividedinto K repli- kind of thinkingthat Don Rubinhas been doing.
cate groups,each with the same sampledesign. Then with Olkin:With respectto missing observations?
complete data, Q can be calculatedin each replicate,and Hansen: Yes, in missing observations.Sometimes it's
a valid (K - 1) df estimateof the varianceof the average necessaryto do modelingin sample surveys,where prob-
of the K independentestimates,Q = EQ/K, can be found ability samplingmethods aren't applicableas in the case
fromtheirsamplevariancedividedby K. This canbe called of the imputationfor nonresponse.We certainlyhave been
the "randomgroupestimator"(Wolter1985). involved in such methods.In general,I can't say'that we
This approachhas been used with single imputationfor havebeen workingin thatareavery much.Howeverwe are
missing data; I believe the method is appropriatelyat- interestedin the potentialin that setting.
tributedto MorrisHansen,but I cannotfindthe appropriate Olkin: Now, Morris to switch topics somewhat ...
earlyreference(a relativelyrecentreferenceis Kalton1983, (Hansen1987, p. 171)
pp. 112-123). Random-draw imputationsaremadein the K
independentrandomreplicatesof the surveyunits, so that 4.3 Imputations in Hypothetical Independent
the varianceof K values of Q on the imputeddata is a Replicates-Another Alternative
K - 1 df estimatedvarianceof Q (or Q calculatedon the One way to try to get aroundthese inefficienciesis to try
full imputeddata);this estimatereflectsnot only sampling to do first-momentproper(multiple)imputationin K non-
variabilitybut also increasedvariancedue to imputation. independentsamples,i.e., jackknifeor bootstrapreplicates
In personalcommunications,Hansenrealizedthe propriety (e.g., Burns 1990; Efron 1994). This is an interestingand
of the use of multipleimputationswithineach independent useful idea,but it has limitationsin our context.If the data-
replicateto reduce'variancedue to imputation,and real- base constructoris to providethe imputationsfor the ulti-
ized the potentialtremendousloss of efficiencyby doing mateuser,theremustbe a set of imputationsfor each of the
the imputationsindependentlyin each independentrepli- K jackknifeor bootstrapsampleschosen by the data-base
cate. In Rubin(1990), when discussinga relatedapproach constructor,where K shouldbe substantialfor stablevari-
with energy data (Burns 1990), I called the resultingesti- ance estimation(e.g., 100 or more).Moreover,if K - 100
mate of uncertaintyan estimateof "evaluationvariance"in replicatedatasets are consideredtoo manyto provide,then
contrastto "inferentialvariance"because it evaluatesthe the data-baseconstructormust include with the data base
variabilityof the estimationprocedure,perhapsincluding the softwareto be appliedby the ultimateuser to create
excessive variabilitydue to an efficientprocedureused to the imputationson each of the ultimateuser'sjackknifeor
handlemissing data. bootstrappedsamples-in this case, superiorimputations
Assumingthe requisiterichnessof surveydata to allow based on confidentialor detailedinformationmust be for-
the independentreplicateprocedureto be appliedand as- gone. Also, as with independentreplication,the basic ob-
sumingthatthe imputationmethodis first-momentproper, jective is not fully satisfiedfor pointor varianceestimation,
Hansen'smethodalmostsatisfiesthe basic andvalidityob- and more work is requiredof the ultimateuser than with
jectives,withoutneedingthe second-momentconditionsin- a multiply-imputeddata set. Moreover,the varianceesti-
volved in properor confidence-properimputations;I say mationcan be inaccurateinferentially,reflectingexcessive
"almost"becausethe ultimateusermustbe willing to forgo proceduralvariance(see, e.g., Rao and Shao 1992, p. 813,
varianceestimationaspects of the complete-dataanalysis and Burns 1990; incidentally,subsequentlyBurns found
programs,andrely on the potentiallyfar less efficientvari- that multipleimputationworkedwell relativeto replicate
ance estimationvia the replicates,whichdoes not fully sat- imputation,Burns 1991, 1993).
isfy the basic objective.Nevertheless,the lack of need for If neitherthe data-baseconstructor'sbootstrap/jackknife
second-momentconditionsfor valid varianceestimationis imputationsnorthe data-baseconstructor'simputationsoft-
a potentialadvantagerelative to relying on the repeated- ware is deliveredto the ultimateuser, this approacheffec-
imputationinference.Some experiencesuggests,however, tively throws the entire probleminto the ultimateuser's
thatthesepotentialbenefitsoftencannotbe realizedbecause lap, who may well do some sort of misguidedimputation,

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
Rubin: Multiple ImputationAfter 18+ Years 485

which is not even first-momentproper,take bootstrapor discardedthis idea in Rubin(1977b;also see Rubin 1987,
jackknifereplicatesand assumeinferentialvaliditydespite prob. 4-29, and the rejoinderin Meng 1994), because it
badly biased estimates of scientific estimands (see, e.g., seemedto havemeritas a methodof analysisonly in simple
Rubin1994, andEfron 1994,for differingviews concerning cases (see, e.g., Little 1979).For valid analysisin general,I
the acceptabilityof such answers). believethatsuchan approachrequiresextraroutinesfor dif-
ferentcomplete-dataanalyses,andso fails to satisfythe ba-
4.4 Other Imputation-Based Procedures sic objective.As a methodfor storingthe multiply-imputed
data sets, it can take substantiallymore memorythan the
Rao andShao(1992)providea carefulanalysisof how to standardform becauseall the observeddatafor units with
use thejackknifeto adjustanalyseswhenmissingdatahave some missing data are stored many times instead of just
been singly imputedby a particularhot-deck procedure. once. Nevertheless,I wouldcertainlybe interestedin seeing
This addressesan importantproblem because in current any workthatsuggestsI rejectedthis idea prematurely,and
practicemanypublic-usefiles have been singly imputedby thatin fact, it canbe madeto workfor anypositedresponse
the hot deck.But the ultimateuserbearsthe burdenof sub- mechanism,complex survey, and complete-dataanalysis,
stantialextrawork,because "specialcomputationshave to with only the additionof completelygeneralmacros.
be performedto adjustthe imputedvaluesfor eachpseudo-
replicatebefore applyingthe standardjackknife variance
formula"(RaoandShao 1992, p. 813), andnew mathemat-
ical analysis and new softwareapparentlymust be devel- 4.5 Conclusions Regarding Alternative
oped for each new distinct situation(estimatorx missing Imputation Strategies
datapatternx surveydesign x imputationmethod).Conse- Given a situationwith a single imputationmethodthat
quently,this approach,at least at present,fails to satisfythe is first-momentproperfor many statistics,it is almostcer-
basic objectiveof relying only on complete-dataanalyses
tainly a random-drawmethod, and then multiple imputa-
and generalroutines.
tions are easily created,and these are the basis of more
Fay's work is somethingof a moving target,with a va-
accurateinference.Thenthe only reasonnot to createthem
riety of older and newer suggestions,which are described
and recommendto the ultimate user that the multiply-
with little generalityand under special assumptions(e.g.,
imputeddata be analyzedusing repeated-imputation com-
missing completely at random).For example,Fay (1996)
seems now to acceptmultipleimputationas being superior bining rules is fear that the imputationmethod,although
to single imputation(andperhapsto standardweightingad- first-momentproper,is not fully properfor some analy-
justments)but advocatescreating"improper"multipleim- ses. If it is not properbut is confidenceproper,the only
putationsand recommendsanalysisby weightingthe data legitimatefear is lost power and overcoverage,as due to
from the completedunits in one analysis ratherthan us- superefficiency. But then anothermethodis neededfor the
ing the repeated-imputed inference.Recommendingcreat- ultimateuserto recoversuchsuperefficiency-Ibelievespe-
ing "improper" multipleimputationsis suggestingwhatwe cial methodsfor differentsituations.Are suchspecialefforts
shouldnot do, butit is not a prescriptionfor doing anything needed?All realisticexamplesI know suggestthatin prac-
in particular.Presumably,it refers to first-momentproper tice the overcoverageis slight and a minorissue relativeto
multipleimputation(becausewithoutthis even point esti- omittedvariablesthat can lead all methodsastraybecause
mation can be badly biased) but without concern for the of biased estimation and undercoverage.General theory
second-momentconditions(e.g., fixing parametersat point and examples suggest that second-momentpropernessof
estimatesratherthandrawingthemfromtheirposteriordis- Bayesianly-motivated multipleimputationprocedurestyp-
tributions,as in Rubin 1987, ex. 4.1, prob. 13 in chap. 1, ically follows automaticallyif the methodis first-moment
and prob. 46 in chap. 5). But this is not even defined in proper(see, e.g., Huber1976, andresultsreferencedin Ru-
multistagecomplex surveyswith clusterswhere valid im- bin 1987, sec. 2.10). Nevertheless,morework on this issue
putationmodelsneed to be hierarchical,typicallywith lev- is desirableand could make general theoreticalcontribu-
els of parametricstructure:I know what it means to try tionsto understanding the robustnessof Bayesianinference.
to be properin complex surveysby following a Bayesian My conclusion when doing imputationis to do multi-
analysiswith variablesfor the surveystructureincludedin ple imputationundercarefullychosen models and use the
the modelling,but I do not know what the advice to "not repeated-imputation inferencefor analysis.Of course,more
do this"means.Also considerthe examplein Rubin(1983, theoreticaldevelopmentis still desirableon such issues as:
sec. 2.8, also describedin Gelman,Carlin,SternandRubin implicitimputationmodelsthatreflectboth the uncertainty
1995, chap. 15), which stimulatedthe methodsin Clogg et of parameterestimationand the uncertaintyof the values
al. (1991) and illustratesthe need to be Bayesian and in- to impute given a specific predictivefit (van Buuren,van
clude variabilityin parameterestimationin orderto obtain Rijckevorsel,andRubin1993);modelsfor sequentialimpu-
valid frequencyinference. tation(Kong,Liu, andWong 1994;Liu andChen 1995);the
Finally, consider the suggestionin Fay (1996) that the use of importanceweights (Meng 1994);improvedsmallmn
analysisof a multiply-imputeddata set shouldproceedby combiningrulesin especiallydifficultcases (Barnard1995);
replacingeach incompleteunit with multiply-imputedver- and the developmentof realistic nonignorablemodels for
sions of thatunit'sdatawith split weights.I consideredand particularsettings.

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
486 Journal of the American Statistical Association, June 1996

4.6 Weighting Adjustments lem of missing data in shareddata bases; recall Hansen's
Finally,considerweightingadjustmentsfor nonresponse, (1987) commentsreportedin Section 4.2.
which in principle,can be a very effective class of meth-
ods for obtainingapproximatelyunbiasedestimates.Each 4.7 Concluding Comparative Comments
unit's weight is the inverse probabilityof observingthat Multipleimputationis doing well, perhapseven flourish-
unit's patternof missing data given (X, Y) information.If ing, as documentedby recent sessions at the annualmeet-
the patternsof missingdatafor the units are createdby de- ings of the AmericanStatisticalAssociationandotherpro-
sign, as with matrixsampling,these probabilitiesand thus fessional associations(e.g., the InternationalStatisticalIn-
the weightsareknown.Whenthesepatternsof missingdata stitute, American MedicalInformaticsAssociation)andby
are affectedby nonresponse,the nonresponseprobabilities the variety of recent publicationsdocumentingits appli-
need to be estimated.Althoughthis estimationcan be un- cability and extending its theory.It is even becoming so
dertakenby the data-baseconstructor,typically it is only popular that the words "multiple imputation"can appearin
done assumingthe simplestcase of nonresponsewherethe the title of an article with no reference to a publicationby
unitsareeitherrespondents(withall of Y observed)or non- me or any of my coauthors (e.g., James 1995).This change
respondents(with all of Y missing);in this case, the nonre- is occurring for two basic reasons. First multipleimputa-
spondentscan be discarded,and (approximately)unbiased tion is substantially easier for the ultimate user than any
estimatescan be obtainedfrom the respondentsand their other current method that can satisfy the dual objectives
weights,assumingthey accuratelyreflectboththe sampling of reliance only on complete-data methods and general va-
and nonresponsemechanisms. lidity of inference. And second, it is becoming relatively
Several issues arise with the use of weighting adjust- easy for the data collectorto createmultiply-imputedfiles
ments.First,even in the simplestcase of unit nonresponse, usingmoderncomputinghardwareandaccompanyingalgo-
where the shared data base of respondentsis fully ob- rithmicdevelopmentsfor Bayesianmodels. Of course,the
served,manyultimateusers' complete-dataanalysesdo not developmentof simply-usedappropriatesoftwarefor cre-
allow for samplingweights. Second, even with complete- atingmultipleimputationsand analyzingmultiply-imputed
dataanalysesthatcan deal with samplingweights,the con- datais still badlyneeded,but fortunatelyprogressis taking
structionof intervalsandp-valuesthat validly accountfor place in manyplaces (e.g., Schafer1996;Liu 1995;andvan
the fact that nonresponseadjustmentsin the weights are Buuren,van Mulligen,and Brand1995). I expect that with
estimatedfrom dataare not immediatefrom complete-data the availabilityof this software,multipleimputationwill
analyses.Third,with generalpatternsof nonresponse,spe- become the standardmethodfor handlingmissing data in
cial analysismethodsneed to be developedandspecialsoft- public-usedata sets.
ware needs to be written-see Little 1988, sec. 5.1 for the As an anonymousrefereeof this paperwrote:"Multiple
case of monotonemissingdata,but attemptingto do this in imputation is more flexible than replicationand reweight-
a mannerthatallowsthe use of standardcomplete-datasoft- ing for the analysis of surveydatawhen thereare complex
patterns of nonresponse.Case closed."
ware leads to ad hoc approachessuch as "completecases"
and"availablecases,"whichwe havealreadyrejectedas un-
[Received August 1993. Revised June 1995.]
acceptablegeneralsolutions.These threeissues imply that
in general,weightingadjustmentsdo not satisfy the objec-
tives of allowingultimateusersto applystandardcomplete- REFERENCES
datasoftwareto shareddatabases to obtainvalidinference. Barnard,J. (1995),"Cross-MatchProceduresfor Multiple-Imputation In-
A fourth issue with such weighting adjustmentsis that ference:BayesianTheoryandFrequentistEvaluation," unpublished doc-
toralthesis,Universityof Chicago,Dept.of Statistics.
they are focusedon unbiasedestimationandare essentially Burns, E. M. (1990), "MultipleandReplicateItemImputationin a Com-
blindto efficiencyconcerns.In most well-designedsurveys, plex Sample Survey,"in Proceedings of the Bureau of the Census Annual
the plannedpatternof missingdatais suchthatefficientes- Research Conference.
timatesare expectedto resultfrom standardweightedesti- - (1991), "MultipleImputationin the 1989 CommercialBuildings
EnergyConsumptionSurvey:BuildingCharacteristics," CBECSTech-
mates.But nonrespondentsdo not necessarilycreatemiss- nical Note 86, U.S. Departmentof Energy.
ing data in such a benign way, and so standardweighted (1993), "Assessmentof EnergyUse in MultibuildingFacilities,"
estimates,even when approximatelyunbiased,can have ex- ReportDOE/EIA-0555(93)/1,U.S. Departmentof Energy.
cessive variability.Considerdealingwith censoreddataby Chand,N., and Alexander,C. H. (1994), "ImputingIncomeFor An N-
PersonConsumerUnit,"Bureauof the Censuspaperpresentedat the
weighting-data beyondor approachingthe censoringpoint AmericanStatisticalAssociationAnnualMeeting,Toronto.
havezero or very smallprobabilitiesof being observed,and Clogg, C., Rubin, D. B., Schenker,N., Schultz, B., and Weidman,L.
so eithercannotbe dealt with by weightingor imply a few (1991),"MultipleImputationof IndustryandOccupationCodesin Cen-
observationswith dominantweights.Weightingby inverse sus Public-UseSamplesUsing BayesianLogisticRegression,"Journal
of the American Statistical Association, 86, 68-78.
probabilitiescannotcreateestimatesoutsidethe convexhull Efron, B. (1994), "MissingData, Imputation,and the Bootstrap"(with
of the observeddata,and estimatesinvolvingweights near discussion), Journal of the American Statistical Association, 89, 463-
the boundaryhave extremelylarge variance. 478.
For these reasons,weighting, althoughtheoreticallyat- Efron, B., and Tibsharani, R. (1993), An Introduction to the Bootstrap,
London:ChapmanandHall.
tractive in an asymptotic sense, has never really been Eltinge,J. L., Yansaneh,I. S., and Paulin,G. D. (1994), "Assessmentof
claimed to be a complete practicalsolution to the prob- ReportedDifferencesBetweenExpendituresand Low Incomesin the

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
Rubin: Multiple Imputation After 18+ Years 487

U.S. ConsumerExpenditureSurvey,"paperpresentedat the American Association, 89, 278-288.


StatisticalAssociationAnnualMeeting,Toronto Kott,P. S. (1992),"ANote on a Counter-Example to VarianceEstimation
Ezzati-Rice,T. M., Khare,M., andSchafer,J. L. (1993),"MultipleImputa- Using MultipleImputation," technicalreport,U.S. NationalAgriculture
tion of MissingDatain NHANESIII,"paperpresentedat the American Service.
StatisticalAssociationAnnualMeeting,San Francisco. Krewski,D., and Rao, J. N. K. (1981), "InferenceFrom StratifiedSam-
Ezzati-Rice,T. M., Johnson,W., Khare,M., Little,R. J. A., Rubin,D. B., ples:Propertiesof the Linearization, Jackknife,andBalancedRepeated
andSchafer,J. L. (1995),"ASimulationStudyto EvaluateThe Perfor- Replication Methods," The Annals of Statistics, 9, 1010-1019.
manceof MultipleImputationsin NCHS HealthExaminationSurvey," Lehmann, E. L. (1959), Testing Statistical Hypotheses, New York: John
in Proceedings of the Bureau of the Census Eleventh Annual Research Wiley.
Conference, pp. 257-266. Li, K. H., Meng, X. L., Raghunathan, T. E., and Rubin,D. B. (1991),
Fahimi,M., and Judkins,D. (1993), "SerialImputationof NHANESIII "SignificanceLevels FromRepeatedp ValuesWith Multiply-Imputed
WithMixedRegressionandHot-DeckTechniques," paperpresentedat Data," Statistica Sinica, 1, 65-92.
the AmericanStatisticalAssociationAnnualMeeting,San Francisco, Li, K. H., Raghunathan, T. E., and Rubin,D. B. (1991), "LargeSample
Fay,R. E. (1990),"VPLX:VarianceEstimationfor ComplexSurveys,"in SignificanceLevelsfromMultiply-Imputed DataUsing Moment-Based
Proceedings of the Survey Research Methods Section, American Statis- Statisticsand an F ReferenceDistribution," Journalof the American
tical Association, pp. 266-271. Statistical Association, 86, 1065-1073.
(1991), "A Design-BasedPerspectiveon MissingData Variance," Little,R. J. A. (1979),"Maximum
LikelihoodforMultipleRegressionWith
in Proceedings of the 1991 Annual Research Conference, U.S. Bureau Missing Values: A Simulation Study," Journal of the Royal Statistical
of the Census, pp. 429-440. Society,Ser.B, 41, 76-87.
(1992), "Whenare InferencesFromMultipleImputationValid?," (1988),"MissingDatain LargeSurveys"(withdiscussion),Journal
in Proceedings of the Survey Research Methods Section, American Sta- of Business and Economic Statistics, 6, 287-301.
tistical Association, pp. 227-232. Little, R. J. A., and Rubin, D. B. (1987), Statistical Analysis with Missing
(1993),"ValidInferencesFromImputedSurveyData,"paperpre- Data, New York:JohnWiley.
sentedat the AnnualMeetingof the AmericanStatisticalAssociation, (1993), "Assessmentof Trial Imputationsfor NHANES III,"
San Francisco. projectreport,DatametricsResearch,Inc.
(1996),"Alternative Paradigmsfor the Analysisof ImputedSurvey Liu,C. andRubin,D. B. (1996),"M:MultipleImputationSystem,"report,
Data," Journal of the American Statistical Association, this issue, 490- DatametricsResearchInc.
498. Liu,J. S., andChen,R. (1995),"BlindDeconvolutionvia SequentialImpu-
Fisher,R. A. (1925), "Theoryof StatisticalEstimation,"Proceedingsof tations,"'Journal of the American Statistical Association, 90, 567-576.
the Cambridge Philosophical Society, 22, 700-725. Meng, X. L. (1994), "MultipleImputationWithUncongenialSourcesof
(1934),Discussionof "Onthe Two DifferentAspectsof the Rep- Input"(withdiscussion),StatisticalScience,9, 538-574.
resentativeMethod:The Methodof StratifiedSamplingandthe Method Meng, X. L., andRubin,D. B. (1992),"Performing LikelihoodRatioTests
of Purposive Selection," by J. Neyman, Journal of the Royal Statistical WithMultiplyImputedData Sets,"'Biometrika,79, 103-111.
Society,Ser.A, 97, 614-619. Miller,R. G. (1974), "TheJackknife-A Review,"Biometrika,61, 1-17.
Freedman,V. (1990), "UsingSAS to PerformMultipleImputation," Dis- Mislevy,R. J., Johnson,E. G., andMuraki,E. (1992),"ScalingProcedures
cussionPaperSeries UI-PSC-6,The UrbanInstitute,Washington,DC. in NAEP;"Journal of Educational Statistics, 17, 131-154.
Gelfand,A. E., andSmith,A. F. M. (1990),"Sampling-Based Neyman, J. (1934), "Onthe Two DifferentAspectsof the Representative
Approaches
to Calculating Marginal Densities," Journal of the American Statistical Method:The Methodof StratifiedSamplingand the Methodof Pur-
Association, 85, 398-409. posive Selection," Journal of the Royal Statistical Society, Ser. A, 97,
558-606.
(1992), "Bayesian Statistics Without Tears: A Sampling-
Paulin,G. D., and Ferraro,D. L. (1994), "Do ExpendituresExplainIn-
Resampling Perspective," The American Statistician, 46, 84-88.
come?A Studyof Variablesfor IncomeImputation," paperpresentedat
Gelman,A., Carlin,J., Stern,H., andRubin,D. B. (1995),BayesianData the AnnualMeetingof the AmericanStatisticalAssociation,Toronto.
Analysis,New York:ChapmanandHall.
Rao, J. N. K., and Shao, J. (1992), "JackknifeVarianceEstimationWith
Gelman,A., andRubin,D. B. (1992),"InferenceFromIterativeSimulation SurveyDataUnderHot Deck Imputation," Biometrika,79, 811-822.
UsingMultipleSequences"(withdiscussion),StatisticalScience,7, 457- Rubin,D. B. (1977a),"Formalizing SubjectiveNotionsAbouttheEffectof
472. Nonrespondents in Sample Surveys," Journal of the American Statistical
Hansen,M. H. (1987), "A Conversationwith MorrisHansen"(I. Olkin, Association, 72, 538-543.
interviewer), Statistical Science, 2, 162-179.
(1977b),"TheDesign of a GeneralandFlexibleSystemfor Han-
Heitjan,D. F., andRubin,D. B. (1990),"InferenceFromCoarseDatavia dling Non-Responsein SampleSurveys,"workingdocumentprepared
MultipleImputationWithApplicationto Age Heaping,"Journalof the for the U.S. Social SecurityAdministration.
American Statistical Association, 85, 304-314. (1978),"MultipleImputationsin SampleSurveys-A Phenomeno-
Herzog,T., andLancaster,C. (1980), "MultipleImputationModelingfor logicalBayesianApproachto Nonresponse," in Proceedingsof the Sur-
IndividualSocial SecurityBenefitAmounts,"in Proceedingsof the Sur- vey Research Methods Section, American Statistical Association, pp. 20-
vey Research Methods Section, American Statistical Association, pp. 34. Also in Imputation and Editing of Faulty or Missing Survey Data.
398-403. U.S. Departmentof Commerce,pp. 1-23.
Huber,P. J. (1976), "The Behaviorof MaximumLikelihoodEstimates (1980), "HandlingNonresponsein Sample Surveysby Multiple
Under Nonstandard Conditions," in Proceedings of the Fifth Berkeley Imputations," monograph,U.S. Departmentof Commerce,Bureauof
Symposium on Mathematical Statistics and Probability, Berkeley: Uni- the Census.
versityof CaliforniaPress,pp. 221-233. (1981),"TheBayesianBootstrap," TheAnnalsof Statistics,9, 130-
James,I. R. (1995),"ANote on the Analysisof CensoredRegressionData 134.
by MultipleImputation," Biometrics,51, 358-362. (1983), "ProgressReporton ProjectFor MultipleImputationof
Johnson,C. L., Curtin,L. R., Ezzati-Rice,T. M., Khare,M., andMurphy, 1980 Codes,"manuscriptdistributedto the U.S. Bureauof the Census,
R. S. (1993), "SingleandMultipleImputation: The NHANESPerspec- the U.S. NationalScienceFoundation,andthe Social ScienceResearch
tive,"paperpresentedat the AnnualMeetingof the AmericanStatistical Foundation.
Association,San Francisco. (1984), "BayesianlyJustifiableand RelevantFrequencyCalcula-
Kalton, G. (1983), Compensating for Missing Survey Data, Ann Arbor, tions for the AppliedStatistician,"TheAnnalsof Statistics,12, 1151-
MI:Instituteof Social Research,Universityof Michigan. 1172.
Kennickell,A. B. (1991), "Imputationof the 1989 Surveyof Consumer (1987), Multiple Imputationfor Nonresponse in Surveys, New York:
Finances:StochasticRelaxationand MultipleImputation," in Proceed- JohnWiley.
ings of the Survey Research Methods Section of the American Statistical (1988), "Usingthe SIR Algorithmto SimulatePosteriorDistribu-
Association, pp. 1-10. tions"(with discussion),in BayesianStatistics3, eds. J. M. Bernardo,
Kong, A., Liu, J., and Wong,W. H. (1994), "SequentialImputationand M. H. DeGroot,D. V. Lindley,andA. F. M. Smith,New York:Oxford
BayesianMissingData Problems,"Journal of the American Statistical UniversityPress,pp. 395-402.

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
488 Journal of the American Statistical Association, June 1996

-( (1990), "Imputation Procedures and Inferential Versus Evaluative Brownstone,D. (1991),"MultipleImputationsfor LinearRegressionMod-
Statistical Statements,"in Proceedings U.S. Census Bureau Sixth Annual els,"WorkingPaperMBS 91-37, Universityof California,Irvine,Insti-
Research Conference, pp. 676-679. tute for MathematicalBehavioralSciences.
(1993), "Satisfying Confidentiality Constraints Through the Use of Brownstone,D., andValletta,R. (1996),"ModelingEarningsMeasurement
Synthetic Multiply-Imputed Micro-Data," Journal of Official Statistics, Error:A Multiple Imputation Approach," The Review of Economics and
9, 461-468. Statistics, In press.
(1994), Comments on "Missing Data, Imputation, and the Boot- Chao,M. T. (1994),"AShortReviewof RecentSurveyMethodsfor Non-
strap" by B. Efron, Journal of the American Statistical Association, 89, response," Journal of the Chinese Statistical Association, 32, 169-177.
485-8. Chen,R., and Liu, J. S. (1994), "PredictiveUpdatingMethodsWithAp-
Rubin, D. B., and Schenker, N. (1986), "Multiple Imputation for Inter- plication to Bayesian Classification," Journal of The Royal Statistical
val Estimation From Simple Random Samples With Ignorable Nonre- Society, Ser. B, 58, 2.
sponse,"'Journal of the American Statistical Association, 81, 366-374. Clogg, C. C., Rubin,D. B., Schenker,N., Schultz,B., and Weidman,L.
(1987), "IntervalEstimation From Multiply Imputed Data: A Case (1991),"MultipleImputationof IndustryandOccupationCodesin Cen-
Study Using Agriculture Industry Codes," Journal of Official Statistics, sus Public-UseSamplesUsing BayesianLogisticRegression,"Journal
3, 375-387. of the American Statistical Association, 86, 68-78.
Schafer, J. L. (1996), Analysis of Incomplete Multivariate Data by Simu- Dorey,F. J., Little,R. J. A., and Schenker,N. (1993), "MultipleImputa-
lation, New York: Chapman and Hall. tion for Threshold-Crossing DataWithInterval-Censoring,"Statisticsin
Schafer, J. L., and Schenker, N. (1991), "Variance Estimation With Im- Medicine, 12, 1589-1603.
puted Means," in Proceedings of the Survey Research Methods Section, Ezzati-Rice,T. M., Khare,M., andSchafer,J. L. (1993),"MultipleImputa-
American Statistical Association, pp. 696-701. tion of MissingDatain NHANESIII,"paperpresentedat the American
Schafer, J. L., Khare,M., Little, R. J. A., and Rubin, D. B. (1993), "Multiple StatisticalAssociationAnnualMeeting,San Francisco.
Imputation of NHANES III," paper presented at the Annual Meeting of Fahimi,M., and Judkins,D. (1993), "SerialImputationof NHANESIII
the American Statistical Association, San Francisco. WithMixedRegressionandHot-DeckTechniques," paperpresentedat
Schenker, N. (1989), "The Use of Imputed Probabilities for Missing Binary the AmericanStatisticalAssociationAnnualMeeting,San Francisco.
Data," in Proceedings of the 5th Annual Research Conference, Bureau Freedman,V., and Wolf, D. A. (1991), "Imputation of Mother'sMarital
of the Census, pp. 133-139. Statusin NationalSurveyof FamiliesandHouseholds,"DiscussionPa-
Schenker, N., Treiman, D. J., and Weidman, L. (1993), "Analyses of Pub- per SeriesUI-PSC-8,The UrbanInstitute,Washington,DC.
lic Use Decennial Census Data With Multiply Imputed Industry and Glynn,R., Laird,N., andRubin,D. B. (1993),"ThePerformanceof Mix-
Occupation Codes," Applied Statistics, 42, 545-556. tureModelsfor NonignorableNonresponseWithFollowUps,"Journal
Tanner, M. A., and Wong, W. H. (1987), "The Calculation of Posterior of the American Statistical Association, 88, 984-993.
Distributions by Data-Augmentation" (with discussion), Journal of the Greenland,S., andFinkle,W.D. (1995),"ACriticalLookat BasicMethods
American Statistical Association, 82, 528-550. for HandlingMissing Covariatesin EpidemiologicRegressionAnaly-
Treiman, D. J., Bielby, W., and Cheng, M. (1989), "Evaluating a Multiple- ses," American Journal of Epidemiology, 142, 1255-1264.
Imputation Method for Recalibrating 1970 U.S. Census Detailed Indus- Journal of the American Statistical Association, 90, 54-63.
try Codes to the 1980 Standard,"Sociological Methodology, 18, 309- Heitjan,D. F. (1990),"Copingwith Age HeapingandDigit Preference:A
345. MultipleImputationApproach," unpublishedpaper,PennsylvaniaState
van Buuren, S., van Mulligen, E. M., and Brand, J. P. L. (1995), "Omgaan UniversityCollege of Medicine,Centerfor Biostatistics& Epidemiol-
Met Ontbrekende Gegevens in Statistische Databases: Multiple Impu- ogy.
tatie in HERMES," Kwantitatieve Methoden, 50, 503-504. Heitjan,D. F.,andLandis,J. R. (1994),"AssessingSecularTrendsin Blood
van Buuren, S., van Rijckevorsel, J. L. A., and Rubin, D. B. (1993), "Mul- Pressure:A Multiple-Imputation Approach,"Journalof the American
tiple Imputation by Splines," in Bulletin of the International Statistical Statistical Association, 89, 750-759.
Institute, Contributed Papers II, 503-504. Heitjan,D. F., and Little, R. J. A. (1991), "MultipleImputationfor the
Weld, L. (1987), "Significance Levels from Public Use Data With FatalAccidentReportingSystem,"AppliedStatistics,40, 13-29.
Multiply-Imputed Industry Codes," unpublished doctoral thesis, Harvard Johnson,C. L., Curtin,L. R., Ezzati-Rice,T. M., Khare,M., andMurphy,
University, Dept. of Statistics. R. S. (1993), "SingleandMultipleImputation: The NHANESPerspec-
Wolter, K. M. (1984), Introduction to Variance Estimation, New York: tive,"paperpresentedat the 1993 AmericanStatisticalAssociationAn-
Springer-Verlag. nualMeeting,San Francisco.
Johnson,E. G., and Zwick, R. (Eds.) (1988), Focusingthe New Design:
The NAEP 1988 Technical Report, Princeton, NJ: Educational Testing
BIBLIOGRAPHY:SOME OTHER WORK Service.
INVOLVINGMULTIPLEIMPUTATION Kalleberg,A. L., Marsden,P. V., Aldrich,H. E., andCassell,J. W. (1990),
"ComparingOrganizational SamplingFrames,"Administrative Science
Belin, T. R., Diffendal, G. J., Mack, S., Rubin, D. B., Schafer, J. L., and Quarterly, 35, 658-688.
Zaslavsky, A. M. (1993), "Hierarchical Logistic Regression Models for Kennickell,A. B. (1991),"Imputation
of the 1989Surveyof ConsumerFi-
Imputation of Unresolved Enumeration Status in Undercount Estima- nances:StochasticRelaxationandMultipleImputation," in Proceedings
tion" (with discussion), Journal of the American Statistical Association, of the Survey Research Methods Section, American Statistical Associa-
88, 1149-1166. tion,pp. 1-10.
Belin, T. R., and Rubin, D. B. (1990), "Calibration of Errors in Computer Land,K. C., and McCall,P. L. (1993), "Estimatingthe Effect of Nonre-
Matching for Census Undercount" (with discussion), in Proceedings of sponsein SampleSurveys:An Applicationof Rubin'sBayesianMethod
the Government Statistics Section, American Statistical Association, pp. to the Estimationof CommunityStandardsfor Obscenity." Sociological
124-131. Methods and Research, 21, 291-316.
Bloxum, B., Pashley, P. J., Nicewander, W. A., and Yan, D. (1995), "Link- Li, K. H. (1988),"HypothesisTestingin MultipleImputation,"unpublished
ing to a Large-Scale Assessment: An Empirical Evaluation,"Journal of doctoralthesis,Universityof Chicago.
Educational and Behavioral Statistics, 20, 1-26. Little, R. J. A., and Rubin,D. B. (1989), "The Analysis of Social Sci-
Boshuizen, H. C., Izaks, G. J., van Buuren, S., and Ligthart, G. J. (1995), ence Data
"Bloeddruk en Sterfte Bij Hoogbejaarden." TNO-rapport C95.014, With Missing Values," Sociological Methods and Research, 18, 292-
ISBN 90-6743-377-2. 326. Also in Modern Methods of Data Analysis (1990), eds. S. Fox and
J. S. Long,NewburyPark,CA: Sage Publications.
Brand, J., Gelsema, E. S., and van Buuren, S. (1995), "Verification of
Multiple Imputation by Simulation," submitted to SCAMC '95. (1993),"Assessmentof trialimputationsfor NHANESIII"project
Brand,J., vanBuuren,S., vanMulligen,E. M., Timmers,T., andGelsema, report,DatametricsResearch,Inc.
E. (1994), "MultipleImputationas a Missing Data Machine."in Pro- Liu, C. (1993), "Bartlett'sDecompositionof the PosteriorDistributionof
ceedings of the Eighteenth Annual Symposium on Computer Applica- the Covariancefor NormalMonotoneIgnorableMissingData,'Journal
tions in Medical Care (SCAMC),Philadelphia:Hanleyand Belfus, pp. of Multivariate Analysis, 46, 198-206.
303-307. (1994),"StatisticalAnalysisUsingthe Multivariatet Distribution'"

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions
Rubin: Multiple Imputation After 18+ Years 489

unpublisheddoctoralthesis,HarvardUniversity,Dept. of Statistics. Study Using Agriculture Industry Codes," Journal of Official Statistics,
(1995),"MissingDataImputationUsing the Multivariatet Distri- 3, 375-387.
bution,"Journal of Multivariate Analysis, 53, 139-158. (1991), "MultipleImputationin Health-CareData Bases: An
Liu,J. S. (1994),"Fractionof MissingInformationandConvergenceRate Overview and Some Applications," Statistics in Medicine, 10, 585-598.
of Data Augmentation," in Computationally Intensive Statistical Meth- Rubin,D. B., Stern,H., andVehovar,V. (1994),"Handling'Don'tKnow'
ods: Proceedings of the 26th Symposium Interface, eds. J. Sall and A. SurveyResponses:The Caseof the SlovenianPlebiscite,"Journalof the
Lehman,pp. 490-497. American Statistical Association, 90, 822-828.
Marsden,P. V., and Podolny,J. (1990), "DynamicAnalysisof Network Rubin,D. B., and Zaslavsky,A. (1989), "AnOverviewof Representing
Diffusion Processes," in Social Networks Through Time, eds. J. Weesie Misenumerations in the CensusUsingMultipleImputation," in Proceed-
andH. Flap,Utrecht,The Netherlands: ISOR. ings of the Bureau of the Census Fifth Annual Research Conference, pp.
Meng, X. L. (1990), "TowardsCompleteResults for Some Incomplete- 109-117.
DataProblems,"unpublisheddoctoralthesis,HarvardUniversity,Dept. Schafer,J. L. (1991), "A Comparisonof the Missing-DataTreatmentsin
of Statistics. the Post-Enumeration Program,"Journal of Official Statistics, 7, 475-
Meng, X. L., and Rubin, D. B. (1990), "LikelihoodRatio Tests with 498.
Multiply-Imputed Data," in Proceedings of the. Statistical Computing (1991),"Algorithms for MultipleImputationandPosteriorSimula-
Section, American Statistical Association, pp. 78-82. tion from IncompleteMultivariateData WithIgnorableNonresponse,"
Meng, X. L., and Tu, X. M. (1993), "CorrectingReportingDelays in unpublisheddoctoralthesis,HarvardUniversity,Dept.of Statistics.
SurveillanceData by MultipleImputationWith Applicationto AIDS Schafer,J. L., Khare,M., Little,R. J. A., andRubin,D. B. (1993),"Multiple
Surveillance," paperpresentedat the AmericanStatisticalAssociation Imputationof NHANESIII"paperpresentedat theAmericanStatistical
AnnualMeeting,San Francisco. AssociationAnnualMeeting,San Francisco.
Mislevy, R. J. (1991), "Randomization-Based InferencesAbout Latent Schenker,N., Treiman,D. J., and Weidman,L. (1988), "Evaluationof
VariablesFromComplexSamples,"Psychometrika, 56, 177-196. Multiply-Imputed Public-Use Tapes," in Proceedings of the Survey Re-
Mislevy, R. J., Beaton,A. E., Kaplan,B., and Sheehan,K. M. (1992), search Methods Section, American Statistical Association, pp. 85-92.
"EstimatingPopulationCharacteristics FromSparseMatrixSamplesof Schenker,N., Treiman,D. J., andWeidman,L. (1993),"Analysisof Public-
Item Responses," Journal of Educational Measurement, 29, 133-161. Use DataWithMultiply-Inputed Industryand OccupationCodes,"Ap-
Raghunathan,T. E. (1987), "LargeSample SignificanceLevels From plied Statistics, 42, 545-556.
Multiply-Imputed Data,"unpublisheddoctoralthesis, HarvardUniver- Schenker,N., andWelsh,A. H. (1988),"AsymptoticResultsfor"Multiple
sity, Dept. of Statistics. Imputation,"The Annals of Statistics, 16, 1550-1566.
Raghunathan, T. E., and Grizzle,J. E. (1995), "A Split QuestionSurvey Soldo,B. J., Wolf,D. A., andFreedman,V. A. (1990),"CoresidenceWith
Design," The Journal of the American Statistical Association, 90, 54-63. Older Mothers:The Children'sPerspective,"UrbanInstituteReport,
Raghunathan, T. E., and Siscovick,D. S. (1996), "AMultipleImputation Washington,DC.
Analysisof a Case-ControlStudyon the Riskof PrimaryCardiacArrest Stein,M. L., Shen,X., andStyer,P. E. (1993),"Applications of a Simple
Among Pharmacologically TreatedHypertensives," AppliedStatistics, Regression Model to Rain," Canadian Journal of Statistics, 21, 331-346.
45, 3. Taylor,J. M. G., Mufioz,A., Bass, S. M., Saah, A., Chmiel,J. S., and
Reilly,M. (1993), "DataAnalysisUsing Hot-DeckMultipleImputation" Kingsley,L. A. (1990), "Estimatingthe Distributionof Times From
The Statistician, 42, 307-313. HIV Seroconversionto AIDS Using MultipleImputation," Statisticsin
Relles,D. A., andStolzenberg,R. M. (1991),"AnAssessmentof the Con- Medicine, 9, 505-514.
sequencesof Sample CensoringBias in GraduateSchool Admission Treiman,D. J., Bielby,W., andCheng,M. (1989),"Evaluating a Multiple-
Test Validation,"in Proceedings of the Social Statistics Section, Ameri- ImputationMethodfor Recalibrating1970 U.S. CensusDetailedIndus-
can Statistical Association, pp. 101-110. try Codes to the 1980 Standard," SociologicalMethodology,18, 309-
Rubin,D. B. (1988), "MultipleImputationfor Data-BaseConstruction." 345.
COMSTAT88-Proceedings in Computational Statistics, eds. D. Ed- Tu, X. M., Meng,X. L., and Pagano,M. (1993a),"TheAIDS Epidemic:
wardsandN. E. Raun,Heidelberg:Physica-Verlag, pp. 389-400. EstimatingSurvivalAfter AIDS DiagnosisFrom SurveillanceData,"
(1988), "AnOverviewof MultipleImputation," in Proceedingsof Journal of the American Statistical Association, 88, 26-36.
the Survey Research Methods Section, American Statistical Association, (1993b),"SurvivalDifferencesandTrendsin PatientsWithAIDS
pp. 79-84. in the United States," Journal of Acquired Immune Deficiency Syn-
(1991), "EMandBeyond,"Psychometrika, 56, 241-254. dromes,6, 1150-1156.
(1992),Commenton "ClinicalTrialsin Psychiatry:ShouldProto- van Buuren,S., van Mulligen,E. M., and Brand,J. P. L. (1994), "Rou-
col DeviationsCensorPatientData?"by Lavori,Neuropsychopharma- tine MultipleImputationin StatisticalData Bases,"in Proceedingsof
cology, 6, 59-60. the Seventh International WorkingConference on Scientific and Statisti-
Rubin,D. B., and Schafer,J. L. (1990), "EfficientlyCreatingMultiple cal DatabaseManagement,eds. J. C. FrenchandH. Hinterberger, Los
Imputationsfor IncompleteMultivariateNormalData,"in Proceedings Alamitos,CA:IEEEComputerSocietyPress,pp. 74-78.
of the Statistical Computing Section, American Statistical Association, Weld, L. H. (1987), "SignificanceLevels From Public-UseData With
pp. 83-88. Multiply-Imputed IndustryCodes,"unpublished doctoralthesis,Harvard
(1988), "ImputationStrategiesfor Missing Values in the PES," University,Dept. of Statistics.
Survey Methodology, 14, 209-22 1. Williams,V. S. L., Billeaud,K., Davis,L. A., Thissen,D., andSanford,E.
Rubin,D. B., Schafer,J. L., andSchenker,N. (1988),"Imputation
Strate- (1995),"Projecting to the NAEPScale:ResultsFromthe NorthCarolina
gies for Estimating the Undercount" in Bureau of the Census Fourth End of GradeTestingProgram,"researchreport,NationalInstituteof
Annual Research Conference, pp. 151-159. StatisticalSciences.
Rubin,D. B., and Schenker,N. (1986), "MultipleImputationfor Inter- Zaslavsky,A. M. (1989), "Representing CensusUndercount:A Compar-
val EstimationFromSimpleRandomSamplesWith IgnorableNonre- ison of Reweightingand MultipleImputationMethods,"unpublished
sponse," Journal of the American Statistical Association, 81, 366-374. doctoralthesis,MassachusettsInstituteof Technology,Dept.of Mathe-
Data:A Case
(1987),"IntervalEstimationfromMultiply-Imputed matics.

This content downloaded from 185.44.78.156 on Mon, 16 Jun 2014 10:45:11 AM


All use subject to JSTOR Terms and Conditions

You might also like