Rubin - Multiple imputation after 18+ years
Rubin - Multiple imputation after 18+ years
Donald B. RUBIN
Multiple imputation was designed to handle the problem of missing data in public-use data bases where the data-base constructor
and the ultimate user are distinct entities. The objective is valid frequency inference for ultimate users who in general have access
only to complete-data software and possess limited knowledge of specific reasons and models for nonresponse. For this situation
and objective, I believe that multiple imputation by the data-base constructor is the method of choice. This article first provides a
description of the assumed context and objectives, and second, reviews the multiple imputation framework and its standardresults.
These preliminary discussions are especially important because some recent commentaries on multiple imputation have reflected
either misunderstandings of the practical objectives of multiple imputation or misunderstandings of fundamental theoretical results.
Then, criticisms of multiple imputation are considered, and, finally, comparisons are made to alternative strategies.
KEY WORDS: Confidence validity; Missing data; Nonresponse in surveys; Public-use files; Sample surveys; Superefficient
procedures.
473
Essentially all public-use data sets have missing values, sample mean. These latter quantities can be important for
typically not of any nice neat type. In general, ultimate users inference and design, but they are not scientific estimands
have neither the knowledge nor the tools to address miss- in my definition because they are functions of sample size,
ing data problems satisfactorily. Even if some ultimate users sample design, response rates for a particular survey, meth-
do have adequate resources for modeling and computation, ods for handling missing values, and scientific estimands
data-base constructors typically know more about reasons such as population means and variances.
for nonresponse and have better access to confidential and The distinction between estimates of scientific estimands
detailed information not released for public use (e.g., ex- and measures of their uncertainty is an old one in statis-
act addresses and neighborhood relationships, hourly blood tics; see, for example, Fisher (1925, p. 724) where a mea-
pressure readings and doctor indicators), information that sure of uncertainty associated with an estimate is called an
can be useful for modeling missing data. Moreover, ulti- "ancillary" statistic, that is, a subordinate or supplemental
mate users should be focused on their substantive scientific statistic. In Fisher's context, the estimate was the maximum
analyses and for these, missing data are generally simply likelihood estimate and the ancillary statistic was the sec-
a nuisance. My conclusion is that "correctly" modeling the ond derivative of the log-likelihood, but the distinction is
missing data must be, in general, the data constructor's re- relevant to more general estimates and associated measures
sponsibility. of uncertainty, as we see in the next section.
We, that is, data-base constructors and statistical software
designers, have no direct control over what ultimate users 1.4 What is Meant by Statistically Valid?
will do with their arsenal of tools. We cannot stop users
from doing bad science, but if possible we should facilitate In the context of shared data bases supporting analyses by
their ability to do good science with their available tools, many users, I believe that statistically valid must be a fre-
even when data sets suffer from missing values. quency concept, averaging over randomization distributions
generated by known sampling mechanisms (used to collect
data) and posited distributions for the response mechanisms
1.2 Achievable Basic Objective (the processes underlying nonresponse). In standard scien-
One achievable basic objective in such a setting is the tific surveys, the sampling mechanism is known but the non-
following: Each tool in the ultimate users' existing arsenals response mechanisms is rarely fully known and so typically
can be applied to any data set with missing values using the must be posited, either implicitly or explicitly.
same command structure and output standards as if there Bayesian validity is also important, but is far more diffi-
were no missing data. The only additional software that is cult to achieve in this context because it requires far more
allowed to be required comprises entirely general macros compatibility between the data-base constructor and the an-
that can be applied to any complete-data analysis and in- alyst. In fact, in general I do not believe it can be achieved
complete data set. Certain ad hoc methods of handling miss- in any real sense in the context of the basic objective to use
ing data, such as "complete-case analysis," "available-case existing complete-data tools with shared data bases. In any
analysis," and "fill-in with means" (e.g., see Little and Ru- case, no Bayesian should object to achieving frequentist va-
bin 1987, part I), satisfy this basic objective and so have a lidity; effectively, Bayesians want and promise much more:
certain appeal. The problem with such methods is that they calibration conditional on the data in addition to uncondi-
typically yield statistically invalid answers for scientific es- tional calibration (e.g., in Rubin 1984, I call such frequency
timands; "scientific estimands" and "statistically valid" re- calculations "Bayesianly relevant and justifiable").
quire definition. First and foremost, for statistical validity for scientific
estimands, point estimation must be approximately unbi-
ased for the scientific estimands, averaging over the sam-
1.3 Scientific Estimands pling and the posited nonresponse mechanisms (e.g., filling
By a scientific estimand I mean a quantity of scientific in zeros or means is not generally acceptable). Second, in-
interest that can be calculated in the population and does terval estimation and hypothesis testing must be valid in
not change its value depending on the data collection de- the sense that nominal levels describe operating character-
sign used to measure it (i.e., it does not vary with sample istics over sampling and posited response mechanisms. Two
size and survey design, or the number of nonrespondents, versions of such frequentist validity for nominal levels are
or follow-up efforts). Letting X be the array of all back- especially important to distinguish when assessing multiple
ground (e.g., stratification) information fully observed in a imputation.
population and Y be the arrayof outcome information in the Using terminology from Rubin (1987, pp. 117-118),
population that is to be sampled in the survey, a scientific "randomizationvalidity" means that, for interval estimates,
estimand is a function of X and Y, say Q = Q(X, Y). Sci- "actual interval coverage = nominal interval coverage," and
entific estimands include population means, variances, cor- for tests of hypotheses, "actual rejection rate = nominal re-
relations, factor loadings, regression coefficients, and these jection rate."Randomization validity is the naturalobjective
quantities within strata or domains, but exclude the sam- in most survey contexts. In standard asymptotic situations,
pling variance of a sample mean under a particularsampling a complete-data estimate Q of an estimand Q has a normal
plan and the expectation of the complete-data sample mean sampling distribution centered at Q with sampling variance
when missing values are filled in with zero or the observed (or more generally, variance-covariance) consistently esti-
mated by the statistic U, where the randomization distribu- ter confidence-valid procedure exists (i.e., one with shorter
tion is that generated by the sampling indicator I given fixed intervals), which is also randomization valid, but in gen-
(X, Y)-the sampling mechanism. In this case we have eral this is not achievable. An attendant advantage, when
the best confidence interval is randomization valid, is that
E(QJX1 Y) =Q(1) the associated measure of precision can be thought of as
and a true rather than approximate weight (again, in the sense
of Fisher 1925, p. 724- also see Fisher 1934, criticizing
E(UIX,Y) var(QIX,Y), (1.2) Neyman 1934, on this point).
and then randomization validity is not only desirable but
theoretically achievable. The precision of Q is measured 1.5 Supplemental Objective Concerning
by U-1, which plays the role of the ancillary statistic and Statistical Validity
can be used as a "true weight" (Fisher 1925, p. 724) for We are now prepared to supplement the Achievable Ba-
combining estimates. sic Objective when faced with missing values, regarding
A more generally achievable objective, however, is "con- the ability to apply standard complete-data statistical tools,
fidence validity," meaning that for interval estimates, "ac- with an objective concerning statistically valid inference for
tual interval coverage > nominal interval coverage," and for scientific estimands. It is easy to ask for more than is possi-
tests of hypotheses, "actual rejection rate < nominal rejec- ble and then do something misguided when attempting the
tion rate." For confidence validity with complete data, we impossible. We first consider a hopeless objective, which is
replace (1.2) with commonly sought, and then state an achievable one.
E(UIX,Y) > var(Q X,Y). (1.3) Hopeless Supplemental Objective. Each complete-data
statistical tool can be applied to each incomplete data set to
If (1.3) is satisfied but (1.2) is not, then U1 is only an obtain the same inference as if the data set had no missing
"approximate weight for the value of the estimate" (Fisher values.
1925, p. 724).
The distinction between randomization validity and con- This objective is clearly impossible because of the lost in-
fidence validity can be quite important when dealing with formation, but nevertheless, it guides some thinking about
approximate procedures, which necessarily arise with non- how to handle missing data. It is analogous to saying that
response in public-use surveys, and this distinction appears the objective of a survey is to obtain the same answer as a
in Neyman (1934), which is the foundation for statisticians' complete census, and it can lead to an "operationsresearch"
current view of frequentist validity in surveys. Here Ney- objective of creating imputations for missing values that are
man (1934, pp. 562-563) defined confidence intervals, con- as close as possible to the truth (i.e., fill in missing values to
fidence coefficients, and confidence limits, and these defini- minimize some objective function). Our actual objective is
tions remain the accepted mathematical definitions of these valid statistical inference not optimal point prediction un-
terms (e.g., Lehmann 1959). In particular,confidence limits der some loss function, and replacing the former with the
are statistics defining an interval such that, in repeated ex- latter can lead one badly astray. For example, suppose we
perience, the estimand lies in the confidence interval with have a coin that, in truth, is biased .6 heads and .4 tails.
probability greater than or equal to the confidence coeffi- This known truth is model A, whereas model B asserts that
cient; the shorter the interval satisfying this constraint, the the coin has two heads. Using model A for creating im-
better. putations (i.e., future predictions) yields a hit rate (agree-
A simple example illustrates the wisdom implicit in Ney- ments between predictions and outcomes) of .6 x .6+ .4 x .4
man's definition. Consider a particular situation with two .52, whereas using model B for predictions yields a hit
different confidence-valid procedures for creating confi- rate of .6. This does not mean that model B is better than
dence intervals with confidence coefficient 95%. Procedure model A for handling missing values. Filling in missing
1 produces intervals that are always shorter than the inter- values using model B yields the invalid statistical inference
vals produced by Procedure 2, and moreover, Procedure 1 that in the future all coin tosses will be heads, clearly in-
has actually probability > 95% of covering the estimand, consistent for the estimand Q = fraction of tosses that are
whereas Procedure 2 has only the nominal 95% probabil- heads, whereas using model A yields consistent estimates
ity of covering the estimand. Clearly, Procedure 1 is sci- for all such scientific estimands. The lesson is simple: Judg-
entifically and statistically superior to Procedure 2 because ing the quality of missing data procedures by their ability to
it provides tighter inferences with greater confidence, and recreate the individual missing values (according to hit-rate,
Neyman's definition and desiderata agree with this fact. Re- mean square error, etc.) does not lead to choosing proce-
quiring exact agreement between nominal and actual levels dures that result in valid inference, which is our objective.
as a desideratum for validity would lead one to reject Proce- Statistical validity in our context is difficult because the
dure 1 as invalid and choose Procedure 2, clearly a mistake. answer that results from applying a complete-data analy-
It is for this reason that confidence validity is more funda- sis to an incomplete data set is generally invalid unless the
mental than randomization validity for interval estimation. complete-data analysis in the absence of missing data is
Of course, if we have a procedure that is confidence valid valid-the ultimate user's responsibility, and the reasons
but not randomization valid, there is the hope that a bet- for missing data are correctly modelled-the data-base con-
structor'sresponsibility.We can essentiallynever be sure sociated variance-covariance matrices {U*1, I... U*m}, and p
that the data-baseconstructor'smodel is appropriate,but values, that is, the final repeated-imputation inferences, are
assumingit is, and assumingthat the ultimateuser is per- derived in chapter 3 in Rubin (1987) under the Bayesian
forming an analysis that would be valid if there were no paradigm for survey inference (introduced in chap. 2 of
missing data,we can expect that the ultimateuser will ob- Rubin 1987), assuming that the multiple imputations are
tain a valid inference. repeated imputations.
The key Bayesian motivation for multiple imputation is
Achievable Supplemental Objective. Assuming that the given by result 3.1 in Rubin (1987). Ignoring both tech-
ultimateuser's complete-dataanalysisis statisticallyvalid nical details and indicator variables for sampling and re-
for a scientificestimand,the answerthat results from ap- sponse, the results and its consequences can be easily stated
plying the same analysismethodto an incomplete-dataset using the simplified notation that the complete-data are
remainsstatisticallyvalid for the same scientificestimand Y = (Yobs, Ymis), where Yobs is observed and Ymis is miss-
assumingthe truth of the data-baseconstructor'sposited ing. Specifically, the basic result is
model for missingdata.
I doubt that there is a much strongerobjectiveregard- P(Q|Yobs) J P(Q|Yobs, Ymis)P(Ymis Yobs) dYmis,
ing validitythat we can achieve in this context where the
ultimateuser andthe data-baseconstructorare distincten- or in words,
tities. Multiple imputationwas designed to satisfy both (
actual posterior
achievableobjectivesby using the Bayesianandfrequentist
distribution of Q
paradigmsin complementaryways: the Bayesian model-
based approachto create procedures,and the frequentist -AV E complete-data posterior
(randomization-based approach)to evaluateprocedures. distribution of Q
2. REVIEWOF MULTIPLEIMPUTATION
where AVE[ ] refers to the average over the repeated im-
FRAMEWORKAND RESULTS putations, which are draws from p(Ymis Yobs), which is the
posterior predictive distribution of missing data given the
Multiple imputationsfor the set of missing values are observed data. Two simple consequences follow (Rubin
multiple sets of plausiblevalues; these can reflect uncer- 1987, result 3.2). The first concerns the final estimate of
tainty under one model for nonresponseand across sev- Q
eral models. Each set of imputationsis used to create a
completeddata set, each of which is to be analyzedusing E (Q IYobs) = E [E (Q IYobs v Ymis) IYobs],
of Qmis
andthe associatedvariance-covariance 2.4 Proper Multiple Imputation
sign weights(or estimatedpropensityscoresof being in the that lack of model fit goes into residualvariance,which in
sample)should be includedin imputationmodels. Ezzati- a Bayesianmodel inflatesthe between-imputation variance
Rice, Johnson,Khare,Little, Rubin,and Schafer(1995) il- of draws (e.g., of regressioncoefficients),therebyleading
lustratessuch effortsandthe resultingvalid inferences. to a large enough Bm to compensatefor an omitted co-
Since with public-usedatasets it is alwaysunclearwhat efficient.Of course, this is an observationbased on some
analysesthe ultimateuserswill conduct,the rangeof statis- experience,not a theorem,but a relatedtheoreticalresult
tics (Q, U) thatmightbe used involvesessentiallyany vari- (Meng 1994, lem. 2) lends supportto this observation.
able or combinationof variablesavailablein the data set, Nevertheless,becauseproblemscan occur when the im-
at least up to some level of interactions.Thus, the danger puter'smodel leaves out importantpredictorvariables,the
with an imputer'smodel is generallyin leaving out pre- data-baseconstructormustincludea descriptionof the im-
dictorsratherthanincludingtoo many,and the advice has putationmodelwith the multiply-imputed database, so that
alwaysbeen to includeas manyvariablesas possible when ultimateusers know which relationshipsamong variables
doing multipleimputation.The press to include all possi- have been implicitlyset to zero.
bly relevantpredictorsis demandingin practice,but it is
generallya worthygoal. For example,in the originalpre- 3. CURRENT ISSUES CONCERNING
scriptionfor the industryand occupationrecodingproject MULTIPLEIMPUTATION
(Rubin1983), thousandsof logistic regressionswere done, Thereappearto be two distinctkinds of concernsabout
each with nearly20 variables,andsome with far fewerthan multipleimputation.The firsttype focuses on its implemen-
20 observations(e.g., 4), in orderto preservethis theme of tation:operationaldifficultiesfor the data-baseconstructor
trying to include all variablesthat might be used to de- and the ultimateuser, as well as the acceptabilityof an-
fine statisticsQ or U; this effortrequiredthe development swers obtainedpartiallythroughthe use of simulation.The
of specializedbut computationallyefficientBayesianlogis- second type concernsthe frequentistvalidity of repeated-
tic regressionproceduresfor sparse data (Clogg, Rubin, imputationinferences when the multiple imputationsare
Schenker,Schultz, and Weidman1991). The possible lost not proper,but appear"reasonable" in some sense.
precisionwhen includingunimportantpredictorsis usually
viewed as a relatively small price to pay for the general 3.1 Is Multiple Imputation Unprincipled or Unacceptable
validity of analysesof the resultantmultiply-imputeddata Because it Uses Simulation?
base.
An early criticism,not much heard anymorebut wor-
2.7 Some Experience With Useful But Improper
thy of response,is that multipleimputationis theoretically
Multiple Imputation
unsatisfactoryandpracticallyunacceptablebecauseit adds
In some cases, impropermultiple imputationscan still randomnoise to the data.In this context,it is criticalto re-
lead to confidence-validrepeated-imputationinferences. memberthatmultipleimputationdoes not pretendto create
This issue will be discussedin more detailin Sections3.5- informationthroughsimulatedvalues but simply to repre-
3.8 in reply to a recent criticism of multipleimputation, sent the observedinformationthis way so as to make it
but the issue has been previouslyconsidered.Rubin and amenableto valid analysisusing complete-datatools. The
Schenker(1987, sec. 7) explicitlyconsiderthe situationin extranoise createdwhen using a finite numberof imputa-
the early industryand occupationexamplewhere some in- tions is the price to be paid for this luxury.
formationused by the imputer(the originaldouble-coded In responseto this criticism,first appreciatethat simula-
sample) is not availableto the data analyst, and demon- tion methodsarebecomingmoreandmorecommonandac-
stratethe resultingpotentialconservativecoverage.Also, ceptedin statistics.Considerjackknifeandbootstrapmeth-
the evaluationsof the results of this projectinclude cases ods for complete-datafrequentistinference (e.g., Miller
where the data analystuses variablesnot used by the im- 1974; Efron and Tibsharani1993), or data augmentation
puter and, for this data set and practicalanalyses,find no (TannerandWong 1987), the Gibbs sampler(e.g., Gelfand
deleteriousconsequences(Schenker,Treiman,and Weid- and Smith 1990; Gelmanand Rubin 1992), and sampling
man 1993;Treiman,Bielby, and Cheng 1989;Weld 1987). importanceresamplingmethods(Rubin1983, 1987, 1988;
Carefuland extensiveevaluationsof this generalsituation, GelfandandSmith 1992)for complete-dataBayesianinfer-
involving variablesomitted by the imputer,are also in- ence. These methodshave now become acceptedcomplete-
cludedin work conductedat ETS in the contextof NAEP, data tools worthy of theoreticalinvestigationand routine
whichfor a decadehas createdmultiply-imputed public-use practicalapplication.
databases (e.g., Mislevy,Johnson,andMuraki1992). Second,multipleimputationhas a distinctadvantageover
Substantialempiricalwork,some given in the Appendix, such methodsin principle,because with multipleimputa-
supportsthe conclusionthat,even if mildly importantpre- tion, the simulationis only being used to handlethe miss-
dictorsare left out of the multipleimputationscheme,the ing information,with reliancefor handlingthe rest of the
repeated-imputation inferences are confidence-valid:with informationleft to the complete-datamethod,be it analytic
fractionsof missinginformationtypicalin carefulsurveys, or simulation-based.Thus, the acceptablenumberof im-
m =3 or 5 works very well, with the complete-datapro- putationscan be muchless than the acceptablenumberof
cedurefor smallrntypicallybreakingdownbeforemultiple simulationsfor a complete-datainference,at least assum-
imputationdoes. A heuristicreason for this robustnessis ing that the fractionof missing information,ty, is modest
Table 1. Approximate Factor for Inflating Normal Standard Errors in estimands,such as means,variances,correlations,and are
(2.5) to Reflect Finite Number of Imputations, m: v /(v -2% thereforenot appropriatefor public-usedatabases.
Where v = (m - 1)[1 + (1 + m- 1)Bm/ &m]2
3.5 Can Repeated Imputations Under An Appropriate cient estimateof the populationmean.The situationinvolv-
Bayesian Model Lead to Invalid Inferences? ing superefficientimputationsis moresubtle,however.Sup-
Fay (1991, 1992, 1993; also see Kott 1992) claims that pose that we have a multiply-imputeddata set, but subse-
even when the model used to createrepeatedimputations quentlythe datacollectorbringsforthvaluesof the missing
is "appropriate" in some sense, the resultingrepeatedin- data,therebyallowingus to calculateQ andU. Presumably,
ferencescan be invalid.I believe that this criticismis mis- we would then be inclinedto base our inferencesfor Q on
guided for a varietyof reasons,many of which have been (Q, U) and discardthe imputations.If the imputationsare
exposedin the work and discussionof Meng (1994). Nev- superefficient,however,the standardcomplete-dataproce-
ertheless,I will also brieflyaddressthe issue here because durecan be improvedby using informationin the imputa-
it has receivedattention,andbecauseI believe my results, tions about Q beyond that in Q, informationsuppliedby
althoughless extensive and detailed than those of Meng the imputer(e.g., in the canonicalexample,the knowledge
(1994), will be moretransparentto manyreaders. thatX = a units andX = b unitshave the samepopulation
The kernelof this criticismariseswhen an irrelevantpre- distributionof Y). The imputationsare "stronglysuperef-
ficient"if Q<>is at least as good an estimateas Q despite
dictorX of outcomeY is not usedby the Bayesianmultiple
imputerto createrepeatedimputations,but is used by the the existence of missing data,that is, despite the fact that
Q, is not identicalto Q in the formalsense that
ultimateanalystto define estimands(a case alreadyintro-
duced here in Sec. 2.7 because of historicaldiscussionof var(Q,,, - QjX, Y) > O, (3.1)
it). Morespecifically,supposeX is dichotomous,(a, b), and
Y is normal(0, 1) and independentof X in a population where with vector Q, ">" means at least one eigenvalue
in which X = a units and X = b units are equally rep- >0.
resented. Suppose a stratifiedrandomsample of size 2n More precisely, a multiple imputation procedure is
is taken where there are n units with X = a and n units stronglysuperefficientfor the complete-datastatisticQ if,
with X = b, and furthersupposethat nonresponseis sim- first, Q>, and Q estimatethe same estimand,that is, the
ply like anotherlevel of stratifiedrandomsamplingthat procedureis "first-moment proper"for Q,
results in nr respondentsand no nonrespondentsin both E(QO jX,Y) E(QIX, Y), (3.2)
the X = a sample and in the X = b sample. The esti-
mands are: Y = (Ya+ Yb)/2, the populationmean value and second, QOchas no largervariancethanthe complete-
of Y; and D = (Ya - Yb), the population difference of dataestimateitself:
means,which equalszero. The obviouscomplete-dataesti-
matorsare y = (Ya+ Yb)/2 for Y and d = (Ya - Yb) for D, var(Q,,IX, Y) < var(QIX,Y), (3.3)
with associatedstandardcomplete-datavarianceestimates wherewith vectorQ, (3.3) comparesthe generalizedeigen-
UgandUj, respectively,whichresultin randomization-valid values of the left side with respect to the right side. In
complete-datainferences,at least for large n. the canonicalexampleof Section 3.5, the imputationsare
Now suppose repeatedimputationsfor the 2no nonre- stronglysuperefficientfor Q = d because Q,o = dDosatis-
spondentsare generatedusing a fully exchangeablenor- fies both (3.2) and (3.3).
mal model based on the 2n1 respondents.That is, the im- The generaldefinitionof superefficiencyconcernsthe ex-
putationsfor both the X a and X = b units will be istence of imputationsthat make QCO informativeaboutQ
centered at the observed grand mean Yobs ratherthan at even with knowledge of Q. Bayesian models can be su-
the separateobservedsamplemeans -obs,a and Yobs,b. It is perefficientwhen they incorporateappropriatesmoothing
easy to show thatthe multipleimputationmethodis proper informationin their distributionalassumptions.The resul-
for (-, Up), but it is improper for (d, Uj): (1), the expec- tant draws of Ymiscannotbe sharperthan those from the
tation of doo = (nT1/n)(Yobs,a - Yobs,b) over the response parentdistributionand still lead to valid inferencesfor a
mechanism,that is given (X, Y,I), does not equal d, but varietyof estimands,but multipleimputationsof Ymiscan
(ni/n)d, therebynot satisfying (2.6) [nor (4.2.5) in Rubin be moreefficientthanthe one truevalueof Ymisbecauseof
19871;and(2) BOO, the varianceof the repeatedvaluesof d*1 their multiplicity.For instance,in the canonicalexample,
across repeatedimputationswith fixed no, is greaterthan suppose that the multiple imputationproceduredrew the
the varianceof dc, over the response mechanismby the groupdifferenceeffect from a normaldistributioncentered
factorn/n1, therebynot satisfying(2.8) [nor(4.2.6)-(4.2.7) at I (Pobs,a -Yobs,b) ratherthan at Yobs,a - Yobs,b (as when
in Rubin 19871. this effect is directly estimatedfrom the data) or at zero
(as with the stronglysuperefficientimputationsof sec. 3.5).
3.6 Superefficient Imputations These imputationswouldeffectivelybe additionaldataval-
In this example,the imputationsare "superefficient" from ues, which could contributeto a betterestimateof D, even
the perspectiveof the data analystinterestedin estimating if the actualmissingvalueswere found.The generaldefini-
D becausethe imputationsuse "extra"information,specifi- tion of superefficientimputationsfor Q replaces(3.3) with
cally the knowledgethatthe distributionof Y given X = a cov(QOO,QIX, Y) < var(QjX,Y); (3.4)
is identicalto the distributionof Y given X = b. For a mtore
familiarexampleof superefficiency,if the data are normal strongsuperefficiencyimplies supereffciencybecause(3.1)
with mean zero, then half the samplemean is a supereffi- and (3.3) imply (3.4).
cause the procedureis confidencevalid but not random- wider along the dooaxis. Using Fay's ratioof samplingco-
izationvalid,inefficienciescan arisewhen combiningvari- variancesof Yoo,a and Yoo,b is equivalentto describingthe
ous estimatesusing the assignedprecisionsas weights.But differencebetweenthese two ellipsoidsby the ratio of dif-
findinga randomization-valid procedurein generalrequires ferences of variances(i.e., of eigenvalues)of 2-00 and doo
extraworkbeyondthe use of standardcomplete-datameth- in the two ellipsoids.The ratio of eigenvalues,or of vari-
ods, andis generallyimpossiblefor the ultimateuserunless ancesin any direction,is relevantto inference,but the ratio
extrainformationis conveyedby the data-baseconstructor. of differencesbetween eigenvalues,Fay's measure,is by
Furthermore,this whole issue seems relativelyunlikelyto itself, irrelevant.
arisein practicebecauseknowledgeof populationparame-
ters by the data-baseconstructormust be unusual.
4. COMPETING METHODS
3.9 Reaching Correct Conclusions When Evaluating If multiple imputationsare proper (confidenceproper)
Multiple Imputation under the posited model for nonresponse, then using
Severalpointsarecriticalin reachingcorrectconclusions the repeated-imputation rules for combiningcomplete-data
concerningmultipleimputation. statistics (Q, U) yields a randomization-valid(confidence-
First,when evaluatingrepeated-imputation inferencesby valid) final inference under the posited response mecha-
analysis or simulation,we need to monitor whether the nism, assumingthat the complete-datainferencewas valid
complete-datainferencewith no missingdatais valid:mul- in the absence of nonresponse.And this holds no matter
tiple imputationfor missing datacannotfix problemswith how complex the surveydesign. Moreover,the combining
complete-dataanalyses (e.g., poor coverage propertiesof rules can be implementedusing completely general soft-
the normalapproximationfor the sample mean with rare ware that is the same for all data sets and all complete-
binomial trials, where, for example, logit transformscan data analyses.Thus multipleimputationand the repeated-
lead to moreaccuratecomplete-datainferences);Rubinand imputationcombiningrules satisfy both the basic objective
Schenker(1986) and Ezzati-Riceet al. (1995) provideex- and the supplementalachievableobjective.
amplesof suchevaluations.Also note whenevaluatingthese Are therecompetingmethodsthat,in some cases at least,
procedureswith the numberof respondentsfixed (e.g., as also satisfy these objectives?Yes, but such competitorsap-
in sec. 4.3 andprob.4-18, in Rubin,1987)thatthe resultant pearto me in generalto have substantiallygreaterdeficien-
answersare conditionalon these quantities,which in prac- cies for the intendedsituationwith ultimateusers distinct
tice arerandom.Moreover,whendoingevaluationstreating entities from databaseconstructors.These competitorsare
the numberof respondentsas random,the theoreticalvari- single imputation,multipleimputationwith some analysis
ances of unbiasedestimatorscan be undefined,since, for for the ultimateuser otherthanthe repeated-imputation in-
any finite sample size, with positive probability,all units ference,and weightingmethods.
will be nonrespondents; in such cases, it makesmore sense
to reportcoveragepropertiesof intervalestimates,which 4.1 Desiderata for Creating -Imputations,
are defined(no respondentsimplies zero coverage)andthe Single or Multiple
objectsof statisticalinferenceanyway.
Also importantin reaching correct conclusions about If imputationsareto be used,thenthe estimatewill be the
multipleimputationis the treatmentof estimatedsampling value of Q calculatedon the imputeddata, or the average
variancesas ancillarystatisticsratherthan as estimatesof of multiple values Q*1,1 1 1,2,.... In broad generality,
scientific estimands.For example,Fay (1992) treatedthe consistentestimationrequiresthat the imputationmethod
ratioof repeated-sampling covariancesas an estimand,and must be first-momentproper,in the sense of (3.2), for a
therebywas led to misunderstand the effectof superefficient variety of statistics Q, for example Q = sample mean,
imputationon inference.This illustrateswhy it is important sample variance,median,25th percentile,factor loadings,
not to confusescientificestimandsandancillaries.In partic- andthesequantitieswithinstrata,domains,subdomains,etc.
ular,Fay (1992, sec. 3) states,in the contextof the canonical For this to hold for each Q in such a range,the imputation
exampleof Section 3.5: method,single or multiple,must in generalnot only track
the positedresponsemechanismbut also mustbe a random
. the design-based approach gives 19 times the covariance of multiple drawmethod;otherwise,it cannotbe first-momentproper
imputation ... such a limitation, if general, imposes severe restrictions on
the validity of the multiple imputation inferences for complex applications,
for Q - y, Q s2, Q 25th percentile,etc.
such as Clogg et al. (1991). Consequently,any imputationmethod that satisfies the
validity objective in generalitymust not only reflect the
Considerthe true samplingvariance-covarianceellipsoid underlyingresponsemechanismbut must also be a random
for (y-O,doo) under the exchangeablenormal repeated- drawmethod.Nonrandomdrawmethodscan be appliedin
imputationschemeandthe samplingellipsoidfor (yO0doo) special cases but requirespecial analysis techniques.The
assigned to it by the repeated-imputation inference;both most carefulworkon this topic of deterministicimputation
have zero correlationbecause Yoo,a = (2yo ? doo) and of which I am aware concernsimputingprobabilitiesfor
Yoo,b =(2WOo- do) have equal variance.The repeated- missingdichotomousvariables(Schenker1989;Schaferand
imputation-assignedellipsoidis outsidebecauseit touches Schenker1991), andthis work revealsthe substantialextra
the correctone at the two points along the yO axis but is effortthat is needed,even in a special situation.
which is not even first-momentproper,take bootstrapor discardedthis idea in Rubin(1977b;also see Rubin 1987,
jackknifereplicatesand assumeinferentialvaliditydespite prob. 4-29, and the rejoinderin Meng 1994), because it
badly biased estimates of scientific estimands (see, e.g., seemedto havemeritas a methodof analysisonly in simple
Rubin1994, andEfron 1994,for differingviews concerning cases (see, e.g., Little 1979).For valid analysisin general,I
the acceptabilityof such answers). believethatsuchan approachrequiresextraroutinesfor dif-
ferentcomplete-dataanalyses,andso fails to satisfythe ba-
4.4 Other Imputation-Based Procedures sic objective.As a methodfor storingthe multiply-imputed
data sets, it can take substantiallymore memorythan the
Rao andShao(1992)providea carefulanalysisof how to standardform becauseall the observeddatafor units with
use thejackknifeto adjustanalyseswhenmissingdatahave some missing data are stored many times instead of just
been singly imputedby a particularhot-deck procedure. once. Nevertheless,I wouldcertainlybe interestedin seeing
This addressesan importantproblem because in current any workthatsuggestsI rejectedthis idea prematurely,and
practicemanypublic-usefiles have been singly imputedby thatin fact, it canbe madeto workfor anypositedresponse
the hot deck.But the ultimateuserbearsthe burdenof sub- mechanism,complex survey, and complete-dataanalysis,
stantialextrawork,because "specialcomputationshave to with only the additionof completelygeneralmacros.
be performedto adjustthe imputedvaluesfor eachpseudo-
replicatebefore applyingthe standardjackknife variance
formula"(RaoandShao 1992, p. 813), andnew mathemat-
ical analysis and new softwareapparentlymust be devel- 4.5 Conclusions Regarding Alternative
oped for each new distinct situation(estimatorx missing Imputation Strategies
datapatternx surveydesign x imputationmethod).Conse- Given a situationwith a single imputationmethodthat
quently,this approach,at least at present,fails to satisfythe is first-momentproperfor many statistics,it is almostcer-
basic objectiveof relying only on complete-dataanalyses
tainly a random-drawmethod, and then multiple imputa-
and generalroutines.
tions are easily created,and these are the basis of more
Fay's work is somethingof a moving target,with a va-
accurateinference.Thenthe only reasonnot to createthem
riety of older and newer suggestions,which are described
and recommendto the ultimate user that the multiply-
with little generalityand under special assumptions(e.g.,
imputeddata be analyzedusing repeated-imputation com-
missing completely at random).For example,Fay (1996)
seems now to acceptmultipleimputationas being superior bining rules is fear that the imputationmethod,although
to single imputation(andperhapsto standardweightingad- first-momentproper,is not fully properfor some analy-
justments)but advocatescreating"improper"multipleim- ses. If it is not properbut is confidenceproper,the only
putationsand recommendsanalysisby weightingthe data legitimatefear is lost power and overcoverage,as due to
from the completedunits in one analysis ratherthan us- superefficiency. But then anothermethodis neededfor the
ing the repeated-imputed inference.Recommendingcreat- ultimateuserto recoversuchsuperefficiency-Ibelievespe-
ing "improper" multipleimputationsis suggestingwhatwe cial methodsfor differentsituations.Are suchspecialefforts
shouldnot do, butit is not a prescriptionfor doing anything needed?All realisticexamplesI know suggestthatin prac-
in particular.Presumably,it refers to first-momentproper tice the overcoverageis slight and a minorissue relativeto
multipleimputation(becausewithoutthis even point esti- omittedvariablesthat can lead all methodsastraybecause
mation can be badly biased) but without concern for the of biased estimation and undercoverage.General theory
second-momentconditions(e.g., fixing parametersat point and examples suggest that second-momentpropernessof
estimatesratherthandrawingthemfromtheirposteriordis- Bayesianly-motivated multipleimputationprocedurestyp-
tributions,as in Rubin 1987, ex. 4.1, prob. 13 in chap. 1, ically follows automaticallyif the methodis first-moment
and prob. 46 in chap. 5). But this is not even defined in proper(see, e.g., Huber1976, andresultsreferencedin Ru-
multistagecomplex surveyswith clusterswhere valid im- bin 1987, sec. 2.10). Nevertheless,morework on this issue
putationmodelsneed to be hierarchical,typicallywith lev- is desirableand could make general theoreticalcontribu-
els of parametricstructure:I know what it means to try tionsto understanding the robustnessof Bayesianinference.
to be properin complex surveysby following a Bayesian My conclusion when doing imputationis to do multi-
analysiswith variablesfor the surveystructureincludedin ple imputationundercarefullychosen models and use the
the modelling,but I do not know what the advice to "not repeated-imputation inferencefor analysis.Of course,more
do this"means.Also considerthe examplein Rubin(1983, theoreticaldevelopmentis still desirableon such issues as:
sec. 2.8, also describedin Gelman,Carlin,SternandRubin implicitimputationmodelsthatreflectboth the uncertainty
1995, chap. 15), which stimulatedthe methodsin Clogg et of parameterestimationand the uncertaintyof the values
al. (1991) and illustratesthe need to be Bayesian and in- to impute given a specific predictivefit (van Buuren,van
clude variabilityin parameterestimationin orderto obtain Rijckevorsel,andRubin1993);modelsfor sequentialimpu-
valid frequencyinference. tation(Kong,Liu, andWong 1994;Liu andChen 1995);the
Finally, consider the suggestionin Fay (1996) that the use of importanceweights (Meng 1994);improvedsmallmn
analysisof a multiply-imputeddata set shouldproceedby combiningrulesin especiallydifficultcases (Barnard1995);
replacingeach incompleteunit with multiply-imputedver- and the developmentof realistic nonignorablemodels for
sions of thatunit'sdatawith split weights.I consideredand particularsettings.
4.6 Weighting Adjustments lem of missing data in shareddata bases; recall Hansen's
Finally,considerweightingadjustmentsfor nonresponse, (1987) commentsreportedin Section 4.2.
which in principle,can be a very effective class of meth-
ods for obtainingapproximatelyunbiasedestimates.Each 4.7 Concluding Comparative Comments
unit's weight is the inverse probabilityof observingthat Multipleimputationis doing well, perhapseven flourish-
unit's patternof missing data given (X, Y) information.If ing, as documentedby recent sessions at the annualmeet-
the patternsof missingdatafor the units are createdby de- ings of the AmericanStatisticalAssociationandotherpro-
sign, as with matrixsampling,these probabilitiesand thus fessional associations(e.g., the InternationalStatisticalIn-
the weightsareknown.Whenthesepatternsof missingdata stitute, American MedicalInformaticsAssociation)andby
are affectedby nonresponse,the nonresponseprobabilities the variety of recent publicationsdocumentingits appli-
need to be estimated.Althoughthis estimationcan be un- cability and extending its theory.It is even becoming so
dertakenby the data-baseconstructor,typically it is only popular that the words "multiple imputation"can appearin
done assumingthe simplestcase of nonresponsewherethe the title of an article with no reference to a publicationby
unitsareeitherrespondents(withall of Y observed)or non- me or any of my coauthors (e.g., James 1995).This change
respondents(with all of Y missing);in this case, the nonre- is occurring for two basic reasons. First multipleimputa-
spondentscan be discarded,and (approximately)unbiased tion is substantially easier for the ultimate user than any
estimatescan be obtainedfrom the respondentsand their other current method that can satisfy the dual objectives
weights,assumingthey accuratelyreflectboththe sampling of reliance only on complete-data methods and general va-
and nonresponsemechanisms. lidity of inference. And second, it is becoming relatively
Several issues arise with the use of weighting adjust- easy for the data collectorto createmultiply-imputedfiles
ments.First,even in the simplestcase of unit nonresponse, usingmoderncomputinghardwareandaccompanyingalgo-
where the shared data base of respondentsis fully ob- rithmicdevelopmentsfor Bayesianmodels. Of course,the
served,manyultimateusers' complete-dataanalysesdo not developmentof simply-usedappropriatesoftwarefor cre-
allow for samplingweights. Second, even with complete- atingmultipleimputationsand analyzingmultiply-imputed
dataanalysesthatcan deal with samplingweights,the con- datais still badlyneeded,but fortunatelyprogressis taking
structionof intervalsandp-valuesthat validly accountfor place in manyplaces (e.g., Schafer1996;Liu 1995;andvan
the fact that nonresponseadjustmentsin the weights are Buuren,van Mulligen,and Brand1995). I expect that with
estimatedfrom dataare not immediatefrom complete-data the availabilityof this software,multipleimputationwill
analyses.Third,with generalpatternsof nonresponse,spe- become the standardmethodfor handlingmissing data in
cial analysismethodsneed to be developedandspecialsoft- public-usedata sets.
ware needs to be written-see Little 1988, sec. 5.1 for the As an anonymousrefereeof this paperwrote:"Multiple
case of monotonemissingdata,but attemptingto do this in imputation is more flexible than replicationand reweight-
a mannerthatallowsthe use of standardcomplete-datasoft- ing for the analysis of surveydatawhen thereare complex
patterns of nonresponse.Case closed."
ware leads to ad hoc approachessuch as "completecases"
and"availablecases,"whichwe havealreadyrejectedas un-
[Received August 1993. Revised June 1995.]
acceptablegeneralsolutions.These threeissues imply that
in general,weightingadjustmentsdo not satisfy the objec-
tives of allowingultimateusersto applystandardcomplete- REFERENCES
datasoftwareto shareddatabases to obtainvalidinference. Barnard,J. (1995),"Cross-MatchProceduresfor Multiple-Imputation In-
A fourth issue with such weighting adjustmentsis that ference:BayesianTheoryandFrequentistEvaluation," unpublished doc-
toralthesis,Universityof Chicago,Dept.of Statistics.
they are focusedon unbiasedestimationandare essentially Burns, E. M. (1990), "MultipleandReplicateItemImputationin a Com-
blindto efficiencyconcerns.In most well-designedsurveys, plex Sample Survey,"in Proceedings of the Bureau of the Census Annual
the plannedpatternof missingdatais suchthatefficientes- Research Conference.
timatesare expectedto resultfrom standardweightedesti- - (1991), "MultipleImputationin the 1989 CommercialBuildings
EnergyConsumptionSurvey:BuildingCharacteristics," CBECSTech-
mates.But nonrespondentsdo not necessarilycreatemiss- nical Note 86, U.S. Departmentof Energy.
ing data in such a benign way, and so standardweighted (1993), "Assessmentof EnergyUse in MultibuildingFacilities,"
estimates,even when approximatelyunbiased,can have ex- ReportDOE/EIA-0555(93)/1,U.S. Departmentof Energy.
cessive variability.Considerdealingwith censoreddataby Chand,N., and Alexander,C. H. (1994), "ImputingIncomeFor An N-
PersonConsumerUnit,"Bureauof the Censuspaperpresentedat the
weighting-data beyondor approachingthe censoringpoint AmericanStatisticalAssociationAnnualMeeting,Toronto.
havezero or very smallprobabilitiesof being observed,and Clogg, C., Rubin, D. B., Schenker,N., Schultz, B., and Weidman,L.
so eithercannotbe dealt with by weightingor imply a few (1991),"MultipleImputationof IndustryandOccupationCodesin Cen-
observationswith dominantweights.Weightingby inverse sus Public-UseSamplesUsing BayesianLogisticRegression,"Journal
of the American Statistical Association, 86, 68-78.
probabilitiescannotcreateestimatesoutsidethe convexhull Efron, B. (1994), "MissingData, Imputation,and the Bootstrap"(with
of the observeddata,and estimatesinvolvingweights near discussion), Journal of the American Statistical Association, 89, 463-
the boundaryhave extremelylarge variance. 478.
For these reasons,weighting, althoughtheoreticallyat- Efron, B., and Tibsharani, R. (1993), An Introduction to the Bootstrap,
London:ChapmanandHall.
tractive in an asymptotic sense, has never really been Eltinge,J. L., Yansaneh,I. S., and Paulin,G. D. (1994), "Assessmentof
claimed to be a complete practicalsolution to the prob- ReportedDifferencesBetweenExpendituresand Low Incomesin the
-( (1990), "Imputation Procedures and Inferential Versus Evaluative Brownstone,D. (1991),"MultipleImputationsfor LinearRegressionMod-
Statistical Statements,"in Proceedings U.S. Census Bureau Sixth Annual els,"WorkingPaperMBS 91-37, Universityof California,Irvine,Insti-
Research Conference, pp. 676-679. tute for MathematicalBehavioralSciences.
(1993), "Satisfying Confidentiality Constraints Through the Use of Brownstone,D., andValletta,R. (1996),"ModelingEarningsMeasurement
Synthetic Multiply-Imputed Micro-Data," Journal of Official Statistics, Error:A Multiple Imputation Approach," The Review of Economics and
9, 461-468. Statistics, In press.
(1994), Comments on "Missing Data, Imputation, and the Boot- Chao,M. T. (1994),"AShortReviewof RecentSurveyMethodsfor Non-
strap" by B. Efron, Journal of the American Statistical Association, 89, response," Journal of the Chinese Statistical Association, 32, 169-177.
485-8. Chen,R., and Liu, J. S. (1994), "PredictiveUpdatingMethodsWithAp-
Rubin, D. B., and Schenker, N. (1986), "Multiple Imputation for Inter- plication to Bayesian Classification," Journal of The Royal Statistical
val Estimation From Simple Random Samples With Ignorable Nonre- Society, Ser. B, 58, 2.
sponse,"'Journal of the American Statistical Association, 81, 366-374. Clogg, C. C., Rubin,D. B., Schenker,N., Schultz,B., and Weidman,L.
(1987), "IntervalEstimation From Multiply Imputed Data: A Case (1991),"MultipleImputationof IndustryandOccupationCodesin Cen-
Study Using Agriculture Industry Codes," Journal of Official Statistics, sus Public-UseSamplesUsing BayesianLogisticRegression,"Journal
3, 375-387. of the American Statistical Association, 86, 68-78.
Schafer, J. L. (1996), Analysis of Incomplete Multivariate Data by Simu- Dorey,F. J., Little,R. J. A., and Schenker,N. (1993), "MultipleImputa-
lation, New York: Chapman and Hall. tion for Threshold-Crossing DataWithInterval-Censoring,"Statisticsin
Schafer, J. L., and Schenker, N. (1991), "Variance Estimation With Im- Medicine, 12, 1589-1603.
puted Means," in Proceedings of the Survey Research Methods Section, Ezzati-Rice,T. M., Khare,M., andSchafer,J. L. (1993),"MultipleImputa-
American Statistical Association, pp. 696-701. tion of MissingDatain NHANESIII,"paperpresentedat the American
Schafer, J. L., Khare,M., Little, R. J. A., and Rubin, D. B. (1993), "Multiple StatisticalAssociationAnnualMeeting,San Francisco.
Imputation of NHANES III," paper presented at the Annual Meeting of Fahimi,M., and Judkins,D. (1993), "SerialImputationof NHANESIII
the American Statistical Association, San Francisco. WithMixedRegressionandHot-DeckTechniques," paperpresentedat
Schenker, N. (1989), "The Use of Imputed Probabilities for Missing Binary the AmericanStatisticalAssociationAnnualMeeting,San Francisco.
Data," in Proceedings of the 5th Annual Research Conference, Bureau Freedman,V., and Wolf, D. A. (1991), "Imputation of Mother'sMarital
of the Census, pp. 133-139. Statusin NationalSurveyof FamiliesandHouseholds,"DiscussionPa-
Schenker, N., Treiman, D. J., and Weidman, L. (1993), "Analyses of Pub- per SeriesUI-PSC-8,The UrbanInstitute,Washington,DC.
lic Use Decennial Census Data With Multiply Imputed Industry and Glynn,R., Laird,N., andRubin,D. B. (1993),"ThePerformanceof Mix-
Occupation Codes," Applied Statistics, 42, 545-556. tureModelsfor NonignorableNonresponseWithFollowUps,"Journal
Tanner, M. A., and Wong, W. H. (1987), "The Calculation of Posterior of the American Statistical Association, 88, 984-993.
Distributions by Data-Augmentation" (with discussion), Journal of the Greenland,S., andFinkle,W.D. (1995),"ACriticalLookat BasicMethods
American Statistical Association, 82, 528-550. for HandlingMissing Covariatesin EpidemiologicRegressionAnaly-
Treiman, D. J., Bielby, W., and Cheng, M. (1989), "Evaluating a Multiple- ses," American Journal of Epidemiology, 142, 1255-1264.
Imputation Method for Recalibrating 1970 U.S. Census Detailed Indus- Journal of the American Statistical Association, 90, 54-63.
try Codes to the 1980 Standard,"Sociological Methodology, 18, 309- Heitjan,D. F. (1990),"Copingwith Age HeapingandDigit Preference:A
345. MultipleImputationApproach," unpublishedpaper,PennsylvaniaState
van Buuren, S., van Mulligen, E. M., and Brand, J. P. L. (1995), "Omgaan UniversityCollege of Medicine,Centerfor Biostatistics& Epidemiol-
Met Ontbrekende Gegevens in Statistische Databases: Multiple Impu- ogy.
tatie in HERMES," Kwantitatieve Methoden, 50, 503-504. Heitjan,D. F.,andLandis,J. R. (1994),"AssessingSecularTrendsin Blood
van Buuren, S., van Rijckevorsel, J. L. A., and Rubin, D. B. (1993), "Mul- Pressure:A Multiple-Imputation Approach,"Journalof the American
tiple Imputation by Splines," in Bulletin of the International Statistical Statistical Association, 89, 750-759.
Institute, Contributed Papers II, 503-504. Heitjan,D. F., and Little, R. J. A. (1991), "MultipleImputationfor the
Weld, L. (1987), "Significance Levels from Public Use Data With FatalAccidentReportingSystem,"AppliedStatistics,40, 13-29.
Multiply-Imputed Industry Codes," unpublished doctoral thesis, Harvard Johnson,C. L., Curtin,L. R., Ezzati-Rice,T. M., Khare,M., andMurphy,
University, Dept. of Statistics. R. S. (1993), "SingleandMultipleImputation: The NHANESPerspec-
Wolter, K. M. (1984), Introduction to Variance Estimation, New York: tive,"paperpresentedat the 1993 AmericanStatisticalAssociationAn-
Springer-Verlag. nualMeeting,San Francisco.
Johnson,E. G., and Zwick, R. (Eds.) (1988), Focusingthe New Design:
The NAEP 1988 Technical Report, Princeton, NJ: Educational Testing
BIBLIOGRAPHY:SOME OTHER WORK Service.
INVOLVINGMULTIPLEIMPUTATION Kalleberg,A. L., Marsden,P. V., Aldrich,H. E., andCassell,J. W. (1990),
"ComparingOrganizational SamplingFrames,"Administrative Science
Belin, T. R., Diffendal, G. J., Mack, S., Rubin, D. B., Schafer, J. L., and Quarterly, 35, 658-688.
Zaslavsky, A. M. (1993), "Hierarchical Logistic Regression Models for Kennickell,A. B. (1991),"Imputation
of the 1989Surveyof ConsumerFi-
Imputation of Unresolved Enumeration Status in Undercount Estima- nances:StochasticRelaxationandMultipleImputation," in Proceedings
tion" (with discussion), Journal of the American Statistical Association, of the Survey Research Methods Section, American Statistical Associa-
88, 1149-1166. tion,pp. 1-10.
Belin, T. R., and Rubin, D. B. (1990), "Calibration of Errors in Computer Land,K. C., and McCall,P. L. (1993), "Estimatingthe Effect of Nonre-
Matching for Census Undercount" (with discussion), in Proceedings of sponsein SampleSurveys:An Applicationof Rubin'sBayesianMethod
the Government Statistics Section, American Statistical Association, pp. to the Estimationof CommunityStandardsfor Obscenity." Sociological
124-131. Methods and Research, 21, 291-316.
Bloxum, B., Pashley, P. J., Nicewander, W. A., and Yan, D. (1995), "Link- Li, K. H. (1988),"HypothesisTestingin MultipleImputation,"unpublished
ing to a Large-Scale Assessment: An Empirical Evaluation,"Journal of doctoralthesis,Universityof Chicago.
Educational and Behavioral Statistics, 20, 1-26. Little, R. J. A., and Rubin,D. B. (1989), "The Analysis of Social Sci-
Boshuizen, H. C., Izaks, G. J., van Buuren, S., and Ligthart, G. J. (1995), ence Data
"Bloeddruk en Sterfte Bij Hoogbejaarden." TNO-rapport C95.014, With Missing Values," Sociological Methods and Research, 18, 292-
ISBN 90-6743-377-2. 326. Also in Modern Methods of Data Analysis (1990), eds. S. Fox and
J. S. Long,NewburyPark,CA: Sage Publications.
Brand, J., Gelsema, E. S., and van Buuren, S. (1995), "Verification of
Multiple Imputation by Simulation," submitted to SCAMC '95. (1993),"Assessmentof trialimputationsfor NHANESIII"project
Brand,J., vanBuuren,S., vanMulligen,E. M., Timmers,T., andGelsema, report,DatametricsResearch,Inc.
E. (1994), "MultipleImputationas a Missing Data Machine."in Pro- Liu, C. (1993), "Bartlett'sDecompositionof the PosteriorDistributionof
ceedings of the Eighteenth Annual Symposium on Computer Applica- the Covariancefor NormalMonotoneIgnorableMissingData,'Journal
tions in Medical Care (SCAMC),Philadelphia:Hanleyand Belfus, pp. of Multivariate Analysis, 46, 198-206.
303-307. (1994),"StatisticalAnalysisUsingthe Multivariatet Distribution'"
unpublisheddoctoralthesis,HarvardUniversity,Dept. of Statistics. Study Using Agriculture Industry Codes," Journal of Official Statistics,
(1995),"MissingDataImputationUsing the Multivariatet Distri- 3, 375-387.
bution,"Journal of Multivariate Analysis, 53, 139-158. (1991), "MultipleImputationin Health-CareData Bases: An
Liu,J. S. (1994),"Fractionof MissingInformationandConvergenceRate Overview and Some Applications," Statistics in Medicine, 10, 585-598.
of Data Augmentation," in Computationally Intensive Statistical Meth- Rubin,D. B., Stern,H., andVehovar,V. (1994),"Handling'Don'tKnow'
ods: Proceedings of the 26th Symposium Interface, eds. J. Sall and A. SurveyResponses:The Caseof the SlovenianPlebiscite,"Journalof the
Lehman,pp. 490-497. American Statistical Association, 90, 822-828.
Marsden,P. V., and Podolny,J. (1990), "DynamicAnalysisof Network Rubin,D. B., and Zaslavsky,A. (1989), "AnOverviewof Representing
Diffusion Processes," in Social Networks Through Time, eds. J. Weesie Misenumerations in the CensusUsingMultipleImputation," in Proceed-
andH. Flap,Utrecht,The Netherlands: ISOR. ings of the Bureau of the Census Fifth Annual Research Conference, pp.
Meng, X. L. (1990), "TowardsCompleteResults for Some Incomplete- 109-117.
DataProblems,"unpublisheddoctoralthesis,HarvardUniversity,Dept. Schafer,J. L. (1991), "A Comparisonof the Missing-DataTreatmentsin
of Statistics. the Post-Enumeration Program,"Journal of Official Statistics, 7, 475-
Meng, X. L., and Rubin, D. B. (1990), "LikelihoodRatio Tests with 498.
Multiply-Imputed Data," in Proceedings of the. Statistical Computing (1991),"Algorithms for MultipleImputationandPosteriorSimula-
Section, American Statistical Association, pp. 78-82. tion from IncompleteMultivariateData WithIgnorableNonresponse,"
Meng, X. L., and Tu, X. M. (1993), "CorrectingReportingDelays in unpublisheddoctoralthesis,HarvardUniversity,Dept.of Statistics.
SurveillanceData by MultipleImputationWith Applicationto AIDS Schafer,J. L., Khare,M., Little,R. J. A., andRubin,D. B. (1993),"Multiple
Surveillance," paperpresentedat the AmericanStatisticalAssociation Imputationof NHANESIII"paperpresentedat theAmericanStatistical
AnnualMeeting,San Francisco. AssociationAnnualMeeting,San Francisco.
Mislevy, R. J. (1991), "Randomization-Based InferencesAbout Latent Schenker,N., Treiman,D. J., and Weidman,L. (1988), "Evaluationof
VariablesFromComplexSamples,"Psychometrika, 56, 177-196. Multiply-Imputed Public-Use Tapes," in Proceedings of the Survey Re-
Mislevy, R. J., Beaton,A. E., Kaplan,B., and Sheehan,K. M. (1992), search Methods Section, American Statistical Association, pp. 85-92.
"EstimatingPopulationCharacteristics FromSparseMatrixSamplesof Schenker,N., Treiman,D. J., andWeidman,L. (1993),"Analysisof Public-
Item Responses," Journal of Educational Measurement, 29, 133-161. Use DataWithMultiply-Inputed Industryand OccupationCodes,"Ap-
Raghunathan,T. E. (1987), "LargeSample SignificanceLevels From plied Statistics, 42, 545-556.
Multiply-Imputed Data,"unpublisheddoctoralthesis, HarvardUniver- Schenker,N., andWelsh,A. H. (1988),"AsymptoticResultsfor"Multiple
sity, Dept. of Statistics. Imputation,"The Annals of Statistics, 16, 1550-1566.
Raghunathan, T. E., and Grizzle,J. E. (1995), "A Split QuestionSurvey Soldo,B. J., Wolf,D. A., andFreedman,V. A. (1990),"CoresidenceWith
Design," The Journal of the American Statistical Association, 90, 54-63. Older Mothers:The Children'sPerspective,"UrbanInstituteReport,
Raghunathan, T. E., and Siscovick,D. S. (1996), "AMultipleImputation Washington,DC.
Analysisof a Case-ControlStudyon the Riskof PrimaryCardiacArrest Stein,M. L., Shen,X., andStyer,P. E. (1993),"Applications of a Simple
Among Pharmacologically TreatedHypertensives," AppliedStatistics, Regression Model to Rain," Canadian Journal of Statistics, 21, 331-346.
45, 3. Taylor,J. M. G., Mufioz,A., Bass, S. M., Saah, A., Chmiel,J. S., and
Reilly,M. (1993), "DataAnalysisUsing Hot-DeckMultipleImputation" Kingsley,L. A. (1990), "Estimatingthe Distributionof Times From
The Statistician, 42, 307-313. HIV Seroconversionto AIDS Using MultipleImputation," Statisticsin
Relles,D. A., andStolzenberg,R. M. (1991),"AnAssessmentof the Con- Medicine, 9, 505-514.
sequencesof Sample CensoringBias in GraduateSchool Admission Treiman,D. J., Bielby,W., andCheng,M. (1989),"Evaluating a Multiple-
Test Validation,"in Proceedings of the Social Statistics Section, Ameri- ImputationMethodfor Recalibrating1970 U.S. CensusDetailedIndus-
can Statistical Association, pp. 101-110. try Codes to the 1980 Standard," SociologicalMethodology,18, 309-
Rubin,D. B. (1988), "MultipleImputationfor Data-BaseConstruction." 345.
COMSTAT88-Proceedings in Computational Statistics, eds. D. Ed- Tu, X. M., Meng,X. L., and Pagano,M. (1993a),"TheAIDS Epidemic:
wardsandN. E. Raun,Heidelberg:Physica-Verlag, pp. 389-400. EstimatingSurvivalAfter AIDS DiagnosisFrom SurveillanceData,"
(1988), "AnOverviewof MultipleImputation," in Proceedingsof Journal of the American Statistical Association, 88, 26-36.
the Survey Research Methods Section, American Statistical Association, (1993b),"SurvivalDifferencesandTrendsin PatientsWithAIDS
pp. 79-84. in the United States," Journal of Acquired Immune Deficiency Syn-
(1991), "EMandBeyond,"Psychometrika, 56, 241-254. dromes,6, 1150-1156.
(1992),Commenton "ClinicalTrialsin Psychiatry:ShouldProto- van Buuren,S., van Mulligen,E. M., and Brand,J. P. L. (1994), "Rou-
col DeviationsCensorPatientData?"by Lavori,Neuropsychopharma- tine MultipleImputationin StatisticalData Bases,"in Proceedingsof
cology, 6, 59-60. the Seventh International WorkingConference on Scientific and Statisti-
Rubin,D. B., and Schafer,J. L. (1990), "EfficientlyCreatingMultiple cal DatabaseManagement,eds. J. C. FrenchandH. Hinterberger, Los
Imputationsfor IncompleteMultivariateNormalData,"in Proceedings Alamitos,CA:IEEEComputerSocietyPress,pp. 74-78.
of the Statistical Computing Section, American Statistical Association, Weld, L. H. (1987), "SignificanceLevels From Public-UseData With
pp. 83-88. Multiply-Imputed IndustryCodes,"unpublished doctoralthesis,Harvard
(1988), "ImputationStrategiesfor Missing Values in the PES," University,Dept. of Statistics.
Survey Methodology, 14, 209-22 1. Williams,V. S. L., Billeaud,K., Davis,L. A., Thissen,D., andSanford,E.
Rubin,D. B., Schafer,J. L., andSchenker,N. (1988),"Imputation
Strate- (1995),"Projecting to the NAEPScale:ResultsFromthe NorthCarolina
gies for Estimating the Undercount" in Bureau of the Census Fourth End of GradeTestingProgram,"researchreport,NationalInstituteof
Annual Research Conference, pp. 151-159. StatisticalSciences.
Rubin,D. B., and Schenker,N. (1986), "MultipleImputationfor Inter- Zaslavsky,A. M. (1989), "Representing CensusUndercount:A Compar-
val EstimationFromSimpleRandomSamplesWith IgnorableNonre- ison of Reweightingand MultipleImputationMethods,"unpublished
sponse," Journal of the American Statistical Association, 81, 366-374. doctoralthesis,MassachusettsInstituteof Technology,Dept.of Mathe-
Data:A Case
(1987),"IntervalEstimationfromMultiply-Imputed matics.