Vincent-El Badry Method
©1998 by Griffith
1998 (Original January 1996)
In many past censuses enumerators have frequently omitted
to record "0" for childless women, leaving the space
on the census schedule blank or perhaps recording a dash. The
census records will then show these women as having not
responded to the question on number of children ever born (CEB),
rather than as having no children. This creates a downward
bias in reported proportions of childless women and a
corresponding upward bias in reported proportions of women
with one or more children ever born and in mean number of
children ever born. The Vincent-El Badry method is a tool for
diagnosing this problem and, in certain conditions, correcting
Let the following denote observed values.
Wi - number of women in
NSi - number of women
with CEB Not Stated
Ci - number of childless
where i = 1, 2, ..., indexes age groups from youngest to
oldest. Number of women refers to women to whom the children
ever born question was addressed, typically either all women
or all ever married women.
From these observed numbers the following proportions are
Note that upper case letters denote numbers, lower case
Let the corresponding (unknown) true values be the same
prefaced by t. E.g., tCi denotes the true number of
(1) The true proportion of women not stating children ever
born is the same in every age group. Let this proportion be
(2) The true proportion of zero parity women who are
incorrectly recorded as CEB not stated is the same for all age
groups. Let this proportion be denoted p.
These assumptions may be relaxed by interpretation in the
context of particular applications, as illustrated below.
The second assumption means that
(1a) Ci = tCi
- ptCi = tCi(1-p) and
These equations simply indicate the transfer of the
improperly recorded women, ptCi in number, from the
childless to the not stated category.
Solving (1a) for tCi gives
Substituting this in (1b) gives
(3) NSi = tNSi
Dividing this by Wi gives
(4) nsi = tnsi
These steps are pure (and very elementary) algebra. By
assumption (1), however, tnsi has the constant
value ns, so that we obtain finally
(5) nsi = ns +
This equation includes the observed values nsi
and ci for each age group and two unknown
parameters, ns and p.
Values for ns and p are estimated by fitting a straight
line to the points (ci, nsi). The
intercept of the fitted line gives an estimate of ns. The
slope of the fitted line equals p/(1-p), whence p is given by
s/(1+s), s denoting the slope.
The true ci values may be computed in either of
two ways. First, from (1a),
Second, adding (1a) and (1b) and rearranging terms,
If the fit is perfect, these two formulas will give the
same result. In practice, both may be computed and their ratio
examined to give an indication of how well the method is
The interpretation of p is straightforward: it is the
estimated proportion of zero parity women who are incorrectly
recorded as having failed to report number of children ever
born. It is used to correct the observed proportions of
childless women using formula (6a).
The interpretation of ns requires a distinction between
"real" and "spurious" not stated cases.
"Real" not stated cases are women for whom the
enumerator attempted to obtain an answer to the children ever
born question but was unable to do so. "Spurious"
not stated cases are women who had no children, and who might
have been accurately identified as such, but for whom improper
behavior of the enumerator resulted in the "children ever
born not stated" classification.
The estimated true proportions of women childless vary in
quality according to age group. Estimates for women aged 20-50
are often rather good. Estimates for older women may be poor
because true proportions not stated tend to increase with age
beyond age 50. Estimates for women under 20, and especially
for women under age 15, may be very poor. While the
explanation for this is unclear, it evidently has to with the
very high proportions of zero parity women at these young
ages. The residuals of the fitted line and the ratios of the
two estimates of the true proportion of zero parity women in
each age group provide a guide for interpretation.
The data points (ci, nsi) should be
plotted and scrutinized before fitting a line, and the fitted
line should in general aim to minimize residuals for the
points for reproductive age women. Once a line is fit,
residuals should be plotted and examined. Resist any
temptation to omit these steps, at the risk of producing silly
and potentially embarrassing results. When working by
computer, robust fitting methods should be used.
Children ever born data for ever married women in the
Indian state of Maharashtra as of the 1981 census are given on
page 574 of Maharashtra, Census of India - 1981, SR. 12,
Maharashtra, Part - VI - A & B, Fertility tables. The
proportions of women with CEB not stated and of childless
women are as follows
Table 1: Input Data
age nsi ci
<15 0.5368 0.4446
15-19 0.3281 0.3541
20-24 0.1462 0.1419
25-29 0.0611 0.0548
30-34 0.0403 0.0334
35-39 0.0348 0.0272
40-44 0.0353 0.0280
45-49 0.0379 0.0275
>50 0.0474 0.0322
The following plot shows the scatter of nsi
against ci together with a fitted line. The
intercept and slope of the fitted line are 0.0080 and 0.9737.
The following table shows fitted nsi values and
Table 2: Fitted Values and Residuals
age fit res
<15 0.4409 0.0959
15-19 0.3528 -0.0247
20-24 0.1462 0.0000
25-29 0.0614 -0.0003
30-34 0.0405 -0.0002
35-39 0.0345 0.0003
40-44 0.0353 0.0000
45-49 0.0348 0.0031
50+ 0.0394 0.0080
The residuals are plotted against age groups identified by
number, 1 being the youngest age group <15, in the
following figure. The fit is extremely good in the
reproductive ages, with slight deterioration to both sides of
this range, but extremely poor for the <15 age group.
The intercept and slope of the fitted line give ns = 0.0080
and p = 0.4933. The ns value indicates a "true"
level of understatement of 0.8 percent. The p value indicates
that nearly half of all childless women were recorded as
children ever born not stated. The two possible estimates of
corrected proportions of zero parity women are shown in the
Table 3: Corrected Proportions of Zero Parity Women
age meth1 meth2 ratio
<15 0.8774 0.9734 1.11
15-19 0.6988 0.6742 0.96
20-24 0.2800 0.2801 1.00
25-29 0.1082 0.1079 1.00
30-34 0.0659 0.0657 1.00
35-39 0.0537 0.0540 1.01
40-44 0.0553 0.0553 1.00
45-49 0.0543 0.0574 1.06
50+ 0.0635 0.0716 1.13
The observed values of childlessness for older women shown
in Table 1 above are around 2.8 percent. The p = 0.4933
implies a multiplication of the observed values by 1/(1-.4933)
= 1.9736, i.e., just under a doubling of the observed values.
The bottomline is that the level of childlessness in
Maharashtra as of the 1981 census is about double the level
indicated by the unadjusted census data, roughly 5.5 percent
as compared with 2.8 percent, a very large difference indeed.
This example, which was chosen merely because the
Maharashtra data were conveniently at hand, shows how very
important the Vincent-El Badry adjustment may be.
M. V. Del Tufo, A Report on the 1947 Census of
Population, The Government Printer, Federation of Malaya,
Kuala Lumpur. Contains a useful discussion of the problem (not
the method) by a census taker. See pages 65-70.
Paul Vincent, L'Utilization des statistiques des
familles, Population 1, January-March, 1946. Pages
143-148 seem to give the essentials of the method described
here, though my french is not particularly good. My copy of
this paper bears a note "Cf. Henry 1953, page 40,"
which I cannot at present track down.
M. R. El Badry, Failure of enumerators to make entries
of zero: Errors in recording childless cases in population
censuses, Journal of the American Statistical
Association 56(296), 1961, pages 909-924. This is more
widely cited, in the English speaking world, at any rate.
United Nations, Manual X, Indirect Techniques for
Demograhic Estimation, Population Studies No. 81,
Department of International Economic and Social Affairs, New
York, 1983. Contains an expostion in Annex II, pages 230-235.
I find the expositions in the preceding two sources
unsatisfactory. El Badry's exposition does not identify the
two assumptions of the method with sufficient clarity and his
restriction to ages below 40 keeps us from learning much that
the data for older women have to tell us. Manual X states that
equation (5) is "plausible," which it isn't. Without
a derivation on explicit assumptions it is neither plausible
nor implausible, and no derivation is given.
Alberto Palloni, Adjusting data on children-ever-born
for nonresponse, Social Biology, Vol. 28, No. 3-4,
1981, pages 308-314. This paper would appear to contain
relevant material, but despite several readings I have not
been able to make sense of it.