Statistical methods in
epidemic modelling:
Daniela De Angelis
report on progress
2001-2006
http://www.mrc-bsu.cam.ac.uk/BSUsite/Research/Section3.shtml
Abstract
The main focus of this research
programme has concerned the development and application
of statistical methodology to estimate the
characteristics and evolution of epidemics, in
particular those caused by the Human Immunodeficiency
Virus (HIV) and the Hepatitis C Virus (HCV). The main
progress relates to: the development and application of
methods to estimate prevailing and future disease
prevalence (often at different disease stages) and
incidence using information from a variety of sources;
the characterisation of disease progression, and the
factors affecting it, through the analysis of
longitudinal data on disease markers collected in
observational cohort studies, accounting for the bias
inherent in such studies. The philosophy of this work is
mainly Bayesian with an emphasis on the use of
information from multiple sources and at different
levels of aggregation and, particularly, on the
identification of critical sources of information needed
both to resolve apparent conflicts between data sources
and to reduce estimation uncertainty. Apart from my
close collaboration with the Health Protection Agency (HPA),
the research has also benefited from continuation of
important collaborations, such as that with the
Concerted Action on SeroConversion to AIDS and Death in
Europe (CASCADE) project, and the establishment of new
ones, such as that with the Trent study. These have been
fundamental in providing longitudinal data to estimate
progression for the HIV and HCV diseases, respectively.
Contributions in other areas include: input to
cost-effectiveness analyses of HCV therapies; estimation
of evolution of the injecting drug use "epidemic";
modelling of errors in protein databases.
Introduction
My research activities are motivated by
the need to provide evidence-based input to public
health policies in England and Wales. This work is
carried out through the link with the HPA (previously
Public Health Laboratory Service), which funds my
position. This long-standing collaboration has proved
very successful for both organisations. The HPA is the
national, and internationally renowned, centre for
surveillance of infectious diseases, with
responsibilities for preparedness for new and emerging
health threats such as bio-terrorist attacks and
virulent new disease strains. It offers a wealth of
information, an in-depth epidemiological expertise and,
importantly, the opportunity of contributing to the
formulation of policies for genuinely significant and
pressing public health issues. The Biostatistics Unit,
on the other hand, provides the specialised statistical
knowledge that is essential to the provision of sound
evidence based advice.
Most of my work is focused on the
development of statistical methods to address problems
relevant to the HPA and it is mainly carried out at the
Biostatistics Unit. However, my responsibilities to the
HPA also include provision of statistical advice to
junior statisticians and epidemiologists of a more
routine nature.
HIV modelling
HIV incidence estimation
HIV incidence estimation: HIV has been
the most serious communicable disease in the UK since
the mid-1980s. Traditionally, estimates of the number of
HIV infected individuals in high risk groups and
short-term prediction of AIDS cases have been the
quantitative under-pinning of the Department of Health's
public health strategies. The advent of the highly
active anti-retroviral therapies (HAART) in the late
1990s has substantially reduced the incidence of AIDS
and death in industrialised countries. This, together
with the shift from epidemic to endemic HIV transmission
in high risk groups, has altered research priorities.
Therefore while estimation of prevalence, particularly
in ethnic minorities, has been increasingly important,
knowledge of future trends in AIDS cases is no longer
central to health care planning. Estimation and
prediction of the number of people at earlier stages of
HIV disease and, in particular, the number of new
infections, has become more relevant to policy.
Knowledge of the recent and current level of HIV
transmission is the basis for the Department of Health
Sexual Strategy, whose main aim is to reduce
transmission of HIV and sexually transmitted disease by
25% by 2007. However, the introduction of HAART has also
irremediably changed the historical trends in AIDS
incidence and the natural history of HIV, compromising
use of more traditional methods for incidence
estimation. Implementation of the back-calculation
method [05.503],
which, based on reports of AIDS cases and the
distribution of the time from HIV infection to AIDS
(incubation time), estimates the underlying HIV
incidence, is now problematic. This indicated the need
for new approaches, less reliant on AIDS figures that
can explore the potential of alternative, but already
existing, data sources and, crucially, identify
enhancements to current surveillance systems, which will
most usefully inform estimation. Little progress in this
direction has been made in other industrialised
countries, which typically do not have the rich
surveillance system of the UK. Attempts have been
limited to devising versions of back-calculation that
are based on HIV diagnoses data rather than AIDS cases.
However, this has introduced the added difficulty of
estimating the distribution of the time between
infection and HIV diagnosis (the time of the first HIV
positive test), which is not the result of a natural
process, but depends on the state of the immune system
as well as on external pressures (awareness campaigns)
that might change over time (Chau et al, 2003; Ping
Yang, Public Health Agency of Canada, personal
communication).
We have addressed this situation through
the development of new methods that exploit the complex
body of information available on HIV. Our contribution
adopts a Bayesian perspective, which is particularly
suited to the incorporation of uncertain information
from a variety of sources and allows coherent
propagation of uncertainty.
Collaboration with Wally Gilks
has led to the development of an approach to the
re-construction of relevant aspects of the HIV epidemic,
in particular HIV incidence, consistently with
surveillance data, both at the individual level and
aggregate level, as well as information from ad hoc
surveys, national surveys and routinely collected
statistics. One of the main goals of this work was to
investigate the role of additional information (or
potentially available additional information) in the
estimation of HIV incidence and to make recommendations
for routine collection of such information. Feasibility
of the idea has been tested on data on homosexuals from
England and Wales. Results have confirmed the potential
of the approach while pointing out the computational
burden. The appeal of this approach is the individual
based modelling, which allows incorporation of
individual-specific information.
As an alternative approach
[05.110] we have
extended our previous work (Aalen et al, 1997) by
proposing a discrete time multistage version of the
back-calculation method that employs HIV diagnoses and
AIDS diagnoses with no previous HIV diagnosis as
end-points. HIV progression is described as a series of
disease stages in terms of CD4 count, and HIV diagnosis
is allowed from the various stages with rates that are
stage and calendar time dependent. In this way we model
explicitly the dependence of the diagnosis process on
both the state of the immune system and calendar time.
Information on HIV progression through CD4 stages and
data on end-points are used to estimate diagnosis rates
and HIV incidence rates. These parameters are only
identifiable if ancillary surveillance data (from the
CD4 database held at the HPA) are included in the model,
clearly demonstrating the value of this specific
surveillance programme. Aggregate information on the
number of undiagnosed infected individuals can also be
used to refine estimation. This formulation, in which
HIV diagnoses replaces AIDS diagnoses, avoids the
necessity of dealing with treatment effects, which would
complicate things substantially. Application has
concentrated on the homosexual epidemic.
HIV prevalence estimation:
Estimation of HIV prevalence remains
essential even after the introduction of HAART.
Traditionally, prevalence estimates have been obtained
as a cross-sectional summary of prevalences in various
groups at high risk for HIV. The "Direct" method (see
for example, Petruckevitch et al 1997;
[06.073]) combines
information on the size of risk group derived from
population-based surveys of HIV-related risk or exposure
behaviours, with HIV prevalence estimates from anonymous
surveys.
A characteristic of this method is that
it depends on the availability of direct data on
relevant parameters, such as the size of a particular
group at high risk. If direct data are not available,
assumptions and adjustments are used to derive estimates
of these parameters. Because of these adjustments, it is
not possible to attach uncertainty to the final
prevalence estimates, therefore only final point
estimates are available with no measure of uncertainty.
In this framework, there is also no scope for validation
of results as there is no notion of model or model
fitting. This method has been the accepted method to
estimate HIV prevalence in the UK.
In collaboration with Tony Ades
(Bristol), we have proposed an alternative Bayesian
multi-parameter evidence synthesis (MPES) approach
[06.405]. The
philosophy underlying the approach is that all (both
direct and indirect) available data are used, so that
estimation of each parameter can also be informed by
indirect data and, in fact, multiple sources of evidence
can contribute to such estimation. This clearly uses
information more efficiently, leads to more precise
estimates, is less prone to biases due to selection of
information and, importantly, allows the assessment of
whether the various pieces of information are consistent
with one another. For example, we found that, under the
interpretation that has been routinely assumed in HIV
prevalence estimates obtained through the "Direct
Method", some of the HIV surveillance sources conflicted
with each other. Finally, the Bayesian paradigm provides
naturally the crucial measure of uncertainty. A
substantive characteristic of the project has been the
emphasis on consistency of evidence and the use of
diagnostics for model fit and model choice. These
important issues will be the topic of an MRC funded
workshop in September 2006
We have assessed the feasibility of our
approach using surveillance data for 2001 census data,
and information from the National Survey of Sexual
Attitudes and Lifestyles and the National Study of HIV
in Pregnancy. The model developed includes thirteen
distinct risk groups, and estimates the size of each
risk group, the proportion of infected in each risk
group and the proportion of infected diagnosed in three
regions (Inner London, Outer London, and the rest of
England and Wales). The MPES has been used to derive the
official HIV prevalence estimates for 2004
[05.405].
A promising development of the above
work is to use it to obtain HIV incidence estimates. As
a result of the work on prevalence, we derive estimates
for each risk group of the number of people in the
compartment "infected" with HIV, "infected not yet
diagnosed" and "infected and diagnosed" at any given
point in time. These estimates, combined with an
appropriate model for transition between compartments
(and groups), can be used to obtain estimates of
transition rates, in particular of the rate between
"uninfected" and "infected" i.e. the incidence of
infection. Feasibility of this approach has been
demonstrated and further developments are the topic of
Anne Presanis's PhD
project.
HCV modelling
World-wide infection with the Hepatitis
C Virus (HCV) is a major cause of chronic liver disease
including liver cancer. In England HCV has been
identified as a priority in the Chief Medical Officer's
strategy for control of infectious diseases with the aim
of improving prevention, diagnosis and treatment. Our
involvement started with participation in the Department
of Health (DH) strategy committee (Department of Health
Strategy group for Hepatitis) whose work defined the HCV
Action Plan for England. The research we are now
conducting is to provide the quantitative support to
this plan. However, information on HCV spread is very
limited, surveillance systems are still not very
developed and little is known about progression of
infection with HCV. Once again the role of our work is
that of identifying directions for development of
surveillance systems. So far, we have concentrated on
two areas: estimation of HCV prevalence and prediction
of future burden in the general population.
Estimation of HCV prevalence:
There is no agreement on the prevalence
of HCV in the general population and in the topic has
recently been the subject of public debate. Data on HCV
prevalence come from testing residual sera collected
within unlinked anonymous programmes in key groups such
as pregnant women and attenders at genitourinary
medicine (GUM) clinics as well as from routine
diagnostic hospital testing. As with the "Direct" method
for HIV, estimates of the number of HCV infected
individuals could be derived by combining the proportion
of infected individuals from each group with its size.
The resulting estimates of HCV prevalence are however
difficult to interpret, since they are biased estimates
of HCV prevalence in the general population, as a result
of these key groups being mixtures of sub-groups with
different risk for HCV. Only by including information on
these mixtures, which might come from several data
sources, is it possible to provide a reliable estimate
of HCV prevalence. In collaboration with Matthew Hickman
(Bristol), Tony Ades (Bristol) and epidemiologists at
the HPA, we have developed an epidemiological model of
the population of England and Wales aged 15 to 59 years
and subdivided by gender, region and risk group (current
injecting drug users (IDUS), ex-IDUS, not IDUS). Our
approach is again Bayesian and our goal is to include
all available information on relevant parameters. This
modelling exercise has revealed inconsistency of
information and identified the current lack of
information on both the HCV prevalence in ex-IDUs and
the size of this group.
Prediction of HCV burden
Application of an age-specific version
of the Bayesian back-calculation approach developed for
HIV has provided estimates of the current and future
burden of chronic HCV by disease stage. We have used
data on hepatocellular carcinoma (HCC) due to HCV over
time, estimates of HCV progression through disease
stages and information on the number of hospital
admissions due to end-stage liver disease and
hepatocellular carcinoma, to reconstruct the underlying
incidence of HCV. The resulting incidence has then been
used to predict the number of chronically infected
individuals by disease stage. Here, the inclusion of
additional information such as hospital admission data,
has served to highlight the need for further research on
some of the key parameters: for example, data on HCC
deaths conflict with information from hospital admission
unless some of transition probabilities are allowed to
vary by age. An alternative explanation is a bias in the
hospital admission data, on which there is currently no
information. We have presented results in the first
annual report on HCV in England, showing a likely
substantial increase in the burden on healthcare
resources [05.402;
06.118]Remarkably, trends in the underlying
incidence of HCV mirror those in the incidence of
injecting drug use as recently estimated from data on
overdose mortality [04.032].
Disease progression modelling
Information on disease progression
represents an essential ingredient to the understanding
of the evolution of epidemics. In the HIV field,
estimates of disease progression from HIV infection have
been provided by cohort studies, mostly conducted in the
United States, of individuals typically enrolled after
HIV infection and with unknown date of seroconversion.
Since 1997, the CASCADE collaboration has pooled 22
cohort studies from several European countries providing
what is currently the largest cohort of HIV infected
individuals with well estimated seroconversion dates. As
a statistician member, I have been involved in a number
of projects, and, in particular, in the parametric and
semi-parametric modelling of the incubation time to AIDS
[06.031]. More
importantly, data from CASCADE have offered the
opportunity to estimate age (at seroconversion) specific
CD4-based staged models of HIV progression, prior to and
after the advent of HAART. The resulting age-specific
transition rates represent a fundamental input to
further development of the models in section on HIV
modelling.
Essential to projection of future HCV
burden and clinical management of HCV infected
individuals is the research conducted to estimate HCV
progression. This is typically estimated using data on
disease severity, established through a scoring system
for fibrosis, from patients who have undergone liver
biopsy. Progression is estimated at the patient level by
dividing the change in fibrosis score at two consecutive
observations by the time elapsed between them. There are
two problems with the common approaches: firstly, the
estimation method makes the assumption that the patient
enters the fibrosis stage precisely at the time of
observation and that progression is constant thereafter;
and secondly, the dependence of the recruitment on the
underlying disease process produces biased estimates of
progression. In collaborative work
[06.117]
we have addressed these two problems by adopting a three
stage progressive Markov model to describe fibrosis
progression from mild HCV disease to a cirrhotic state.
Models of this type are best suited to estimate
transition rates between disease stages on the basis of
interval censored observations. We have analysed data on
the results of biopsies from three different cohorts
characterised by different recruitment policies, and
estimates of progression vary substantially according to
the method of recruitment. The probability of developing
cirrhosis after 20 years from infection, for a group of
patients of a specific profile, was estimated to be
6%(95% CI 3%-13%) using data from the HCV National
Register (Harris et al, 2000), 12% (95% CI 6%-22%) using
data from the Trent Study (Mohsen, 2001), a hospital
based cohort, and 23% (95% CI 14%-37%) using data from a
tertiary referral centre for liver disease. Importantly,
the HCV National Register run at the Health Protection
Agency and, based on a "lookback" exercise of
individuals infected through transfusion, recruits
patients independently of their disease severity. We
have used estimates from this cohort to predict future
HCV burden while those derived from the hospital based
cohorts have been used as input in cost-effectiveness
analyses of treatment of HCV at a mild stage
[06.046].
Bio-informatics
My involvement in bio-informatics has
focused on the modelling of the process of error
percolation in databases of protein sequences. Proteins
are responsible for the functioning of an organism by
performing specific tasks. Publicly available databases
of protein sequences report this function in the form of
an annotation. In good quality databases, the annotation
is assigned manually, on the basis of experimental
evidence. The genome sequencing project has resulted in
a rapid increase in protein sequence information and the
annotation process has been accelerated through use of
automatic methods based on sequence similarity. The
function of a protein is now more commonly attributed by
copying the annotation from proteins already annotated
that are "homologous", i.e. show, through a similarity
of sequence, a common origin to the protein of interest.
This process is prone to error if, for instance, the
functional annotation of the homologous proteins has
itself been derived from sequence similarity, as no
information is kept on how the annotation of a protein
has been acquired. It is then possible that, through
this copying mechanism, annotation errors can percolate
through the database. In collaboration with Christos
Ouzounis's group at the European Bioinformatics
Institute at Hinxton, we have modelled the
percolation process to investigate the effect of the
progressive misannotation on the quality of the database
[02.029]. Results
have shown a worrying progressive deterioration of the
quality of the database and we recommended to improve
data tracking. We later extended our model to deal with
more complex annotation structures
[05.032].
Reconstructing the IDU epidemic
The spread of injecting drug use is
analogous to the spread of an infectious disease. Its
evolution is strongly related to that of the HCV
epidemic as sharing infectious needles is the major root
of HCV transmission. In collaboration with Matthew
Hickman (Bristol) we have conducted work to estimate the
characteristics of the IDU epidemic. In a report to the
Home office [02.402]
we have reviewed methods for estimating prevalence and
incidence. In [01.031]
we have attempted to estimate incidence of opiate/IDU
using data on the number of IDUs in treatment. Finally
in [04.032] we
exploit information on the age at first injection and
age-specific mortality due to overdose as well as
information on injecting history duration to derive
age-specific estimates of opiate/IDU incidence. Results
are sensitive to assumptions made about key parameters,
such as the distribution of the length of injecting
career, on which there is currently little evidence.
Further insight into opiate related overdose mortality,
derived from the Cohort Studies on Mortality of
Opiate-users workshop run in collaboration with Sheila
Bird in November 2003 [05.005],
could also help to refine these estimates.
Summary of major achievements
-
development of
Bayesian approaches for disease incidence and
prevalence estimation with application to HIV;
-
estimation and
prediction of HCV burden by disease stage using
Bayesian multistage models;
-
detailed
statistical analysis of data from observational
cohorts leading to a greater understanding of HCV
progression.
Publications from this programme
|
Publications - 2001
|
|
01.016 |
CASCADE
Collaboration, (participant:
De Angelis D).
Is the time from HIV serconversion a
determinant of the risk of AIDS after
adjustment for updated CD4 cell counts?
Journal of
Acquired Immune Deficiency Syndrome
2001;
28: 158-165. |
|
01.031 |
Hickman M,
Seaman SR, De Angelis
D Estimating the relative incidence
of heroin use: application of a method for
adjusting observed reports of first visits
to specialized drug treatment agencies.
American
Journal of Epidemiology
2001;
153: 632-641. |
|
01.069 |
Nicoll A,
Hughes G, Donnelly M, Livingstone S,
De Angelis D,
Fenton K, Evans B, Gill ON, Catchpole M.
Assessing the impact of national anti-HIV
sexual health campaigns: trends in the
transmission of HIV and other sexually
transmitted infections in England.
Sexually
Transmitted Infections
2001;
77: 242-247. |
Publications - 2002
|
|
02.014 |
CASCADE
Collaboration, (participant:
De Angelis D).
Changes over calendar time in the risk of
specific first AIDS-defining events
following HIV seroconversion, adjusting for
competing risks.
International Journal of Epidemiology
2002;
31: 951-958. |
|
02.029 |
Gilks WR,
Audit B, De Angelis D,
Tsoka S, Ouzounis CA. Modeling the
percolation of annotation errors in a
database of protein sequences.
Bioinformatics
2002;
18: 1641-1649. |
|
02.061 |
McHenry A,
Evans BG, Sinka K, Shaheem Z, Macdonald N,
De Angelis D
Numbers of adults with diagnosed HIV
infection 1996-2005 - adjusted totals and
extrapolations for England, Wales and
Northern Ireland.
Communicable
Disease and Public Health
2002;
5: 97-100. |
Publications - 2003
|
|
03.017 |
CASCADE
Collaboration, De
Angelis D Impact of tuberculosis on
HIV disease progression in persons with
well-documented time of HIV seroconversion.
Journal of
Acquired Immune Deficiency Syndrome
2003;
33: 184-190. |
Publications - 2004
|
|
04.021 |
CASCADE
Collaboration, (participant:
De Angelis D).
Short-term risk of AIDS according to current
CD4 cell count and viral load in
antiretroviral drug-naive individuals and
those treated in the monotherapy area.
AIDS
2004;
18: 51-58. |
|
04.022 |
CASCADE
Collaboration, (participant:
De Angelis D).
Systemic non-Hodgkin lymphoma in individuals
with known datess of HIV seroconversion:
incidence and predictors.
AIDS
2004;
18: 673-681. |
|
04.032 |
De Angelis D,
Hickman M, Yang S.
Estimating long-term trends in the incidence
and prevalence of opiate use/injecting drug
use and the number of former users:
back-calculation methods and opiate overdose
deaths.
American
Journal of Epidemiology
2004;
160: 994-1004. |
|
04.044 |
Gilks WR,
Audit B, De Angelis D,
Tsoka S, Ouzounis CA. Percolation of
annotation errors through hierarchically
structured protein sequence databases.
Mathematical
Biosciences
2004. |
|
04.089 |
PLATO
Collaboration, De
Angelis D Predictors of trend in
CD4-positive T-cell count and mortality
among HIV-1-infected individuals with
virological failure to all three
antiretroviral-drug classes.
Lancet
2004; 364:
51-64. |
|
04.106 |
Sweeting MJ,
Sutton AJ, Lambert PC. What to add to
nothing? Use and avoidance of continuity
corrections in meta-analysis of sparse data.
Statistics
in Medicine
2004;
23: 1351-1375. |
Publications - 2005
|
|
05.004 |
Anderson
HR, Atkinson RW, Peacock JL,
Sweeting MJ,
Marston L. Publication bias in studies of
the short-term associations between ambient
particulate matter and health effects.
Epidemiology
2005;
16: 155-163. |
|
05.005 |
Bargagli
AM, Hickman M, Davoli M, Perucci C, Schifano
P, Buster M, Brugal T, Vicente J.
Drug-related mortality and its impact on
adult mortality in eight European countries,
DA/SB participants, for the COSMO European
Group 6.
European
Journal of Public Health
2005; published
online. |
|
05.032 |
Gilks WR,
Audit B, De Angelis D,
Tsoka S, Ouzounis CA. Percolation of
annotation errors through hierarchically
structured protein sequence databases.
Mathematical
Biosciences
2005; 193:
223-234. |
|
05.110 |
Sweeting MJ,
De Angelis D,
Aalen OO. Bayesian back-calculation using a
multi-state model with application to HIV.
Statistics
in Medicine
2005;
24: 3991-4007. |
Publications - 2006
|
|
06.022 |
Bongartz T,
Sutton AJ, Sweeting MJ,
Buchan I, Matteson EL, Montori V. Anti-TNF
Antibody Therapy in Rheumatoid Arthritis and
the risk of serious infections and
malignancies: a systematic review and
metaanalysis of rare harmful effects in
randomized controlled trials.
Journal of
the American Medical Association
2006; under
revision. |
|
06.031 |
De Angelis D,
Presanis A,
Yang S, Walker
S. Parametric models for the distribution of
the incubation time between HIV infection
and AIDS.
Journal of
the Royal Statistical Society
2006;
submitted. |
|
06.046 |
Grieve R,
Roberts J, Wright M,
Sweeting MJ, De
Angelis D, Rosenberg W, Bassendine M,
Main J, Thomas H. Cost-effectiveness of
interferon alpha or peginterferon alpha with
ribavirin for histologically mild chronic
hepatitis C.
Gut
2006; in press. |
|
06.073 |
McGarrigle
CA, Cliffe S, Copas AJ, Mercer CH,
De Angelis D,
Fenton KA, Evans BG, Johnson AM, Gill ON.
Estimating adult HIV prevalence in the UK in
2003. The Direct method of estimation.
Sexually
Transmitted Infections
2006;
82: 78-86. |
|
06.117 |
Sweeting MJ,
De Angelis D,
Neal KR, Ramsay ME, Wright M, Brant L,
Harris HE, the Trent HCV Group and the HCV
National Register Steering Group. Estimated
progression rates in three United Kingdom
hepatitis C cohorts differed according to
method of recruitment.
Journal of
Clinical Epidemiology
2006;
59: 144-152. |
|
06.118 |
Sweeting MJ,
De Angelis D,
Ramsay ME, Brant L, Harris HE. The burden of
hepatitis C in England and Wales.
British
Medical Journal
2006;
submitted. |
|