Data sources and information governance
Our research population uses the National Bridges to Health segmentation dataset, developed and maintained since 2019 to support healthcare prioritization, planning and service evaluation in the NHS in England.27,41,42,43. This dataset includes individuals registered with GPs in England since 2014 and consists of 60,004,883 individuals. The segmentation dataset is derived from over 15 years of longitudinal data from numerous national, primarily secondary care, patient-level datasets in the National Commissioning Data Repository (NCDR).42each linked by a pseudonymized NHS number.
Data will be collected and used in line with NHS England's purposes as appropriate and in accordance with the statutory obligations outlined in the NHS Act 2006 and section 254 of the Health and Social Care Act 2012. Data will be processed using best practice methodologies, underpinned by NHS data processing agreements. England and Outcomes Based Healthcare (OBH) produce segmentation datasets on behalf of NHS England. This ensures controlled access by appropriate individuals to consent-free anonymised/pseudonymised data held in a secure data environment within NHS England's infrastructure. Data is only processed for specific purposes, such as operational functions, service evaluation and service improvement. Ethics committee approval was not required as the current study supports these objectives. If OBH processes your data, this is agreed and detailed in the data processing agreement.
The analysis is based on 46,748,714 adults aged 20 and older who were alive as of March 31, 2019. To avoid distortions due to coronavirus, we limited the data to the NHS financial year ending 2020 (i.e. 1 April 2019 to 31 March 2020). -19 pandemic. This dataset includes socio-demographic data (e.g. age, sex (not gender), ethnicity, socio-economic deprivation), geographic data (e.g. registered general practitioners, mapped administrative NHS organizations and locations). ), and clinical diagnostic data, which are primarily derived from: Encrypted hospital records. Our analysis considered 35 long-term conditions, and the selection process was described above.44 Based on a recent Delphi study with good agreement.26. Inclusion of these >35 conditions generates additional MLTC phenotypes with very low prevalence and low priority for public health intervention. Additionally, the computational intensity involved in modeling the years spent and lost by combinations of conditions required a priori prioritization of conditions. The 35 symptoms are derived using data definitions based on logic and clinical codes (e.g., International Classification of Diseases (ICD)-10 diagnosis codes, Office of the Census Bureau of Census (OPCS) procedure codes, SNOMED CT codes, etc.), and are based on a wide range of clinical Each symptom after review and evaluationtwenty four (Supplementary Table 2).
The complete list of source datasets used to derive the segmentation datasets, including the time the data were accumulated over time, is provided in Supplementary Table 1. National Diabetes Audit SNOMED codes and other condition definitions are available in our online technical documentation.24,45. Antecedent validation studies showed good agreement for most conditions with established prevalence benchmarks, including the England GPs' pay-for-performance scheme, called the 'Quality and Outcomes Framework'.twenty four.
statistical analysis
We calculated the point prevalence of all dual combinations of diabetes and other comorbidities using the March 2020 adult population as the denominator. We also calculated the observed prevalence minus the expected prevalence. Here, the observed prevalence is the actual prevalence of diabetes for each symptom combined, and the expected prevalence is the prevalence of diabetes in the general population and each symptom, regardless of diabetes status. It is the product of symptom prevalence. Therefore, expected prevalence refers to the prevalence of each combination that would be expected by chance, with no etiological association between the two conditions. We also calculated the number of comorbidities according to age and diabetes status.
To estimate the years spent and lost associated with diabetes-related MLTC types, we constructed a standard three-state disease and mortality Markov model.46,47. The disease-death model (also known as the quasi-competing risks model) is widely used to model time-to-event data and consists of three possible states: health, disease, and death. This model allows for three possible transitions: health to disease, health to death, or disease to death. In this case, remission (from illness to health) is not recognized. A disease state is defined as the presence of a targeted MLTC state pair, independent of the presence or absence of other states. The annual probability of interstate migration is assumed to be age-dependent, and the health status (long-term status and mortality status) of all individuals from April 2019 to March 2020, as observed in the dataset. The percentage is estimated by observing the amount every month. More precisely, the number of occurrences ni, j, a of an individual of a certain age be moving out of state I I will state j The probabilities are aggregated and prorated as follows: Pi, j, a Age of individual moving out of state I I will state j is given by \({P}_{i,\;j,a}=\frac{{n}_{i,\;j,a}}{\sum _{k\in S}{n}_{i, k,a}}\)where S is the set of possible final states. If state transition data at a particular age is not available, individuals are assumed to remain in the same state as time increases by one year. Transition data is measured monthly and the model requires yearly data, so the monthly transition matrix constructed first Tmeter It is raised to the power of 12 by matrix multiplication and converted to an annual transition matrix. Ty According to the following equation:
$${{{T}}}_{{{y}}}={{{{T}}}_{{{m}}}}^{12}$$
Depending on the combination of conditions, the prevalence may be rare and there may be insufficient data available to perform calculations. To provide a sufficient distribution of ages entering and exiting disease segments, analyzes were restricted to combinations of conditions for which at least 1,000 observations were recorded for each transition type in the model. The model was limited to years 0 to 100. Extending beyond this age has a negligible effect on the model's output since the majority of people are dead at this age.
Of the 35 long-term symptoms, frailty was initially excluded from the analysis because it was not compatible with the form of the Markov model, although remission was present in the data model. There were insufficient transition observations for sickle cell disease, cystic fibrosis, autism, sarcoidosis, and multiple sclerosis (as comorbidity pairs with diabetes) to be included in the analysis.
This model was used to calculate five key indicators: lifetime risk of MLTC, median age of onset, and years of survival (YLW) MLTC, age at death and years of life lost (YLL) associated with MLTC. lifetime risk, Lris the probability that an individual at birth will have a disease condition at some point in his or her lifetime. This can be calculated by considering the proportion of the initial population that transitions from a healthy state to a diseased state at a given age. be, \({P}_{{\rm{health}}\from {\rm{illness}}}\left(a\right)\). This can be calculated by multiplying by the percentage of the population in good health at age. be, Phealth (be) by PI= health,j = disease,the probability that a healthy person will enter a state of disease soon after age \(a.\)
$${P}_{{\rm{Health}}\to {\rm{Disease}}(a)={P}_{{\rm{Health}}}(a)\times {P}_ { i\,=\,{\mathrm{health}},\,j\,=\,{\mathrm{illness}},\,a}$$
Summing this over all ages in the model yields the total probability of transitioning to a disease state over a lifetime, or lifetime risk. Lr.
$${L}_{r}=\mathop{\sum }\limits_{a\,=\,0}^{100}{{{P}}}_{{\rm{Health}}\to { \rm{illness}}}(a)$$
years lost in life (at a certain age) \({a}\)), \({Y}_{{\mathrm{LL}}}\left(a\right),\) is the difference between the survival function of a person in a disease state and the survival function of a two-state live-death Markov model of otherwise identical form. average measure of years of life lost YLL Experience with the disease state of interest is calculated by summing all ages. \({Y}_{{\mathrm{LL}}}\left(a\right)\)weighted by the proportion of people entering the disease state at that age.
$${Y}_{{\mathrm{LL}}}=\mathop{\sum }\limits_{a=0}^{100}\left(\,\frac{{{{P}}}_{ {\rm{Health}}\to {\rm{Disease}}}(a)}{{L}_{r}}\times {Y}_{{\mathrm{LL}}}\left(a\ right)\,\right)$$
Number of years lived with the disease (considering age) \(a\)), YLW (be) is characterized by the survival function of the population that enters the disease state at that age. Again, on an average scale, YLWcalculated using a weighted sum.
$${Y}_{{\mathrm{LW}}}=\mathop{\sum }\limits_{a=0}^{100}\left(\,\frac{{{{P}}}_{ {\rm{Health}}\to {\rm{Disease}}}(a)}{{L}_{r}}\times {Y}_{{\mathrm{LW}}}\left(a\ right)\,\right)$$
The median age of disease state onset is extracted from the model by interpolating the age at which half of the total number of individuals transitioning to the disease state. \({a}_{{\mathrm{median}}}\) is an integer value of αit minimizes \(\left|\frac{{\sum }_{a=\propto }^{100}{P}_{{\mathrm{health}}\to {\mathrm{disease}}}\left(a\ right) {{L}_{r}}-0.5\right|\)
These individual-level indicators require that an individual suffer from a “disease” at some point in their life. For the population as a whole, the community index is defined as the sum of years of life lost across 1,000 individuals, not all of whom become 'sick'. The community index can be calculated by multiplying the average index above by the lifetime risk of the condition and scaling to 1,000 people.
$${C}_{{\mathrm{YLL}}}=1,000\times {Y}_{{\mathrm{LL}}}\times {L}_{r}$$
$${C}_{{\mathrm{YLW}}}=1,000\times {Y}_{{\mathrm{LW}}}\times {L}_{r}$$
Report overview
For more information on the study design, please see the Nature Portfolio Reporting Summary linked in this article.