Artificial intelligence predicts disease risks for more than a decade

A new model based on artificial intelligence (AI) can estimate the long-term individual risk for more than 1,000 diseases, according to researchers. It is a generative pre-trained transformer (GPT) and thus bears a resemblance to the large-scale language model behind ChatGPT. The research team called it Delphi-2M. They trained the model using 400,000 patient records from a large British database (UK Biobank) and were able to apply it to nearly two million Danish patient records with only a slight loss of accuracy.

The study, conducted by a group led by Moritz Gerstung of the German Cancer Research Center (DKFZ) and Ewan Birney and Tom Fitzgerald of the European Molecular Biology Laboratory (EMBL) in Hinxton, UK, was published in the journal Nature . "Our AI model is a proof of concept, demonstrating that it is possible to identify many long-term health patterns and use this information to generate meaningful predictions," Birney is quoted as saying in a DKFZ press release .

The model has a resolution of individual patients. It is therefore, in principle, possible to reconstruct individual medical histories and derive prognoses for the further development of disease risks and progression. At the same time, the model can predict the health development of larger population groups and thus provide clues as to how healthcare can be improved.

Model was tested on 1.93 million data sets of Danish patients

"Just as large language models can learn the grammar of our language from the sequence of words in texts, this AI model learns the logic of the temporal sequence of events in health data to model entire medical histories," Gerstung explained, according to the DKFZ press release. The learned patterns enable the AI model to calculate the probability of disease risks at the current point in time and for more than a decade into the future. In addition to disease diagnoses based on the International Classification of Diseases (ICD-10) , other characteristics such as age, gender, body mass index, smoking habits, and alcohol consumption are included in the probability calculation.

After training Delphi-2M on the 400,000 records from the UK Biobank, it was tested on another 100,000 records from the same database. The researchers then applied the model to 1.93 million records from the Danish National Patient Register between 1978 and 2018 without prior adjustments. The researchers were able to show that the probabilities calculated by the model indeed occurred with the expected frequency.

“The fact that Delphi-2M can be applied to Danish population data with slightly reduced accuracy suggests that many patterns learned by the model accurately reflect the actual development of multiple disease rates,” write the study authors.

No certainty, but assessment of potential risks

For Fabian Theis, director of the Institute for Computational Biology at the Helmholtz Center Munich, the transfer to a cohort from another country is a breakthrough. This demonstrates the robustness of the model. "There have been some medical models with good results, but they usually only worked in one hospital and no longer worked in the next," said Theis, who was not involved in the study. Ewan Birney also said during a press conference that the positive result with the Danish data has greatly strengthened the scientists' confidence in their model.

In a graphic, the study authors show how several diseases affecting the pancreas, liver, and bile ducts, as well as diabetes mellitus and digestive disorders, increase the risk of pancreatic cancer by 19 times. According to the researchers, Delphi-2M is particularly suitable for diseases with clear progression patterns, such as certain types of cancer or heart attacks. However, it is less reliable for infections or mental illnesses that depend on unforeseeable life events. "Crucially, this is not a certainty, but an assessment of the potential risks," said Tom Fitzgerald.

Thanks to the large amount of training data, the AI model can detect signs of diseases that are not usually revealed during medical examinations. "By modeling how diseases develop over time, we can investigate when certain risks arise and how early interventions can best be planned," explained Birney. This is a major step toward personalized and more preventative approaches to healthcare. However, there is likely still a long way to go before Delphi-2M or a successor version can be used in everyday clinical practice, due to patient and data protection concerns. Gerstung estimates that it will take five to ten years.

The right not to know should continue to be protected

Should the AI model be used on individual patients, it "should only be a supplementary component and must always be accompanied by medical judgment," said Markus Herrmann of the Institute for Medical and Data Ethics at Heidelberg University. Patients must be informed about the use of the technology and its significance, and the results must be discussed in detail between doctor and patient.

In order not to restrict patients' freedom of choice, medical ethicist Robert Ranisch from the University of Potsdam advocates the option of waiving the right to information: "That's why a right not to know remains crucial." But Ranisch also sees the potential of the AI model when applied to larger population groups: "It can be used in the spirit of fair, proportional prevention to identify gaps in care for disadvantaged groups." For Carsten Marr from the Helmholtz Center Munich, the most exciting thing is finding connections between previously unknown diseases. "There is a study that showed that an Epstein-Barr virus infection leads to a 30-fold increased risk of multiple sclerosis . That's what we're looking for," said Marr.

Treatment could be tested on a digital twin

An important aspect of further developing the AI model will be to consider potential biases. For example, only datasets from patients aged 40 to 70 were used for AI training; other age groups were therefore not represented. Over- or underestimation could also affect groups that differ in origin and social status. "A model that predicts hundreds of diseases at once consolidates opportunities, but also increases the risk of bias," warned Ranisch.

But the study authors are optimistic: "This is the beginning of a new way of understanding human health and the progression of diseases," predicted Gerstung. Fabian Theis envisions that one day there will be a digital twin of a patient, fed by health and lifestyle data. He said: "This would then allow us to see, for example, how the virtual patient reacts to a change in medication without having to test it on a real patient." (dpa/fwt)