Technical data

How advances in AI are enabling medical research without sharing data


  • Controversial plans in the UK to collect healthcare data have sparked public outcry and forced the NHS to suspend plans.
  • This incident highlighted how vitally important issues of confidentiality and consent are in medical research.
  • Revolutionary advances in machine learning that don’t require data sharing are poised to revolutionize healthcare.

The UK’s National Health Service (NHS) recently came under scrutiny when it announced plans to collect and share GP records of over 55 million patients with third parties at for research purposes. It has been argued that virtually every sensitive detail of patients’ lives can be found in these records, including accounts of past abortions, marital problems, and substance abuse.

Although officials have said the NHS will pseudonymize the data, it is possible to identify patients regardless, prompting GPs to withhold their records and millions to opt out. The plan was due to be launched in September, but public anger forced the NHS to suspend its data entry.

The motivation to pool medical data is clear: it saves lives. Artificial intelligence (AI) has the potential to transform our understanding of biology and to surpass humans in diagnosis and selection of treatments. Since AI improves when it receives more data, the NHS dataset could have a significant impact on medical research and decision-making.

But centralizing the medical records of millions of people is incompatible with patient privacy and therefore unethical. Not only would the NHS dataset be vulnerable to hacks and breaches, it could also lead to data misuse by its partners.

Unfortunately, the public debate centers on either protecting patient privacy or improving healthcare. It is a false dichotomy.

Ethical use of data for medical research

The new approaches offer the same benefits as data pooling, but do not rely on the sharing of patient records or valuable corporate AI models. This can be achieved by carefully separating what each partner sees while improving prediction results.

It is now possible to design algorithms that reinforce each other in their collective analyzes without exchanging data. This is accomplished by calculating and sharing technical characteristics intended to preserve patient privacy and the intellectual property of the underlying data and models.

In other words, these algorithms talk to each other without actually sharing sensitive information and then share their common ideas with us.

So the solution is clear: By sharing information rather than data, citizens can protect their privacy while businesses and researchers advance medicine.

By sharing information rather than data, citizens can protect their privacy as businesses and researchers advance medicine.

– Peter Peumans, Wilfried Verachtert and Roel Wuyts.

The speed with which effective measures have been prescribed and vaccines developed in response to the COVID-19 pandemic is the result of a global commitment to easily share available data. The need to share data is only magnified by the growing reliance on AI in medicine, but potentially groundbreaking information often remains inaccessible to the international research community. This is because data is stored in the individual silos of GPs, insurers, labs, hospitals, and pharmaceutical companies, often at too small a volume for AI to draw meaningful conclusions.

Data sharing is not only controversial; due to privacy laws, this is often not possible. Additionally, data storage and management is expensive, and a large dataset or sophisticated AI model can be extremely valuable. So why would a company or a researcher share them with their competition?

Lawmakers can overcome these obstacles by implementing a system that allows parties to generate information about data silos. This can have a transformative impact on medical research while preserving patient trust and protecting the intellectual property of partners.

The world needs an independent data regulator

We believe the time has come to create an independent, neutral and transparent agency that can act as a data ombudsman. An ethical watchdog that oversees technical standards and connects data silos. When enough parties join, each organization will have access to increasingly powerful insights into the most appropriate diagnosis or treatment.

Connecting data silos will dramatically accelerate medical advancements. For example, by studying the human genome on an unprecedented scale, scientists can unlock new data to find out who is at risk for developing cancer or getting sick from infectious diseases such as COVID-19. Not only does this provide additional information on who to screen for and protect preventively, it can also lead to new drug targets and therapies.

AI will also be crucial to counter wasteful spending. In the United States, for example, the medical costs of cancer are estimated at more than $ 208 billion per year. This should come as no surprise: immunotherapies often exceed $ 100,000 per patient and multiply once ancillary services are taken into account. Although immunotherapy can be very effective, it does not work in a considerable number of cases and can cause harmful side effects. With enough data, doctors will be able to identify optimal treatments in advance, increasing patient outcomes and quality of life while avoiding unnecessary medical interventions and costs.

But more importantly, feeding data into AI models improves their ability to save lives. However, patients are right to fear that their medical records – which contain sensitive information about their physical, mental and reproductive health – could be hacked, hacked, or misused once centralized or shared. Lawmakers need to understand that we only reap the full health benefits of AI when we connect data without harming privacy or business interests. Until then, data-sharing initiatives will be controversial.


Leave a Reply

Your email address will not be published.