Neoantigen identification: a primer

Inevitably, it's complicated

Mar 21, 2025

red round fruits on white and blue surface — Photo by National Cancer Institute on Unsplash

Based on some consulting work I recently completed and some talks I gave.

Introduction

The key to cancer is neoantigens — novel malformed proteins generated by tumor mutations. Being absent in normal cells, neoantigens present a specific and promising target for cancer therapies: make something that hits a neoantigen and you can precisely direct therapies at the cancer and avoid the collateral damage of off-target effects.

Biomarkers: being able to classify patients based on their antigens and therefore select effective therapies
Cancer Vaccines: using the neoantigen to effectively train the immune system to recognise and attack the tumours. This includes the promise of making individualised vaccines, created by analysis
Enhancing other neonantigen-targetting therapies: like T cell therapy or checkpoint inhibitors.

Despite this promise, identifying effective neoantigens targets remains difficult. The overwhelming majority of any “identifications” are false positives. This is a huge problem.

Challenges

The singular throughline in finding actionable neoantigens is that biology is absurdly complicated, there’s a long chain of causality from genomic mutations through protein expression to an immune response, and attempts to solve for any part of the system can easily be thwarted by issues elsewhere. As I am fond of quoting:

Immunology is where intuition goes to die
(Ed Yong)

So let’s walk down that problematic chain:

Sequencing

How do you find a neo-antigen in a tumor sample? You sequence it and look for divergent coding regions that could give rise to an abnormal protein. However, the usual sequencing methods can miss or misinterpret putative neoantigens, especially those in complex or repetitive DNA regions (GC-rich, etc.)
Just because it exists in the genome, doesn’t mean it is expressed as a protein on the cell surface.
And, because of tumour heterogeneity, we might be sampling the “wrong part” of a tumour that isn’t immunologically accessible or only represents a subpopulation of the total tumour and is thus not a useful target.

Immunology

Just because a protein is expressed, doesn’t mean that the immune system will process it usefully. A neoantigen has to be bound by immune system proteins (the MHC) for display on the cell surface. Predicting which mutated peptides will be bound effectively is still inexact.
And, even if it is bound by the MHC, this doesn’t mean it will provoke an immune response. (It’s this step that sinks a lot of neoantigen identification. Binding is measurable and can be done computationally, but the link to immunogenicity just takes us into the dark.)
Variation between patients, especially on the immune system / MHC level, can lead to a highly variable response due to variable binding - the same set of tumour mutations might be immunogenic in one patient and not another.
Tumors can, of course, evolve or produce molecules that suppress the immune response locally.

Validation

Ensuring that a putative neoantigen actually exists, is expressed, and can raise an immune response in the real world is obviously the gold standard. This leads us to:

Directly identifying neoantigenic peptides presented by MHC molecules through mass spec would be ideal. However, many neoantigens are expressed at a low level and are difficult to detect in standard MS workflows, when swamped by so many abundant proteins. Sample prep and even the vagaries of protein processing can mean that it’s difficult to know what fragment of a protein you’re seeing. At the best of times, interpreting mass spec results is difficult and fraught with potential biases. Here, you have all the problems in one place.
Why not validate biologically with in vitro binding assays or T-cell activation assays? Absolutely. But it’s a wet lab experiment which means that it’s slow, fiddly and subject to noise variation.

Operational challenges

It’s one thing to be able to identify a neo-antigen. It’s another thing to do that at scale, reliably and consistently. This, I think, might be the greatest barrier.

At a very fundamental level, it’s not clear yet what the best practices are in this area. There’s a real lack of comparative studies or even mechanisms that allow comparison. Much of the work in the field exists in their own isolated kingdoms, with methods and data difficult to share.
Can you get consistent tumour samples from patients for sequencing? Biopsies tend to be very variable, packaging a problem, and there is little infrastructure for industrialising this.
How do you get a sample from the patient to the sequencing facility fast enough? Actually, how do you do any of this fast enough?
How do you execute the sequencing - computational analysis - immunological analysis at scale? Validation (massSpec and T-cell activation assays) will always be very slow.
What’s the regulatory framework for a therapy individualised on a set of neoantigens? Every “dose” could be different from every other dose.
Fundamentally, this is a different model to the “conveyor belt” one that pharma is used to operating in. It needs a tight loop from the patient through the healthcare provider to the manufacturer and back. That really doesn’t exist at the moment.

Solutions?

Long-read sequencing can identify complex mutations (insertions/deletions, fusion genes, and large structural variants) and alternative splicing events that traditional sequencing methods miss. They can also deal better with difficult-to-sequence regions.
LRS can also be used to more accurately sequence and type the immune (HLA) type of a patient, allowing that to be incorporated into the pipeline. In my eye, this is one of the big ways forward - methods that account for patient variation outperform those that don’t.
Multiomics: using not just genomics but transcriptomics and proteomics will provide multiple layers of evidence, increasing confidence that any putative neoantigen is actually real.

More AI: (perhaps inevitably) improved AIML would help us better predict neoantigen binding and immunogenicity. This, of course, relies upon getting more and better data, which is a challenge in itself.

Conclusion

Neoantigen identification is critical to the efficacious development and use of many emerging therapies. However, it is definitely in its “wild west” phase: no one is completely sure what the way forward is, everyone is going their own way, building their own tools and setting their own goals. The field is ripe for someone to step in, start setting standards, make benchmarks and shake the field up.

Make More Machines

Discussion about this post