
Full-stack business required when new data modalities come online
Unlocking the new bio data sources with full-stack diagnostics and model building
The next generation of life science giants will not be hardware vendors, but full-stack data and insight companies. Historically, bio hardware manufacturers (selling sequencers and reagents) have struggled to capture value compared to the diagnostic companies that utilize that hardware to generate patient insights. This is evidenced by the disparity in value accrual between hardware pure-plays (e.g., PacBio, Oxford Nanopore) and diagnostic/insight platforms (e.g., Natera, Tempus, BillionToOne, Caris, Adaptive).
Most bio-AI companies train on the same public or weakly differentiated databases (protein structures, DNA sequences, transcriptomes), leading to converging model performance. To break this ceiling, frontier ML companies require a change in data inputs. We’re specifically interested in the unlocking of proteomics and metabolomics. These modalities are functionally more relevant to bodily function in health and disease but are harder to collect at scale.
However, with new, scaled data types coming online in the next 10 years, we believe it’s time to make new businesses for proteomics and metabolomics data than those we’ve had for genomics. Specifically unlocking new data should be productized and used to make net new insights.
