Neural Net Potentials

Concept

Simulating biology & materials science at the atomic scale using AI that inputs atomic coordinates and outputs the energy surface necessary answer relevant questions

Longer Description

Many methods in computational chemistry have been developed over the decades for simulating physical processing, especially in biology, but come with the tradeoff of speed vs accuracy. Fast methods that model only classical physics aren’t transferable to other systems and often cannot simulate complex phenomena (e.g. protein folding, covalent bond-breaking in drug binding or heterogenous battery chemistries in materials science) while the methods that directly calculate the breadth of inter-atomic effects necessary to predict such phenomena are excruciatingly slow. Neural net potentials embed only the crucial effects in their architecture explicitly and learn the remaining patterns implicitly from the training data so they never do the full calculations from scratch. They’re thus poised to dramatically expand the Pareto frontier, enabling highly accurate simulation of arbitrarily complex phenomena at speeds, system scale, and computational efficiency relevant for industrial use.

Key aspects of this system include:

Coarse graining or even continuum solvent model, hybrid NNP / MM, hybrid diffusion / NPP, adaptive partitioning: modeling only the crucial parts of the system with full accuracy and abstract away everything else, making the models many times faster
Active Learning: a dataset generation method that automatically generates more of the data that the model struggles with. As parts of the potential are trained, the system estimates the uncertainty of its estimates. If too high, it automatically runs MD simulations of that molecule, generating the necessary data.
Message passing: enabling the atoms to transmit information between them to update each other
Modeling long-range interactions without sacrificing GPU parallelization: certain long-distance (>10Å) interactions like electrostatics can be crucial for complex phenomena but currently forbid the GPU parallelization necessary to scale to the necessary large clusters

The potential applications are limited only by the ultimate speed and accuracy breakthroughs enable.

Other Thoughts

An especially exciting system could automatically generate hypotheses, explore the potential energy surface to validate those hypotheses, updating its hypothesis and plans along the way.
The biggest models are at ~GPT1 size. They have shown consistent scaling laws in terms of accuracy and are showing exciting signs of generality (e.g. generalizing from small systems to large, across the periodic table, across phases of matter, types of structures from crystalline to amorphous). It’s unclear that directly comparing with NLP parameter count is apt, but regardless it’ll be tremendously exciting tracking their properties as they scale.
My assumption as to how the future plays out is that a massive foundational model will be trained akin to the LLM space and then static, coarse grained models will be automatically generated for a given application.
Better AI models (AlphaFold, ESM, etc.) are good for high throughput ideation but are fundamentally incapable of ever modeling the complex phenomena that NNPs excel at. These two technologies will be complementary in the computational chemistry stack: AI as digitalized high throughput assays and NNPs as digitalized experimentation and optimization.

Comparable Companies

It’s unclear to what degree a given computational chemistry company relies on NNPs currently but here are some such companies: Acellera, Angstrom, Relay, D.E. Shaw Research, Schrodinger, Qubit Pharma, Menten.AI, ProteinQure, Radical AI. Radical’s doing closed-loop autonomous system.

Concept

Simulating biology & materials science at the atomic scale using AI that inputs atomic coordinates and outputs the energy surface necessary answer relevant questions

Longer Description

Key aspects of this system include:

Coarse graining or even continuum solvent model, hybrid NNP / MM, hybrid diffusion / NPP, adaptive partitioning: modeling only the crucial parts of the system with full accuracy and abstract away everything else, making the models many times faster
Active Learning: a dataset generation method that automatically generates more of the data that the model struggles with. As parts of the potential are trained, the system estimates the uncertainty of its estimates. If too high, it automatically runs MD simulations of that molecule, generating the necessary data.
Message passing: enabling the atoms to transmit information between them to update each other
Modeling long-range interactions without sacrificing GPU parallelization: certain long-distance (>10Å) interactions like electrostatics can be crucial for complex phenomena but currently forbid the GPU parallelization necessary to scale to the necessary large clusters

The potential applications are limited only by the ultimate speed and accuracy breakthroughs enable.

Other Thoughts

An especially exciting system could automatically generate hypotheses, explore the potential energy surface to validate those hypotheses, updating its hypothesis and plans along the way.
The biggest models are at ~GPT1 size. They have shown consistent scaling laws in terms of accuracy and are showing exciting signs of generality (e.g. generalizing from small systems to large, across the periodic table, across phases of matter, types of structures from crystalline to amorphous). It’s unclear that directly comparing with NLP parameter count is apt, but regardless it’ll be tremendously exciting tracking their properties as they scale.
My assumption as to how the future plays out is that a massive foundational model will be trained akin to the LLM space and then static, coarse grained models will be automatically generated for a given application.
Better AI models (AlphaFold, ESM, etc.) are good for high throughput ideation but are fundamentally incapable of ever modeling the complex phenomena that NNPs excel at. These two technologies will be complementary in the computational chemistry stack: AI as digitalized high throughput assays and NNPs as digitalized experimentation and optimization.

Comparable Companies

It’s unclear to what degree a given computational chemistry company relies on NNPs currently but here are some such companies: Acellera, Angstrom, Relay, D.E. Shaw Research, Schrodinger, Qubit Pharma, Menten.AI, ProteinQure, Radical AI. Radical’s doing closed-loop autonomous system.

Neural Net Potentials

Concept

Longer Description

Other Thoughts

Comparable Companies

Related Reading

Related Theses

Marketplaces Requiring Private Intelligence

New bio data + AI businesses

Ozempic for Sleep

Neural Net Potentials

Concept

Longer Description

Other Thoughts

Comparable Companies

Related Reading

Related Theses

Marketplaces Requiring Private Intelligence

New bio data + AI businesses

Ozempic for Sleep