Formal Verification at Scale

Concept

Automating formal verification to enable its use outside of strictly the greatest downside or largest scale problems

Longer Description

Whereas software developers test their code by thinking up a set of edge cases, formal verification translates the code into a mathematical statement and then produces a mathematical proof that under all possible scenarios that the statement or property of interest holds true. It provides a guarantee that something’s universally true.

However, having an applied math PhD work alongside developers to analyze their code to come up with proof statements to test is extremely expensive, slow, and can only scale with the world’s few FV researchers. So, it’s largely used only in the highest downside scenarios and largest operating systems, like:

Certifying that airplane’s and nuclear plant’s software control systems are guaranteed to work as intended
Amazon also has one of the top teams in the space, using it for its payment and authorization systems that support its $700B in annual GMV
Semiconductor companies pay up front $100Ms to billions and many months to reach the first demonstrations of a tape read out. With that much up investment and after Intel’s famed tape readout failure that gave AMD (who used FV) a shot at catching up, FV is now an integral part of their pipeline and you effectively never see abject readout failures anymore.
Government and defense have used it to make their systems unhackable
Crypto protocols use FV to validate the safety of their code, consensus mechanism, tokenomics, etc. Surprisingly, it’s yet to be used heavily in traditional capital markets more broadly.

However, soon it could become far more broadly interesting for at least three reasons:

AI-generated proofs may make FV cheap and scalable. It could be that most enterprise production code or otherwise code of material importance is automatically formally verified in 10 years time. An interesting aspect of this thesis is that according to a community orchestrator we spoke with none of the FV researchers themselves are scaling-pilled, refusing to believe that AI systems will be capable of this work any time soon. So there should a longer runway without competition for a startup that finds researchers that are all in on disproving that disbelief. Theorem just launched from YC to pursue something in this area.
Proving qualities about AI models, primarily their safety. For instance, GPT7/8 might not be allowed to be released until it can be proven to be safe. It’s very unclear to me whether it is technically even possible to mathematically prove such a general property like safety about such a black box system, especially systems far larger and more advanced than those we have today. With that said, some researchers are pioneering the field.
Possibly the most commercially relevant is verifying autonomously translated computable contracts. A startup that gets the system described below working in a flywheel could automate large swaths of manual workflows in high-value industries like healthcare insurance.
autonomously translate existing contracts / law into computable contracts
use LLM-based formal verification to prove the system did it correctly automatically and at scale
use that end reward data for reinforcement learning to further hone the system’s translation capabilities

The absolute deterministic execution of both computable contracts and formal verification may be the only way to fully automate and deploy complex agentic systems at scale in such highly regulated, high downside industries. Otherwise each autonomous transaction may require manual checking. A startup may even have a shot at fundamentally disrupting their business as opposed to remaining an innovation supplier with these tools and by going after new markets like robot insurance.

Concept

Automating formal verification to enable its use outside of strictly the greatest downside or largest scale problems

Longer Description

Certifying that airplane’s and nuclear plant’s software control systems are guaranteed to work as intended
Amazon also has one of the top teams in the space, using it for its payment and authorization systems that support its $700B in annual GMV
Semiconductor companies pay up front $100Ms to billions and many months to reach the first demonstrations of a tape read out. With that much up investment and after Intel’s famed tape readout failure that gave AMD (who used FV) a shot at catching up, FV is now an integral part of their pipeline and you effectively never see abject readout failures anymore.
Government and defense have used it to make their systems unhackable
Crypto protocols use FV to validate the safety of their code, consensus mechanism, tokenomics, etc. Surprisingly, it’s yet to be used heavily in traditional capital markets more broadly.

However, soon it could become far more broadly interesting for at least three reasons:

AI-generated proofs may make FV cheap and scalable. It could be that most enterprise production code or otherwise code of material importance is automatically formally verified in 10 years time. An interesting aspect of this thesis is that according to a community orchestrator we spoke with none of the FV researchers themselves are scaling-pilled, refusing to believe that AI systems will be capable of this work any time soon. So there should a longer runway without competition for a startup that finds researchers that are all in on disproving that disbelief. Theorem just launched from YC to pursue something in this area.
Proving qualities about AI models, primarily their safety. For instance, GPT7/8 might not be allowed to be released until it can be proven to be safe. It’s very unclear to me whether it is technically even possible to mathematically prove such a general property like safety about such a black box system, especially systems far larger and more advanced than those we have today. With that said, some researchers are pioneering the field.
Possibly the most commercially relevant is verifying autonomously translated computable contracts. A startup that gets the system described below working in a flywheel could automate large swaths of manual workflows in high-value industries like healthcare insurance.
autonomously translate existing contracts / law into computable contracts
use LLM-based formal verification to prove the system did it correctly automatically and at scale
use that end reward data for reinforcement learning to further hone the system’s translation capabilities

Formal Verification at Scale

Concept

Longer Description

Comparable Companies

Related Reading

Related Theses

2nd Life Batteries for Storage

Marketplaces Requiring Private Intelligence

Distraction-Free Personal Device

Formal Verification at Scale

Concept

Longer Description

Comparable Companies

Related Reading

Related Theses

2nd Life Batteries for Storage

Marketplaces Requiring Private Intelligence

Distraction-Free Personal Device