Formally Verifiable and Guaranteed Safe AI by Automated Reasoning

Taken from “Towards Guaranteed Safe AI”

Introduction

Guaranteed Safe AI1 relies on a safety specification, a world model, and a verifier producing a formal proof to check if the AI systems satisfy the safety specification based on the world model. Such formal verification is usually extremely hard to obtain, and recent progress in automated reasoning could potentially make this approach easier. However, existing auto-regressive LLMs cannot do genuine logical reasoning or self-verification on their own, and they should be viewed as universal approximate knowledge retrievers (e.g., trying to mimic the reasoning steps seen during training)2. Furthermore, there seems to be no free lunch for inference scaling3, which is believed to be able to boost LLMs’ reasoning capabilities, as the verifier (reliable training signals) is not perfect for most cases. Given the important role of formal verifiers4, I’m exploring ways of scaling up them.

A more detailed blog post is here.

My (Remotely) Related Works

As of 2024-12, to be honest, I don’t have works strongly related to what I’m working on.

Neural Architectures

Reinforcement Learning

Human-AI Alignment

Robustness

Meta Learning

Jie Fu
Jie Fu
Research Scientist

Focus on Deep Learning, Safe AI, Automated Reasoning, Reinforcement Learning.