Manifesto·v0.1 · 2026

The frontier of medical truth is not on social media.

It is not in podcasts. It is not in tweet threads. It is where two specialists with the same evidence and different conclusions would actually argue — if the time, the protection, and the audience existed. Almost none does.

01The problem

Guidelines are decisions, not arguments

Most contested medical questions — is LDL causal, is saturated fat bad, do SSRIs beat placebo — are decided by processes invisible to the people affected. Guidelines emerge from committee rooms. Meta-analyses make inclusion-criteria choices you cannot reproduce without the raw data, which you cannot get.

The smartest people have stopped saying it

Dissident specialists exist on every major question. Their best arguments are scattered across paywalls, podcasts, BMJ rapid responses, and footnotes in books. Audience capture pulls everyone toward simpler claims than the evidence supports. The interesting argument exists in nobody's writing.

02The proposal

Run the debate anyway.

Have frontier models play both sides in earnest. Have a separate model arbitrate, claim by claim. Track every concession. Publish the entire thing as a structured, contestable argument — not as an answer.

The output is not authoritative. It is the clearest available version of the actual disagreement. If you disagree, the format tells you precisely where — which sub-claim, which evidence, which verdict — and invites you to submit back in.

03Why frontier models

Reason 01

They steelman

Frontier models reliably produce the strongest version of a position they don't personally hold. Humans rarely do.

Reason 02

No career to protect

No grant to renew. No patients in front of them. No audience to please. The structural distortions of human expert opinion don't apply.

Reason 03

Inspectable

Every claim has a reference. When the model is wrong, you can show exactly where.

Reason 04

Adversarial across families

Different model families have different priors. Pairing them across roles reduces single-model RLHF gravity wells.

04How a debate is structured

Step 01

Framing

The arbiter rewrites the question until both sides agree it's worth debating. Imprecise questions become precise.

Step 02

Opening positions

Each side files independently — thesis, mechanism, strongest evidence. Neither sees the other until both are filed.

Step 03

Rounds — one sub-claim each

A states, B counters, A rebuts. The arbiter names the precise edge, the concessions, and what would flip the verdict.

Step 04

Tracked claims

Every claim, status, holder, and round-set. Claims persist across rounds; status can revise.

Step 05

Snapshot

Top-of-page summary. Updates only when a round materially shifts the picture. Most rounds leave it unchanged.

Step 06

What would flip this

Every debate ends with a list of specific results that would change the snapshot. If you don't know what would change a belief, it isn't an argument.

05Who plays which role

For the LDL debate:

GPT-5.5 affirmative and negative — same model, separated by role
Claude Opus 4.7 arbiter

Same-model-both-sides is the cleanest setup we can run today. It removes one obvious confound: that one side argues more persuasively because its model has different priors than the other. The arbiter is a different model family — cross-family arbitration is the property that keeps the format honest.

As more models become available (Gemini, open-weight reasoning models, future Claude versions), roles will rotate. Every debate publishes its role assignment so it can be audited for bias.

06Honest limitations

Known failure modes

Models have priors. RLHF shapes what they will argue and how strongly. The negative side is almost certainly less sharp than the strongest specialist argument that exists.
The arbiter is one model. Its verdicts are arguments, not authority. Where it's wrong, the format makes that visible — which is the point.
Topic selection is editorial. The choice of questions is itself a judgment. Suggest topics in the project repo.
This is not medical advice. Individual clinical decisions are multi-factor and need a clinician who knows you.

07The long arc

The long-term goal: gradually displace human judgment in consensus-formation. Not because frontier models are smarter than the best specialists — they are not. Because their arguments are inspectable, their citations are accountable, and they do not have careers to protect or grants to renew.

Step one: make the format work. Step two: show it produces arguments serious people would not have produced on their own. Step three: run enough topics that specialists go to it to see the state of the disagreement — and eventually, submit into it.

We are at step one. The LDL topic is the proof.

Independent project · no pharmaceutical, professional society, or institutional funding · feedback at the project repo