The frontier of medical truth is not on social media.
It is not in podcasts. It is not in tweet threads. It is where two specialists with the same evidence and different conclusions would actually argue — if the time, the protection, and the audience existed. Almost none does.
01The problem
Guidelines are decisions, not arguments
Most contested medical questions — is LDL causal, is saturated fat bad, do SSRIs beat placebo — are decided by processes invisible to the people affected. Guidelines emerge from committee rooms. Meta-analyses make inclusion-criteria choices you cannot reproduce without the raw data, which you cannot get.
The smartest people have stopped saying it
Dissident specialists exist on every major question. Their best arguments are scattered across paywalls, podcasts, BMJ rapid responses, and footnotes in books. Audience capture pulls everyone toward simpler claims than the evidence supports. The interesting argument exists in nobody's writing.
02The proposal
Run the debate anyway.
Have frontier models play both sides in earnest. Have a separate model arbitrate, claim by claim. Track every concession. Publish the entire thing as a structured, contestable argument — not as an answer.
The output is not authoritative. It is the clearest available version of the actual disagreement. If you disagree, the format tells you precisely where — which sub-claim, which evidence, which verdict — and invites you to submit back in.
03Why frontier models
They steelman
Frontier models reliably produce the strongest version of a position they don't personally hold. Humans rarely do.
No career to protect
No grant to renew. No patients in front of them. No audience to please. The structural distortions of human expert opinion don't apply.
Inspectable
Every claim has a reference. When the model is wrong, you can show exactly where.
Adversarial across families
Different model families have different priors. Pairing them across roles reduces single-model RLHF gravity wells.
04How a debate is structured
Framing
The arbiter rewrites the question until both sides agree it's worth debating. Imprecise questions become precise.
Opening positions
Each side files independently — thesis, mechanism, strongest evidence. Neither sees the other until both are filed.
Rounds — one sub-claim each
A states, B counters, A rebuts. The arbiter names the precise edge, the concessions, and what would flip the verdict.
Tracked claims
Every claim, status, holder, and round-set. Claims persist across rounds; status can revise.
Snapshot
Top-of-page summary. Updates only when a round materially shifts the picture. Most rounds leave it unchanged.
What would flip this
Every debate ends with a list of specific results that would change the snapshot. If you don't know what would change a belief, it isn't an argument.
05Who plays which role
For the LDL debate:
- GPT-5.5 affirmative and negative — same model, separated by role
- Claude Opus 4.7 arbiter
Same-model-both-sides is the cleanest setup we can run today. It removes one obvious confound: that one side argues more persuasively because its model has different priors than the other. The arbiter is a different model family — cross-family arbitration is the property that keeps the format honest.
As more models become available (Gemini, open-weight reasoning models, future Claude versions), roles will rotate. Every debate publishes its role assignment so it can be audited for bias.
06Honest limitations
- Models have priors. RLHF shapes what they will argue and how strongly. The negative side is almost certainly less sharp than the strongest specialist argument that exists.
- The arbiter is one model. Its verdicts are arguments, not authority. Where it's wrong, the format makes that visible — which is the point.
- Topic selection is editorial. The choice of questions is itself a judgment. Suggest topics in the project repo.
- This is not medical advice. Individual clinical decisions are multi-factor and need a clinician who knows you.
07The long arc
The long-term goal: gradually displace human judgment in consensus-formation. Not because frontier models are smarter than the best specialists — they are not. Because their arguments are inspectable, their citations are accountable, and they do not have careers to protect or grants to renew.
Step one: make the format work. Step two: show it produces arguments serious people would not have produced on their own. Step three: run enough topics that specialists go to it to see the state of the disagreement — and eventually, submit into it.
We are at step one. The LDL topic is the proof.
Independent project · no pharmaceutical, professional society, or institutional funding · feedback at the project repo