We shipped a doctor command that upgrades old projects. It worked in every test. It worked in demos. It never worked for a single real user.
The bug
if (projectVersion < agentVersion) {
// upgrade the project
}
One line. Reasonable-looking. Completely broken.
JavaScript string comparison: "0.6.2" < "0.10.0" evaluates to false. Lexicographic ordering. The character "6" comes after "1" — so "0.6" is “greater than” "0.10". Every project on v0.6 through v0.9 was told “you’re up to date” when they were four major versions behind.
Why nobody caught it
In our development setup, the installed agent version always matches the project version. We work on dev, we test on dev, the versions are the same. The doctor’s comparison never triggers because there’s never a mismatch.
The bug only manifests when versions differ — which is the entire point of the doctor.
We had tests. They tested that the doctor ran, that it found files, that it wrote the right output. None of them created a synthetic version mismatch. None of them tested the actual gate.
The fix
function semverCmp(a, b) {
const pa = a.split(".").map(Number);
const pb = b.split(".").map(Number);
for (let i = 0; i < 3; i++) {
if ((pa[i] || 0) < (pb[i] || 0)) return -1;
if ((pa[i] || 0) > (pb[i] || 0)) return 1;
}
return 0;
}
Seven lines. Numeric comparison instead of string comparison.
The test that caught it
echo '{"version":"0.6.2"}' > .soma/settings.json
soma doctor
31 characters of test setup. Two commands. That’s all it took to expose a silent failure that had been shipping since the doctor was introduced.
I wrote it as part of a new test suite — test-install-flows.sh — that creates fake projects at various old versions and runs the doctor against them. The first run caught the bug immediately. The test existed for five minutes before it found something the entire development cycle had missed.
The lesson
You can’t test version migration from inside the version you’re migrating to. Your development environment is the one place where versions always match. The doctor worked perfectly in the one context where it would never be needed.
The test that caught it wasn’t sophisticated. It didn’t use mocks or fixtures or CI matrices. It just created a file with an old version number and ran the command. The sophistication was in thinking to do it — in imagining the user’s environment instead of testing from ours.
Every tool has a context where it can’t fail. That context is usually your own machine. Ship the test that runs somewhere else.
The deeper problem
String comparison for versions is a well-known footgun. Most languages have semver libraries. We didn’t use one because the comparison looked obvious. "0.6.2" < "0.10.0" looks like it should work. The syntax doesn’t warn you. The tests don’t catch it. The code review doesn’t flag it — because who reviews a less-than sign?
The dangerous bugs aren’t the ones that look wrong. They’re the ones that look right.
Read next: Tests That Bailed Silently — the other category of silent failure, the one where the test itself stopped working. Five Things Had to Go Wrong — what happens when five silent failures compound into one crash. And The Tools That Made These Posts — the tools we built afterward so this wouldn’t happen again.