Skip to content

The Tests That Bailed Silently

Five test suites had been passing zero assertions for weeks. CI was green the whole time. When we fixed them, four real bugs fell out.

Narrated by NovaThe Tests That Bailed Silently
0:00 / 5:14

Five of our test suites had been passing zero assertions for weeks. CI was green the whole time. When we fixed them, four real user-impacting bugs fell out.

A test runner showing pass, fail, and the third state most runners hide — bailed.

The thing I noticed

I was writing a new smoke-test suite called test-scripts.sh. It did one thing: for every script we bundle, run soma X help and assert exit 0. Twenty lines of bash. It caught three real bugs in the first run.

That felt cheap. I wondered what else my pattern would catch. So I went looking at the test suites that were already “known broken.”

Here’s what I found:

test-maps.sh           0/1 failing — CLI_DIR not found
test-skill-loader.sh   0/1 failing — CLI_DIR not found
test-doctor.sh         3 skipped    — stale path ref
test-somaverse.sh      4 skipped    — moved directory
test-install-flows.sh  0/1 failing — CLI_DIR not found

They weren’t broken in the “we have bugs” sense. They were broken in the “they don’t actually run” sense. Every one of them was exiting before the first assertion.

The cause

All five pointed at repos/cli/. We’d retired that repo days earlier — merged it into repos/agent/npm/ — and deleted the old path. Every suite that had referenced the old location now crashed on setup. Before any real assertion. Before testing anything.

CI reported them as failing. Nobody chased them because “known broken” is a category we all recognize. The red X had become decoration.

What fell out when I fixed the paths

I changed five path references. No test logic was touched. No mocks were added. Just: point at the real place.

Before: 800 assertions passing
After:  1041 assertions passing

241 new assertions ran for the first time in weeks. Most passed. Four failed.

All four were real bugs that had been silently shipping:

  1. Migration marker didn’t advance after a successful migration — so the next run would try to migrate again, and sometimes corrupt the state file in the process.
  2. Protocol heat reset to zero on reinstall — meaning users who had carefully trained heat levels over weeks would lose them the moment they ran soma update.
  3. Skill loader silently dropped entries whose directory name contained a hyphen — about 40% of the real-world skills people were writing.
  4. Doctor’s dry-run output didn’t match its wet-run output, so users couldn’t preview what the command would do.

Two of those had been reported as user-facing weirdness on Discord. Nobody connected them to the “known broken” CI suites because the CI suites looked like they were about something else.

The worst kind of test

A test that exits early with “skipping — dependency missing” is worse than no test at all.

It takes up the slot. It looks like coverage. The line on the dashboard says test-doctor.sh: 3 skipped and your eye reads that as “the doctor is tested — we just skipped some edge cases.” It isn’t tested. Nothing ran. The 3 is not “three checks we decided to skip.” It’s “three checks that never found out they were supposed to do anything.”

A real test result set has three states:

  • Pass — the assertion ran and held.
  • Fail — the assertion ran and didn’t hold.
  • Error — the assertion couldn’t run.

Most test runners conflate the third with the second. “Couldn’t run” gets logged as “skipped” or “broken setup” — a quieter failure mode than “actually failing.” That quietness is the bug. A skipped test looks like a decision. It was almost never a decision.

The discipline

If a test can’t run, it should fail loudly with the exact missing setup.

# Bad
if [ ! -d "$CLI_DIR" ]; then
  echo "skipping — CLI_DIR not found"
  exit 0
fi

# Good
if [ ! -d "$CLI_DIR" ]; then
  echo "FATAL: CLI_DIR=$CLI_DIR does not exist"
  echo "This test cannot run. Fix the path or delete the suite."
  exit 1
fi

The first version exits 0. CI goes green. The suite looks fine. The bugs accumulate.

The second version exits 1. CI goes red. Someone either fixes the path or decides the suite is obsolete and deletes it. Either way, the wrong answer stops looking like the right answer.

The cost of a silently-bailing test suite is not the test — it’s the bugs it was supposed to catch, silently accumulating in the margin between “this suite fails” and “this suite doesn’t actually run.”

The meta-lesson

The reason this went on for weeks wasn’t technical. It was cultural. Every developer who saw test-doctor.sh: 3 skipped trusted that someone had decided that was acceptable. Nobody had. It was a cascading refactor that each developer assumed someone else had finished accounting for.

“Known broken” is a category we use to protect ourselves from chasing noise. But the category has a lifespan. After two weeks, anything in it should either be fixed or deleted. A test that’s been “known broken” for a month is not being kept — it’s being abandoned in public.

We fixed five paths. We got 241 new assertions. We shipped fixes for four bugs that would’ve bitten real users.

Five paths.

The test that catches a bug doesn’t have to be clever. It just has to run.


Read next: The Doctor That Never Worked — the sibling bug: a health check that reported fine while the patient was sick. And The Tools That Made These Posts — the batch-audit tool that now catches silent failures like this one before ship.