THE MINDSET DIFF

9 phase shifts from story to proof.

A side-by-side comparison showing how the same task gets handled with old vs new thinking. Every shift closes an open loop.

See the Example View All Shifts

I've fixed the email duplicate bug by:

1. Adding email normalization in the signup handler
2. Adding email normalization in the login handler
3. Both now call toLowerCase() and trim()

This should prevent duplicate accounts from being created
with different email capitalizations.

The change looks correct and follows our patterns.
Let me know if you'd like any adjustments!

WHAT'S WRONG:

"should prevent" = unverified claim
"looks correct" = vibes, not proof
No test, no verification command, no evidence
If it breaks, we won't know until production

## Plan
Goal: Prevent duplicate accounts from email case variations
Non-goals: Not changing auth flow, not migrating existing data
Approach: Single normalizeEmail() at ingress boundary
Proof plan: npm run verify + new regression test

## Changes
- src/utils/email.ts: Added normalizeEmail(email: string)
- src/handlers/auth.ts: Call normalizeEmail at signup/login ingress
- tests/email.test.ts: Added case-sensitivity regression test

## Proof
$ npm run verify
✓ format (0.2s)
✓ lint (1.1s)
✓ typecheck (2.3s)
✓ unit tests (4.7s)
  ✓ email.test.ts
    ✓ normalizeEmail trims whitespace
    ✓ normalizeEmail lowercases
    ✓ Foo@Bar.com then foo@bar.com returns same user (REGRESSION)

## Risks & Mitigations
- Existing duplicate accounts: Not addressed (out of scope, needs data migration)
- Performance: normalizeEmail is O(n) on string length, negligible

WHAT'S DIFFERENT:

Every claim has executable evidence
Regression test proves the specific bug class is caught
npm run verify output is pasted, not described
Risks are explicit, not hidden

Human describes what they want

→

AI generates narrative

→

Human trusts the story

"Fix the bug" → "I fixed it" → "Great!"

No verification. No proof. Just vibes.

↓ PHASE SHIFT ↓

Human defines constraints

→

AI generates patch

→

Oracle judges

↑ feedback ←

"Fix the bug" → patch + test → verify: OK

Every claim terminates in executable evidence.

Division of Labor → Division of Verification

"Oracles, not org charts."

Team structure: 2 frontend, 3 backend, 1 QA

Quality comes from: multiple reviewers checking each other

AI role: "assistant" that generates code for humans to validate

Constraint structure: UI constraints, data constraints, reliability constraints

Quality comes from: oracles (tests, types, linters) judging every change

AI role: "agent" that generates AND proves, validated by machines

Documentation → Verification

"Proof over prose."

Success metric: "Is the code well-documented?"

Trust source: "The comments explain what it does"

Done when: "I described what I changed"

Success metric: "Does npm run verify pass?"

Trust source: "The test proves it works"

Done when: "Here's the command output showing it passes"

Planning → Control Theory

"Sensors before actuators."

Approach: "Let's plan this carefully upfront"

Risk management: "We thought through the edge cases"

Adaptation: "We'll adjust the plan if things change"

Approach: "What sensors tell us if we're wrong?"

Risk management: "What's the fastest way to falsify this?"

Adaptation: "The feedback loop auto-corrects; we just tune it"

One-Shot Fixes → Loop Discipline

"Propose → Patch → Prove → Pack."

Fix: "I changed the code"

Verify: "It looks right"

Done: "Moving on"

Propose: "Goal, non-goals, approach, proof plan"

Patch: "Small diff, focused change"

Prove: "format → lint → typecheck → test"

Pack: "Regression test + docs + changelog"

Bugs → Interface Gaps

"Interfaces are physics."

Diagnosis: "There's a bug in the email handler"

Fix: "I corrected the logic"

Prevention: "I'll be more careful next time"

Diagnosis: "The contract allows invalid state"

Fix: "I tightened the boundary constraint"

Prevention: "The type system now rejects this class of bug"

Implied Confidence → Explicit Uncertainty

"Certainty from tests, not vibes."

Response: "I've implemented the feature."

Ambiguity: Hidden in the confident tone

Risk: Discovered in production

Response: "I've implemented the feature. Uncertainty: auth edge case untested because X. Falsifier: added TODO test."

Ambiguity: Explicit, with mitigation

Risk: Visible before merge

Human QA → Oracle Specialists

"Specialists as tools, not people."

Quality gate: "QA team reviews the PR"

Feedback time: Days

Coverage: Whatever humans notice

Quality gate: "Type oracle + Style oracle + Security oracle + Test oracle"

Feedback time: Seconds

Coverage: Mechanized, consistent, exhaustive

Pre-Deploy Testing → Pre-Generation Truth

"Verify at the speed of thought."

When tests run: "Before we deploy"

Iteration speed: Slow (write → commit → CI → wait → fix)

Feedback distance: Large (bug created ↔ bug found)

When tests run: "Before we commit, ideally before we save"

Iteration speed: Fast (write → verify → fix → verify)

Feedback distance: Minimal (bug created = bug found)

Team Practices → Portable Invariants

"Tests, not tribal knowledge."

Knowledge location: "Our team does it this way"

Transferability: Low (tribal, context-dependent)

Drift: High (practices evolve silently)

Knowledge location: "The tests enforce it"

Transferability: High (clone repo, run verify, know truth)

Drift: Low (invariants are checked, not remembered)

If a new AI agent (or human) cloned this repo tomorrow with zero context, could they run one command and know if the codebase is healthy?

Could they make a change and know immediately if they broke something?

Could they trust the system without trusting any individual's narrative?

✓ If yes → you have proof-driven development.

✗ If no → you have story-driven development with automation sprinkled on top.

OLD THINKING	NEW THINKING	REMEMBER
Roles divide work	Constraints define truth	Oracles, not org charts
Document it well	Prove it runs	Proof over prose
Plan carefully	Control feedback loops	Sensors before actuators
Fix the bug	Close the loop	Propose → Patch → Prove → Pack
Code has bugs	Interface has gaps	Interfaces are physics
I think this works	Here's the proof	Certainty from tests
QA reviews code	Oracles judge diffs	Specialists as tools
Test before deploy	Truth at generation	Verify at speed of thought
Follow practices	Verify invariants	Tests, not tribal knowledge

The shift: Stop asking "Does this look right?"

Start asking "What command proves this works?"

Back to Main