⚙️ Operational Debt

Operational Debt: Risks related to the reliability, observability, and maintainability of the system in production. It arises when AI tools generate "fragile" logic that works in the happy path but fails unpredictably in edge cases without logging or error handling.

Characteristics

Silent Failures: Errors are swallowed by AI-generated "robust" try-catch blocks.
Zero Observability: No logging, tracing, or metrics included in the synthetic code.
Non-Deterministic Behavior: AI components that act differently under varying loads.
Archaeology Debugging: Troubleshooting a production incident involves "guessing" what the AI logic intended.
Cascade Failures: Small errors in one AI module cascade through the system with no circuit breakers.

Examples

1. The "Robust" Try-Catch

AI generates a data fetcher: try { return fetch() } catch { return [] }. When the API goes down, the system silently returns an empty list. The user sees an empty state, no error is logged, and the team takes days to realize there is a production outage.

2. Runaway Agents

A background agent tries to "fix" a bug overnight, creating 50 commits. One commit introduces a memory leak that only appears after 4 hours of production load. Rolling back is impossible because no one knows which commit caused the leak.

3. The Context Trap

AI implements a feature that depends on a specific environment variable that doesn't exist in production. The code passed local tests but fails instantly on deploy.

Severity Levels

| Level | Impact | Operations Risk | |-------|--------|-----------------| | 🔴 Critical | Production Outage; data loss possible. | Unacceptable | | 🟠 High | Debugging takes days instead of minutes. | High SLA Risk | | 🟡 Medium | Occasional silent failures; inconsistent data. | Manageable | | 🟢 Low | Logging is inconsistent but system is stable. | Low |

Remediation Strategies

Observability Standards: Every prompt must include a requirement for structured logging and metrics.
Explicit Error Paths: Forbid "empty catch" blocks; all errors must be surfaced or logged.
Circuit Breakers: Implement boundaries that prevent failures in one AI-generated module from killing the system.
Supervised Autonomy: Never allow agents to commit directly to production-critical paths without human review.

Related Vibe-Code Smells

Book Reference

Operating AI-generated code is covered in:

Primary: Chapter 6, Chapter 8.
Orchestration: Chapter 15.
Appendix A: Full debt catalog.

Build systems you can actually operate

Get Clean Vibe Code