Don’t be a hero
How firefighting engineers hide system failures and keep organizations fragile
Every organization has a hero.
The engineer who knows how to restart the stuck pipeline. The one who gets pinged when the dashboard looks wrong. The person who quietly fixes things before anyone notices there was a problem.
They are praised for being reliable, trusted for “always being there,” and celebrated for keeping the system running when it shouldn’t.
But that’s precisely the problem.
Having heroes in your organization is a clear sign that systems are fragile. And this fragility is being absorbed by a few individuals willing to firefight, improvise, and sacrifice their time to prevent failure from becoming visible.
From the outside, everything looks stable. SLAs are met. Incidents are rare. Delivery continues. From the inside, the system is held together by undocumented workarounds, manual interventions, and people who know where the bodies are buried.
And if you are a hero, as I used to be, you may feel a sense of ownership, but you are masking structural failure. Because as long as you are willing to save the system, the system has no reason to improve. You are a hero, yes, but you are in prison.
The comfort of heroics
Heroism feels good because it is rewarded. When something breaks, and someone steps in to fix it quickly, everyone wins in the short term.
The incident is avoided, customers are not impacted, leadership stays calm, and the engineer earns trust and recognition. Over time, that trust compounds. Certain people become the default escalation path, the human fallback when automation, documentation, or processes fail. Ask yourself: If you were to leave your organization tomorrow, would it become an operational problem?
Organizations and teams get comfortable with heroism.
Why invest in fixing a brittle pipeline if “Alex knows how to unblock it”? Why prioritize observability when “someone is always around” to look at logs? Why pause delivery to redesign the system when problems are being handled quietly?
Heroism creates the illusion of resilience. The system appears stable because someone is constantly compensating for its flaws. And because the failures never surface, the organization never feels the urgency to change. The system doesn’t fail loudly enough to demand investment. It fails softly, absorbed by a few people who care too much to let it break.
People who care about their work are fantastic and should be rewarded for their dedication. But without realising, they hide problems, and teach the organization that the current state is acceptable.
How hero culture creates blind spots
Hero culture actively prevents learning. When one person repeatedly steps in to “save the day,” it creates structural blind spots.
Failures never register
If a pipeline or service breaks at 2 a.m. and someone manually patches it before users notice, the incident may never be recorded. No postmortem, no backlog item, no prioritization discussion. And an immense satisfaction for the hero that an incident has been avoided. From the outside, everything looks fine. The hidden operational load is interpreted as success.
Knowledge concentrates
The hero learns more and more about edge cases, undocumented behaviors, and fragile dependencies. Others learn less. The system becomes harder to understand, and risk increases. Of course, it is always the same people who jump in when similar issues happen, simply because other engineers didn’t get the chance to learn from past incidents.
Priorities skew
Product and leadership teams respond to what they can see. If failures don’t cause downtime, missed deadlines, or customer complaints, they rarely compete with feature work. The system’s fragility never enters roadmap conversations because it never shows up as pain at the right level.
The cultural effect
Teams learn that raising issues is less valuable than fixing them quietly. Writing a proposal to redesign a component feels slower and riskier than just handling it yourself. Over time, engineers internalize the idea that “good engineers don’t let things fail,” even when failure would be the most honest signal. This can also be true in a blaming culture. To avoid blame, engineers quickly fix issues before they get escalated.
The paradox is that the better the heroes perform, the worse the system becomes.
At that point, the blind spot is complete: leadership believes the system is stable, teams feel constant low-grade stress, and the real risk remains unaccounted for until the hero is unavailable, burned out, or leaves.
And when that happens, the system collapses all at once.
Limiting hero culture without letting everything burn
“Don’t be a hero” is easy to say and hard to practice. No team wants outages, angry stakeholders, or broken user experiences to prove a point. And if it is “your” system, you don’t want to see it failing. I have been there. It’s hard to let your service fail when you could have avoided it.
But limiting hero culture is not about letting systems fail recklessly. It’s about making failure visible, teachable, and survivable.
The goal is not to allow for more incidents, but fewer invisible ones.
Separate response from resolution
It’s often necessary to restore service quickly. But restoring service is never the end of the story. If an incident required manual intervention, undocumented knowledge, or individual heroism, treat it as a system failure, even if users never noticed. Capture it, write it down, and make it visible to leadership and in planning conversations. Visibility matters more than incident severity.
Make fragility expensive
When heroes hide pain, nothing changes. Teams need mechanisms that translate hidden effort into explicit cost. This can be as simple as:
Logging manual interventions as incidents
Tracking “human retries” alongside system retries
Recording how often someone bypasses a process to unblock a deployment
Tagging tickets that required deep tribal knowledge to resolve
These signals reveal where the system relies on people rather than design. And now you get metrics and a “costs” estimate to help with roadmap prioritisation sessions.
Design for shared ownership
Hero culture thrives when systems are implicitly owned. If only one person truly understands a workflow, that is not expertise, it is risk. Rotate on-call. Pair during incident response. Require that fixes include documentation or automation. And if you are a hero, stop doing, start mentoring instead.
Leadership behavior matters more than process
If leaders reward “saving the day” but don’t reward making the system boring, hero culture will persist. Celebrating uptime while ignoring the human cost sends a clear message: keep absorbing the pain.
Instead, start recognizing work that removes the need for intervention. Celebrate simplification and automation work. Praise engineers who slow things down to fix root causes, even when it delays short-term delivery. The goal is to move that pride from individual endurance to collective resilience.
Systems improve when they are allowed to fail in controlled, observable ways. Teams improve when they are not expected to carry the system on their backs.
The hardest part is trusting that the organization can handle seeing the truth.
Conclusion
Hero culture feels good in the moment. It creates stories of dedication, ownership, and people who “always come through.” But over time, it quietly damages the system. It hides fragility, concentrates risk, burns out the most capable engineers, and teaches the organization the wrong lessons.
Healthy engineering organizations don’t eliminate failure. They make it visible, shared, and actionable. They design systems that degrade safely, recover predictably, and improve because something went wrong. It does require a healthy environment.
“Don’t be a hero” is not a rejection of responsibility. It’s a commitment to building systems that don’t require sacrifice to function. It’s choosing long-term resilience over short-term heroism, boring reliability over dramatic recoveries.
So the next time someone saves the day, ask a different question:
Why did the system need saving in the first place?
What would it take to make sure no one has to do that again?
Do you rely too much on that person?
And now, pause for a moment. Who are your heroes? Go and thank them for their work. They are doing much more than what you see. Next, plan how to bring them out of jail.


