Approvals as a Service (AaaS)
Because nothing says “move fast” like waiting three days for an approval from someone who has no idea what the change is about.
My second most-feared task is patching data in production. If you are wondering, the first one is attending a company team-building event.
Unfortunately, I’ve just been assigned a new task to update millions of records in a database, where the unique interface for making changes to the data is through Kafka topics.
I have done my best over the past three weeks to delay this activity, from testing the weirdest edge cases to simulating the activity under unrealistic loads. But I’ve run out of excuses.
So it’s time to write this change request document. And get the necessary approvals.
Twenty-four hours after submitting the request, I have still not received any approval from any of the five levels of reviewers. The opposite would have been surprising. Why would I dislike such activity if I didn’t have to chase people to acknowledge the change?
First, the Principal Engineer. He has no idea about what all of this is about. He is likely crossing his fingers that we’re not doing anything wrong. Or will blame it on my poor communication skills.
Next, the product folks. Special treatment for them. I will wait for the stand-up to follow up on the approval. My ego wants to mention that they are a blocker. To switch roles for once!
Then, the infrastructure engineers. We have run the load testing together, but they are as fearful as I am about running this in production. At least, if something goes wrong, we will be in the same boat.
Almost there. Now, the release management team. It is already thirty minutes past the expected execution time mentioned in the document. After a few pings, I get the approval. I guess we will backdate the activity?
And finally, one more acknowledgement is needed. From one specific person. I have no idea what his role is, or why he has a say. But process is process.
Two hours later, still no answers in Slack. The team gets creative: “Why don’t we go ahead? It's easier to ask for forgiveness than permission”… Tempting. But I can already picture myself in a post-mortem call, admitting I skipped the last approval. Not worth it.
And while I’m about to go home, I see that the final approval has just been granted! Why is it always an odd timing?
Anyway, let’s do it.
The team jumps on a call. We do our system checks. We start our prayers. And we trigger the pipeline.
Failed. We are missing a secret value. We ping our production support team. Luckily, they are still available. They fix the issue within ten minutes.
We do another round of system checks. All good.
Wait. No. A company SEV-1 has just been triggered. We have two options:
1. Report our activity.
2. Go ahead and potentially add chaos to the chaos.
A difficult choice after so much administrative effort today. But out of respect for our fellow SREs, we decide to postpone to tomorrow.
Should I build an “Approval Timing Leaderboard”? To track the monthly time spent by each engineer chasing signatures? The winner gets a free deployment wildcard!
To keep in mind. But now, I can go home and get some rest. Because tomorrow, I will have to chase approvals again.