bugzero background
From Crisis to Innovation: The Origins of BugZero

From Crisis to Innovation: The Origins of BugZero

Miles Lancaster

Miles Lancaster

Architecture, Compliance, and Security

This is a story that I don’t often tell, but it is worth sharing.

Several years ago, I was working as the chief architect for a leading IT firm. I was dedicated to a $100M account providing hosting services to a $12B global company. One otherwise pleasant afternoon, we suffered a storage outage. It lasted for several hours and severely impacted the client’s business. We restored operations, but the cost to the client was significant, and the damage to our relationship was permanent.

Initially, this outage was considered a hardware failure – a common scapegoat in the industry. It’s easy to blame outages on hardware and that often satisfies everyone. However, it is exceedingly rare for modern hardware to fail. Transistors don’t often burn out. If a hardware component does fail, most systems are redundant and will failover to the redundant components automatically.

So, I dug into the situation further and discovered that it was not a hardware failure, but a software bug related to storage controller firmware. Yes, the storage controller “failed”, but I wouldn’t call it a “hardware failure”. In reality, one little bug brought down the operations of a global company.

This incident illuminated the underlying problem of how hardware and software bugs are managed and communicated within the industry. We sweep them under the rug, preventing opportunities for learning and improvement. Our industry just doesn’t like to talk about bugs.

There Had To Be A Better Way

That wasn’t the first time I dealt with an outage, but I do hope it was the last. Outages are costly, often requiring a payout in SLAs or credits which never truly repair the damage. Across all industries, more than two-thirds of outages incur costs exceeding $100,000. For the Fortune 1000, those costs total over $1 Million... per hour! Believe me when I say it is a terrible experience for everyone involved.

Not long after that event, I was contacted by Eric. He shared his experiences with bugs, and I told him mine. He posed the question: Why don’t we have a solution that tells us when the software running the storage controller is broken? After all, we have monitoring tools for everything else, why is there nothing focused on software bugs?

Eric shared his vision for a solution that would detect software bugs before they caused damage—a platform that went beyond monitoring to preemptively identify software problems and coordinate mitigating the risks that they posed. A vision, if it had been realized, that would have saved the relationship between my clients and my IT firm.

That was the first of many conversations as BugZero began to take shape.

Our goal is more than providing an innovative solution. We want to foster a culture of openness, continuous improvement, and accountability. Discussing outages and bugs shouldn’t be taboo – it should be encouraged.

“The purpose of technology is to augment humanity.”

-- IBM’s Principles of Trust and Transparency

BugZero: Right on Time for the Future

The response to BugZero is in line with how the world is developing and changing.

In 2025 the European Union will begin enforcing the Digital Operational Resilience Act (DORA) which introduces many new regulations. Particularly relevant to BugZero, Article 10, point 4(a) demands financial entities must

"identify and evaluate available software and hardware patches and updates using automated tools, to the extent possible".

This is the first time anything like this has been written into regulations – and BugZero is a solution designed to do exactly what the regulation mandates.

We’ve built the publicly available Operational Defect Database, a repository that catalogs operational defects for many popular IT hardware and software products. We’re providing this knowledge hub as a tool for IT Operations professionals. By democratizing this data, we hope to foster a community of openness and information sharing.

For our commercial clients – we’ve built a seamless integration to the leading ITSM, ServiceNow. BugZero now extends the scalable and proven ITOps problem management capabilities of ServiceNow to this long-overlooked category of IT risk.

Our journey has just begun as we work to effectively manage today’s (and tomorrow’s) complex IT operations, emerging operational resilience requirements, and disruptive technologies.

Join us as we work to eliminate outages — one bug at a time.

Share:

Do you know how much operational outages are costing you?

Understand the cost to your business and how BugZero can help you reduce those costs.

Sign up for our monthly Zero Defect Digest