OK, first let me say I wasn't there, so like Will Rogers is quoted as saying, "All I know is what I read in the papers." In my case, the paper is Computerworld.
On April 21, McAfee released an update to their anti-virus application that disabled systems running Microsoft XP SP3. This caused a bad day at McAfee, but even worse days at all the companies and homes that were impacted.
You can read the early account here:
Then, today, a more detailed explanation and apology was issued:
The bottom line was a critical defect was missed and made its way to customers. Here are my observations as an interested bystander and software testing consultant:
1) The apology was cryptic for a technical audience. "We recently made a change to our QA [quality assurance] environment that resulted in a faulty DAT making its way out of our test environment and onto customer systems." However, no explanation of the change was given. Was a platform removed, or skipped? Was a test case skipped? Was there a rush to get the update out? Why was the environment changed? The blame seems to be on the change to the environment.
2) The risk was very high. This is the tester's worst nightmare and an example of what you don't want your company to go through, or your customers. This is a credibility-basher. I work with some software companies that say "We don't care about the risk. If there are problems, we'll just post a hotfix on the web site." Right.....
3) Testing can't find all the defects. However, testing is an easy role to place blame. At least they didn't blame the testers - they blamed the process and the environment, which is probably the appropriate place to focus.
4) This points out a big risk for COTS applications - applying an update without testing it. I know the updating process is automated for large companies. However, one of the things I teach in my COTS testing class is to test the updates before rolling them out to the entire company. I suspect this will be one of those lessons learned for many people. The really troubling thing is for individuals who get impacted. They have no "test" PCs.
5) The customers want to hear from the CEO over this. So, your CEO doesn't seem to care about testing? This is a good case study to show why they should care.
6) Your test is only as good as your environment. You may have great tools, great testers and great processes, but if you have gaps in your environment, you don't know for sure what you are testing.
7) This is one of those head-bangers. Apparently, this was not one of those deeply-embedded defects, but one that could have been found just in a simple update to a commonly-used platform. This is one of those defects that leaves management and customers asking, "Why didn't you guys test that? (I refer you back to observation #1) They aren't saying for sure.
8) There will be more defects escape in the future. The only question is, what will the impact be? If you really want a scare, take this scenario out to medical devices, aircraft, automobiles, utilities and other safety-critical applications. No matter how hard we try, there will still be defects because we can't test everything. That's not an easy reality to embrace because many people have grown to trust that software just works - mostly. Testers know better
OK, enough of being the armchair quarterback. This is serious and frustrating, but not the end of the world. A few weeks from now, it will all be forgotten. Actually, that's part of the problem. We experience the pain, the pain goes away, then we experience it again...and again.