Google has apologized and offered up an explanation of sorts for yesterday’s lengthy issues involving Gmail delivery delays. For those who receive so little email that you may have missed this (lucky you): on Monday, the company said that “less than 50 percent” of Gmail’s user base were seeing email slowdowns, where messages weren’t arriving in a timely fashion, and some attachments were failing to download.
Though we experienced and heard from others who saw messages arriving as long as several hours after first being sent out, Google says today that, on average, the average delivery delay was just 2.6 seconds, and 71 percent of messages were unaffected. However, it did admit that around 1.5 percent of messages were severely delayed, meaning for longer than two hours.
The company had been regularly posting updates as to the status of the issue throughout Monday on its Apps status dashboard, where the first report of trouble was publicly confirmed at 10:25 AM ET. Partial service restoration came in at 3:00 PM ET, with a promise that full resolution was only an hour away. But when 4:00 PM ET rolled around, Google said that it would still take another 3 hours to fix the problem for all users. By 7 PM ET, it stopped promising a timeframe to a resolution, and said the fix would arrive “in the near future.” The issue wasn’t fully resolved until 10 PM ET – a full 12 hours after the first report of troubles. The fix, we noticed, involved some previous read emails reappearing as unread messages, which added to the confusion.
Google apologized for the outage’s length this morning, saying “we realize that our users rely on Gmail to be always available and always fast, and for several hours we didn’t deliver.” It also offered a post-mortem, explaining that message delivery delays were triggered by a dual network failure, a rare event where two separate, redundant network paths both stop working at the same time. In layman’s terms, that’s getting pretty close to a worst-case scenario, given Gmail’s always-on nature and its uptime SLA for businesses on Google Apps.
The messages began piling up just before 9 AM ET, but although deliveries remained slow throughout the day Monday, Gmail itself never fully went offline. That gives it some wiggle room as its SLA agreement is focused on downtime, measured by server-side error rates. (Google hints at this subject, noting in the post’s conclusion: “Gmail remains well above 99.9% available…!”) Technically speaking, Gmail was up all day, but had slowed down to the point that email itself became unreliable for many of its users, who had to turn to alternative and backup email accounts to get their work done. (Isn’t iCloud.com pretty, by the way?)
The company says it’s now planning for rare events like this going forward, and will work to make Gmail message delivery more resilient to network capacity shortfalls going forward.