PCMag Digital Network
Seen a hot gadget?  Tell Us   
Contact Us  
Sites We Like
Gearlog on Twitter
Gearlog for Kindle
GoodCleanTech Recycling Superguide
Categories:  
Blackberry 8800 Where were you when the Blackberries went out? The eight-hour Blackberry outage this week generated plenty of news because RIM's Blackberry systems are usually so reliable -- and because this showed that yes, they do have a single point of failure. RIM released a statement today somewhat obliquely explaining what happened.

In English, RIM decided to do a software upgrade on their servers, it crashed their database, and switching over to their backup system took longer than they thought. I don't see this as crippling for RIM, because they are normally so reliable. All the same, it's been a bit of a PR boost for competitors like Microsoft, who are trumpeting that all of their push e-mail doesn't go through one server.

I was just discussing this with former IT guy Joel Santo Domingo, our desktops analyst, and here are the lessons we found for IT guys:

1. Don't run upgrades on your live servers. Switch over to a backup, run the upgrade, switch back.
2. If you're going to run an upgrade, do it on a Saturday night. Don't do it on a Tuesday.


Click through to read the full RIM statement.



RIM says: RIM's in-depth diagnostic analysis of the service interruption that occurred in North America on Tuesday night is progressing well and RIM will continue to provide further information as it's available. RIM's first priority during any service interruption is always to restore service and then establish, monitor and maintain stability. Proper analysis can take several days or longer and RIM's commitment is to provide the most accurate and complete information possible in such situations.

RIM is pleased to report that normal conditions returned on Wednesday and the BlackBerry service continues to operate normally today.

RIM has been able to definitively rule out security and capacity issues as a root cause. Further, RIM has confirmed that the incident was not caused by any hardware failure or core software infrastructure.

RIM has determined that the incident was triggered by the introduction of a new, non-critical system routine that was designed to provide better optimization of the system's cache. The system routine was expected to be non-impacting with respect to the real-time operation of the BlackBerry infrastructure, but the pre-testing of the system routine proved to be insufficient.

The new system routine produced an unexpected impact and triggered a compounding series of interaction errors between the system's operational database and cache. After isolating the resulting database problem and unsuccessfully attempting to correct it, RIM began it's failover process to a backup system.

Although the backup system and failover process had been repeatedly and successfully tested previously, the failover process did not fully perform to RIM's expectations in this situation and therefore caused further delay in restoring service and processing the resulting message queue.

RIM apologizes to customers for inconvenience resulting from the service interruption. RIM's root cause analysis and system enhancement process with respect to this incident is ongoing and RIM has already identified certain aspects of its testing, monitoring and recovery processes that will be enhanced as a result of the incident and in order to prevent recurrence.
Mixx It Mixx It Digg It Digg It StumbleUpon Toolbar Stumble Share More...

Content Recommendations from Evri
* = required
    Remember Me?
  
Please keep your comments on topic. Intelligent, thoughtful comments and questions are appreciated. Comments that contain personal attacks or profanity may be edited or removed. Comments containing personal information such as phone numbers, credit card numbers, or addresses may be edited or removed. Comments with advertisements will be removed.


 
Info Centers
Special Offers
         
 
  Ziff Davis Home | Contact Us | Advertise | Link to Us | Newsletters | RSS Feeds | Ziff Davis Media International
Digital Edition Customer Service | Subscribe to PCMag Digital Edition | Reprints
AppScout | Cranky Geeks | DigitalLife | DL.TV | ExtremeTech | GearLog | GoodCleanTech | PC Magazine | PCMagCasts | Security Watch | Smart Device Central | TechSaver
AppScout Mobile | Gearlog Mobile | GoodCleanTech Mobile | PCMag.com Mobile
Privacy Policy | Terms of Service | Linking Policy | Contact Us
Copyright © 1996-2009 Ziff Davis Publishing Holdings Inc. All Rights Reserved. PC Magazine, the PCMag.com logo and Gearlog are registered trademarks of Ziff Davis Publishing Holdings Inc. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis Media Inc. is prohibited.