I'm sure you can't have failed to notice this; a complete blogging community wiped out due to negligence, incompetance, ignorance or what-ever. Harsh? If you are hosting some-one's data, you have a duty of care to ensure that it is protected, backed-up etc. Or at least disclose that you are an idiot who does not know the first thing about protecting data.
RAID is not a back-up; RAID-5, RAID-6, RAID-1 or RAID-(whatever) protects you from spinning disk(s) failing. It does not protect you from corruption, deletion, complete RAID-set failure or the array bursting into flames. And while we are at it, Snapshots can help to protect your from some of those but they do not protect you from complete RAID-set failure and other such catastrophic events.
You need to get your data off of your primary disks and onto a physically different location; this can be tape, another array, optical or a paper print-out. Keeping it in the same phyiscal array is IMHO idiocy!
I'm fine, I replicate to another array (local or remote); well that's a start. You've physically protected your data but you haven't logically protected your data. So any data corruption will be replicated quite nicely. Completely useless at that point.
I'm okay, I take periodic snapshots at the remote site; that's better for sure but how many snapshots? How quickly can you pick-up data corruption? What happens if you run out of space for Snapshots due to some unusual update job? Do you stop the replication when you do upgrades to ensure you've a last known good?
Well I'm fine, I stream off to tape! How often do you test your back-ups? When was the last time you restored from them? You run 1000s of back-up jobs a night; you'll have some failures, are you sure it wasn't the weekly full? Are you sure that those incrementals, you've been taking are any use to you at all?
So many questions, do you have answers? To all of them? We all wish that backup would simply go away but it won't. By the way, before you dive headfirst into the Cloud; surely these are a questions you need to ask of any external cloud provider.
That reminds me, time I backed my blog up!!
p.s a belated Happy New Year to you all!!
Yeah, I read that. Here's
their explanation from their website:
"The list of potential causes for this disaster is a short one. It includes a catastrophic failure by the operating system (OS X Server, in case you're interested), or a deliberate effort. A disgruntled member of the Lagomorphics team sabotaged some key servers several months ago after he was caught stealing from the company; as awful as the thought is, we can't rule out the possibility of additional sabotage.
But, clearly, we failed to take the steps to prevent this from happening. And for that we are very sorry."
Did they have meetings where things like this came up?
"What happens if we get massive corruption, how do we go back? What happens if everything gets zapped?"
And so on. Jaw dropping "what ifs" that should have had them putting safeguards in place - years ago.
Clearly there are places in the world where IT thinking isn't in place.
Sad.
Posted by: Rob | January 06, 2009 at 03:33 PM
Love it, so this basically is a load of tosh that explains in a blog post THEY HAD NO BACKUP PLAN.
Posted by: Daniel Eason | January 06, 2009 at 09:43 PM
Indeed, unfortunately I think there are a fair number of small IT-based companies who think that RAID==BackUp, Replication==BackUp etc! Its what happens when you let developers run infrastructure!
Posted by: Martin G | January 06, 2009 at 10:03 PM