It is common for raids to go entirely offline during a single drive failure and then for the rebuild not to work. Even if there is a hot spare.
When trying to rebuild, most people will make the same mistake, usually simply because it is their first rebuild and they follow what hey have been told by the manufacturers or by their backup providers.
We are posting this page hopeful that you are reading it before performing the rebuild and hoping we can help you avoid making the same common mistakes usually made and save your data.
NEVER RESTORE A BACKUP TO THE ORIGINAL DRIVES.
Never Assume your backup is good and never restore to the raid, but rather make the Backup restoration to another system first and make sure that you know that the restored data is good data and useful.
To many times customers have done this with out testing to see if the backup is good to find out they have a bad backup. POINT OF NO RETURN.
We recommend not asking the vender what to do in case of a raid failure. Even though they want to get you up and running, they are not professional in data recovery and therefore will often give you advise that can result in data loss and eliminate any further attempt to recover your files. I have seen this to many times.
DO NOT Force a Rebuild
If the raid is not running in a crippled state DO NOT DO A REBUILD.
Forcing a rebuild is the worst thing you can do at this point. The raid did not go into a rebuild for a reason. Something is wrong. Until you determine what is wrong, the chance of making a wrong move and losing your data is very real.
When raids are
forced into a rebuild – the procedure can be writing the wrong data to
the new drive then parity, it should have been running in a crippled
state.
Keep in mind that you have no way of knowing what is being written to
the drive and that the raid must synchronize the parity across all of
the drives, which means re-writing all the parity. If during the rebuild
the wrong data is being written to the drives – game over.
Raid controllers
all have a form of data scrubbing which is a correction or alignment
correction of the data and parity on the drive as best it can interpret
what it should be.
If the information is wrong, to start with it can only rewrite the wrong
information when scrubbing. The pointers to the files will be wrong or
the wrong data is written and recovery of the data can become Near
impossible.
So, what do you do and what not to do…
First – clone all the drives.
The order of the drives must be known. (If there was ever a previous failure it is possible that the order of the drives is not sequential anymore).
If you are not the original person that has been administering the server and there is no log then you really don’t know what has transpired over its life time.
Never Assume.
Never update the bios at this time. Bios changes can alter the way the parity is handled and the compression.
Do not change drive order. They can only operate in the order they where originally placed!
Do not initialize any drive. That causes a write to the drive. Never write to a drive that needs to be recovered!
Never change the settings in the controller. The way they were set is the way the raid operated. Altering the setting is not going to help and will cause a lose of vital information needed for recovery.
The best thing to do is call and ask a professional. Know what the best chance of a successful recovery is before doing anything. It never hurts to ask.
Better to be safe than sorry.
We offer phone support if you must do it yourself.
If it is Mission Critical we do not advise trying this yourself. There could be no return from this. Call and ask.