source : https://rickardnobel.se/how-raid5-works/
How RAID 5 actually works
In this article we will look in some detail how the RAID 5 parity is created and how it is possible to actually “read” from a destroyed disk in a RAID 5 set.
There are many sources on the web about the general principle of RAID 5, so we will not be covering that part here. In short, and as you might know, the RAID level 5 works with any number of disk equal or greater than 3 and places a parity sum on one disks in the set to be able to recover from a disk failure. (Striped blocks with distributed parity.) We can for example combine eight physical disk into a RAID5 set while only consuming the size of one disk for parity information. If any single drive breaks down we would still have full access to the data that was on the destroyed disk.
To understand how this is possible we have to look at the smallest unit, the binary bit, which could be 1 or 0. When doing mathematical calculations in binary we have several so called boolean algebra operations, for example the AND operation and the OR operation.
One of these low level logical operations is used heavily in RAID5: the XOR (“exclusive or”). XOR takes two binary digits and produces a true result if exactly one digit is true, (i.e. the other digit needs to be false).
Value A | Value B | XOR result |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
This means that for example 1 XOR 0 = 1, and 1 XOR 1 = 0. Only one binary digits may be 1 for the result to be “true”, that is, 1.
Let us now see how the parity calculations are done in a RAID 5 set using XOR. If we assume we have a small RAID 5 set of four disks and some data is written to it. For simplicity we see only a half byte (4 bits), but the principle is true no matter of the stripe size or the number of disks.
On the first three disks we have the binary information 1010, 1100 and 0011, here representing some data, and we now have to calculate the parity information for the fourth disk.
If looking at the first “column” of the disks to the left we have 1, 1 and 0. If we use XOR to calculate the result that would be:
1 XOR 1 XOR 0 = Parity bit
This could be written as: (1 XOR 1) XOR 0 = Parity bit
This means first 1 XOR 1 = 0 for the first two disks and then the result of that, the zero, against the bit on the third disk. That is, the first result 0 with the last disk, also 0, means 0 XOR 0 = 0, which would give the final result to 0.
For the next “column”, to the right above, we have 0, 1 and 0. We do first 0 XOR 1 = 1 and then this result with the third disk: 1 XOR 0 = 1. The parity bit will here be 1.
For the third column we would have:
1 XOR 0 XOR 1 = Parity
Broken down: 1 XOR 0 = 1 and then 1 XOR 1 = 0
And finally the fourth column:
0 XOR 0 XOR 1 = 1
This will for all four columns end up with the parity sum of 0101.
If any of Disk number 1, 2 or 3 would break the parity information on Disk 4 could be used to recreate the missing data. Let us look how this is done. If we assume that disk number 2 unexpectedly goes down we have lost all read and write access to the real Disk 2, however with the help of the already recorded parity we might be able to calculate the information which is missing.
The primary feature in a RAID5 disk set is to be able to “access” the data on a missing disk. This is done by running the exact same XOR operation over the remaining disks and the parity information. Let us look at the first column again. 1 XOR 0 = 1 (for disk 1 and disk 3) and then 1 XOR 0 (the parity) = 1. This means that there must have been a binary digit of 1 on the missing disk. If we do the same operation on the other columns we will end up with 1100, which is exactly the same data that was on the failed drive.
The XOR operation itself is extremely quick and easily handled by the CPU or RAID controller, but the big downside is that we have to read against ALL other disks to recreate the data on the missing one. If having for example eight disks in the set with one broken, then a single read IO against the missing disk will create seven more disk IOs to calculate the lost data on the fly.
The XOR operation works perfect mathematically with one disk missing, but the moment a second disk is lost then we no longer have enough information to make the calculations. While it is possible to keep using the RAID5 set with one disk missing for some time with degraded performance, it is naturally very good to replace the damaged disk and begin the full re-creation as soon as possible (hot spare is quite handy here).
See also this blog post on the RAID 5 write penalty.