If you have multiple terabytes of data then you may want to consider buying a NAS appliance (or building a NAS, which will be discussed in a future post). They are standalone machines such as this one that connect to your network and present themselves as a network drive. They usually have multiple bays for hard drives (typically not included) and usually offer several different RAID (Redundant Array of Independent Disks) options, alongside JBOD (Just a Bunch of Disks).

What is RAID?

RAID offers automatic redundant copies of your data by pooling together multiple hard drives and writing multiple copies[1]. But, as stated in any discussion about RAID, RAID is not a backup. It protects you against hardware failures only.

For each of the RAID configurations, assume that you have X drives each with capacity Y. Most RAID implementations will only be able to use, per drive, the minimum capacity of all drives in your array. That is, if you have three 4TB drives and one 3 TB drive, it can only use 3 TB on your first three drives.

  • RAID 0
    • This actually doesn't add any redundancy, but instead increases performance. Your files get split across the drives in the array (otherwise known as striping) so that when you read or write to/from the drives, you can do so from all drives simultaneously, increasing performance by a factor of X
    • If you lose any drives in the array, you lose ALL the data in the array (since because of striping, now you're missing pieces of many files)
    • Your usable capacity is XY
    • You need at least two drives
  • RAID 1
    • This is a simple mirror. All data is written to two drives simultaneously
    • Your usable capacity is XY/2
    • There is no write performance penalty
    • Sustained read performance will increase by a factor of X/2
    • You can lose multiple drives without data loss, but they have to be the right drives. Each drive is mirrored to its partner, so so long as you don't lose both drives that are mirrors of each other, you can recover your data
    • You need an even number of drives
  • RAID 5
    • Instead of storing a second copy of your data, RAID5 stores something called parity alongside your data
    • You can lose any one drive without data loss. By using the parity and the remaining data, the missing data can be reconstructed
    • You need at least three drives
    • The read and write performance is the same as that of the slowest disk, because for all writes you have to write to all disks and compute parity
    • Your usable capacity is (X-1)Y
  • RAID 6
    • This is similar to RAID 5 except there's two different parity values calculated alongside your data
    • You can lose any two drives without data loss
    • You need at least four drives
    • The read and write performance is the same as that of the slowest disk
    • Your usable capacity is (X-2)Y
  • RAID 10 (1+0)
    • This is a stripe of mirrors.
    • You can lose multiple drives, but if you lose any mirror, you will lose all your data (because they're striped)
    • You need at least four drives
    • The read and write performance is increased by a factor of the number of mirror sets
    • Your usable capacity is XY/2
  • RAID 0+1
    • This is a mirror of stripes
    • You can technically lose multiple drives so long as they are of the same stripe. However, once you lose the first drive, your RAID controller will stop writing to any drives in the stripe that contains the failed drive, so in theory your other drives in the same stripe are not likely to fail
    • The read and write performance is increased by a factor of X/2
    • Your usable capacity is XY/2
  • JBOD
    • Not a RAID, but just a way to combine your drives into one logical drive
    • You get to use all the capacity of your drives, even if they are mismatched sizes
    • If you lose a drive, then depending on how the JBOD was implemented, you may be able to recover some files. Obviously you will lose any files that resided on the dead disk. But there's also the chance that the JBOD implementation decided to span a file across the two drives

NAS appliances will typically offer RAID 0, RAID1 , and JBOD, and if they have the appropriate number of drive bays, RAID 5 and RAID 10.

I recommend either RAID 1 for the simplicity or RAID 5 or RAID 6 if available for the lower loss in capacity. RAID 5 and RAID 6 do require extra computation for every write, so it can result in slow performance if the CPU in the appliance isn't powerful enough. Make sure to read the reviews before buying one.

I should mention there is some debate on whether RAID 5 is viable today. There's a fairly (in?)famous article on how RAID5 doesn't work for consumer drives because the chances of a drive failure during a rebuild (when you put in a new drive after a drive failure and the system reconstructs the data) are high enough to cause concern. Remember, in a RAID5, you can only lose one drive. If you lose a drive during a rebuild, you've lost two drives and all your data.
But then you have discussions like this one where they disagree.

I go into more detail in my next post in this series where I discuss my own setup, but I take the conservative approach and use RAID 6. It only costs one more drive. Furthermore, drive failures are correlated. I mitigate this by not buying all my drives from the same brand, but when all your drives are in the same system, in close proximity, experiencing vibration from the other drives spinning at 5400-7200 rpm, and experiencing the same workload, the chances of a second drive dying after your first one dies is higher than if you had the drives all installed on different systems alone.

  1. Yes I know, RAID5 and 6 don't store an entire copy but they store parity instead. I'll get to that ↩︎