RAID stands for Redundant Array of Inexpensive (or sometimes "Independent") Disks. RAID is a method of combining multiple hard disks in a single logical unit to offer high availability, performance or a combination of both. This provides better resilience and performance than a single disk drive.
benefits of RAID
software RAID
Many operating systems provide functionality for implementing software based RAID systems . The software RAID systems generate the RAID algorithms using the server CPU, this can severely limit the RAID performance. Should a server fail the whole RAID system is lost. Cheap to implement and only need a single SCSI controller.
hardware RAID
All RAID algorithms are generated on the RAID controller board, thus freeing the server CPU. Allows full benefits and data protection of RAID. More robust and fault tolerant than software RAID. Requires dedicated RAID controller to work.
Various RAID levels exist these are:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 10 | 50 | 0+1 |
The levels of RAID protection varies with the RAID level selected RAID levels 0 is not technically RAID as they have no redundancy in the event of drive failure.
JBOD Subsystems
These JBOD subsystems are high-density storage enclosures that can be cond to specific applications including changes to disk drive form factor and number of drives per enclosure.
This includes configurations of 8, 12, 14, 16 disk drives per enclosure.
RAID Features:
Write Through Cache
With Write Through Cache the data is written to both the cache and drive once the data is retrieved.
As the data is written to both places, should the information be required it can be retrieved from the cache for faster access.
The downside of this method is that the time to carry out a Write operation is greater the time to do a Write to a non cache device. The total Write time is the time to write to the cache plus the time to Write the disk.
Write Back Cache
With Write Back Cache the write operation does not suffer from the Write time delay.
The block of data is initially written to the cache, only when the cache is full or required is the data written to the disk.
The limitation of this method is that the storage device for a period of time does not contain the new or updated block of data.
If the data in the cache is lost due to power failure the data cannot be recovered.
When using Write Back Cache a battery backup module would prevent data loss in a RAID power failure.
Battery Backup
Provides data recovery in the event of power failure. Should the RAID controller fail the battery backup module can be transported to the replacement RAID controller and everything will continue.
A downside of battery backup modules is they loose their capacitance over time, need replacing every 12-24 months, add to the RAID cost and only hold the information for up to 72 hours.
Hot Swap
Whenever a RAID system mentions hot swap, the components can replaced while the RAID continues to operate.
Online Hot Spare
Should a drive fail within the RAID it will automatically utilise the hot spare and carry out a RAID re-build.
These can be of two types a). Local hot spare is available only to a specified RAID set.
b). Global hot spare can be available to multiple RAID sets.
Read Ahead Caching
A buffering technique used by hard disk drives and other disk access devices, in which extra data beyond that requested by the system is read and stored in cache memory.
There is a strong chance, especially when dealing with sequential data, that this subsequent information will also be requested by the computer.
Reading from cache memory is much faster than reading from the disk or media, so read-ahead caching speeds increase overall system performance to a degree. Also called look-ahead caching.
Online Capacity Expansion
The primary reason for Online Capacity Expansion is that it allows disk drives to be added to RAID systems whilst operating.
These disk drives can then be used to grow the overall RAID capacity, without taking anything offline.
The traditional method would be to backup the information and then destroy the RAID set and build a new RAID from scratch.
SAF-TE
This specification defines a set of SCSI commands for setting drive status information, including status for RAID arrays, into a disk drive array enclosure.
The drive array enclosure may be a separate enclosure, or the same enclosure.
The specification also defines commands for managing hot-swap drive slots and returning environmental health information for a drive enclosure.
The status commands are typically used by the enclosure manufacturer to assert lights or other indicators that provide information to the user about the state of the drives in the array.
This can include status such as 'rebuilding', 'fault', and 'hot spare'. The SAF-TE status setting commands are typically issued either by an intelligent disk controller, or by software, e.g. RAID software , running under the operating system.
Other parties on the SCSI bus may elect to access the status information as a means of determining the state of the physical drives in the array.
In addition, SAF-TE commands can be used to report certain environmental information about the enclosure, such as temperature, voltage, power supply, and fan health.
SMART
SMART - Self-Monitoring Analysis and Reporting Technology
The fundamental principle behind SMART is that many problems with hard disks don't occur suddenly.
They result from a slow degradation of various mechanical or electronic components.
SMART evolved from a technology developed by IBM called Predictive Failure Analysis or PFA. PFA divides failures into two categories: those that can be predicted and those that cannot. Predictable failures occur slowly over time, and often provide clues to their gradual failing that can be detected.
An example of such a predictable failure is spindle motor bearing burnout: this will often occur over a long time, and can be detected by paying attention to how long the drive takes to spin up or down, by monitoring the temperature of the bearings, or by keeping track of how much current the spindle motor uses.
An example of an unpredictable failure would be the burnout of a chip on the hard disk's logic board: often, this will "just happen" one day.
Clearly, these sorts of unpredictable failures cannot be planned for.