|
Introduction to ECC What is ECC? ECC is an
acronym for Error Checking and Correcting. ECC is used in several
areas of computer operations, but the focus of this paper is on
ECC in main memory. What kinds of errors occur in RAM? In order of
likelihood, the most common memory errors are: Single-Bit,
Multi-Bit, Column, and Row. Single-bit errors are the most common
and are characterized by a single bit of data being incorrect
when reading a complete byte or word. A multi-bit error is the
result of more than one bit being erroneous within the same byte
or word. A single column or row error would appear as single-bit
errors in multiple words. How are these error corrected? ECC memory uses
extra bits to store an encrypted code with the data. When the
data is written to memory, the ECC code is simultaneously stored.
Upon being read back, the stored ECC code is compared to the ECC
code generated when the data was read. If the codes don't match,
they are decrypted to determine which bit in the data is
incorrect. The erroneous bit is "flipped" and the
memory controller releases the corrected data. Errors are
corrected "on-the-fly," and corrected data is rarely
placed back in memory. If the same corrupt data is read again,
the correction process is repeated. Replacing the data in memory
would require processing overhead that could accumulate and
significantly diminish system performance. If the error occurred
because of random events and isn't a defect in the memory, the
memory address will be cleaned of the error when the data is
overwritten with other data. How many extra bits are required for ECC? By encrypting
the ECC code and correcting only single bit errors, very few
additional bits are required. Unlike parity, the number of ECC
bits doesn't increase at the same rate as the bits per word or
data bandwidth. As word size doubles, parity bits double, but ECC
bits increase by one. So, if a system uses an 8 bit word, it
would need 1 bit for parity checking, but 5 bits for ECC.
However, a 32 bit word needs 4 bits for parity or 7 bits for ECC.
Increase the bandwidth to 64 bits and 8 bits are required for
both parity or ECC. Below is a chart comparing the bits required
for different data bandwidths for ECC and parity:
So what? As you can see,
at the 64 bit level, it is equally efficient to use ECC or
parity. This is how manufacturers use 36 bit memory modules in
groups of 2 or more to create an ECC environment. Compaq's
Tri-Flex memory bus is such a system. The Tri-Flex bus uses 4
SIMM groups, for a 128 bit bandwidth. Since the 4 modules have 16
spare bits between them, the 9 bits needed for ECC are easily
satisfied. Other systems may use special ECC memory modules, but
the cost benefit and availability of industry standard 36 bit
SIMMs appeals to designers and end-users. How is ECC used in the real world? Systems that use
ECC, may use it differently. Usually, when data requires
correction, the operating system logs the error and reports the
error to the system administrator. Multiple errors may be
reported for the same memory location, if the data is read more
than once without being replaced by different data. If the same
memory location is corrected after a system powerdown, a defect
is most likely present in the memory and should be replaced. |