APPENDIX A
RAID Concepts
In the not too distant past, a 1TB-sized
database was considered to be pretty big. Currently, 1PB–2PB defines the
lower boundary for a large database. In the not too distant future, exabyte, zettabyte, and yottabyte will become commonly bandied terms near the DBA water cooler.
As companies store more and more data, the need for
disk space continues to grow. Managing database storage is a key
responsibility of every database administrator. DBAs are tasked with
estimating the initial size of databases, recognizing growth patterns,
and monitoring disk usage. Overseeing these operations is critical to
ensuring the availability of company data.
Here are some common DBA tasks associated with storage management:
- Determining disk architecture for database applications
- Planning database capacity
- Monitoring and managing growth of database files
Before more storage is added to a database server, SAs
and DBAs should sit down and figure out which disk architecture offers
the best availability and performance for a given budget. When working
with SAs, an effective DBA needs to be somewhat fluent in the language
of disk technologies. Specifically, DBAs must have a basic understanding
of RAID disk technology and its implications for database performance
and availability.
Even if your opinion isn’t solicited in regard to disk
technology, you still need to be familiar with the basic RAID
configurations that will allow you to make informed decisions about
database tuning and troubleshooting. This appendix discusses the
fundamental information a DBA needs to know about RAID.
Understanding RAID
As a DBA, you need to be knowledgeable about RAID
designs to ensure that you use an appropriate disk architecture for your
database application. RAID, which is an acronym for a Redundant Array
of Inexpensive [or Independent] Disks, allows you to configure several
independent disks to logically appear as one disk to the application.
There are two important reasons to use RAID:
- To spread I/O across several disks, thus improving bandwidth
- To eliminate a lone physical disk as a single point of failure
If the database process that is
reading and writing updates to disk can parallelize I/O across many
disks (instead of a single disk), the bandwidth can be dramatically
improved. RAID also allows you to configure several disks so that you
never have one disk as a single point of failure. For most database
systems, it is critical to have redundant hardware to ensure database
availability.
The purpose of this section is not to espouse one RAID
technology over another. You’ll find bazillions of blogs and white
papers on the subject of RAID. Each source of information has its own
guru that evangelizes one form of RAID over another. All these sources
have valid arguments for why their favorite flavor of RAID is the best
for a particular situation.
Be wary of blanket statements
regarding the performance and availability of RAID technology. For
example, you might hear somebody state that RAID 5 is always better than
RAID 1 for database applications. You might also hear somebody state
that RAID 1 has superior fault tolerance over RAID 5. In most cases, the
superiority of one RAID technology over another depends on several
factors, such as the I/O behavior of the database application and the
various components of the underlying stack of hardware and software. You
may discover that what performs well in one scenario is not true in
another; it really depends on the entire suite of technology in use.
The goal here is to describe the performance and fault
tolerance characteristics of the most commonly used RAID technologies.
We explain in simple terms and with clear examples how the basic forms
of RAID technology work. This base knowledge enables you to make an
informed disk technology decision dependent on the business requirements
of your current environment. You should also be able to take the
information contained in this section and apply it to the more
sophisticated and emerging RAID architectures.
Defining Array, Stripe Width, Stripe Size, Chunk Size
Before diving into the technical details of RAID, you first need to be familiar with a few terms: array, stripe width, stripe size, and chunk size.
An array is simply a
collection of disks grouped together to appear as a single device to the
application. Disk arrays allow for increased performance and fault
tolerance.
The stripe width is the
number of parallel pieces of data that can be written or read
simultaneously to an array. The stripe width is usually equal to the
number of disks in the array. In general (with all other factors being
equal), the larger the stripe width size, the greater the throughput
performance of the array. For example, you will generally see greater
read/write performance from an array of twelve 32GB drives than from an
array of four 96GB drives.
The stripe size is the
amount of data you want written in parallel to an array of disks.
Determining the optimal stripe size can be a highly debatable topic.
Decreasing the stripe size usually increases the number of drives a file
will use to store its data. Increasing the stripe size usually
decreases the number of drives a file will employ to write and read to
an array. The optimal stripe size depends on your database application
I/O characteristics, along with the hardware and software of the system.
Note The
stripe size is usually a configurable parameter that can be dynamically
configured by the storage administrator. Contrast that with the stripe
width, which can be changed only by increasing or decreasing the
physical number of disks.
The chunk size is the subset of the stripe size. The chunk size (also called the striping unit) is the amount of data written to each disk in the array as part of a stripe size.
Figure A-1
shows a 4KB stripe size that is being written to an array of four disks
(a stripe width of 4). Each disk gets a 1KB chunk written to it.
Figure A-1. A 4KB stripe of data is written to four disks as 1KB chunks
The chunk size can have significant performance
effects. An inappropriate chunk size can result in I/O being
concentrated on single disks within the array. If this happens, you may
end up with an expensive array of disks that perform no better than a
single disk.
What’s the correct chunk size to use for database
applications? It depends somewhat on the average size of I/O your
databases generates. Typically, database I/O consists of several
simultaneous and small I/O requests. Ideally, each small I/O request
should be serviced by one disk, with the multiple I/O requests spread
out across all disks in the array. So in this scenario, you want your
chunk size to be a little larger than the average database I/O size.
Tip You’ll
have to test your particular database and disk configuration to
determine which chunk size results in the best I/O distribution for a
given application and its average I/O size.
RAID 0
RAID 0 is commonly known as striping,
which is a technique that writes chunks of data across an array of
disks in a parallel fashion. Data is also read from disks in the same
way, which allows several disks to participate in the read/write
operations. The idea behind striping is that simultaneous access to
multiple disks will have greater bandwidth than I/O to a single disk.
Note One
disk can be larger than the other disks in a RAID 0 device (and the
additional space is still used). However, this is not recommended
because I/O will be concentrated on the large disk where more space is
available.
Figure A-2 demonstrates how RAID 0 works. This RAID 0 disk array physically comprises four disks. Logically, it looks like one disk (/mount01)
to the application. The stripe of data written to the RAID 0 device
consists of 16 bits: 0001001000110100. Each disk receives a 4-bit chunk
of the stripe.
Figure A-2. Four-disk RAID 0 striped device
With RAID 0, your realized disk capacity is the number
of disks times the size of the disk. For example, if you have four
100GB drives, the overall realized disk capacity available to the
application is 400GB. In this sense, RAID 0 is a very cost-effective
solution.
RAID 0 also provides excellent I/O performance. It
allows for simultaneous reading and writing on all disks in the array.
This spreads out the I/O, which reduces disk contention, alleviates
bottlenecks, and provides excellent I/O performance.
The huge downside to RAID 0 is that it doesn’t provide
any redundancy. If one disk fails, the entire array fails. Therefore,
you should never use RAID 0 for data you consider to be critical. You
should use RAID 0 only for files that you can easily recover and only
when you don’t require a high degree of availability.
Tip One
way to remember what RAID 0 means is that it provides “0” redundancy.
You get zero fault tolerance with RAID 0. If one disk fails, the whole
array of disks fails.
RAID 1
RAID 1 is commonly known as mirroring,
which means that each time data is written to the storage device, it is
physically written to two (or more) disks. In this configuration, if
you lose one disk of the array, you still have another disk that
contains a byte-for-byte copy of the data.
Figure A-3
shows how RAID 1 works. The mirrored disk array is composed of two
disks. Disk 1b is a copy (mirror) of Disk 1a. As the data bits 0001 are
written to Disk 1a, a copy of the data is also written to Disk 1b.
Logically, the RAID 1 array of two disks looks like one disk (/mount01) to the application.
Figure A-3. RAID 1 two-disk mirror
Write performance with RAID 1 takes a little longer
(than a single disk) because data must be written to each participating
mirrored disk. However, read bandwidth is increased because of parallel
access to data contained in the mirrored array.
RAID 1 is popular because it is simple to implement
and provides fault tolerance. You can lose one mirrored disk and still
continue operations as long as there is one surviving member. One
downside to RAID 1 is that it reduces the amount of realized disk space
available to the application. Although typically there are only two
disks in a mirrored array, you can have more than two disks in a mirror.
The realized disk space in a mirrored array is the size of the disk.
Here’s the formula for calculating realized disk space for RAID 1:
Number of mirrored arrays * Disk Capacity
For example, suppose that you have four 100GB disks
and you want to create two mirrored arrays with two disks in each array.
The realized available disk space is calculated as shown here:
2 arrays * 100 gigabytes = 200 gigabytes
Another way of formulating it is as follows:
(Number of disks available / number of disks in the array) * Disk Capacity
This formula also shows that the amount of disk space available to the application is 200GB:
(4 / 2) * 100 gigabytes = 200 gigabytes
Tip One
way to remember the meaning of RAID 1 is that it provides 100%
redundancy. You can lose one member of the RAID 1 array and still
continue operations.
Generating Parity
Before discussing the next levels of RAID, it is important to understand the concept of parity
and how it is generated. RAID 4 and RAID 5 configurations use parity
information to provide redundancy against a single disk failure. For a
three-disk RAID 4 or RAID 5 configuration, each write results in two
disks being written to in a striped fashion, with the third disk storing
the parity information.
Parity data contains the information needed to
reconstruct data in the event one disk fails. Parity information is
generated from an XOR (exclusive OR) operation.
Table A-1
describes the inputs and outputs of an XOR operation. The table reads
as follows: if one and only one of the inputs is a 1, the output will be
a 1; otherwise, the output is a 0.
Table A-1. Behavior of an XOR Operation
Input A
|
Input B
|
Output
|
---|---|---|
1
|
1
|
0
|
1
|
0
|
1
|
0
|
1
|
1
|
0
|
0
|
0
|
For example, from the first row in Table A-1,
if both bits are a 1, the output of an XOR operation is a 0. From the
second and third rows, if one bit is a 1 and the other bit is a 0, the
output of an XOR operation is a 1. The last row shows that if both bits
are a 0, the output is a 0.
A slightly more complicated example will help clarify this concept. In the example shown in Figure A-4,
there are three disks. Disk 1 is written 0110, and Disk 2 is written
1110. Disk 3 contains the parity information generated by the output of
an XOR operation on data written to Disk 1 and Disk 2.
Figure A-4. Disk 1 XOR Disk 2 = Disk 3 (parity)
How was the 1000 parity information calculated? The
first two bits of the data written to Disk 1 and Disk 2 are a 0 and a 1;
therefore, the XOR output is a 1. The second two bits are both 1, so
the XOR output is a 0. The third sets of bits are both 1, and the output
is a 0. The fourth bits are both zeros, so the output is a 0.
This discussion is summarized here in equation form:
Disk1 XOR Disk2 = Disk3 (parity disk) ----- --- ----- ----- 0110 XOR 1110 = 1000
How does parity allow for the recalculation of data in
the event of a failure? For this example, suppose that you lose Disk 2.
The information on Disk 2 can be regenerated by taking an XOR operation
on the parity information (Disk 3) with the data written to Disk 1. An
XOR operation of 0110 and 1000 yields 1110 (which was originally written
to Disk 2). This discussion is summarized here in equation form:
Disk1 XOR Disk3 = Disk2 ----- --- ----- ----- 0110 XOR 1000 = 1110
You can perform an XOR operation with any number of
disks. Suppose that you have a four-disk configuration. Disk 1 is
written 0101, Disk 2 is written 1110, and Disk 3 is written 0001. Disk 4
contains the parity information, which is the result of Disk 1 XOR Disk
2 XOR Disk 3:
Disk1 XOR Disk2 XOR Disk3 = Disk4 (parity disk) ----- --- ---- --- ----- ----- 0101 XOR 1110 XOR 0001 = 1010
Suppose that you lose Disk 2. To regenerate the
information on Disk 2, you perform an XOR operation on Disk 1, Disk 3,
and the parity information (Disk 4), which results in 1110:
Disk1 XOR Disk3 XOR Disk4 = Disk2 ----- --- ----- --- ----- ----- 0101 XOR 0001 XOR 1010 = 1110
You can always regenerate the data on the drive that
becomes damaged by performing an XOR operation on the remaining disks
with the parity information. RAID 4 and RAID 5 technologies use parity
as a key component for providing fault tolerance. These parity-centric
technologies are described in the next two sections.
RAID 4
RAID 4, which is sometimes referred to as dedicated parity,
writes a stripe (in chunks) across a disk array. One drive is always
dedicated for parity information. A RAID 4 configuration minimally
requires three disks: two disks for data and one for parity. The term RAID 4 does not mean there are four disks in the array; there can be three or more disks in a RAID 4 configuration.
Figure A-5
shows a four-disk RAID 4 configuration. Disk 4 is the dedicated parity
disk. The first stripe consists of the data 000100100011. Chunks of data
0001, 0010, and 0011 are written to Disks 1, 2, and 3, respectively.
The parity value of 0000 is calculated and written to Disk 4.
Figure A-5. Four-disk RAID 4 dedicated parity device
RAID 4 uses an XOR operation to generate the parity information. For each stripe in Figure A-5, the parity information is generated as follows:
Disk1 XOR Disk2 XOR Disk3 = Parity ----- --- ----- --- ----- ------ 0001 XOR 0010 XOR 0011 = 0000 0100 XOR 0101 XOR 0110 = 0111 0111 XOR 1000 XOR 1001 = 0110 1010 XOR 1011 XOR 1100 = 1101
Tip Refer to the previous “Generating Parity” section for details on how an XOR operation works.
RAID 4 requires that parity information be generated
and updated for each write, so the writes take longer in a RAID 4
configuration than a RAID 0 write. Reading from a RAID 4 configuration
is fast because the data is spread across multiple drives (and
potentially multiple controllers).
With RAID 4, you get more realized disk space than you
do with RAID 1. The RAID 4 amount of disk space available to the
application is calculated with this formula:
(Number of disks – 1) * Disk Capacity
For example, if you have four 100GB disks, the realized disk capacity available to the application is calculated as shown here:
(4 -1) * 100 gigabytes = 300 gigabytes
In the event of a single disk failure, the remaining
disks of the array can continue to function. For example, suppose that
Disk 1 fails. The Disk 1 information can be regenerated with the parity
information, as shown here:
Disk2 XOR Disk3 XOR Parity = Disk1 ----- --- ----- --- ------ ----- 0010 XOR 0011 XOR 0000 = 0001 0101 XOR 0110 XOR 0111 = 0100 1000 XOR 1001 XOR 0110 = 0111 1011 XOR 1100 XOR 1101 = 1010
During a single disk failure, RAID 4 performance will
be degraded because the parity information is required for generating
the data on the failed drive. Performance will return to normal levels
after the failed disk has been replaced and its information regenerated.
In practice, RAID 4 is seldom used because of the inherent bottleneck
with the dedicated parity disk.
RAID 5
RAID 5, which is sometimes referred to as distributed parity,
is similar to RAID 4 except that RAID 5 interleaves the parity
information among all the drives available in the disk array. A RAID 5
configuration minimally requires three disks: two for data and one for
parity. The term RAID 5 does not mean there are five disks in the array; there can be three or more disks in a RAID 5 configuration.
Figure A-6
shows a four-disk RAID 5 array. The first stripe of data consists of
000100100011. Three chunks of 0001, 0010, and 0011 are written to Disks
1, 2, and 3; the parity of 0000 is written to Disk 4. The second stripe
writes its parity information to Disk 1, the third stripe writes its
parity to Disk 2, and so on.
Figure A-6. Four-disk RAID 5 distributed parity device
RAID 5 uses an XOR operation to generate the parity information. For each stripe in Figure A-6, the parity information is generated as follows:
0001 XOR 0010 XOR 0011 = 0000 0100 XOR 0101 XOR 0110 = 0111 0111 XOR 1000 XOR 1001 = 0110 1010 XOR 1011 XOR 1100 = 1101
Tip Refer to the previous “Generating Parity” section for details on how an XOR operation works.
Like RAID 4, RAID 5 writes suffer a slight write
performance hit because of the additional update required for the parity
information. RAID 5 performs better than RAID 4 because it spreads the
load of generating and updating parity information to all disks in the
array. For this reason, RAID 5 is almost always preferred over RAID 4.
RAID 5 is popular because it combines good I/O
performance with fault tolerance and cost effectiveness. With RAID 5,
you get more realized disk space than you do with RAID 1. The RAID 5
amount of disk space available to the application is calculated with
this formula:
(Number of disks – 1) * Disk Capacity
Using the previous formula, if you have four 100GB
disks, the realized disk capacity available to the application is
calculated as follows:
(4 -1) * 100 gigabytes = 300 gigabytes
RAID 5 provides protection against a single disk
failure through the parity information. If one disk fails, the
information from the failed disk can always be recalculated from the
remaining drives in the RAID 5 array. For example, suppose that Disk 3
fails; the remaining data on Disk 1, Disk 2, and Disk 4 can regenerate
the required Disk 3 information as follows:
DISK1 XOR DISK2 XOR DISK4 = DISK3 ----- --- ----- --- ----- ----- 0001 XOR 0010 XOR 0000 = 0011 0111 XOR 0100 XOR 0110 = 0101 0111 XOR 0110 XOR 1001 = 1000 1010 XOR 1011 XOR 1100 = 1101
During a single disk failure, RAID 5 performance will
be degraded because the parity information is required for generating
the data on the failed drive. Performance will return to normal levels
after the failed disk has been replaced and its information regenerated.
Building Hybrid (Nested) RAID Devices
The RAID 0, RAID 1, and RAID 5 architectures are the
building blocks for more sophisticated storage architectures. Companies
that need better availability can combine these base RAID technologies
to build disk arrays with better fault tolerance. Some common hybrid
RAID architectures are as follows:
- RAID 0+1 (striping and then mirroring)
- RAID 1+0 (mirroring and then striping)
- RAID 5+0 (RAID 5 and then striping)
These configurations are sometimes referred to as hybrid or nested
RAID levels. Much like Lego blocks, you can take the underlying RAID
architectures and snap them together for some interesting configurations
that have performance, fault tolerance, and cost advantages and
disadvantages. These technologies are described in detail in the
following sections.
Note Some
degree of confusion exists about the naming standards for various RAID
levels. The most common industry standard for nested RAID levels is that
RAID A+B means that RAID level A is built first and then RAID level B
is layered on top of RAID level A. This standard is not consistently
applied by all storage vendors. You have to carefully read the
specifications for a given storage device to ensure that you understand
which level of RAID is in use.
RAID 0+1
RAID 0+1 is a disk array that is first striped and then mirrored (a mirror of stripes). Figure A-7
shows an eight-disk RAID 0+1 configuration. Disks 1 through 4 are
written to in a striped fashion. Disks 5 through 8 are a mirror of Disks
1 through 4.
Figure A-7. RAID 0+1 striped and then mirrored device
RAID 0+1 provides the I/O benefits of striping while
providing the sturdy fault tolerance of a mirrored device. This is a
relatively expensive solution because only half the disks in the array
comprise your usable disk space. The RAID 0+1 amount of disk space
available to the application is calculated with this formula:
(Number of disks in stripe) * Disk Capacity
Using the previous formula, if you have eight 100GB
drives with four drives in each stripe, the realized disk capacity
available to the application is calculated as follows:
4 * 100 gigabytes = 400 gigabytes
The RAID 0+1 configuration can survive multiple disk
failures only if the failures occur within one stripe. RAID 0+1 cannot
survive two disk failures if one failure is in one stripe (/dev01) and the other disk failure is in the second stripe (/dev02).
RAID 1+0
RAID 1+0 is a disk array that is first mirrored and then striped (a stripe of mirrors). Figure A-8 displays an eight-disk RAID 1+0 configuration. This configuration is also commonly referred to as RAID 10.
Figure A-8. RAID 1+0 mirrored and then striped device
RAID 1+0 combines the fault tolerance of mirroring
with the performance benefits of striping. This is a relatively
expensive solution because only half the disks in the array comprise
your usable disk space. The RAID 1+0 amount of disk space available to
the application is calculated with this formula:
(Number of mirrored devices) * Disk Capacity
For example, if you start with eight 100GB drives, and
you build four mirrored devices of two disks each, the overall realized
capacity to the application is calculated as follows:
4 * 100 gigabytes = 400 gigabytes
Interestingly, the RAID 1+0 arrangement provides much better fault tolerance than RAID 0+1. Analyze Figure A-8
carefully. The RAID 1+0 hybrid configuration can survive a disk failure
in each stripe and can also survive one disk failure within each
mirror. For example, in this configuration, Disk 1a, Disk 2b, Disk 3a,
and Disk 4b could fail; but the overall device would continue to
function because of the mirrors in Disk 1b, Disk 2a, Disk 3b, and Disk
4a.
Likewise, an entire RAID 1+0 stripe could fail, and
the overall device would continue to function because of the surviving
mirrored members. For example, Disk 1b, Disk 2b, Disk 3b, and Disk 4b
could fail; but the overall device would continue to function because of
the mirrors in Disk 1a, Disk 2a, Disk 3a, and Disk 4a.
Many articles, books, and storage vendor documentation
confuse the RAID 0+1 and RAID 1+0 configurations (they refer to one
when really meaning the other). It is important to understand the
differences in fault tolerance between the two architectures. If you’re
architecting a disk array, ensure that you use the one that meets your
business needs.
Both RAID 0+1 and RAID 1+0 architectures possess the excellent performance attributes of striped storage devices
without the overhead of generating parity. Does RAID 1+0 perform better
than RAID 0+1 (and vice versa)? Unfortunately, we have to waffle a bit
(no pun intended) on the answer to this question: it depends.
Performance characteristics are dependent on items such as the
configuration of the underlying RAID devices, amount of cache, number of
controllers, I/O distribution of the database application, and so on.
We recommend that you perform an I/O load test to determine which RAID
architecture works best for your environment.
RAID 5+0
RAID 5+0 is a set of disk arrays placed in a RAID 5 configuration and then striped. Figure A-9 displays the architecture of an eight-disk RAID 5+0 configuration.
Figure A-9. RAID 5+0 (RAID 5 and then striped) device
RAID 5+0 is sometimes referred to as striping parity.
The read performance is slightly less than the other hybrid (nested)
approaches. The write performance is good, however, because each stripe
consists of a RAID 5 device. Because this hybrid is underpinned by RAID 5
devices, it is more cost effective than the RAID 0+1 and RAID 1+0
configurations. The RAID 5+0 amount of disk space available to the
application is calculated with this formula:
(Number of disks - number of disks used for parity) * Disk Capacity
For example, if you have eight 100GB disks with four
disks in each RAID 5 device, the total realized capacity would be
calculated as shown here:
(8 - 2) * 100 gigabytes = 600 gigabytes
RAID 5+0 can survive a single disk failure in either
RAID 5 device. However, if there are two disk failures in one RAID 5
device, the entire RAID 5+0 device will fail.
Determining Disk Requirements
Which RAID technology is best for your environment?
It depends on your business requirements. Some storage gurus recommend
RAID 5 for databases; others argue that RAID 5 should never be used.
There are valid arguments on both sides of the fence. You may be part of
a shop that already has a group of storage experts who predetermine the
underlying disk technology without input from the DBA team. Ideally,
you want to be involved with architecture decisions that affect the
database, but realistically that does not always happen.
Or you might be in a shop that is constrained by cost
and might conclude that a RAID 5 configuration is the only viable
architecture. For your database application, you’ll have to determine
the cost–effective RAID solution that performs well while also providing
the required fault tolerance. This will most likely require you to work
with your storage experts to monitor disk performance and I/O
characteristics.
Tip Refer to Chapter 8 for details on how to use tools such as iostat and sar to monitor disk I/O behavior.
Table A-2
summarizes the various characteristics of each RAID technology. These
are general guidelines, so test the underlying architecture to ensure
that it meets your business requirements before you implement a
production system.
Table A-2. Comparison of RAID Technologies
Table A-2
is intended only to provide general heuristics for determining the
appropriate RAID technology for your environment. There will be some
technologists who might disagree with some of these general guidelines.
From our experience, there are often two very opposing RAID opinions,
and both have valid points of view.
Some variables that are unique to a particular
environment also influence the decision about the best solution. For
this reason, it can be difficult to determine exactly which combination
of chunk, stripe size, stripe width, underlying RAID technology, and
storage vendor will work best over a wide variety of database
applications. If you have the resources to test every permutation under
every type of I/O load, you probably can determine the perfect
combination of the previously mentioned variables.
Realistically, few shops have the time and money to
exercise every possible storage architecture for each database
application. You’ll have to work with your SA and storage vendor to
architect a cost-effective solution for your business that performs well
over a variety of database applications.
Caution Using RAID technology doesn’t
eliminate the need for a backup and recovery strategy. You should
always have a strategy in place to ensure that you can restore and
recover your database. You should periodically test your backup and
recovery strategy to make sure it protects you if all disks fail
(because of a fire, earthquake, tornado, avalanche, grenade, hurricane,
and so on).
Capacity Planning
DBAs are often involved with disk storage capacity planning.
They have to ensure that adequate disk space will be available, both
initially and for future growth, when the database server disk
requirements are first spec’ed out (specified). When using RAID
technologies, you have to be able to calculate the actual amount of disk
space that will be available given the available disks.
For example, when the SA says that there are x
number of type Y disks configured with a given RAID level, you have to
calculate whether there will be enough disk space for your database
requirements.
Table A-3 details the formulas used to calculate the amount of available disk space for each RAID level.
Table A-3. Calculating the Amount of RAID Disk Space Realized
Disk Technology
|
Realized Disk Capacity
|
---|---|
RAID 0 (striped)
|
Num Disks in Stripe * Disk Size
|
RAID 1 (mirrored)
|
Num Mirrored Arrays * Disk Size
|
RAID 4 (dedicated parity)
|
(Num Disks – 1) * Disk Size
|
RAID 5 (distributed parity)
|
(Num Disks – 1) * Disk Size
|
RAID 0+1 (striped and then mirrored)
|
Num Disks in Stripe * Disk Size
|
RAID 1+0 (mirrored and then striped)
|
Num Mirrored Arrays * Disk Size
|
RAID 5+0 (RAID 5 and then striped)
|
(Num Disks – Num Parity Disks) * Disk Size
|
Be sure to include future database growth requirements
in your disk space calculations. Also consider the amount of disk space
needed for files such as database transaction logs and database
binaries, as well as the space required for database backups (keep in
mind that you may want to keep multiple days’ worth of backups on disk).
Tip A
good rule of thumb is to always keep one database backup on disk, back
up the database backup files to tape, and then move the backup tapes
offsite. You will have the good performance that is required for routine
backup and recovery tasks and protection against complete disasters.
APPENDIX B
Server Log Files
Server log files contain
informational messages about the kernel, applications, and services
running on a system. These files can be very useful for troubleshooting
and debugging system-level issues. DBAs often look in the system log
files as a first step in diagnosing server issues. Even if you’re
working with competent SAs, you can still save time and gain valuable
insights into the root cause of a problem by inspecting these log files.
This appendix covers managing Linux and Solaris log
files. You’ll learn about the basic information contained in the log
files and the tools available to rotate the logs.
Managing Linux Log Files
Most of the system log files are located in the /var/log directory. There is usually a log file for a specific application or service. For example, the cron utility has a log file named cron (no surprise) in the /var/log directory. Depending on your system, you may need root privileges to view certain log files.
The log files will vary somewhat by the version of the OS and the applications running on your system. Table B-1 contains the names of some of the more common log files and their descriptions.
Table B-1. Typical Linux Log Files and Descriptions
Log File Name
|
Purpose
|
---|---|
/var/log/boot.log
|
System boot messages
|
/var/log/cron
|
cron utility log file
|
/var/log/maillog
|
Mail server log file
|
/var/log/messages
|
General system messages
|
/var/log/secure
|
Authentication log file
|
/var/log/wtmp
|
Login records
|
/var/log/yum.log
|
yum utility log file
|
Note Some utilities can have their own subdirectory under the /var/log directory.
Rotating Log Files
The system log files will continue to grow unless they are somehow moved or removed. Moving and removing log files is known as rotating the log files, which means that the current log file is renamed, and a new log file is created.
Most Linux systems use the logrotate
utility to rotate the log files. This tool automates the rotation,
compression, removal, and mailing of log files. Typically, you’ll rotate
your log files so that they don’t become too large and cluttered with
old data. You should delete log files that are older than a certain
number of days.
By default, the logrotate utility is automatically run from the cron scheduling tool on most Linux systems. Here’s a typical listing of the contents of the /etc/crontab file:
SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root HOME=/ # run-parts 01 * * * * root run-parts /etc/cron.hourly 02 4 * * * root run-parts /etc/cron.daily 22 4 * * 0 root run-parts /etc/cron.weekly 42 4 1 * * root run-parts /etc/cron.monthly
Notice that the /etc/crontab uses the run-parts utility to run all scripts located within a specified directory. For example, when run-parts inspects the /etc/cron.daily directory, it finds a file named logrotate that calls the logrotate utility. Listed here are the contents of a typical logrotate script:
#!/bin/sh /usr/sbin/logrotate /etc/logrotate.conf EXITVALUE=$? if [ $EXITVALUE != 0 ]; then /usr/bin/logger -t logrotate "ALERT exited abnormally with [$EXITVALUE]" fi exit 0
The behavior of the logrotate utility is governed by the /etc/logrotate.conf file. Here’s a listing of a typical /etc/logrotate.conf file:
# see "man logrotate" for details # rotate log files weekly weekly # keep 4 weeks worth of backlogs rotate 4 # create new (empty) log files after rotating old ones create # uncomment this if you want your log files compressed #compress # RPM packages drop log rotation information into this directory include /etc/logrotate.d # no packages own wtmp -- we’ll rotate them here /var/log/wtmp { monthly create 0664 root utmp rotate 1 } # system-specific logs may be also be configured here.
By default, the logs are rotated weekly on most Linux
systems, and four weeks’ worth of logs are preserved. These are
designated by the lines weekly and rotate 4 in the /etc/ logrotate.conf file. You can change the values within the /etc/logrotate.conf file to suit the rotating requirements of your environment.
If you list the files in the /var/log directory, notice that some log files end with an extension of .1 or .gz. This indicates that the logrotate utility is running on your system.
You can manually run the logrotate utility to rotate the log files. Use the -f option to force a rotation, even if logrotate doesn’t think it is necessary:
# logrotate -f /etc/logrotate.conf
Application–specific logrotate configurations are stored in the /etc/logrotate.d directory. Change directories to the /etc/logrotate.d directory and list some typical application logs on a Linux server:
# cd /etc/logrotate.d # ls acpid cups mgetty ppp psacct rpm samba syslog up2date yum
Setting Up a Custom Log Rotation
The logrotate utility is sometimes perceived as a utility only for SAs. However, any user on the system can use logrotate to rotate log files for applications for which they have read/write permissions on the log files. For example, as the oracle user, you can use logrotate to rotate your database alert.log file.
Here are the steps for setting up a job to rotate the alert log file of an Oracle database:
- Create a configuration file named alert.conf in the directory $HOME/config (create the config directory if it doesn’t already exist):
/oracle/RMDB1/admin/bdump/*.log { daily missingok rotate 7 compress mail oracle@localhost }
- In the preceding configuration file, the first line specifies the location of the log file. The asterisk (wildcard) tells logrotate to look for any file with the extension of .log in that directory. The daily keyword specifies that the log file should be rotated on a daily basis. The missingok keyword specifies that logrotate should not throw an error if it doesn’t find any log files. The rotate 7 keyword specifies that the log files should be kept for seven days. The compress keyword compresses the rotated log file. Finally, a status e-mail is sent to the local oracle user on the server.
- Create a cron job to automatically run the job on a daily basis:
0 9 * * * /usr/sbin/logrotate -f -s /home/oracle/config/alrotate.status /home/oracle/config/alert.conf
Note The previous two lines of code should be one line in your cron table (the code didn’t fit nicely on this page on one line). - The cron job runs the logrotate utility every day at 9 a.m. The -s (status) option directs the status file to the specified directory and file. The configuration file used is /home/oracle/config/alert.conf.
- Manually test the job to see whether it rotates the alert log correctly. Use the -f switch to force logrotate to do a rotation:
$ /usr/sbin/logrotate -f -s /home/oracle/config/alrotate.status \ /home/oracle/config/alert.conf
As shown in the previous steps, you can use the logrotate utility to set up log rotation jobs.
Compare using logrotate instead of writing a custom shell script such as the one described in recipe 10-8.
Monitoring Log Files
Many Linux systems have graphical interfaces for monitoring and managing
the log files. As a DBA, you often need to look only at a specific log
file when trying to troubleshoot a problem. In these scenarios, it is
usually sufficient to manually inspect the log files with a text editor
such as vi or a paging utility such as more or less.
You can also monitor the logs with the logwatch utility. You can modify the default behavior of logwatch by modifying the logwatch.conf file. Depending on your Linux system, the logwatch.conf file is usually located in a directory named /etc/log.d. To print the default log message details, use the --print option:
# logwatch --print
Many SAs set up a daily job to be run that automatically e-mails the logwatch report to a specified user. Usually this functionality is implemented as a script located in the /etc/cron.daily directory. The name of the script will vary by Linux system. Typically, these scripts are named something like 0logwatch or 00-logwatch.
Managing Solaris Log Files
The Solaris OS logs can be found under the /var directory. Table B-2 documents the names and purpose of commonly used log files in a Solaris environment.
Table B-2. Typical Solaris Log Files
Log File Name
|
Purpose
|
---|---|
/var/adm/messages
|
General-purpose, catch-all file for system messages
|
/var/adm/sulog
|
Records each attempt to use the su command
|
/var/cron/log
|
Contains entries for cron jobs running on the server
|
/var/log/syslog
|
Logging output from various system utilities (e.g., mail)
|
Viewing System Message Log Files
The syslogd daemon automatically records various system errors, warnings, and faults in message log files. You can use the dmesg command to view the most recently generated system-level messages. For example, run the following as the root user:
# dmesg
Here’s some sample output:
Apr 1 12:27:56 sb-gate su: [ID 810491 auth.crit] ’su root’ failed for mt... Apr 2 11:14:09 sb-gate sshd[15969]: [ID 800047 auth.crit] monitor fatal: protocol error...
The /var/adm directory contains several log directories and files. The most recent system log entries are in the /var/adm/messages file. Periodically (typically every 10 days), the contents of the messages file are rotated and renamed to messages.N. For example, you should see a messages.0, messages.1, messages.2, and messages.3 file (older files are deleted). Use the following command to view the current messages file:
# more /var/adm/messages
If you want to view all logged messages, enter the following command:
# more /var/adm/messages*
Rotating Solaris Log Files
You can rotate logs in a Solaris environment via the logadm utility, which is a very flexible and powerful tool that you can use to manage your log files. The logadm utility is called from the root user’s cron table. Here’s an example:
10 3 * * * /usr/sbin/logadm
This code shows that the logadm utility is called once per day at 3:10 a.m. The logadm utility will rotate files based on information in the /etc/logadm.conf file. Although you can manually modify this file, the recommended approach to modifying the /etc/logadm.conf file is via the logadm utility.
A short example will help illustrate how to add an entry. This next line of code instructs the logadm utility to add an entry with the -w switch:
# logadm -w /orahome/logs/mylog.log -C 8 -c -p 1d -t ’/orahome/logs/mylog.log.$n’ -z 1
Now if you inspect the contents of the /etc/logadm.conf file, the prior line has been added to the file:
/orahome/logs/mylog.log -C 8 -c -p 1d -t ’/orahome/logs/mylog.log.$n’ -z 1
The preceding line of code instructs logadm to rotate the /orahome/logs/mylog.log file. The -C 8 switch specifies that it should keep eight old versions before deleting the oldest file. The -c switch instructs the file to be copied and truncated (and not moved). The -p 1d switch specifies that the log file should be rotated on a daily basis. The -t switch provides a template for the rotated log file name. The -z 1 switch specifies that the number 1 rotated log should be compressed.
You can validate your entry by running logadm with the -V switch. Here’s an example:
# logadm -V
You can also force an immediate execution of the entry via the -p now switch:
# logadm -p now /orahome/logs/mylog.log
After running the preceding command, you should see that your log has been rotated:
# cd /orahome/logs # ls -altr -rw-r--r-- 1 root root 0 Apr 5 16:40 mylog.log.0 -rw-r--r-- 1 root root 0 Apr 5 16:40 mylog.log
To remove an entry from the /etc/logadm.conf file, use the -r switch. Here’s an example:
# logadm -r /orahome/logs/mylog.log
Summary
Server log files are often the first places to look
when you experience performance and security issues. These files contain
messages that help diagnose and troubleshoot problems. Because log
files tend to grow very fast, it is important to understand how to
rotate the logs, which ensures that they are archived, compressed, and
deleted at regular intervals.
On Linux systems, use the logrotate utility to rotate log files; on Solaris servers, use the logadm utility.
No comments:
Post a Comment