Friday 6 July 2018

Linux and solaris receipt for oracle dba

CHAPTER 7
image
Shell Scripting
Shell scripting is an important skill that every professional DBA must possess. Most DBAs use this coding technique to automate many critical database administration tasks such as backups and monitoring. As part of your job, you’ll be asked to maintain scripts that prior DBAs have written, and you’ll also be required to write new shell scripts as required. The better you are at scripting, the better you’ll be able to perform your job. To be effective at shell scripting, DBAs minimally need to be familiar with four types of tools:
  • SQL
  • Text editor
  • Shell interface
  • Shell scripting language
The most common tools that map to the previous list are SQL*Plus, the vi (or vim) editor, and the Bash shell. The Bash shell is both a shell interface and a scripting language. The previous bulleted list is a minimal list of tools that you should be familiar with. You’ll be invoking a wide variety of Oracle database utilities in shell scripts such as SQL*Plus, RMAN, Data Pump, external tables, and so on.
As discussed in previous chapters, a shell is a command-line interface that allows you to interact with the Linux/Solaris kernel. Some shells that are commonly used by DBAs and developers are the Bash shell (bash), Korn shell (ksh), and C shell (csh or improved tcsh). As explained in Chapter 3, this book focuses on the Bash shell, so we won’t rehash those details here.
A shell script is an OS file that contains one or more commands that a shell can execute. Any command that the shell can execute can be placed in a shell script file. For example, a shell script can run other shell scripts, executable binary files (such as sqlplus or rman), or any system utility (such as df, ps, and so on).
DBAs use shell scripts for critical tasks such as proactively monitoring the database and taking backups. These critical tasks need to be repeatable and reliable. DBAs typically use shell scripts to run a series of commands that accomplish the desired task. DBAs like shell scripting because they don’t need to be expert programmers to write shell scripts. With a base knowledge (and a good example), it is fairly quick and easy to create a shell script that performs the desired task.
The purpose of this chapter is to provide a core set of techniques to enable you to successfully write shell scripts.
This chapter does not cover all the facets of shell scripting (that would take an entire book to accomplish). Instead, we give you the common approaches and shell scripting fundamentals that DBAs need to perform their jobs. We also provide useful real-world examples. Once you become adept at shell scripting, you’ll be able to leverage these skills to automate critical database jobs. One could argue that a database administrator might exclusively use a graphical user interface (GUI) to automate database tasks. We strongly recommend you learn how to write shell scripts to automate database administration tasks. If the GUI ever becomes inaccessible or doesn’t do something you need it to do, for your job’s sake, you had better know how to write and debug a shell script.
The first recipe of this chapter introduces you to the basics of how to write a shell script.
7-1. Writing a Simple Shell Script
Problem
You’re new to shell programming, and you want to write a simple script to determine whether your database is accessible.
Solution
Use an editor to create a new file (see Chapter 4 for details on using the vi file editor). Place the following text within the file:
#!/bin/bash
# Check if database is up.
ORACLE_SID=O1212
ORACLE_HOME=/orahome/app/oracle/product/12.1.0.2/db_1
PATH=$ORACLE_HOME/bin:$PATH
echo "select ’DB up’ from dual;" | sqlplus -s system/foo
exit 0
Modify the previous code to match your ORACLE_SID and ORACLE_HOME variables and your system schema password. After you create the file, you’ll also need to modify the permissions on the file to be executable. In this example, the file name is dbcheck.bsh:
$ chmod +x dbcheck.bsh
This code changes the permission of the file to executable (x). Now you should be able to successfully run the program:
$ dbcheck.bsh
If your database is up, you should receive a message like this:
’DBUP
-----
DB up
All the checking within the script is performed with one line of code:
echo "select ’DB up’ from dual;" | sqlplus -s system/foo
The echo command pipes a valid SQL statement to the sqlplus executable. The sqlplus executable will then attempt to log on with the system schema and run the statement.
Image Note  Adding the file extensions .sh, .bsh, or .bash to the end of a Bash shell script is a common industry practice. Keep in mind that file extensions are meaningless in Linux/Solaris environments (other than helping you document the type of script). This is different from DOS, in which .exe, .com, and .bat indicate executable OS files.
How It Works
The first line of the shell script in the “Solution” section of this recipe needs a little more explanation. We reproduced the line here for the discussion:
#!/bin/bash
The # character is normally used to comment out a line in a shell script; for example:
# This is a comment.
One exception to that rule is when #! appears as the first text in the shell script. When #! is placed on the first line, it can then be combined with a path and program name. The path and program name specify the location and name of the program that will interpret the commands within the script. This is important because it means you can have the script run with a designated shell regardless of the interactive shell you’re using.
Image Note  The two-character sequence of #! is sometimes referred to as the shebang.
If you don’t specify a #!/<path>/<program>, the shell you’re currently logged on to will be used to interpret the commands within the script. We recommend that you specify the path and shell program on the first line of your script so that there is no ambiguity about which shell will be used to interpret the commands within the script.
On most systems, the bash interpreter is in the /bin directory. If you don’t know the location of your bash executable, you can use the which or whereis command to locate it:
$ whereis bash
bash: /bin/bash
When you first attempt to run a shell script, you may receive an error similar to the following:
-bash: dbcheck.bsh: command not found
It means that your PATH variable doesn’t include the current working directory. To work around this, you can reset the PATH variable to include the current working directory. This example exports the PATH variable to include the current working directory:
$ export PATH=$PATH:.
Another method for ensuring that the Bash shell can locate a script is to include the complete directory path to where the script resides. In this example, the script is located in the directory /home/oracle and is run as shown here:
$ /home/oracle/dbcheck.bsh
You can also instruct the shell to look in the current working directory to determine the location of the script. You do this by placing ./ before the script name:
$ ./dbcheck.bsh
When first running the script, you may also receive a message like this:
Error 6 initializing SQL*Plus
SP2-0667: Message file sp1<lang>.msb not found
This message most likely means that the ORACLE_HOME variable isn’t set correctly. In this situation, ensure that you set the ORACLE_HOME variable to the correct location.
BASH SHELL EXIT COMMAND
You can place an exit command at any location within a shell script to instruct the Bash shell to immediately terminate the program. A successful exit is normally specified with an exit or an exit 0. Exiting a shell script when a failure condition has been detected is indicated by a nonzero value, such as an exit 1. We recommend that you explicitly place an exit 0 command within your shell script to indicate a successful completion. You should also use a nonzero value such as exit 1 when an error condition has been detected.
Each Bash shell command that executes will also return an exit code. If a command executes successfully, it will terminate with a status of 0. If there has been some sort of a failure, the exit code will be nonzero. You can check the status of an exit code by inspecting the $? variable:
$ echo $?
0
The $? variable holds the exit value of the previously executed command. The nonsuccess value of an exit code will vary by each command. For example, the grep utility will return a 0 on successfully finding a match, a 1 if no matches are found, and a 2 if there has been some sort of a syntax error or missing input.
7-2. Checking Simple Conditions
Problem
You want to check for a condition such as whether a critical database background process is running and send an e-mail if there is a problem.
Solution
Use the if/then/else Bash control structure to check for a condition and perform an appropriate action. The following example uses an if/then/else structure to determine whether the Oracle system monitor (SMON) process is running and sends an e-mail if the process is not detected:
#!/bin/bash
# Check for background process.
ORACLE_SID=O1212
critProc=ora_smon
ps -ef | grep -v ’grep’ | grep ${critProc}_$ORACLE_SID
if [ $? -eq 0 ]; then
  echo "SMON $ORACLE_SID is available."
else
  echo "SMON $ORACLE_SID issue." | mailx -s "issue with SMON $ORACLE_SID" dba@gmail.com
fi
exit 0
Place the preceding code in a file named bgcheck.bsh and make it executable:
$ chmod +x bgcheck.bsh
Then run it:
$ ./bgcheck.bsh
If the SMON process is running, you’ll see output similar to the following:
oracle   27910     1  0 May11 ?        00:00:00 ora_smon_O1212
SMON O1212 is available.
The previous example uses the $? variable. This variable is often used after conditional statements to evaluate the success or failure of the previous command. The $? variable contains the status of the previous command executed. If the previously executed command was successful, the $? variable will contain a 0; otherwise, it will contain a nonzero value.
How It Works
The if/then/else control structure comes in three basic forms. The first one states that if a condition is true, execute the following commands. The syntax is as follows:
if condition ; then
  commands
fi
On the first line of code in the previous example, the keyword then is a separate command from the if keyword, so you must insert a semicolon to indicate the end line termination point of the if keyword. Another way of executing the previous bit of code is as follows:
if condition
then
  commands
fi
The next form of the if/then/else structure states if a condition is true, execute the following commands. If the first condition is false, execute a separate set of commands. Its syntax is as follows:
if condition ; then
  commands
else
   commands
fi
The third form of the if/then/else structure states that if a condition is true, execute the first set of commands; otherwise, check for the next condition. If it is true, execute the commands. This functionality is enabled with the elif keyword. You can have many elif conditions in this form. Its syntax is as follows:
if condition ; then
   commands
elif  condition
   commands
elif  condition
   commands
fi
Image Tip  You can also check for success (or not) of a command via the two conditional operators || and &&. See recipe 7-8 for more details on conditional operators.
7-3. Testing a Condition
Problem
You want to write a script that checks for certain conditions, such as the number of parameters passed to a script. Based on the condition, you want to perform an action such as displaying an informational message or exiting the script.
Solution
As shown in recipe 7-2, the if/then/else structure is an important programming technique. However, it is the combination of if/then/else with a condition that can be tested that gives you a much more powerful tool to automate DBA tasks. The test command enables you to check a condition within an if command. Here is the basic syntax for the test command:
test operand1 operator operand2
The test command can also be written with the [ ] syntax. This syntax uses a left square bracket to start the command and then finishes the command with a right square bracket. Its syntax is as follows:
[ operand1 operator operand2 ]
Image Note  The shell script examples in this chapter use the [ ] form of the test command.
For some test conditions, an operand1 isn’t required. The syntax for this condition is as follows:
[ operator operand2 ]
The previous test conditional checks will exit with a status of 0 (true) or 1 (false), depending on the evaluation of the condition. Ensure that you have a space between the operands, operators, and brackets. The space is how the shell knows where the operator and operand are separated. If there is no space between the operator, operand, and brackets, the shell will interpret the value as one string, which will result in erroneous outcomes.
To bring if/then/else and test together, we’ll write a small but useful piece of code that checks to see whether the correct number of parameters are passed to a script. The script will use the $# variable. The $# variable automatically gets assigned to the number of positional parameters typed at the command line and passed into the script. This variable is handy when you want to check for the correct number of parameters passed to a script. The script will also use the $1 variable to display the first parameter passed to the script. When parameters are passed to a script, the first parameter is automatically stored in the $1 variable, the second parameter is stored in $2, and so on.
The following bit of code uses the -ne conditional check to determine whether the number of parameters passed to the script is not equal to 1:
#!/bin/bash
if [ $# -ne 1 ]
then
  echo "Wrong number of parameters passed to script."
  exit 1
else
  echo "$1 passed to the script"
fi
exit 0
We next place this code in a script named ss.bsh and make it executable:
$ chmod +x ss.bsh
Now run the script with no parameters passed to it:
$ ./ss.bsh
Here is the corresponding output:
Wrong number of parameters passed to script.
Now run the script with one parameter passed to it:
$ ./ss.bsh testparm
Here is the output:
testparm passed to the script
The $0 parameter is often used in conjunction with the $# parameter to display the syntax required when invoking a script. Within a shell script, the $0 parameter contains the name of the shell script being executed. Here’s a slight variation of the previous code that uses the $0 variable to display the name of the script:
#!/bin/bash
if [ $# -ne 1 ]
then
  echo "Wrong number of parameters passed to script."
  echo "Usage: $0 ORACLE_SID"
  exit 1
else
  echo "$1 passed to the script"
fi
exit 0
The -ne operator is an arithmetic operator and is used to test whether the operands are not equal. If the script (named ss.bsh) is called without passing exactly one parameter to it, the following output is displayed:
Wrong number of parameters passed to script.
Usage: ./ss.bsh ORACLE_SID
Notice the ./ in front of the script name in the previous output. To eliminate the ./ from the output, use the basename command. This utility is used to strip any directory or suffix information from file names; for example:
#!/bin/bash
Pgm=$(basename $0)
if [ $# -ne 1 ]
then
  echo "Wrong number of parameters passed to script."
  echo "Usage: $Pgm ORACLE_SID"
  exit 1
else
  echo "$1 passed to the script"
fi
exit 0
The script is executed again:
$ ./ss.bsh
Notice in the output that the name of the script is displayed without directory information:
Wrong number of parameters passed to script.
Usage: ss.bsh ORACLE_SID
One last note: the following line of code from the prior script needs a bit more explanation:
Pgm=$(basename $0)
The previous line uses a technique known as command substitution. Command substitution allows you to take the output of a command and populate a variable. The basic syntax for doing this is as follows:
variable=$(shell commands)
This is a powerful feature that allows you to populate variables with the output of other commands. In our example, we took the output of basename $0 and populated the Pgm variable. Be aware that in older versions of the shell you may see command substitution implemented with the back tick syntax:
variable=`shell commands`
This is an older style and won’t be used in any of our examples.
How It Works
Testing for conditions is an integral part of shell script logic. The “Solution” section provides a practical example of using the test command with the [ ] syntax. There are several additional use cases for which you’ll need to test for conditions (e.g., an arithmetic condition, a string, if a file exists, and so on). Several real-world scenarios are explored in the following sections.
Testing an Arithmetic Condition
We have occasionally been in a situation in which an abnormally high number of SQL sessions get connected to the database. It might be due to an erroneous program initiating connections to the database and never exiting. In this scenario, it is useful to have a shell script that periodically checks to see whether the number of SQL processes running on the server is less than a certain threshold.
Here’s a simple script that checks to see whether the number of SQL processes is fewer than 300. If there are 300 or more processes executing, the script sends an e-mail:
#!/bin/bash
crit_var=$(ps -ef | grep sqlplus | wc -l)
if [ $crit_var -lt 300 ]; then
  echo $crit_var
  echo "processes running normal"
else
  echo "too many processes"
  echo $crit_var | mailx -s "too many sqlplus procs" dba@gmail.com
fi
exit 0
In the prior script, command substitution is used to populate the crit_var variable with the output of the ps command. The value of crit_var is then tested with the -lt (less than) arithmetic operator.
Several arithmetic operators are available with the Bash shell. Table 7-1 gives a brief description of each operator.
Table 7-1. Arithmetic Operators
Operator
Description
-eq
True if two integers are equal
-ne
True if two integers are not equal
-lt
True if operand1 is less than operand2
-le
True if operand1 is less than or equal to operand2
-gt
True if operand1 is greater than operand2
-ge
True if operand1 is greater than or equal to operand2
Testing Strings
You can use strings with test conditions, and there is a wide variety of ways to use string comparisons. For example, you may want to check to ensure that you’re logged on as a certain OS user before you run commands in a script. This example checks to see whether the user running the script is oracle:
#!/bin/bash
checkUser=oracle
curWho=$(whoami)
if [ "$curWho" != "$checkUser" ]; then
  echo "You are currently logged on as: $curWho"
  echo "Must be logged in as $checkUser to run this script.."
  exit 1
fi
exit 0
In the preceding bit of code, we use command substitution to populate the curWho variable with the output of the whoami command. The curWho variable is then checked to see whether it matches the string of oracle via the != (not equal) operator. If the user doesn’t match, the script displays informational messages and exits the script. Table 7-2 lists test operations for strings and their descriptions.
Table 7-2. String Operators
String Operator
Description
-z string
True if the string is empty
-n string
True if the string is not empty
string1 = string2
True if string1 equals string2
string1 != string2
True if the strings are not equal
string1 < string2
True if string1 sorts before string2
string1 > string2
True if string1 sorts after string2
Accepting Input from the Command Line
Another useful example of a string comparison is to read user input from the keyboard and verify an operation. Suppose you want to check the current Oracle SID variable before continuing to run more commands within the script. This is useful if you work with multiple databases contained on one physical server. This script displays the value of ORACLE_SID and asks whether you want to continue running the script:
#!/bin/bash
keepGoing=n
echo "Current value of ORACLE_SID: $ORACLE_SID"
echo -n "Do you want to continue? y/n "
read keepGoing
if [ "$keepGoing" = "y" ]; then
  echo "Continue to run script."
else
  echo "Exiting script"
  exit 1
fi
exit 0
Testing for the Existence of a File
In addition to arithmetic and string comparisons, you can also perform various tests on OS files. The test command allows you to perform checks such as the availability of a file, the file type, and so on. For example, you may want to determine whether a log file exists; if it does, you want it to send an e-mail to the appropriate support person. This script uses the -e (exists) parameter of the test command to determine this:
#!/bin/bash
checkFile=/home/trace/error.log
if [ -e $checkFile ]; then
  mail -s "error.log exists" dba@gmail.com <$checkFile
else
  echo "$checkFile does not exist"
fi
exit 0
If you want your shell script to do nothing after checking for a condition, use the colon (:) command (sometimes called no-operation or null). For example, the following bit of code does nothing if it detects that the given file exists:
#!/bin/bash
checkFile=/home/oracle/error.log
if [ -e $checkFile ]; then
  :
else
  echo "$checkFile does not exist"
fi
exit 0
Table 7-3 contains descriptions of the Bash shell tests for file operations.
Table 7-3. File Operators
File Operator
Description
-a
True if file exists
-b
True if file is a block device file
-c
True if file is a character device file
-d
True if file is a directory
-e
True if file exists
-f
True if file exists and is a regular file
-g
True if file has set-group-id permission set
-h
True if file is a symbolic link
-L
True if file is a symbolic link
-k
True if file’s sticky bit is set
-p
True if file is a named pipe
-r
True if the file is readable (by current user)
-s
True if file exists and is not empty
-S
True if file is socket
-u
True if file is set-user-id
-w
True if file is writable (by current user)
-x
True if file is executable
-O
True if file is effectively owned by current user
-G
True if file is effectively owned by current user’s group
-N
True if file has been modified since it was last read
file1 -nt file2
True if file1 is newer than file2
file1 -ot file2
True if file1 is older than file2
file1 -ef file2
True if file1 is a hard link to file2
Image Tip  The test command options will vary by OS version. For a complete listing of available test operations in your environment, use the help test command or the man test command.
7-4. Checking Complex Conditions
Problem
You need to perform a sophisticated set of condition checks, such as checking for free disk space on a server. When a particular mount point reaches a certain threshold, you want to send an e-mail to the DBA team. You suspect that you’ll need something more sophisticated than if/then/else checking, such as a case statement.
Solution
In many cases, a simple if/then/else construct is all you need to check a condition. However, as soon as you are presented with many different actions to take, the if/then/else syntax can become unwieldy and nonintuitive. In these situations, use a case statement instead. The basic syntax for a case statement is as follows:
case expression in
  pattern1)
     commands ;;
  pattern2)
     commands ;;
esac
The next example in this section uses a case statement to check for free disk space. The script replies on the output of the df command. So to understand how the script works, first run df -h and view its output:
$ df -h
Here’s the corresponding output for this server:
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda2            191G  165G   17G  91% /
/dev/xvda1            996M  136M  809M  15% /boot
The two mount points on this server are / and /boot. Looking ahead to the script, the mntlist variable within the script has been defined to match the mount points on this server that require monitoring for used space:
mntlist="/ /boot"
With that understanding, consider the following code, which uses a case statement to perform several checks:
#!/bin/bash
BOX=$(uname -a | awk ’{print $2}’)
mntlist="/ /boot"
for ml in $mntlist
do
echo "Mount point: $ml"
usedSpc=$(echo $(df -h $ml|awk ’{print $5}’|grep -v Use|cut -d "%" -f1 -))
BOX=$(uname -a | awk ’{print $2}’)
case $usedSpc in
[0-9])
diskStat="relax, lots of disk space: $usedSpc"
;;
[1-7][0-9])
diskStat="disk space okay: $usedSpc"
;;
[8][0-9])
diskStat="space getting low: $usedSpc"
;;
[9][0-9])
diskStat="warning, running out of space: $usedSpc"
echo $diskStat $ml | mailx -s "space on: $BOX" dba@gmail.com
;;
[1][0][0])
diskStat="update resume, no space left: $usedSpc"
echo $diskStat $ml | mailx -s "space on: $BOX" dba@gmail.com
;;
*)
diskStat="huh?: $usedSpc"
esac
# end case
echo $diskStat
done
# end for
exit 0
Assume that the preceding code is placed in a script named filesp.bsh and made executable:
$ chmod +x filesp.bsh
Next the script is executed:
$ filesp.bsh
And the output is displayed:
Mount point: /
warning, running out of space: 91
Mount point: /boot
disk space okay: 15
Within the script, the usedSpc variable gets assigned a value that shows what percentage of disk space is used on a mount point. The case statement then examines usedSpc to determine within which range the variable falls. Finally, if a given mount point exceeds the 90% full threshold, the script e-mails the DBA a message to indicate that there could be an issue.
How It Works
The code in the “Solution” section of this recipe uses shell commands, a case statement, and various coding techniques in a few lines of code. The result is a small but extremely useful script that monitors disk space. The usedSpc line of the script needs additional explanation. We repeat it here for convenience:
usedSpc=$(echo $(df -h $ml|awk ’{print $5}’|grep -v Use|cut -d "%" -f1 -))
The usedSpc variable is contained within a looping structure for ml in $mntlist. The loop executes for each mount point defined in the mntlist variable. The ml variable is assigned for the current mount point being examined. The output of the df command is piped to the awk command, which extracts the fifth column. This in turn is passed to the grep command, which eliminates any output that contains the string "Use". This output is piped to the cut command, which extracts the first field delimited by a % character.
Note that there’s an echo command embedded into the line of code; it is used because there’s extra space included in the output of the df command on some systems, and echo removes that extra space. The resultant string should be the percentage of disk space used on the mount point in question.
You may have to tweak the usedSpc line of code, depending on the output of df for your system. For example, the output of the df command might not display the string "Use" on some platforms. Case in point: on some Solaris systems, the output from the df command displays the string "Capacity" to indicate the amount of disk space used; for example:
$ df -h
Here is some sample output on a Solaris system:
Filesystem             Size   Used  Available Capacity  Mounted on
orapool1/ora01         350G   203G       147G      59%  /ora01
orapool2/ora02         350G   265G        85G      76%  /ora02
orapool1/ora03         350G   254G        96G      73%  /ora03
In this situation, you can modify the script to use grep to filter out the string "Capacity"; for example:
usedSpc=$(echo $(df -h $ml|awk ’{print $5}’|grep -v Capacity|cut -d "%" -f1 -))
Here’s the corresponding output when filesp.bsh is run on a Solaris system:
Mount point: /ora01
disk space okay: 59
Mount point: /ora02
disk space okay: 76
Mount point: /ora03
disk space okay: 73
Also note that for this Solaris system, the mntlist variable in the script needs to be defined as follows:
mntlist="/ora01 /ora02 /ora03"
Let’s now go back to the example in the “Solution” section of this recipe: the case statement performs a sophisticated set of string comparisons on the value stored in the usedSpc variable. The case statement will check each condition until it finds a match. When a condition is met, the case statement runs any statements within the matched section and then exits.
An example will help clarify this concept. Let’s look at the first condition in the case statement in the “Solution” section of this recipe:
[0-9])
diskStat="relax, lots of disk space: $usedSpc"
;;
In the preceding snippet of code, the case statement checks the value of the variable to see whether it is a one-digit string that contains a value within the range of 0 through 9. If it matches, it sets the diskStat variable to an appropriate message and exits the case statement.
Take a look at the second condition in the case statement:
[1-7][0-9])
diskStat="disk space okay: $usedSpc"
;;
In this bit of code, the case statement checks for a two-digit number. The first character it looks for must be in the range of 1 through 7. The second character can be any number from 0 to 9. If the pattern matches, the diskStat variable is set to an appropriate value, and the case statement exits.
Now examine the conditions near the end of the case statement:
[9][0-9])
diskStat="warning, running out of space: $usedSpc"
echo $diskStat $ml | mailx -s "space on: $BOX" dba@gmail.com
;;
[1][0][0])
diskStat="update resume, no space left: $usedSpc"
echo $diskStat $ml | mailx -s "space on: $BOX" dba@gmail.com
;;
The idea here is that if a mount point is 90% used or above, send an e-mail to the DBA, warning that disk space is getting low. Finally, if no match is made, the catchall clause *) will be executed, and the case statement will be exited.
The structure of the case statement allows you to perform complicated comparisons that would probably drive you crazy if you tried to code them using if/then/else statements. Table 7-4 lists some common pattern-matching characters used in case statements.
Table 7-4. Common Character-Matching Patterns
Pattern
Description
a|b
Matches either a or b
*
Matches any string of characters, often used for a catchall
[abc]
Matches any character a, b, or c
[a-c]
Matches any character a, b, or c
[0-9]
Matches any character 0 through 9
"<string>"
Matches the string enclosed in the quotes
One final note: typically we’ll use the cron utility (see Chapter 10 for details) to automatically have the disk monitoring script run on a periodic basis (e.g., once every 30 minutes or once per hour). In this way, we’re warned when a server is running out of space. You may be thinking, “Isn’t it the system administrator’s job to monitor disk space?” Yes it is, but we’ve been in numerous situations (too many to count) in which a mount point filled up, the database ceased working, and the production support DBA gets called. By running your own disk space–monitoring script, you’ll give yourself some warning regarding disk fullness issues and save yourself a lot of headaches.
7-5. Repeating a Task
Problem
You want to perform a check on several databases running on a server. You don’t want to have to create a script for each database; in other words, you’d rather write one script in a flexible manner to be used for all databases.
Solution
A for loop allows you to re-run a section of code a fixed number of times. This control construct is particularly useful because DBAs often have a known set of databases or files that need to be operated on. The for loop syntax is as follows:
for name [in list]
do
  commands that can use $name
done
The following code illustrates the power of a for loop. In this environment, there are three databases that are being monitored for a critical background process. The for loop allows you to provide an input list and have the same code re-executed for each database name in the input list:
#!/bin/bash
SID_LIST="dev1 dev2 dev3"
critProc=ora_smon
for curSid in $SID_LIST
do
  ps -ef | grep -v ’grep’ | grep ${critProc}_$curSid
  if [ $? -eq 0 ]; then
    echo "$curSid is available."
  else
    echo "$curSid has issues." | mail -s "issue with  $curSid" dba@gmail.com
  fi
done
exit 0
In this manner, you can efficiently use code to repeat a task.
How It Works
The for loop iterates through each argument passed in to the parameter list. This control structure is ideal for a fixed input list. (Depending on which shell you use, the syntax may be slightly different from the one described in the “Solution” section.)
There are a few other aspects about looping that require further analysis. For example, you can use the built-in Bash shell $@ variable to pass a parameter list to a for loop. The $@ variable contains a quoted list of arguments passed to the script. By default, a for loop will use $@ if no input list is provided. The previous code snippet can be slightly modified to take advantage of this technique, as shown here:
#!/bin/bash
critProc=ora_smon
for curSid in $@
do
  ps -ef | grep -v ’grep’ | grep -i ${critProc}_$curSid
  if [ $? -eq 0 ]; then
    echo "$curSid is available."
  else
    echo "$curSid has issues." | mail -s "issue with  $curSid" dba@gmail.com
  fi
done
exit 0
Assume that the preceding bit of code is placed in a file named dbup.bsh. It can now be run from the command line to pass in a list of databases to check:
$ dbup.bsh dev1 dev2 dev3
One last note: there are many different methods to implement a for loop. For example, it is possible to iterate through a for loop based on the output of a command:
for a in $(ls /home/oracle)
do
    echo "$a"
done
You can also iterate based on a sequence of numbers:
for a in {1..10}
do
  echo "$a "
done
Here’s another common method for looping through a sequence of numbers:
for ((a=1; a <= 10; a++))
do
  echo "$a "
done
The method that you choose depends on your personal preference and the task at hand.
7-6. Iterating Until a Condition Is Met
Problem
You want to perform an operation an unknown number of times until a certain condition is achieved.
Solution
The while and until flow control constructs allow a piece of code to iterate until a condition is met. In contrast with a for loop construct, the while and until loops are useful when the number of times needed to continue looping is not known beforehand. The while loop runs until a test condition has a zero exit status. The syntax for the while loop is as follows:
while condition ; do
   commands
done
A small example will demonstrate the utility of the while loop. Suppose that you want to check the sqlplus process count on a box every 15 seconds. If it exceeds a certain value, you want to send an e-mail and exit. Here’s some code to do just that:
#/bin/bash
crit_var=0
while [ $crit_var -lt 300 ]; do
  crit_var=$(ps -ef | grep sqlplus | wc -l)
  echo "Number of sqlplus processes: $crit_var"
  sleep 15
done
echo $crit_var | mailx -s "too many sqlplus procs" dba@gmail.com
exit 0
The until control construct is similar to the while loop. The until loop runs until a test condition has a nonzero exit status:
until condition ; do
  commands
done
Next is a script that uses an until loop in the same fashion as the while loop example:
#/bin/bash
crit_var=0
until [ $crit_var -ge 300 ]; do
  crit_var=$(ps -ef | grep sqlplus | wc -l)
  echo "Number of sqlplus processes: $crit_var"
  sleep 15
done
echo $crit_var | mailx -s "too many sqlplus procs" dba@gmail.com
exit 0
In this way, you can continually execute a task until a condition is met.
How It Works
The while or until constructs are useful when you need to iterate but don’t know in advance the number of iterations. In other words, the requirement is to loop until a condition has been met and then exit.
Here’s another useful example of using a while loop. Sometimes it is useful when debugging scripts to iterate through all arguments passed to a shell script and view the parameter values. This snippet of code uses a while loop to display all parameters passed into a script:
while [ $# -ne 0 ]; do
  echo $1
  shift 1
done
The code is placed in a shell script named test.bsh, made executable, and run as follows:
$ chmod +x test.bsh
$ test.bsh dev1 dev2 dev3
Here is the corresponding output:
dev1
dev2
dev3
In the previous code sample, the shift command is used to move the positional parameters one position to the left. You can think of the positional parameters as an array of values, and (when invoked) the shift command (destructively) moves these values left in the array by the specified number of values.
An example helps to clarify this shifty concept. Suppose that there are three parameters passed into a program: A, B, and C. The positional variable $1 will contain A, $2 will contain B, and $3 will contain C. When you issue the shift 1 command, $1 now contains B, $2 contains C, and $3 now contains nothing. Another shift will move C into $1, and $2 and $3 will now be empty, and so forth.
One last note: there are many different ways to implement a while loop. Here’s a common while loop structure that C programmers will recognize:
((a = 1))
while (( a <= 10 ))
do
  echo "$a "
  ((a += 1))
done
Here’s an example of using a while loop with a function (more on functions later in this chapter):
a=0
condition ()
{
  ((a++))
  if [ $a -lt 11 ]
  then
    return 0  # true
  else
    return 1  # false
  fi
}
while condition
do
  echo "$a"
done
And here is yet another way to implement an until loop:
a=1
until (( a > 10 ))
do
  echo "$a"
  (( a++ ))
done
The method that you choose depends on your personal preference and the task at hand.
7-7. Displaying a Menu of Choices
Problem
You want to present a menu of choices for the shell script user to pick from.
Solution
The select command allows you to create a menu from an input list. If the input list is omitted, the positional parameters (contained in the $@ variable) are used to construct the menu. The syntax of the select command is nearly identical to that of the for command:
select name in [input list ]
do
  commands that use $name
done
Listed next is a shell script that uses the select command to query the contents of the /etc/oratab file (/var/opt/oracle/oratab in Solaris environments) and sets your Oracle OS variables, depending on which value for ORACLE_SID that you chose:
#!/bin/bash
# Sets Oracle environment variables.
# Setup: 1. Put oraset file in /etc (Linux), in /var/opt/oracle (Solaris)
#        2. Ensure /etc or /var/opt/oracle is in $PATH
# Usage: batch mode: . oraset <SID>
#        menu mode:  . oraset
#====================================================
if [ -f /etc/oratab ]; then
  OTAB=/etc/oratab
elif [ -f /var/opt/oracle/oratab ]; then
  OTAB=/var/opt/oracle/oratab
else
     echo ’oratab file not found.’
     exit
fi
#
if [ -z $1 ]; then
  SIDLIST=$(egrep -v ’#|\*’ ${OTAB} | cut -f1 -d:)
  # PS3 indicates the prompt to be used for the Bash select command.
  PS3=’SID? ’
  select sid in ${SIDLIST}; do
    if [ -n $sid ]; then
      HOLD_SID=$sid
      break
    fi
  done
else
  if egrep -v ’#|\*’ ${OTAB} | grep -w "${1}:">/dev/null; then
    HOLD_SID=$1
  else
    echo "SID: $1 not found in $OTAB"
  fi
  shift
fi
#
export ORACLE_SID=$HOLD_SID
export ORACLE_HOME=$(egrep -v ’#|\*’ $OTAB|grep -w $ORACLE_SID:|cut -f2 -d:)
export ORACLE_BASE=${ORACLE_HOME%%/product*}
export TNS_ADMIN=$ORACLE_HOME/network/admin
export ADR_BASE=$ORACLE_BASE/diag
export PATH=$ORACLE_HOME/bin:/usr/ccs/bin:/opt/SENSsshc/bin/\
:/bin:/usr/bin:.:/var/opt/oracle:/usr/sbin:/etc
export LD_LIBRARY_PATH=/usr/lib:$ORACLE_HOME/lib
In this example, the code is placed in a file named oraset in the /etc directory on Linux (or /var/opt/oracle on Solaris). The oraset script is made executable as follows:
$ chmod +x oraset
Before you run oraset, view the contents of the oratab file for this example:
ORA12CR1:/orahome/app/oracle/product/12.1.0.1/db_1:N
O1212:/orahome/app/oracle/product/12.1.0.2/db_1:N
TRG:/orahome/app/oracle/product/12.1.0.2/db_1:N
O112:/orahome/app/oracle/product/11.2.0.4/db_1:N
ORA117:/orahome/app/oracle/product/11.1.0/db_1:N
The names of the databases in the preceding text are ORA12Cr1, O1212, and so on. The path to each database’s home directory is next in the line (separated from the database name by a :). The last column should contain a Y or an N, which indicates whether you want the databases to automatically be restarted when the system reboots (this takes additional setup besides just the Y/N).
Now run oraset from the command line as follows:
$ . /etc/oraset
When running oraset from the command line, you should be presented with a menu such as this (derived from the database names in your oratab file):
1) ORA12CR1
2) O1212
3) TRG
4) O112
5) ORA117
SID ?
In this example, you can now enter 1, 2, 3, 4, or 5 to set the OS variables required for whichever database you want to use. This allows you to set up required Oracle OS variables interactively, regardless of the number of database installations on the server.
How It Works
The Bash shell built-in select command provides an easy way of presenting a menu of choices to the shell script user. The oraset script presented in the “Solution” section of this recipe is a good example of a DBA using the select command to help accurately and consistently set the required OS variables when you have multiple databases running on a server.
When running oraset, be sure to use the . (dot) notation, which instructs the shell to source the script. Sourcing tells your current shell process to inherit any variables set within an executed script. If you don’t use the . notation, the variables set within the script are visible only within the context of the subshell that is spawned when the script is executed.
Image Note  In the Bash and C-shell shell, the source command and the . built-in are synonymous.
You can run the oraset script either from the command line or from a startup file (such as .profile, .bash_profile, or .bashrc). For example, place this line in your startup file:
. /etc/oraset
Now every time you log in to the server, you’ll see a menu of choices that you can use to indicate the database for which you want the OS variables set. If you want the OS variables automatically set to a particular database, put an entry in your startup files such as this:
. /etc/oraset TRG
The prior line will run the oraset file for the TRG database and set the OS variables appropriately.
7-8. Running Commands Based on Success/Failure of the Previous Command
Problem
You have some custom database monitoring code that you have set up. The job should be continuously running on the database server, but sometimes the job unexpectedly dies. You want to set up another job that determines whether the job isn’t running; if not, it should restart the monitoring job. In other words, in one line of code, you need to run a command to see whether the process is running; if not, run another command to restart the job.
Solution
In one line of code, use the || <space> and && control operators to conditionally execute a command based on the status of the previously run command. Here’s the basic syntax for how || works:
<run command 1> || <run command 2 if command 1 did not succeed>
And here’s the syntax for the && operator:
<run command 1> && <run command 2 if command 1 succeeded>
Here’s a simple example illustrating how it works:
$ ls myfile.txt || echo "file does not exist"
Here’s the output indicating that the echo command was executed (meaning that the ls command did not succeed in listing the myfile.txt file):
myfile.txt: No such file or directory
file does not exist
Now suppose that you create the following file:
$ touch myfile.txt
Re-run the command:
$ ls myfile.txt || echo "file does not exist"
The echo command is not executed because the ls command succeeded. The only output returned is the output of the ls command:
myfile.txt
Now that you have that background information, examine the following line of code:
$ ps -ef | grep dbwatch | grep -v grep || nohup /home/oracle/bin/dbwatch.bsh &
If the output from ps -ef | grep dbwatch | grep -v grep does not return a value (meaning that dbwatch is not running on the server), the following code is executed to restart the process:
nohup /orahome/oracle/bin/dbwatch.bsh &
If the ps command does return a value, it means the process is running, so don’t execute the script to restart the job. If you want to automate this, you can place a line in cron as follows:
33 * * * * ps -ef | grep dbwatch | grep -v grep || nohup /home/oracle/bin/dbwatch.bsh &
The preceding code is automatically run every hour (33 minutes after the hour) and checks to see whether dbwatch is running. If not, it is restarted.
How It Works
Sometimes it is useful (on one line of code) to have a command conditionally execute, depending on the success or failure of the previous immediately run command. The && and || operators are designed to do just that. The “Solution” section showed an example using ||; next is an example that uses &&, which means that you want a command to execute if the previous command was successful.
In this next line of code, an e-mail will be sent only if the grep command successfully finds the string "ORA-00600" in the alert.log file:
$ grep ORA-00600 alert*.log && echo "DB prob" | mailx -s "ORA 600 error" dba@gmail.com
On one line of code, you can conditionally execute commands based on the success or failure of the prior command. The examples in this recipe provide another shell-scripting tool that you can use in creative ways to monitor for various activities and alert you if there are problems.
7-9. Modularizing Scripts
Problem
You want to make your scripts more modular and functional. You determine that shell functions will help accomplish this task.
Solution
Functions, which are commonly used in most programming languages, are blocks of commands that perform actions. You can think of a function as a small script within another script that compartmentalizes a section of code. The idea is to define a section of code once and then call that section of code multiple times from other parts of the program. Using functions allows you to modularize your code and make it more reusable, readable, and maintainable.
Like variables, functions must be declared before you can use them. Not surprisingly, the function command is used to declare functions. To illustrate the use of functions, let’s say you need to create a reusable bit of code that displays some debugging information. This example creates a function named showMsg:
function showMsg {
  echo "----------------------------------------"
  echo "You’re at location: $1 in the $0 script."
  echo "----------------------------------------"
} # showMsg
The function can now be referenced anywhere in the script after the point at which it was declared. For example, suppose you want to use the previous function in a script. You can add it before it is called and then reference the function multiple times within the code:
#!/bin/bash
debug=1
function showMsg {
  echo "----------------------------------------"
  echo "You’re at location: $1 in the $0 script."
  echo "----------------------------------------"
} # showMsg
#
SID_LIST="dev1 dev2 dev3"
critProc=ora_smon
#
if [[ debug -eq 1 ]]; then
showMsg 1
fi
#
for curSid in $SID_LIST
do
  ps -ef | grep -v ’grep’ | grep ${critProc}_$curSid
  if [ $? -eq 0 ]; then
    echo "$curSid is available."
  else
    echo "$curSid has issues." | mail -s "issue with  $curSid" dba@gmail.com
  fi
done
#
if [[ debug -eq 1 ]]; then
showMsg 2
fi
#
exit 0
Assume that the preceding code is placed in a script named d.bsh. After it is executed, here is the output:
----------------------------------------
You’re at location: 1 in the ./d.bsh script.
----------------------------------------
----------------------------------------
You’re at location: 2 in the ./d.bsh script.
----------------------------------------
In this way, you can use a function to define the code once and execute it many times within the script.
How It Works
Functions allow you to organize large shell scripts into modular pieces of code, allowing for easier debugging, maintenance, and reusability. Functions can also be declared and invoked directly from the OS command line. From a DBA perspective, this gives you a very powerful tool that allows you to create and use any number of useful functions that can be run as if they were OS commands. For example, create a file named dba_fcns and place it in the following commands:
#!/bin/bash
#------------------------------------------------------------------------------#
# sshl : ssh with your login to remote host
  function sshl {
    echo "ssh -l $LOGNAME $*"
    ssh -l $LOGNAME $*
  } # sshl
#------------------------------------------------------------------------------#
# ssho : ssh with the oracle userid to remote host
  function ssho {
    echo "ssh -l oracle $*"
    ssh -l oracle $*
  } # ssho
#------------------------------------------------------------------------------#
# chkps: check for a process on the box
  function chkps {
    ps -ef | grep -i $1 | grep -v grep
  } # chkps
#------------------------------------------------------------------------------#
Now source the file as shown here:
$ . dba_fcns
You now have access to the sshl, ssho, and chkps functions from the OS command line. To illustrate this, the chkps function is called while passing in the string of smon to operate on:
$ chkps smon
You’ll most likely collect many functions in your DBA tool bag to alleviate having to type long, typo-prone shell commands. Yes, for the previous simple functions, you could create aliases to accomplish essentially the same task. However, functions give you the additional ability to combine several different commands, use parameters, and display useful informational messages.
7-10. Passing Parameters to Scripts
Problem
You don’t like hard-coding variables in your script. You want to change a script to set variables based on parameters passed to the script so your code is more reusable, flexible, and maintainable.
Solution
First, take a look at this script with hard-coded values in it for the database SID:
#!/bin/bash
ORACLE_SID=brdstn
rman target / <<EOF
backup database;
EOF
The << characters instruct the command running (in this case, the rman utility) to receive its input from anything that appears between the first EOF and the last EOF. You don’t have to use EOF for start and finish markers; you can use any text string. It simply marks the beginning and end of where the shell directs the input for the command running.
If you want to use this script as it is to back up a different database, you have to manually edit it and change the name of the database, which isn’t very efficient. A better approach is to modify the script so that it can dynamically be passed the name of the database to be backed up. Assume for this example that the script name is back.bsh. The script is modified as shown to accept the database name as input:
#!/bin/bash
ORACLE_SID=$1
rman target / <<EOF
backup database;
EOF
In the previous bit of code, $1 is a built-in variable in the Bash shell. The $1 variable holds the first parameter passed to the script when invoking it. Now the script can be run by passing in a database name. In this example, pass in the name of devdb:
$ back.bsh devdb
How It Works
Passing parameters to scripts allows for greater flexibility and reusability of a script. You can pass any number of arguments into a shell script. The first parameter is referenced inside the script as $1, the second parameter is referenced as $2, and so on. These shell variables are known as positional parameters, which are special variables that are set internally by the shell and are available for you to use within a script.
If you pass in more than nine positional parameters to a script, when you reference a positional variable, you will have to use braces {} to wrap the number portion of the parameter with multidigit parameters. Without braces, the variable $10 will be interpreted as the contents of $1 with a 0 concatenated to it, whereas ${10} will be interpreted as the contents of the tenth variable.
An example helps to illustrate this point. Suppose that you pass 10 parameters to a script, as shown here:
$ myscript.bsh a b c d e f g h i j
For illustration purposes, suppose that this line of code is contained within the script:
echo $10
In this case, a result of a0 is produced because it is echoing the contents of parameter 1 concatenated with a 0.
When the braces are used with the echo command, the line of code produces a j, which is the tenth parameter that was passed to the script:
echo ${10}
Remember to use the braces any time you reference a multidigit parameter within a shell script, or else you won’t get the results you intended.
Note that besides positional parameters, the shell provides other special variables for you. Some of the more useful special shell variables are described in Table 7-5. Examples of using these variables are found throughout this chapter.
Table 7-5. Special Shell Variables
Name
Description
$1 - $n
Positional parameters that hold values for parameters passed to the script.
$?
The exit status of the last command. Contains a 0 value for successfully executed commands. Contains a nonzero value for commands that failed. This nonzero value depends on what the command actually returned.
$0
Within a shell script, contains the name of the shell script being executed.
$#
The number of positional parameters passed to a script.
$$
The process number of the shell. Can be used to generate unique file names.
$!
The process number of the most recently executed background process.
$*
Contains all the positional parameters passed to the script.
7-11. Processing Parameters
Problem
When passing parameters to a shell script, you require an efficient built-in method for processing the parameters. For example, you want a simple method to test for unexpected parameters or missing parameters.
Solution
The getopts tool is a built-in shell command that provides an efficient mechanism for validating switches and parameters passed into a shell script. The best way to understand how this works is to examine a script that uses getopts to process parameters. Suppose that you have the requirement to pass in to an RMAN backup script the database name to be backed up and whether backup compression should be enabled. You want to be able to see whether the correct parameters have been passed in and display an informative message if the parameters are incorrect. Here is a script that uses getopts to examine and act on parameters passed to an RMAN backup script:
#!/bin/bash
PRG=$(basename $0)
USAGE="Usage: $PRG -s SID [-c compress] [-h]"
if [ $# -eq 0 ]; then
  echo $USAGE
  exit 1
fi
#
# In the OPTSTRING variable, if the first character is a :, then surpress system
# generated messages. If a char is followed by :, then an argument is expected to be
# passed in for a given option. The OPTARG environment variable contains the
# argument passed in for a given option.
#
OPTSTRING=":s:c:h"
while getopts "$OPTSTRING" ARGS; do
  case $ARGS in
  s) ORACLE_SID=${OPTARG}
     ;;
  c) COMP_SWITCH=$(echo ${OPTARG} | tr ’[A-Z]’ ’[a-z’)
     if [ $COMP_SWITCH = "compress" ]; then
       COMP_MODE=" as compressed backupset "
     else
       echo $USAGE
       exit 1
     fi
     ;;
  h) echo $USAGE
     exit 0
     ;;
  *) echo "Error: Not a valid switch or missing argument."
     echo ${USAGE}
     exit 1
     ;;
  esac
done
#
echo rman backup
rman target / <<EOF
backup $COMP_MODE database;
EOF
#
exit 0
Assuming that the prior code is placed in a script named rman.bsh, ensure that the script is executable:
$ chmod +x rman.bsh
The valid parameters to this script are s (accompanied with ORACLE_SID), c (compress), and h (help). First, here’s what happens when the script is run with an invalid parameter:
$ rman.bsh -v mydb
Error: Not a valid switch or missing argument.
Usage: rman.bsh -s SID [-c compress] [-h]
The usage note specifies the correct way to run the script; for example:
$ rman.bsh -s O1212 -c compress
connected to target database: O1212 (DBID=353735090)
RMAN>
Starting backup...
If you want to display help for the shell script, do so as follows:
$ rman.bsh -h
Usage: rman.bsh -s SID [-c compress] [-h]
This simple example demonstrates the flexibility and ease with which parameters are evaluated using getopts.
How It Works
The getopts (get options) utility enables you to efficiently inspect and process command-line switches and parameters. The getopts program ensures that a standard interface is used for shell program parameter handling. The basic syntax for getopts is as follows:
getopts optstring name
The OPTSTRING variable contains the list of options expected to be passed in from the command line when the script is executed. NAME is the variable used to read the command-line options one by one. The getopts command also relies on the OPTARG environment variable. This variable contains the argument value passed in for each option.
With that understanding, the script in the “Solution” section contains the following two lines of code:
OPTSTRING=":s:c:h"
while getopts "$OPTSTRING" ARGS; do
When the first character in OPSTRING is :, it will suppress any system–generated error messages. In other words, when the first character in OPTSTRING is :, it instructs the getopts command to handle all error messages that are generated (and don’t display system-generated messages).
If an option character is followed by :, an argument is expected on the command line. The s and c options are both followed by colons, so they require arguments to be passed into the script. The h option is not followed by a colon, so it does not require an argument.
Image Note  There is also a getopt command (no s on the end). This command is used in a similar fashion to getopts. View the man getopt documentation for more information.
7-12. Running Database Commands in Scripts
Problem
You want to run a database utility command within a shell script. For example, you want to run an RMAN backup from within a shell script.
Solution
There are several techniques for running database commands from within shell scripts. These two techniques are commonly used:
  • Running commands directly
  • Capturing output in a variable
These techniques are described in the following sections.
Running a Command Directly
Here is a script that invokes the Oracle RMAN utility and takes a backup of the database:
#!/bin/bash
ORACLE_SID=DEV_DB
rman target / <<EOF
backup database;
EOF
exit 0
The << characters instruct the command that is running (in this case, the rman utility) to receive its input from anything that appears between the first EOF and the last EOF. You don’t have to use EOF for start and finish markers, you can use any text string. It simply marks the beginning and end of where the shell directs the input for the command running.
This technique applies to any Oracle utility. The following runs a SQL*Plus command within a shell script:
#!/bin/bash
ORACLE_SID=DEV_DB
sqlplus -s <<EOF
/ as sysdba
select sysdate from dual;
EOF
exit 0
Capturing Output in a Variable
Command substitution is a technique in which you run a command and store its output in a variable. You can use command substitution to run a database utility (such as SQL*Plus). For example, you want to determine whether critical materialized views are refreshing on a daily basis. One way of doing this is to select a count from the data dictionary view USER_MVIEWS, where the last refresh date is greater than one day. This bit of code uses command substitution to run a database command and capture the output of the command in the critVar variable:
#/bin/bash
critVar=$(sqlplus -s <<EOF
pdb_m/abc@papd
SET HEAD OFF FEED OFF
SELECT count(*) FROM user_mviews WHERE sysdate-last_refresh_date > 1;
EXIT;
EOF)
The script returns a value into the variable critVar, and you can test to see whether the value is 0:
if [ $critVar -ne 0 ]; then
mail -s "Problem with MV refresh" dba@gmail.com <<EOF
MVs not okay.
EOF
else
  echo "MVs okay."
fi
exit 0
If the value in the critVar variable isn’t 0, the script will send an e-mail.
How It Works
The basic technique for running database utilities within a script is to run the utility directly (as you would from the command line) or use command substitution. The key here is that you must use the following syntax:
database_utility << EOF
<run database_utility commands>
EOF
The code instructs the shell to execute the database utility; then anything between << EOF and the next EOF are commands run by the database utility. The string "EOF" can be any string. We use EOF as our standard.
The following example uses EOF for the start and end maker. Any text between the start and end markers are commands executed by SQL*Plus. This particular script displays any database objects that have been created within the last week:
#!/bin/bash
newobjs=$(sqlplus -s << EOF
fbar/invqi@INVQI
select object_name
from dba_objects
where created > sysdate - 7
and owner not in (’SYS’,’SYSTEM’);
EOF)
echo $newobjs | mailx -s "new objects" dba@gmail.com
exit 0
Be aware that there are two techniques for achieving command substitution in a shell script:
  • $(command)
  • `command`
For example, if you want to return the name of a server into a variable, you can use two techniques. The first is the following:
$ BOX=$(uname -a | awk ’{print$2}’)
The second is the following:
$ BOX=`uname -a | awk ’{print$2}’`
The $(command) is more modern and is thus the preferred technique. Just be aware that you may see command substitution implemented with the `command` syntax.
One last note: if you’re using a data dictionary view (within a shell script) that contains a $ as part of the view name, you must escape with a backslash the dollar sign within the shell script. For example, the following selects from the view V$DATAFILE the number of datafiles that have an OFFLINE status:
#!/bin/bash
nf=$(sqlplus -s << EOF
/ as sysdba
set head off
select count(*)
from v\$datafile
where status=’OFFLINE’;
EOF)
echo "offline count: $nf"  | mailx -s "# files offline" prod@supp.com
You must escape the $ as shown in the script (e.g., v\$datafile). If you don’t escape the $, the shell script interprets the view name as a shell variable. The backslash (\) in front of the $ instructs the shell script to ignore the meaning of special characters.
7-13. Crafting a Robust DBA Shell Script
Problem
You want to write a flexible and reusable shell script that incorporates the techniques used by experienced shell writers.
Solution
Most shell scripts that DBAs use require the following functionality:
  1. Sets the shell
  2. Validates parameters passed to the script
  3. Sets any special variables to be used in the script
  4. Sets the Oracle environment variables
  5. Calls the Oracle utility
  6. Captures the output in a unique log file name
  7. Sends an e-mail indicating the success or failure of the job
  8. Exits the script
Listed next is a basic shell script that uses these techniques to determine whether a SQL*Plus connection can be made to a database. The line numbers have been included for discussion purposes; they should be deleted before you attempt to run the script:
1  #!/bin/bash
2  PRG=$(basename $0)
3  #
4  # Validate parameters
5  USAGE="Usage: ${PRG} <database name> "
6  if [ $# -ne 1 ]; then
7     echo "${USAGE}"
8     exit 1
9  fi
10  #
11  # Set variables used in the script
12  SID=${1}
13  CONSTR=system/foo@${SID}
14  MAILX=’/bin/mailx’
15  MAIL_LIST=’dba@gmail.com’
16  LOG_DIR=/home/oracle/scripts
17  DAY=$(date +%F)
18  LOG_FILE=${LOG_DIR}/${PRG}.${DAY}.$$.log
19  LOC_SID=O1212
20  BOX=$(uname -a | awk ’{print$2}’)
21  #
22  # Source oracle variables
23  . /etc/oraset $LOC_SID
24  #
25  # Attempt to connect to database via SQL*Plus
26  crit_var=$(sqlplus -s <<EOF
27  $CONSTR
28  SET HEAD OFF FEED OFF
29  select ’success’ from dual;
30  EOF)
31  #
32  # Write output to log file
33  echo ${crit_var} > $LOG_FILE
34  #
35  # Send status
36  echo $crit_var | grep success 2>&1 >/dev/null
37  if [[ $? -ne 0 ]]; then
38    $MAILX -s "Problem with ${SID} on ${BOX}" $MAIL_LIST <$LOG_FILE
39  else
40    echo "Success: ${SID} on ${BOX}" | \
41    $MAILX -s "Success: ${SID} okay on ${BOX}" $MAIL_LIST
42  fi
43  #
44  exit 0
Image Tip  If you’re using vi for an editor, use the set number and set nonumber commands to toggle the viewing of line numbers (see recipe 4-11 for more details).
How It Works
The shell script in the “Solution” section of this recipe uses a wide variety of shell-programming techniques. You can use these methods to automate a diverse assortment of DBA tasks. Line numbers are included in the shell program to describe the purpose of each line. Table 7-6 contains a brief description of each line of code.
Table 7-6. Explanation of Shell Script to Check on Database Status
Line Number
Explanation
1
Specifies the Bash shell command interpreter for this script.
2
Captures the name of the shell script in the PRG shell variable. The $0 variable contains the name of the program. The basename command strips off any directory text that is prepended to the program name.
3–4
Comments.
5
Constructs an information string and places it in the USAGE variable.
6–9
If the number of parameters is not equal to 1, displays the script usage string and exits the program. See recipe 7-11 for an advanced discussion of processing variables.
10–11
Comments.
12
Sets the SID variable to the parameter passed into the script.
13
Sets the CONSTR variable to contain the SQL*Plus database connection string.
14
Sets the MAILX variable to the path and name of the mail utility on the server.
15
Specifies the e-mail address of the DBA(s) to receive the job status.
16
Sets the LOG_DIR variable to the directory of the log files.
17
Sets the DAY variable to the current date string.
18
Specifies the LOG_FILE to be a combination of the program name and date. The $$ variable is a unique process identifier that allows you to generate multiple log files per day.
19
Sets the LOC_SID to a local instance name on the box that the shell script is running on.
20
Sets the BOX variable to contain the name of the local database server.
21–22
Comments.
23
Use a program to set the required OS variables such as ORACLE_HOME. See recipe 7-7 for an example of a file that sets the Oracle variables.
24–25
Comments.
26
Captures in the crit_var variable the output of the SQL*Plus command. Initiates a connection to SQL*Plus. EOF specifies the starting point for the SQL*Plus commands.
27
Connects to SQL*Plus with the value in CONSTR.
28–29
Runs the SQL*Plus formatting and SQL command.
30
EOF specifies the end of the text to be interpreted as SQL.
31–32
Comments.
33
Writes the contents of the crit_var variable to the log file.
34–35
Comments.
36
Examines the contents of the crit_var variable for the string success that should have been returned from the SQL*Plus command.
37
$? contains the status of the previously run command. Checks to see whether the previous grep command found the string success. If the grep command succeeded, $? will contain a 0. The $? variable will contain a nonzero value if the grep command does not find the string success in the crit_var variable.
38
Sends an e-mail indicating there is a problem.
39–41
$? is equal to 0; therefore, the grep command found the string success in the crit_var variable. Sends an e-mail indicating that the database is up.
42
The end of the if statement.
43
Blank comment line.
44
Exits the shell script with a success status (indicated by a 0).
CREATING A LOCK FILE
One common method to ensure that only one instance of a shell script is ever running at a time is to create a lock file for a script. When executing a script, if the lock file already exists, the job is currently running or previously terminated abnormally (and the lock file was not removed). Place the following code at the beginning of your script. Modify the LOCKFILE parameter to match your environment:
LOCKFILE=/ora01/oradata/BRDSTN/lock/rman.lock
if [ -f $LOCKFILE ]; then
  echo "lock file exists, exiting..."
  exit 1
else
  echo "DO NOT REMOVE,  RMAN LOCKFILE" > $LOCKFILE
fi
At the end of the script, remove the lock file:
if [ -f $LOCKFILE ]; then
  rm $LOCKFILE
fi
The use of a lock file ensures that if the script is already running and is called again, it won’t start a new job.
7-14. Running Scripts in the Background
Problem
You work in a distributed environment and have database servers in remote locations. You want to run a job in the background that will continue to run, even if there are network problems or after you log off the box.
Solution
Use the & (ampersand) character to place a job in the background. This example runs the rman.bsh script in the background:
$ rman.bsh &
[1] 6507
From the previous output, [1] indicates the job number, and 6507 is the process identifier. You can verify that the program is running in the background via the jobs command:
$ jobs
[1]+  Running                rman.bsh &
On some older systems, you may be required to use the nohup (no hangup) command to ensure that the job will still execute even if you log off the server. If using an older shell, use this syntax to place a job in the background:
$ nohup rman.bsh &
To stop a background job, use the kill command. This next line of code stops job 1:
$ kill %1
How It Works
Sometimes you’ll need to run jobs in the background. Running a job in the background has the advantage of continuing to execute even after the following situations occur:
  • You logged off the box
  • Network issues cause your terminal session to become disconnected
  • Your session gets disconnected due to a server session timeout setting
For long-running jobs, you can run them in the background and not have to worry about restarting the job just because you become disconnected from the server.
Explaining & and nohup
By default, when you run a shell script from the command line, it will run in the foreground. To execute a job in the background, place an ampersand character at the end of the command string. Here is the general syntax:
$ <command>  &
Using & ensures that a command will continue to run, even if you log off the box. On some older systems, you may be required to use the nohup command to achieve this functionality. In this example, the nohup command is used to run a shell script in the background:
$ nohup export_db.bsh &
By default, the output from a nohup command is written to a file named nohup.out. You can monitor the job by continuously viewing the output file:
$ tail -f nohup.out
You can redirect the output to the file of your choice as follows:
$ nohup export_db.bash >exp.out &
You can interactively monitor the output of the previous job by viewing it with the tail -f command:
$ tail -f exp.out
Image Note  If you want a job to consistently run in the background at a specified time, use a scheduling utility such as cron. See Chapter 10 for details on automating jobs.
Using screen to Detach and Reattach to a Session
Using the & and nohup commands is the traditional way of keeping a job running in the background. You can also use the Linux screen command to achieve the same result with significantly more functionality. The screen command starts a terminal session on your server that will persist for you even if there is an unexpected network disruption.
To start a screen session, issue the following command:
$ screen
If you receive an error message such as “Cannot open terminal /dev/pts/1,” change the permissions on that file as root:
# chmod a+rw /dev/pts/1
When you invoke screen, you see a terminal from which you can type commands and run scripts. The difference between a screen session and a normal terminal session is that the screen session will continue to run even after you are detached.
For example, suppose that you are at your work location and you log on to a database server and start a screen session. You then start a long-running backup job in your screen session. After the job is started, you detach from the screen session by pressing Ctrl+A and then the D key (press the Ctrl and the A key at the same time, release them, and then press the D key). You can then drive home, remotely log on to the database server, and reattach to the screen session you started while you were at work. You can monitor the backup job as if you were looking at the same terminal you started at work.
Here’s a simple example of how this works. Type screen, as shown here:
$ screen
Print the current working directory so that you have some output on the screen that you can recognize when you attach to this screen session from another terminal:
$ pwd
/home/oracle
Now press Ctrl+A and then the D key, which detaches you from the screen session. You should see the following message:
[detatched]
Now start a different terminal session and log on to the database server. Issue the following command to display any detached screen sessions:
$ screen -ls
There is a screen on:
        31334.pts-1.rmougprd2   (Detached)
1 Socket in /tmp/uscreens/S-oracle.
You can reattach to any screen session using the -r (reattach) option followed by [[pid.]tty[.host]]. For this particular example, you can re-establish the screen connection by typing this:
$ screen -r 31334.pts-1.rmougprd2
You should now see the output of the previously entered pwd command. It is as if you never left the screen terminal session. This is a very powerful utility that can be used to start jobs and then monitor them from another remote terminal session. You can even share a screen session with other users.
To display screen online help, press Ctrl+A and then the ? key. To leave a screen session, use the exit command, which will stop your screen session.
7-15. Monitoring the Progress of a Script
Problem
You’re executing a shell script and want to monitor its progress.
Solution
Sometimes you have to monitor the progress of a long-running shell script. You can use the Linux tail command with the f (follow) switch to display the output of a job as it is written to a log file. In this example, the output of a backup job is redirected to an output file named rmanback.out:
$ rmanback.bash >rmanback.out 2>&1
From another session, the output being written to the log file is interactively viewed with the tail -f command:
$ tail -f rmanback.out
Here is a snippet of typical output that might be displayed to the screen:
channel ORA_DISK_2: starting archive log backupset
channel ORA_DISK_2: specifying archive log(s) in backup set
input archive log thread=1 sequence=868 recid=859 stamp=628426116
...
How It Works
DBAs often used the tail command to monitor things like alert logs and view potential issues with the database as they are happening. In this example, you continuously follow the display of what’s being written to an Oracle database alert.log file:
$ tail -f alert_BRDSTN.log
Here’s a snippet of typical output written to the alert.log file:
Completed: ALTER DATABASE BACKUP CONTROLFILE TO TRACE
DBID: 2917656785
Thread 1 advanced to log sequence 71
Current  log# 2 seq# 71 mem# 0: /ora01/oradata/BRDSTN/oradata/redo02a.log
When you want to discontinue viewing the contents of a log file, press Ctrl+C to break out of the tail command.
7-16. Debugging a Script
Problem
Your script isn’t doing what you expected. You want to debug the script.
Solution
The Bash shell has several features that are useful for debugging and troubleshooting problems in scripts. The -n (no execution) switch allows you to check the syntax of a script before you run it. To check a Bash shell script for syntax errors, use -n as shown here:
$ bash -n db.bash
If the script contains any errors, it will display a message such as this:
db.bsh: line 10: syntax error: unexpected end of file
Another useful debugging feature is the -o xtrace option, which instructs the Bash shell to display every command before it is executed. This option also shows any variable substitutions and expansions, so you can view the actual variable values used when the shell script executes. You can invoke the -o xtrace feature from the command line as follows:
$ bash -o xtrace <script name>
Notice that the output contains lines that don’t seem to have anything to do with your code:
+ alias ’rm=rm -i’
+ alias ’cp=cp -i’
+ alias ’mv=mv -i’
+ ’[’ -f /etc/bashrc ’]’
+ . /etc/bashrc
+++ id -gn
+++ id -un
+++ id -u
++ ’[’ root = root -a 0 -gt 99 ’]’
++ umask 022
++ ’[’ ’’ ’]’
+ export JAVA_HOME=/opt/java
That’s because the first several lines in the output are from code-executed startup scripts. Also of note: the plus signs in the output indicate the level of nesting of a command within the script.
How It Works
As shell scripts become longer and more complex, it can sometimes be problematic to squash the source of bugs within a script. This problem can be especially acute when you maintain code that somebody else wrote.
If you just want to see the tracing for specific commands within the script, embed set-o xtrace directly within your code at the desired location. In this example, tracing is turned on before the if statement and then turned off at the end:
set -o xtrace
if [ $? -eq 0 ]; then
  echo "$critProc is available."
else
  echo "$critProc has issues." | mail -s "problem with  $critProc" bbill@gmail.com
fi
set +o xtrace
Here is what the output looks like when the prior script is run with tracing enabled:
++ ’[’ 0 -eq 0 ’]’
++ echo ’ is available.’
 is available.
++ set +o xtrace
To enable a set command feature, you must use the minus (-) sign, which may seem counterintuitive. Equally counterintuitive, use a plus (+) sign to disable a set command feature.
Image Note  You can also use the set -x command to print each command’s parameter assignments before they are executed; to turn off tracing, use set +x, which is identical to the set -o xtrace and set +o xtrace commands.




CHAPTER 8
image
Analyzing Server Performance
The separation of tasks between a SA and a DBA is often blurred. This blurring of roles can be especially true in small shops in which you wear multiple hats. Even in large organizations with established roles and responsibilities, you’ll still experience an occasional “all-hands-on-deck” fire drill in which you’re expected to troubleshoot server issues. In these scenarios, you must be familiar with the OS commands used to extract information from the server. An expert DBA does not diagnose database problems in a vacuum; you have to be server-savvy.
Whenever there are application-performance issues or availability problems, the first question asked from the DBA’s perspective is usually this one: “What’s wrong with the database?” Regardless of the source of the problem, the burden is often on the DBA to verify whether the database is behaving well. This process sometimes includes identifying server bottlenecks. The database and server have an interdependent relationship. DBAs need to be well-versed in techniques to monitor server activity.
When you have a server that houses dozens of databases, and you experience performance issues, you have to determine whether the bottleneck is related to CPU, memory, I/O, or the network. Furthermore, you have to pinpoint which processes on the box are consuming the most resources. In these scenarios, it is more productive to diagnose issues with OS tools to lead you to the process that is consuming the most server resources. After you identify the process, you can determine whether it is associated with a database and then further identify the type of process (SQL*Plus, RMAN, Data Pump, and so on).
This chapter covers techniques used to analyze the server’s CPU, memory, I/O, and network performance. Take some time to become familiar with the relevant commands covered in each section. Being able to quickly survey system activity will vastly broaden your DBA skill set.
Table 8-1 summarizes the OS utilities commonly used by DBAs and SAs. This table lists the recipe in which the tool is discussed and what aspect of performance the tool covers (note that some tools are covered in other chapters). Being familiar with these OS commands and how to interpret the output will allow you to work as a team with SAs, storage administrators, network engineers, and developers when diagnosing server performance issues.
Table 8-1. Performance and Monitoring Utilities
Table8-1a
Table8-1b
When diagnosing server issues, start with utilities such as vmstat, sar, top, and ps. These tools give you a quick overview of overall system performance. Then you can use other tools to drill down into CPU, memory, I/O, and network details.
8-1. Identifying System Bottlenecks
Problem
Application users are reporting that the database seems slow. You want to determine whether there are any system resource bottlenecks on the database server.
Solution
The vmstat (virtual memory statistics) tool is intended to help you quickly identify bottlenecks on your server. The vmstat command displays real-time performance information about processes, memory, paging, disk I/O, and CPU usage. This example shows using vmstat on a Linux server (the output is slightly different on Solaris) to display the default output with no options specified:
$ vmstat
procs -----------memory------------ --swap-- ----io---- --system-- -------cpu-------
r  b  swpd    free     buff   cache  si   so  bi    bo    in   cs   us sy  id  wa st
0  0     0 1185336 10615632  991736   0    0   5     9     9   12    0  0 100   0  0
Here are some general heuristics you can use when interpreting the output of vmstat:
  • If b (processes sleeping) is consistently greater than 0, you may not have enough CPU processing power.
  • If so (memory swapped out to disk) and si (memory swapped in from disk) are consistently greater than 0, you may have a memory bottleneck. On some systems, these columns may be labeled pi and po.
  • If the wa (time waiting for I/O) column is high, it is usually an indication that the storage subsystem is overloaded.
By default, only one line of server statistics is displayed when running vmstat (without supplying any options). This one line of output displays average statistics calculated from the last time the system was rebooted. Although it is fine for a quick snapshot, use vmstat with this syntax if you want to gather metrics over a period of time:
$ vmstat <interval in seconds> <number of intervals>
In this mode, vmstat reports statistics sampling from one interval to the next. For example, if you want to report system statistics every 2 seconds for 10 intervals, issue this command:
$ vmstat 2 10
The default unit of measure for the memory columns of vmstat is in kilobytes. If you want to view memory statistics in megabytes, use the -S m (statistics in megabytes) option:
$ vmstat -S m
How It Works
If your database server seems sluggish, analyze the vmstat output to determine where the resources are being consumed. Table 8-2 details the meanings of the columns displayed in the default output of vmstat.
Table 8-2. Column Descriptions of vmstat Output
Column
Description
R
Number of processes waiting for runtime
b
Number of processes in uninterruptible sleep
swpd
Total virtual memory (swap) in use (KB)
free
Total idle memory (KB)
buff
Total memory used as buffers (KB)
cache
Total memory used as cache (KB)
si
Memory swapped in from disk (KB/s)
so
Memory swapped out to disk (KB/s)
bi
Blocks read in (blocks/s) from block device
bo
Blocks written out (blocks/s) per second to block device
in
Interrupts per second
cs
Context switches per second
us
User-level code time as a percentage of total CPU time
sy
System-level code time as a percentage of total CPU time
id
Idle time as a percentage of total CPU time
wa
Time waiting for I/O completion
st
Time stolen from virtual machine
You can also send the vmstat output to a file, which is useful for analyzing historical performance over a period of time. This example samples statistics every 5 seconds for a total of 60 reports and then records the output in a file:
$ vmstat 5 60 > vmout.perf
Another useful way to use vmstat is with the watch tool. The watch command is used to execute another program on a periodic basis. This example uses watch to run the vmstat command every 5 seconds and to highlight any differences between each snapshot onscreen:
$ watch -n 5 -d vmstat
When running vmstat in watch -d (differences) mode, you see changes onscreen as they alter from snapshot to snapshot. To exit from watch, press Ctrl+C.
You can obtain a better idea of how vmstat operates by viewing the output while simulating some server activity. First, inspect the output of vmstat while there is little system activity:
$ vmstat 2 10
procs -----------memory------------ ---swap-- ----io----  --system-- -----cpu-------
 r  b   swpd    free   buff   cache  si   so   bi    bo     in   cs  us sy  id wa st
 0  0  44588 3170020 184644 5222292   0    0    2    68      1    0   5  1  94  0  0
 0  0  44588 3169392 184644 5222336   0    0    0     8    497  864   2  0  98  0  0
 0  0  44588 3169400 184644 5222396   0    0    0   220    448  746   0  0 100  0  0
Now simulate server activity by creating a large randomly populated 1GB file with the dd command:
$ dd if=/dev/urandom of=GIGtestfile bs=1M count=1024 oflag=direct &
This should have the effect of increasing the CPU usage as the random numbers are generated. Additionally, there should be I/O activity generated as the file is populated. While the prior command is running in the background (via the & operator), you can run vmstat:
$ vmstat 2 10
Here are a few lines of the output indicating that the CPU is less idle (more busy) and the I/O blocks written out (bo column in the output) parameter has increased:
procs -----------memory------------ --swap--  ----io----  --system-- -----cpu------
 r  b   swpd    free   buff   cache  si   so   bi    bo     in   cs  us sy id wa st
 1  0  44588 3128704 184664 5256092   0    0    0   912    743  920   1 25 74  0  0
 1  0  44588 3108500 184664 5274432   0    0    0   400    664  934   1 25 74  0  0
 5  0  44588 3070472 184664 5311608   0    0    0   366    647  882   1 25 73  0  0
OSWATCHER
Oracle provides a collection of scripts that gather and store metrics for CPU, memory, disk, and network usage. The OSWatcher tool suite automates the gathering of statistics using tools such as top, vmstat, iostat, mpstat, netstat, and traceroute.
You can obtain OSWatcher from the Oracle MOS web site. Search for document ID 301137.1 or for the document titled “OSWatcher User Guides.” Navigate to the Contents page and search for the Download link.
This utility also has an optional graphical component for visually displaying performance metrics. The OSWatcher utility is currently supported on the following platforms: Linux, Solaris, AIX, and HP-UX. For Windows, Oracle recommends using the Cluster Health Monitor (see document ID 736752.1).
8-2. Analyzing Current and Past System Performance
Problem
Users are reporting that the database application seems sluggish every morning at 10:00 a.m. To troubleshoot the issue, you want to view the current CPU, memory, and I/O load on the server; and display resource activity at 10:00 a.m. for previous days in the week.
Solution
The sar (system activity reporter) utility is unique in that it allows you to view current CPU, memory, and I/O resource usage; as well as server activity for a point in time in the past (e.g., an hour ago, yesterday, and so on).
Displaying CPU Use
To show real-time CPU statistics, use the -u (CPU utilization) option and specify a snapshot interval (in seconds) and the number of reports. The following displays current processor activity with a snapshot interval of 2 seconds for a total of 10 reports:
$ sar -u 2 10
Here is some sample output:
12:50:42          CPU     %user     %nice   %system   %iowait    %steal     %idle
12:50:44          all      0.00      0.00      0.00      0.00      0.00    100.00
12:50:46          all      2.49      0.00      0.37      0.12      0.00     97.01
12:50:48          all      0.37      0.00      0.37      0.00      0.12     99.13
The most important column in the output is %idle. A low %idle could be an indication that the CPUs are underpowered or indicative of a high application load. The %iowait column displays the time waiting for I/O. It follows that a high %iowait time indicates that the I/O subsystem is a potential bottleneck.
If you have multiple CPUs, you can view the output per CPU with the -P ALL options. You should now see one line per CPU in the output:
$ sar -u -P ALL
To report on the current day’s CPU activity, use the -u option without an interval:
$ sar -u
To view a previous day’s statistics, use sar with the -f (file) option. On Linux systems, the files that sar uses to report on statistics for different days of the month are logged in the /var/log/sa directory (on Solaris systems, look in the /var/adm/sa directory). These files have the naming convention of saNN, where NN is the two-digit day of the month. So if today is the ninth day of the month, and you want to report on CPU activity for the eighth day, use the following:
$ sar -u -f /var/log/sa/sa08
Keep in mind that sar keeps only the last 7 days of history files by default. See the “How It Works” section of this recipe if you need to increase the retention period.
The output of using the -f option can be quite long. You can report on a time range via the -s (start time) and -e (end time) options. For example, to report on memory load starting at 10:00 a.m. and ending at 11:00 a.m., do so as follows:
$ sar -r -f /var/log/sa/sa08 -s 10:00:00 -e 11:00:00
Some older versions of sar don’t provide an ending time option. You can creatively use commands such as grep to filter the output for the desired times; for example:
$ sar -r -f /var/log/sa/sa08 | grep ^10 | grep AM
In this manner, you can narrow down the range of activity you want to display.
Displaying Memory Activity
Use sar with the -r (report memory) option to report on memory statistics. To show real-time memory statistics, specify a snapshot interval (in seconds) and the number of reports. The following displays current memory activity with a snapshot interval of 2 seconds for a total of 10 reports:
$ sar -r 2 10
Here is a small snippet of output:
08:44:45 AM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused
kbswpcad
08:44:47 AM   2616532  13005612     83.25    200668   5705848  10243940     44500      0.43
836
...
The output shows the total memory free and used, and the amount of swap space used. A high degree of swapping indicates that you may need more memory. When run in this mode, the output can be wide and lengthy; it doesn’t quite fit within the limits of this physical page.
To report on real–time swapping statistics, specify the -W option. This example generates current swapping statistics snapshots every 3 seconds for a total of 10 reports:
$ sar -W 3 10
Here is some sample output indicating that little or no swapping is occurring:
08:54:01 AM  pswpin/s pswpout/s
08:54:04 AM      0.00      0.00
08:54:07 AM      0.00      0.00
To report on the current day’s memory activity, use the -r option without an interval:
$ sar -r
To view historical memory statistics, use sar with the -f (file) option. For example, to have sar display memory paging statistics for the first day of the month, run it with the -B (report paging statistics) and -f (file) options, as follows:
$ sar -B -f /var/log/sa/sa01
Here is a partial listing of the report:
11:10:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s
11:20:01 AM      0.02     16.17     18.37      0.00
11:30:01 AM      3.49     21.68     74.15      0.04
11:40:01 AM   4182.58    439.44    320.94      0.68
11:50:02 AM   4960.03    1027.79  4384.73      0.51
The previous output shows that there was a substantial increase in paging in from disk (pgpgin/s), pages paged out to disk (pgpgout/s), and page faults per second (fault/s) at approximately 11:40 a.m.
Displaying I/O Load
Use sar with the -b (report I/O) option to report on I/O statistics. To show real-time I/O statistics, specify a snapshot interval (in seconds) and the number of reports. On Linux systems, the following displays current I/O activity with a snapshot interval of 2 seconds for a total of 10 reports:
$ sar -b 2 10
Here’s a partial snippet of the output (this output may vary depending on your version of Linux and the sar command):
09:01:19 AM       tps      rtps      wtps   bread/s   bwrtn/s
09:01:21 AM     19.39      0.00     19.39      0.00    448.98
09:01:23 AM     13.93      0.00     13.93      0.00    366.17
...
The tps column shows the I/O transfers per second to the device. The rtps indicates read requests per second, and the wtps shows write requests per second.
Image Note  On Solaris systems, use sar with the -d (disk) option to report on disk activity. The -d option may be available on Linux, depending on the version.
To report on the current day’s I/O activity, specify the -b option with no time interval:
$ sar -b
To report on I/O statistics for a previous day in the month, use -b with the -f option. For example, to have sar display disk statistics for the tenth day of the month, run it as follows:
$ sar -b -f /var/log/sa/sa10
How It Works
The sar utility is used to generate current load metrics as well as report on system resource usage for a point in time in the past. If you have the requirement of troubleshooting performance issues that have occurred in the past, sar is the utility to use.
You can think of sar as an AWR or Statspack for the OS. The AWR or Statspack tools allow you to view database activity for a range of time in the past. The sar utility is similar in that it allows you to report on historical server activity for CPUs, memory, and I/O. The AWR or Statspack reports depend on scheduled database jobs that periodically populate permanent database tables with information containing historical database activity. Whereas the sar utility uses cron jobs to periodically populate OS files that can be used for historical reporting regarding server activity.
With that in mind, let’s look more closely at the sar cron jobs, the resource usage history files, and how to manually create a sar file.
Understanding sar cron jobs
This sar utility is configured when you install the sysstat system package. You can check for its existence as follows:
$ sar -V
sysstat version ...
If sysstat isn’t installed, and if you have root access, you can install it with the yum utility (or whatever utility is your standard for installing packages) as follows:
# yum install sysstat
When you install the sysstat package, sar will be installed along with two cron jobs. These cron jobs will be instantiated to create files used by the sar utility to report on historical server statistics. On Linux systems, you can view these cron jobs by looking in the /etc/cron.d/sysstat file; for example, here’s a sample cron entry:
# run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib64/sa/sa1 1 1
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A
Once the cron jobs have been running for a few days, to report on a previous day’s CPU statistics, use the -f option to specify the file that corresponds to the day of interest.
On Solaris systems, you can view the sar–related cron jobs by viewing the /var/spool/cron/crontabs/sys file. Here’s a sample entry:
0 * * * 0-6 /usr/lib/sa/sa1
20,40 8-17 * * 1-5 /usr/lib/sa/sa1
5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A
Changing sar File Retention Period
On Linux systems, the files that sar uses to report statistics for different days of the month are located in the /var/log/sa directory. On Solaris systems, the sar files are located in the /var/adm/sa directory.
These files have the naming convention of saNN, where NN is the two-digit day of the month. Typically only the last week or so of files will be retained. A quick listing of the /var/adm/sa directory helps to clarify this:
$ cd /var/log/sa
$ ls -la sa[0-9]*
Here is some sample output:
-rw-r--r-- 1 root root 332016 Jun  1 16:50 sa01
-rw-r--r-- 1 root root 332016 Jun  2 16:50 sa02
-rw-r--r-- 1 root root 332016 Jun  3 16:50 sa03
-rw-r--r-- 1 root root 332016 Jun  4 16:50 sa04
-rw-r--r-- 1 root root 332016 Jun  5 16:50 sa05
-rw-r--r-- 1 root root 332016 Jun  6 16:50 sa06
-rw-r--r-- 1 root root 255984 Jun  7 11:20 sa07
-rw-r--r-- 1 root root 332016 May 30 16:50 sa30
-rw-r--r-- 1 root root 332016 May 31 16:50 sa31
From the prior output, the last 9 days of files are retained. These files are created and populated by the sar cron jobs. The sar utility uses the information contained therein to report on historical performance metrics.
On Linux systems, you can control the number of files retained by modifying the HISTORY parameter in the /etc/sysconfig/sysstat file. The location of the sysstat file varies by OS.
On Solaris systems, the old sar files are removed by the /usr/lib/sa/sa2 utility. The last line of the sa2 script finds the oldest sar files and removes them; for example:
/usr/bin/find /var/adm/sa \( -name ’sar*’ -o -name ’sa*’ \) -mtime +7 -exec /usr/bin/rm {} \;
As root, you can manually adjust the -mtime parameter as required. Usually the default of 7 days is sufficient.
Creating a sar File
While reporting on real-time statistics, use the -o (out) option to send output to a file:
$ sar -b 2 10 -o saroutJun1.perf
This code creates a binary output file that can later be used to analyze disk I/O metrics. At some later point, you can use sar with the -f option to report on the contents of that file; for example:
$ sar -b -f saroutJun1.perf
This code provides a way of indefinitely saving the sar metrics for a given point in time. During a quiet period, you might want to do this to establish a baseline of metrics or during a heavy load to capture specific metrics that you want to analyze at some later point (and not worry about the file being automatically deleted).
8-3. Identifying CPU-Intensive Processes
Problem
You want to identify which Oracle session is consuming the most CPU on the database server. If it is an Oracle session running a SQL query, you want to display the associated SQL.
Solution
There are two tools that are useful for quickly identifying top CPU-consuming processes: top and ps. First, let’s discuss top.
Using top
The top utility is one of the first tools a DBA or SA will use to view server resource usage. This utility provides a dynamic report that refreshes every few seconds and displays the top resource-consuming processes; for example:
$ top
Here is a partial listing of the output:
top - 15:31:27 up 7 days,  5:17,  2 users,  load average: 0.17, 0.35, 0.37
Tasks: 222 total,   2 running, 220 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.2%us, 16.3%sy,  0.0%ni, 75.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.1%st
Mem:  15622144k total, 15524472k used,    97672k free,  9085000k buffers
Swap: 10288440k total,        0k used, 10288440k free,  3160848k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9156 oracle    25   0 2595m 366m 184m R 99.9  2.4  13:51.84 oracle_9156_o12
    1 root      15   0 10368  676  572 S  0.0  0.0   0:00.17 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.38 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.05 ksoftirqd/0
...
The first five lines show system load, summary of processes running, CPU load, memory load, and swap statistics. The lines after that show individual processes and corresponding resource consumption. For this particular example, the Oracle process 9156 is consuming a great deal of CPU. The output will refresh every few seconds. To exit top, press Ctrl+C.
Using ps
Now compare the use of top with the ps command. The ps command (in combination with the pcpu option) is used to identify the PIDs of sessions consuming the most CPU on the server; for example:
$ ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head
Here is a partial listing of the output:
31.0  9156 oracle   ?        oracleO1212 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
1.0   9155 oracle   pts/2    sqlplus   as sysdba
...
The first column is the percentage of CPU being consumed. The second column shows the PID (process 9156 is consuming the most CPU on this server). The third column shows that the oracle user is running this process and we can derive the database name from the fifth column to be O1212.
We recommend that you create an alias for the ps command; for example:
$ alias topcpu=’ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head’
This example allows you to quickly run the command without having to remember the syntax:
$ topcpu
Retrieving Oracle Process Information
The main takeaway from the prior sections using top and ps is that process 9156 is consuming a great deal of CPU on the server and that it is an Oracle process associated with the O1212 database. Now you can use the PID from the output as an input to the following query to show information about the Oracle session responsible for the high consumption of CPU resources:
SET LINES 200 PAGES 0 HEAD OFF LONG 100000
COL dummy_value NOPRINT
--
SELECT ’dummy_value’ dummy_value,
  ’USERNAME    : ’ || s.username     || CHR(10) ||
  ’SCHEMA      : ’ || s.schemaname   || CHR(10) ||
  ’OSUSER      : ’ || s.osuser       || CHR(10) ||
  ’MODULE      : ’ || s.program      || CHR(10) ||
  ’ACTION      : ’ || s.schemaname   || CHR(10) ||
  ’CLIENT INFO : ’ || s.osuser       || CHR(10) ||
  ’PROGRAM     : ’ || s.program      || CHR(10) ||
  ’SPID        : ’ || p.spid         || CHR(10) ||
  ’SID         : ’ || s.sid          || CHR(10) ||
  ’SERIAL#     : ’ || s.serial#      || CHR(10) ||
  ’KILL STRING : ’ || ’’’’ || s.sid || ’,’ || s.serial# || ’’’’  || CHR(10) ||
  ’MACHINE     : ’ || s.machine      || CHR(10) ||
  ’TYPE        : ’ || s.type         || CHR(10) ||
  ’TERMINAL    : ’ || s.terminal     || CHR(10) ||
  ’CPU         : ’ || q.cpu_time/1000000     || CHR(10) ||
  ’ELAPSED_TIME: ’ || q.elapsed_time/1000000 || CHR(10) ||
  ’BUFFER_GETS : ’ || q.buffer_gets  || CHR(10) ||
  ’SQL_ID      : ’ || q.sql_id       || CHR(10) ||
  ’CHILD_NUM   : ’ || q.child_number || CHR(10) ||
  ’START_TIME  : ’ || TO_CHAR(s.sql_exec_start,’dd-mon-yy hh24:mi’) || CHR(10) ||
  ’STATUS      : ’ || s.status       || CHR(10) ||
  ’SQL_TEXT    : ’ || q.sql_fulltext
FROM            v$session s
JOIN            v$process p ON (s.paddr  = p.addr)
LEFT OUTER JOIN v$sql     q ON (s.sql_id = q.sql_id)
WHERE s.username IS NOT NULL -- eliminates background procs
AND NVL(q.sql_text,’x’) NOT LIKE ’%dummy_value%’ -- eliminates this query from output
AND   p.spid           = ’&PID_FROM_OS’
ORDER BY q.cpu_time;
For this example, when you run the prior query and supply to it the PID of 9156, you get the following output:
USERNAME    : SYS
SCHEMA      : SYS
OSUSER      : oracle
MODULE      : sqlplus@dtc07dsg (TNS V1-V3)
ACTION      : SYS
CLIENT INFO : oracle
PROGRAM     : sqlplus@dtc07dsg (TNS V1-V3)
SPID        : 9156
SID         : 91
SERIAL#     : 60916
KILL STRING : ’91,60916’
MACHINE     : dtc07dsg
TYPE        : USER
TERMINAL    : pts/2
CPU         : 275.473216
ELAPSED_TIME: 279.805467
BUFFER_GETS : 64650
SQL_ID      : 1z4xyfmw1rpqy
CHILD_NUM   : 0
START_TIME  : 02-may-15 15:09
STATUS      : ACTIVE
SQL_TEXT    : select a.table_name from dba_tables a, dba_indexes, dba_extents, ...
From the prior output, you see that a SQL*Plus session is consuming a great deal of CPU time. You can also determine when the query started, the username, the kill string, and the SQL identifier from the output.
Once you identify information regarding the process, you can drill down further to display the execution plan and the resources the process is waiting for. For example, you can view the SQL execution plan with the SQL ID and child number:
SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(’&sql_id’,&child_num));
Here’s another useful query that uses the SID and serial number to display the state of a session and whether it is working or waiting for a resource:
SELECT sid,
DECODE(state, ’WAITING’,’Waiting’, ’Working’) state,
DECODE(state, ’WAITING’, ’So far ’||seconds_in_wait,
’Last waited ’|| wait_time/100)|| ’ seconds for ’||event
FROM v$session
WHERE sid = ’&&session_id’
AND serial# = ’&&serial’;
This information is very useful when diagnosing performance issues related to Oracle processes running on the server.
How It Works
When you run multiple databases on one server and are experiencing server-performance issues, it can be difficult to identify which database and session are consuming the most system resources. In these situations, use the top utility or the ps command to identify the highest-consuming processes. The top utility provides a dynamic interface that periodically refreshes to give you the current snapshot of resource consumption, whereas the ps command provides a quick one time snapshot of top resource usage.
You may be wondering whether it is better to use top or ps to identify resource-consuming processes. If your server has top installed, top will probably be your first choice; its ease of use and interactive output is hard to beat. Having said that, the ps command is universally available, so you may have to use ps if the top command is not available. Also the ps command may show you a more descriptive program name associated with the process.
Once you have a process identified, if it’s an Oracle process use the SQL query in the “Solution” section to further identify the type of Oracle process. You then have the option of trying to tune the operation (whether it be SQL, RMAN, and so on), or you might want to terminate the process (see recipe 3-2 for details on how to kill a process and/or stop a SQL session).
Explaining top
The top command deserves a little more explanation than what was shown in the “Solution” section of this recipe. By default, top will repetitively refresh (every 3 seconds) information regarding the most CPU-intensive processes. While top is running, you can interactively change its output. For example, if you type >, the column that top is sorting moves one position to the right.
Table 8-3 lists some key features that you can use to alter the top display to the desired format.
Table 8-3. Commands to Interactively Change the top Output
Command
Function
Spacebar
Immediately refreshes the output.
< or >
Moves the sort column one position to the left or to the right. By default, top sorts on the CPU column.
d
Changes the refresh time.
R
Reverses the sort order.
z
Toggles the color output.
h
Displays the help menu.
F or O
Chooses a sort column.
Type q or press Ctrl+C to exit top. Table 8-4 describes several of the columns displayed in the default output of top.
Table 8-4. Column Descriptions of the top Output
Column
Description
PID
Unique process identifier.
USER
OS username running the process.
PR
Priority of the process.
NI
Nice value or process. Negative value means high priority. Positive value means low priority.
VIRT
Total virtual memory used by the process.
RES
Nonswapped physical memory used.
SHR
Shared memory used by the process.
S
Process status.
%CPU
Processes percent of CPU consumption since last screen refresh.
%MEM
Percent of physical memory the process is consuming.
TIME
Total CPU time used by the process.
TIME+
Total CPU time, showing hundredths of seconds.
COMMAND
Command line used to start a process.
You can also run top using the -b (batch mode) option and send the output to a file for later analysis:
$ top -b > tophat.out
While in batch mode, the top command will run until you kill it (by pressing Ctrl+C) or until it reaches a specified number of iterations. You could run the previous top command in batch mode with a combination of nohup and & to keep it running, regardless of whether you were logged on to the system. The danger there is that you might forget about it and eventually create a very large output file (and an angry SA).
If you have a particular process that you’re interested in monitoring, use the -p option to monitor a PID or the -U option to monitor a specific username. You can also specify a delay and number of iterations by using the -d and -n options. The following example monitors the oracle user with a delay of 5 seconds for 25 iterations:
$ top -U oracle -d 5 -n 25
Image Tip  Use the man top or top --help commands to list all the options available with your OS version.
Solaris prstat
Note that on Solaris systems, the prstat utility can also be used to identify which processes are consuming the most CPU resources. For example, you can instruct the prstat to report system statistics every 5 seconds:
$ prstat 5
Here is some sample output:
   PID USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP
  7601 oracle   8430M 4081M sleep   101    -   3:51:47 2.0% oracle/1
  1614 oracle   6382M 4093M sleep   101    -  48:07:20 0.0% oracle/1
 20071 oracle   8431M 7746M sleep    54    0   0:00:00 0.0% oracle/1
Type q or press Ctrl+C to exit prstat. After identifying a top resource-consuming process, you can determine which database the process is associated with by using the ps command. This example reports on process information associated with the PID of 7601:
$ ps -ef | grep 7601 | grep -v grep
  oracle  7601     1   0   Apr 10 ?         231:47 ora_vktm_DWREP
USING THE /PROC/<PID> FILES TO MONITOR PROCESS ACTIVITY
For every Linux process that is running, a directory is created in the /proc virtual filesystem. For example, if you want to view details regarding the operating PID of 9156, you can navigate to the virtual /proc/9156 directory and do a long listing. There you will see several informational files and directories related to this running process:
$ cd /proc/9156
$ ls -l
Here is a partial listing of the output:
-r-------- 1 oracle dba 0 May  2 15:42 auxv
-r--r--r-- 1 oracle dba 0 May  2 15:02 cmdline
-rw-r--r-- 1 oracle dba 0 May  2 15:42 coredump_filter
-r--r--r-- 1 oracle dba 0 May  2 15:42 cpuset
lrwxrwxrwx 1 oracle dba 0 May  2 15:42 cwd -> /u01/app/oracle/product/12.1.0.2/db_1/dbs
-r-------- 1 oracle dba 0 May  2 15:29 environ
lrwxrwxrwx 1 oracle dba 0 May  2 15:42 exe -> /u01/app/oracle/product/12.1.0.2/db_1/bin/oracle
The output indicates this is an oracle process, and now you can analyze it further by looking at the memory usage maps file or the status file. Because these files don’t exist on disk, use a utility such as cat to display their contents:
$ cat /proc/<PID>/maps
$ cat /proc/<PID>/status
8-4. Identifying CPU Bottlenecks
Problem
You want to monitor the system load on your CPUs.
Solution
As a DBA, you also need to periodically examine the load on CPUs to determine system bottlenecks. The mpstat (multiple processor statistics) utility displays statistics for processors on the server:
$ mpstat
Here’s a snippet of the output:
Linux 2.6.18-308.4.1.0.1.el5xen (rmougserv)      05/02/2015
04:53:12 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
04:53:12 PM  all    0.10    0.00    0.10    0.10    0.00    0.00    0.01   99.70    179.01
The default output of mpstat shows only one line of aggregated statistics for all CPUs on the server. You can also view snapshots that report statistics accumulated between intervals. The following example uses the -P option to report only on processor 0; it displays output every 2 seconds for a total of 10 different reports:
$ mpstat -P 0 2 10
Here are a few lines of the output:
04:54:22 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
04:54:24 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00     93.94
04:54:26 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00     92.61
04:54:28 PM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00    135.68
The amount of idle time is the most important statistic. If the CPUs have a low idle percentage, it is indicative of a high load. See Table 8-5 for an interpretation of the mpstat output.
Table 8-5. Column Definitions for mpstat Processor Statistics
Column
Description
CPU
Processor number. Starts at 0. The all row reports average statistics for all processors.
%user
Percentage of CPU utilization while executing at user level.
%nice
Percentage of CPU utilization while executing at user level with nice priority.
%sys
Percentage of CPU utilization while executing at kernel level.
%iowait
Percentage of time CPUs were idle during an outstanding disk I/O operation.
%irq
Percentage of time spent by CPUs servicing interrupts.
%soft
Percentage of time spent by CPUs to service software interrupts.
%steal
Percentage of time CPUs waiting while the hypervisor servicing another virtual processor.
%idle
Percentage of time that CPUs were idle without outstanding disk I/O operations.
intr/s
Total number of interrupts received per second by CPUs.
On Solaris systems, the output of mpstat is slightly different; for example:
$ mpstat -P 0 2 20
 CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
   0    8   0  358   795  266  645    5   68  195    3   497    3   3   0  94
   1    8   0  235   477  133  677    5   68  177    3   515    4   3   0  94
   2    7   0  328   609  257  705    5   70  197    3   511    3   3   0  94
The amount of idle time is the most telling statistic. A low idle time is indicative of high processer load.
How It Works
The mpstat utility is useful for specifically analyzing the CPU load. Here are some general guidelines for interpreting its output:
  • If %idle is high, your CPUs are most likely not overburdened.
  • If the %iowait output is a nonzero number, you may have some disk I/O contention.
On multiprocessor servers, you can use the -P ALL options of the mpstat command to print each CPU’s statistics on separate lines:
$ mpstat -P ALL
Here’s a partial listing of the output:
05:07:53 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
05:07:53 PM  all    0.11    0.00    0.12    0.10    0.00    0.00    0.01   99.67    179.26
05:07:53 PM    0    0.18    0.01    0.21    0.36    0.00    0.00    0.01   99.23    117.26
05:07:53 PM    1    0.11    0.00    0.12    0.01    0.00    0.00    0.00   99.76     18.92
05:07:53 PM    2    0.06    0.00    0.04    0.01    0.00    0.00    0.00   99.89     21.05
05:07:53 PM    3    0.09    0.00    0.10    0.01    0.00    0.00    0.00   99.80     22.04
The prior output shows that this server has four CPUs (indicated by a line for CPUs 0, 1, 2, and 3). The %idle column is in the 99% range, indicating that there is little load on the CPUs on this box.
You can also save the output of mpstat to a file. This example saves to a file all CPU activity reported every 10 seconds for 100 times:
$ mpstat -P ALL 10 100 > mpperf.perf
This code allows you to save performance statistics so that you can analyze and contrast performance for different time periods.
8-5. Identifying Memory-Intensive Processes
Problem
You want to identify which Oracle session is consuming the most memory on the database server. If it is an Oracle session running a SQL query, you want to display the associated SQL.
Solution
You can use either the top utility or the ps command to display top memory using processes. Let’s look at top first.
Using top
The easiest way to run top is as follows:
$ top
By default, top displays the output sorted by CPU usage. To shift the output to sort by memory usage, use the > key to shift the reporting output one column to the right (%MEM). Here is some sample output:
Tasks: 223 total,   1 running, 222 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  15622144k total, 15369420k used,   252724k free,  8759664k buffers
Swap: 10288440k total,        0k used, 10288440k free,  3505156k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3217 cloudera  19   0 2770m 1.0g  17m S  0.0  6.9   0:52.84 java
  637 oracle    15   0 2409m 404m 394m S  0.0  2.7   0:00.65 ora_dbw0_o1212
 4204 emcadm    16   0  671m 249m  12m S  0.0  1.6   1:02.31 java
 2525 oracle    25   0 2462m 193m 186m S  0.0  1.3   0:15.40 oracle_2525_o12
Using ps
You can also use the ps command with the pmem option to identify the top memory-consuming processes and their associated PIDs. We recommend that you search specifically for Oracle processes because the ps output can be quite lengthy; for example:
$ ps -e -o pmem,pid,user,tty,args | grep -i oracle | sort -n -k 1 -r | head
Here is some sample output:
1.2   625 oracle   ?        ora_mman_O1212
0.7   655 oracle   ?        ora_mmon_O1212
0.4   691 oracle   ?        ora_cjq0_O1212
From the second column in the previous output, the process with the ID of 625 is consuming 1.2 percent of the memory. Now that you have the PID, you can use the query from the “Solution” section of recipe 8-2 to further identify the type of Oracle process (e.g., RMAN, SQL*Plus, Data Pump, and so forth).
How It Works
If you’re experiencing performance issues, determining which processes are consuming the most memory will give you another piece of information to troubleshoot the issue. The top and ps commands are quite useful in this regard. The top command provides an interactive way to dynamically view memory usage patterns, whereas the ps command is more useful as a one-line command to get a snapshot of memory activity. We recommend that you create an alias for the ps command; for example:
$ alias topmem=’ps -e -o pmem,pid,user,tty,args | grep -i oracle | sort -n -k 1 -r | head’
This code will allow you to quickly run the command without having to remember the syntax:
$ topmem
You should use the memory usage information along with the CPU usage (see recipe 8-2) to determine which processes are consuming the most resources and then further investigate those processes.
8-6. Identifying Memory Bottlenecks
Problem
You want to view the current usage of memory on your database server.
Solution
Paging and swapping activity is an indicator of the efficiency of memory usage on your sever. In general, high amounts of paging and swapping indicate an inadequate amount of memory. Numerous utilities are available to monitor paging and swapping. For example, you can use vmstat (virtual memory statistics) to monitor the current memory usage. In this line of code, vmstat reports are generated every 2 seconds for a total of 3 reports:
$ vmstat 2 3
Here is some sample output:
procs  -----------memory----------- --swap-- ----io---- --system-- ------cpu------
 r  b   swpd   free    buff   cache  si   so   bi    bo   in   cs  us sy  id wa st
 0  0      0 358448 8753168 3539596   0    0    4    10    8    1   1  1  98  0  0
 0  0      0 358424 8753168 3539596   0    0    0    16  416  637   0  0 100  0  0
 0  0      0 358424 8753168 3539596   0    0    0   378  484  631   0  0 100  0  0
If you have a fairly recent version of Linux, you can also use the -a option, which displays active and inactive memory. Here is an example of running vmstat with the -a option:
$ vmstat -a 2 3
Here’s what the output looks like with the additional columns:
procs -----------memory----------- ---swap-- -----io--- --system-- ------cpu-------
 r  b  swpd   free   inact  active  si   so   bi    bo    in   cs   us sy  id wa st
 0  0     0 358812 9180892 4841900   0    0    4    10     8    1    1  1  98  0  0
 0  0     0 358812 9180892 4841912   0    0    0    62   427  637    0  0 100  0  0
 0  0     0 358812 9180892 4841912   0    0    0     0   421  629    0  0 100  0  0
If your server shows high amounts of memory swapped in from disk (si column) or the amount of memory swapped out to disk (so column), you may have a memory bottleneck.
The output of vmstat on Solaris reports the same type of information as Linux systems; for example:
$ vmstat 2 3
 kthr      memory            page            disk          faults      cpu
 r b w     swap    free  re  mf pi po fr de sr vc  vc  vc --   in   sy    cs us sy  id
 0 0 0 18541480 2994160  20 119  0  0  0  0  0  2 184 146  0 9203 8216 10930  3  3  94
 0 0 0 18463736 3091248 138 335  0  0  0  0  0  2   6   5  0 5188 3617  5081  0  0  99
 0 0 0 18464824 3090824   0   3  0  0  0  0  0  0   2  38  0 4792 2256  4600  0  0 100
In this output, the page section reports on swapping, and I/O is reported under disk.
How It Works
One of the main indicators of memory health is the amount of paging and swapping that is occurring. If you read five different Linux performance–tuning white papers, you’ll get five slightly different definitions of paging and swapping. We do not split hairs about the exact definitions of those terms; our statement is that in general, paging and swapping are the movement of the contents of memory to and from disk.
Paging and swapping occur when there isn’t enough physical memory to accommodate all the memory needs of the processes on the server. When paging and swapping take place, performance usually suffers because the process of copying memory contents to and from disk is an inherently slow activity. A tool such as vmstat can help you identify excessive swapping.
Using free
You can also use the free command to display current memory used, both physical and virtual (swap):
$ free
             total       used       free     shared    buffers     cached
Mem:      15622144   15269820     352324          0    8753168    3539596
-/+ buffers/cache:    2977056   12645088
Swap:     10288440          0   10288440
From the previous output, you see that this system has 15GB of RAM, almost all of it being used. It has about 10GB of swap space, almost none of which is used. Don’t be too alarmed if your Linux system is using most of its physical memory; that’s typical on many Linux servers.
Image Note  See Chapter 9 for details on using ipcs to view the memory and semaphores used by your database.
You can use the -s option to have the free command report output on a repeating interval. This example uses free to display memory usage in two-second snapshots and sends the output to a file:
$ free -s 2 > freemem.perf
Press Ctrl+C to exit from free when using the -s option. By default, the free output reports memory usage in kilobytes. Use -m to print in megabytes or -g to display the output of free in gigabytes.
Using watch
An effective way to use free is in combination with the watch command. The watch command is used to execute another program on a periodic basis. This example uses watch to run the free utility every 3 seconds via the -n (interval) option. The -d (differences) option is used to have the output highlighted onscreen when there is a change in value from snapshot to snapshot:
$ watch -n 3 -d free
Every 3.0s: free                                 Sun May  3 17:42:33 2015
             total       used       free     shared    buffers     cached
Mem:      15622144   15264004     358140          0    8753168    3539700
-/+ buffers/cache:    2971136   12651008
Swap:     10288440          0   10288440
You should be able to visually see any changes in memory activity onscreen when running in this mode. To exit from watch, press Ctrl+C.
You can also view the current characteristics of memory by viewing the /proc/meminfo file. You can use the file to display the current physical memory and swap space being used. This example uses the cat utility to display the current memory usage:
$ watch -d cat /proc/meminfo
By default, the watch command will refresh the screen every 2 seconds. You should visually see differences highlighted from interval to interval:
Every 2.0s: cat /proc/meminfo               Sun May  3 17:44:38 2015

MemTotal:     15622144 kB
MemFree:        358044 kB
Buffers:       8753168 kB
Cached:        3539912 kB
SwapCached:          0 kB
Active:        4842728 kB
SwapTotal:    10288440 kB
SwapFree:     10243872 kB
...
If you see an unusual amount of swap space being used (low SwapFree), it is an indication that your server needs more memory. To exit from watch, press Ctrl+C.
8-7. Identifying I/O-Intensive Processes
Problem
You want to determine which processes are generating the most I/O.
Solution
Use the iotop utility to display the top I/O-generating processes. On most systems, running iotop requires root access (signified in this example by the # character command prompt). Here’s the simplest way to invoke it:
# iotop
Here is some sample output:
  TID  PRIO   USER      DISK READ   DISK WRITE  SWAPIN  IO>     COMMAND
21553  be/4 oracle      0.00 B/s    0.00 B/s    -5.02 % 99.99 % [oracle_21553_o1]
24423  be/4 oracle      0.00 B/s    0.00 B/s     0.00 %  1.99 % ora_p003_O1212
24348  be/4 oracle      0.00 B/s    0.00 B/s     0.00 %  1.36 % ora_lgwr_O1212
24331  be/4 oracle      0.00 B/s    0.00 B/s     0.00 %  0.72 % ora_gen0_O1212
24401  be/4 oracle      0.00 B/s    0.00 B/s     0.00 %  0.64 % ora_arc3_O1212
24427  be/4 oracle      0.00 B/s    0.00 B/s     1.99 %  0.56 % ora_p005_O1212
This output displays the overall disk read and disk write rates, along with the PID, user, and command. If it’s an Oracle process, you can use the SQL query from the “Solution” section of recipe 8-2 to retrieve further details regarding the process from the data dictionary.
How It Works
The iotop utility is a top-like utility that displays processes and associated I/O. There are many different modes you can use. For example, if you want to display I/O associated with one user, you can do so as follows:
# iotop -user oracle
If you want to change the refresh rate (in seconds), you can specify the interval with the -d switch. The following instructs iotop to refresh the screen every 5 seconds:
# iotop -d 5
The iotop is an effective utility that displays top I/O-consuming processes on a server. When you’re experiencing disk I/O bottlenecks, this tool provides a quick method for identifying processes that may need further investigation.
8-8. Identifying I/O Bottlenecks
Problem
You want to determine whether your disk storage is a bottleneck.
Solution
The iostat command can help you determine whether disk I/O is potentially a source of performance problems. Using the -x (extended) option with the -d (device) option is a useful way to generate I/O statistics. This example uses the -x and -d options to display extended device statistics every 10 seconds:
$ iostat -xd 10
You need a really wide screen to view this output; here’s a partial listing:
Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda          0.01   3.31  0.11  0.31    5.32   28.97     2.66    14.49    83.13
0.06  138.44   1.89   0.08
Image Note  On Solaris systems, the iostat output may report the disk utilization as %b (percent busy). Also on Solaris systems, use iostat -Mnxz <n> to display output using megabytes, list descriptive names, show extended statistics, and remove lines with all zeros.
This periodic extended output allows you to view which devices are experiencing spikes in read and write activity in real time. To exit from the previous iostat command, press Ctrl+C.
When trying to determine whether device I/O is the bottleneck, here are some general guidelines when examining the iostat output:
  • Look for devices with abnormally high blocks read or written per second.
  • If any device is near 100% utilization, it is a strong indicator that I/O is a bottleneck.
Once you have determined that you have a disk I/O–contention issue, you can use utilities such as AWR (if licensed), Statspack (no license required) or the V$ views to determine whether your database is I/O stressed. For example, the AWR report contains an I/O statistics section with the following subsections:
  • IOStat by Function Summary
  • IOStat by Filetype Summary
  • IOStat by Function/Filetype Summary
  • Tablespace IO Stats
  • File IO Stats
If you want to display current database sessions that are waiting for I/O resources, you can query the data dictionary as follows:
SELECT a.username, a.sql_id, b.object_name, b.object_type, a.event
FROM v$session a
    ,dba_objects b
    ,v$event_name c
WHERE b.object_id = a.row_wait_obj#
AND   a.event = c.name
AND   c.wait_class = ’User I/O’;
How It Works
The iostat command can help you determine whether disk I/O is potentially a source of performance problems. If you execute iostat without any options, you’ll get a default report that displays averages since the system was last started:
$ iostat
avg-cpu:  %user   %nice    %sys %iowait   %idle
          18.91    0.04    1.20    0.15   79.70

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               7.14       398.01       409.52  164484368  169239542
sda1              0.00         0.00         0.00       1538        166
sda2              54.15      396.92       407.74  164032098  168505032
sda3              0.35         1.04         1.77     429820     733168
Notice that there are two sections in the iostat output. The first section is the CPU Utilization Report. The second section relates to disk I/O and is referred to as the Device Utilization Report.
Table 8-6 describes the columns used for disk I/O. (Use the -d option of iostat to display only device statistics.)
Table 8-6. Column Descriptions of iostat Disk I/O Output
Column
Description
Device
Device or partition name
tps
I/O transfers per second to the device
Blk_read/s
Blocks per second read from the device
Blk_wrtn/s
Blocks written per second to the device
Blk_read
Number of blocks read
Blk_wrtn
Number of blocks written
rrqm/s
Number of read requests merged per second that were queued to device
wrqm/s
Number of write requests merged per second that were queued to device
r/s
Read requests per second
w/s
Write requests per second
rsec/s
Sectors read per second
wsec/s
Sectors written per second
rkB/s
Kilobytes read per second
wkB/s
Kilobytes written per second
avgrq-sz
Average size of requests in sectors
avgqu-sz
Average queue length of requests
await
Average time in milliseconds for I/O requests sent to the device to be served
svctm
Average service time in milliseconds
%util
Percentage of CPU time during which I/O requests were issued to the device; near 100% indicates device saturation
You can also instruct iostat to display reports at a specified interval. The first report displayed reports averages since the last server reboot; each subsequent report shows statistics since the previously generated snapshot. The following example displays a device statistic report every 3 seconds:
$ iostat -d 3
To exit from the previous iostat command, press Ctrl+C. You can also specify a finite number of reports that you want generated, which is useful for gathering metrics to be analyzed over a period of time. This example instructs iostat to report every 2 seconds for a total of 15 reports:
$ iostat 2 15
When you work with locally attached disks, the output of the iostat command clearly shows you where the I/O is occurring. However, it is not that clear-cut in environments that use external arrays for storage. What you are presented with at the filesystem layer is some sort of a virtual disk that might also have been configured by a volume manager. Virtual disks are often referred to as volumes or logical units (LUNs).
A LUN is a logical disk that physically comprises one or more physical disks. The LUN represents the virtualization layer between the physical disks and the applications running on the database server. Figure 8-1 illustrates at a high level the abstraction involved with virtual disks.
9781484212554_Fig08-01.jpg
Figure 8-1. Abstraction layers between database application and physical disks
When you work with virtual disks, the output from iostat reports on read/write activity at the virtual disk level, not the underlying physical disks. In these situations, there may be many layers of abstraction between the database application and physical disks, which can make it difficult to isolate the exact source of an I/O bottleneck. We recommend that you work closely with your storage administrator to determine whether a particular set of LUNs and underlying physical disks are a source of poor I/O performance.
DETERMINING I/O RATES
To get an estimate of the I/O rate of your storage system, use the dd command in combination with the time command. On Linux systems, use the ofile and ifile parameters to simulate direct I/O; otherwise, you’re reading and writing to memory and will receive misleading results. For example, use the following to estimate writes:
$ time dd if=/dev/zero of=testfile bs=8k count=1000 oflag=direct
The following estimates reads:
$ time dd of=/dev/null if=testfile bs=8k count=1000 iflag=direct
These metrics are helpful for determining the speed at which the OS can write and read a file. Keep in mind that it will provide estimates for sequential writes/reads. You would have to consider something more sophisticated to simulate database I/O activity (random writes/reads and multiple processes). Having said that, this approach is still a good starting point for diagnosing I/O issues.
On Solaris systems, the dd command doesn’t have the oflag and iflag options. If the storage is ZFS, use the zfs utility to set caching to metatdata via zfs set primarycache=metadata <filesystem>. You will most likely have to contact your storage administrator to run the prior command with the appropriate filesystem.
8-9. Monitoring Network Traffic
Problem
You suspect that the network might be a bottleneck. You want to view network statistics.
Solution
Use the netstat (network statistics) command to display network traffic. Perhaps the most useful way to view netstat output is with the -ptc options. These options display the PID and TCP connections, and they continuously update the output:
$ netstat -ptc
Press Ctrl+C to exit the previous command. Here’s a partial listing of the output:
(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address  Foreign Address  State       PID/Program name
tcp        0      0 rmug.com:62386 rmug.com:1521    ESTABLISHED 22864/ora_pmon_RMDB
tcp        0      0 rmug.com:53930 rmug.com:1521    ESTABLISHED 6091/sqlplus
tcp        0      0 rmug.com:1521  rmug.com:53930   ESTABLISHED 6093/oracleRMDB1
tcp        0      0 rmug.com:1521  rmug.com:62386   ESTABLISHED 10718/tnslsnr
If the Send-Q (bytes not acknowledged by remote host) column has an unusually high value for a process, it may indicate an overloaded network. The useful aspect about the previous output is that you can determine the OS PID associated with a network connection. If you suspect that the connection in question is an oracle session, you can use the techniques described in the “Solution” section of recipe 8-2 to map an OS PID to an oracle process or SQL statement.
Image Note  On Linux systems, the /proc/net directory stores information about current network settings and activity.
How It Works
When performance issues occur, the network is usually not the cause. Most likely you’ll determine that bad performance is related to a poorly constructed SQL statement, inadequate disk I/O, or not enough CPU or memory resources. However, as a DBA, you need to be aware of all sources of performance bottlenecks and how to diagnose them. In today’s highly interconnected world, you must possess network troubleshooting and monitoring skills. The netstat utility is a good starting place for monitoring server network connections.
On Linux systems, you can also use the sar command with the -n option to report on network statistics. The -n option takes one of the following as an argument: DEV (network devices), EDEV (error count), SOCK (sockets), or FULL (all). The following command displays the current day’s network device statistics:
$ sar -n DEV
Here’s a limited listing of the output:
12:00:01 AM  IFACE  rxpck/s txpck/s  rxbyt/s  txbyt/s  rxcmp/s  txcmp/s  rxmcst/s
12:10:01 AM     lo     0.00    0.00     0.00     0.00     0.00     0.00      0.00
12:10:01 AM   eth0     0.34    0.11    39.17    10.22     0.00     0.00      0.04
12:10:01 AM   eth1     0.00    0.00     0.00     0.00     0.00     0.00      0.00
12:10:01 AM   sit0     0.00    0.00     0.00     0.00     0.00     0.00      0.00
The previous output shows the number of packets transmitted and received per second, as well as the bytes and compressed packets (snapshots taken in 10-minute intervals).
If you experience performance issues when connecting to remote servers, sometimes it is useful to see the route that a packet takes to get from one server to another. To use traceroute, log on to the server and then trace the route to the remote server:
$ traceroute <remote_server_name>
This information is meaningful to a network engineer and can troubleshoot network performance issues.
8-10. Troubleshooting Database Connectivity
Problem
You’re attempting to connect to a database and are receiving the following error:
ERROR:
ORA-12154: TNS:could not resolve the connect identifier specified
You want to determine the root cause of this issue.
Solution
To diagnose database connectivity issues, first use the OS ping utility to determine whether the remote box is accessible; for example:
$ ping dwdb
dwdb is alive
If ping doesn’t work, work with your SA or network administrator to ensure that you have server-to-server connectivity in place.
Next, use telnet to see whether you can connect to the remote server and port (that the listener is listening on); for example:
$ telnet dwdb 1521
Trying 127.0.0.1...
Connected to dwdb.
Escape character is ’^]’.
This output indicates that connectivity to a server and port is okay. If it doesn’t work (you don’t see “Connected to” in the output), contact your SA or network administrator for further assistance.
Now use tnsping to determine whether Oracle Net is working. This utility will verify that the Oracle Net listener is running on the remote server; for example:
$ tnsping dwrep
..........
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)
(HOST = dwdb.us.farm.com)(PORT = 1521))
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = DWREP)))
OK (500 msec)
What’s sometimes confusing is that the prior output indicates only that the listener is up and receiving requests; it doesn’t necessarily indicate that there actually is a service (DWREP in this example) registered with the listener on the remote host. To further verify that the actual service is registered with the listener, you have to log on to the remote server and run the lsnrctl utility to verify the services registered with the listener:
$ lsnrctl services <listener_name>
If you don’t provide a listener name, the default name of LISTENER is assumed. You can also use the lsnrctl status <listener_name> command to verify that the listener is up and which services are registered with the listener. If you’re not sure what the listener name is, use the ps command to identify it:
$ ps -ef | grep tns
Here is some sample output that indicates that the default listener name (LISTENER) is used on this server:
oracle   27480     1  0 10:09 ?        00:00:00
/u01/app/oracle/product/12.1.0.2/db_1/bin/tnslsnr LISTENER -inherit
Also verify that the remote database is available by establishing a local connection as a non-SYS account (SYS can often connect to a troubled database when other schemas can’t); for example:
$ sqlplus system/manager
If you can’t connect via this command, verify that a critical mount point hasn’t filled up via the df command:
$ df -h
If Oracle can’t write to files in the ORACLE_HOME/dbs directory, or if archiving is enabled and the archive location is full, the database will hang and not accept new connections.
If everything looks good to this point, verify that the TNS information is correct. If the remote listener and database are working, ensure that the mechanism for determining TNS information (such as the tnsnames.ora file) contains the correct information. Sometimes the client machine will have multiple TNS_ADMIN locations and tnsnames.ora files. One way to verify whether a particular tnsnames.ora file is being used is to rename it and see whether you get a different error when you attempt to connect to the remote database.
How It Works
Diagnosing database connectivity issues can be quite frustrating. If you’ve followed all the suggestions in the “Solution” section of this recipe and are still having issues, examine the client sqlnet.log file and/or the server listener.log file. Sometimes these log files show additional information that will pinpoint the issue. The locations of these files can vary by the Oracle release and how a DBA has implemented Oracle Net. Usually the Oracle Net files are in the following directory:
$ORACLE_BASE/diag/tnslsnr/instance_name/listener/trace
If you can’t find the files, use the find command; for example:
$ cd $ORACLE_BASE
$ find . -name listener.log
Here is some sample output:
./diag/tnslsnr/dwrep/listener/trace/listener.log
If all else fails, you can enable higher levels of output to the Oracle Net logging files. See the Oracle Database Net Services Administrator’s Guide for details (freely available for download on the docs.oracle.com web site).
8-11. Tracing a Process
Problem
You’re troubleshooting database network connectivity issues and want to trace the tnsping process to determine which tnsnames.ora file is being used. In other words, you want to determine the directory locations and the order in which tnsping looks for tnsnames.ora files.
Solution
This solution has two sections: one for using strace on Linux and one for using truss on Solaris.
Using strace
To reiterate, the task at hand is to determine what order and in which directories the tnsping utility is looking for tnsnames.ora files. You know that tnsping looks in multiple locations, but aren’t sure of the order. The strace tool can provide this information. First, run strace to trace tnsping without any options:
$ strace tnsping bogus_service
This code generates a great deal of output; here’s a small snippet:
access("/u01/app/oracle/product/12.1.0.2/db_1/network/admin/tnsnames.ora", F_OK)
= -1 ENOENT (No such file or directory)
mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2ac9aa003000
munmap(0x2ac9aa003000, 143360)          = 0
stat("/u01/app/oracle/product/12.1.0.2/db_1/ldap/admin/ldap.ora", 0x7fffba717a30) = -1
ENOENT (No such file or directory)
brk(0x126ff000)                         = 0x126ff000
...
Most of the output isn’t useful. To extract the relevant lines, pipe the output to grep and cut the field (delimited by double quotes) of interest:
$ strace tnsping bogus_sid 2>&1 >/dev/null|grep ’\.ora’|cut -f2 -d\"|cut -f1 -d\"|uniq
Here is the useful output showing the directory locations and order in which tnsping is looking for tnsnames.ora files:
/u01/app/oracle/product/12.1.0.2/db_1/network/admin/sqlnet.ora
/home/oracle/.tnsnames.ora
/u01/app/oracle/product/12.1.0.2/db_1/network/admin/tnsnames.ora
/var/yp/binding/us.oracle.com.2
/u01/app/oracle/product/12.1.0.2/db_1/ldap/admin/fips.ora
/u01/app/oracle/product/12.1.0.2/db_1/ldap/admin/ldap.ora
/u01/app/oracle/product/12.1.0.2/db_1/network/admin/ldap.ora
This output shows the order, location, and file names that the tnsping process searches for when attempting to connect to a service registered with a listener.
Using truss
On a Solaris system, you can use truss to trace a process. First, use truss to trace tnsping without any options:
$ truss tnsping bogus_service
Here’s a small section of a large amount of output:
stat("/orahome/app/oracle/product/12.1.0.2/db_1/network/admin/ldap.ora", 0xFFFFFFFF7FFF5EB0) Err#2 ENOENT
stat("/etc/resolv.conf", 0xFFFFFFFF7FFF6540)    = 0
open("/etc/resolv.conf", O_RDONLY)              = 7
Most of that output isn’t useful. To extract the relevant lines, pipe the output to grep and cut the relevant field (delimited by double quote marks):
$ truss tnsping bogus_service 2>&1 | grep ’\.ora’ | cut -f2 -d\" | cut -f1 -d\" | uniq
Here’s the information showing the order and location of the files that tnsping is accessing:
/orahome/app/oracle/product/12.1.0.2/db_1/network/admin/sqlnet.ora
/orahome/oracle/.tnsnames.ora
/orahome/app/oracle/product/12.1.0.2/db_1/network/admin/tnsnames.ora
/orahome/app/oracle/product/12.1.0.2/db_1/ldap/admin/fips.ora
/orahome/app/oracle/product/12.1.0.2/db_1/ldap/admin/ldap.ora
/orahome/app/oracle/product/12.1.0.2/db_1/network/admin/ldap.ora
So by tracing the process, you can glean more insight on its inner workings.
How It Works
Tracing utilities gives you a window into internal system calls (and parameters) initiated by a process. On rare occasions, you may need to trace a process. In our experience, DBAs rarely use tracing facilities. However as the “Solution” section demonstrates, tracing a process is an effective troubleshooting technique in certain situations. The strace (Linux) and truss (Solaris) utilities provide the capability to trace a process.
The “Solution” section showed examples of initiating a trace on a process as the command is interactively executed. It is also possible to trace a process that is already running. Suppose that you want to trace an Oracle background process. First determine its PID:
$ ps -ef | grep smon
oracle   24354     1  0 09:46 ?        00:00:00 ora_smon_O1212
On Linux, run strace and provide the PID to it:
$ strace -p 24354
To send the output to a file, use the -o option:
$ strace -o smon.txt  -p 24354
On Solaris, run truss with the appropriate PID. First, use ps to determine the PID of the process:
$ ps -ef | grep smon
  oracle 18618     1   0   Jun 12 ?           1:04 ora_smon_TRG
Now use truss to trace the process:
$ truss -p 18618
Here the output is sent to a file while tracing:
$ truss -o smon.txt -p 18618
Note that the dtrace utility is available for Linux and Solaris. This utility is an entire framework that has its own scripting language that provides insight into all aspects of the OS. Using dtrace can be quite involved but critical if you require extensive tracing capabilities. The strace and truss tools are better suited to trace a single process.
8-12. Listing Files Opened by Processes
Problem
You’ve logged on to a database server and want to help diagnose a problem. After logging on to the server, you realize there’s no mechanism to automatically set the oracle OS variables, so the standard variables (ORACLE_HOME, ORACLE_SID, and so on) haven’t been set for your user. You want to determine the location of ORACLE_HOME.
Solution
Identify the Oracle instances running on the server. Use the ps and grep commands to isolate any Oracle SMON background processes:
$ ps -ef | grep smon
The output indicates that the PID of 25128 is associated with the SMON background process:
oraus    25128     1  0 Mar18 ?        00:02:23 ora_smon_semgc1
Now use the lsof (list open files) command to identify files opened by that process. The following line of code searches for the string "oracle" in the output:
$ lsof -p 25128 | grep ’bin\/oracle’ | awk ’{ print $9 }’
Here is some sample output:
/orahome/app/oracle/product/12.1.0.1/db_1/bin/oracle
In this way, you can quickly determine the location of ORACLE_HOME, even when the standard variables haven’t been set for your current user logon.
How It Works
The lsof command is ideal for identifying processes and associated open files. This command is quite flexible and has many options. For example, if you want to view processes that have opened a particular file, you can do so by passing the file name to the command; for example:
$ lsof /u01/dbfile/O1212/system01.dbf
Here are a few lines of the output:
COMMAND     PID   USER   FD   TYPE DEVICE  SIZE/OFF     NODE NAME
ora_dbrm_ 16201 oracle  256u   REG  202,2 524296192 14065848 /u01/dbfile/O1212/system01.dbf
ora_dbw0_ 16207 oracle  258uW  REG  202,2 524296192 14065848 /u01/dbfile/O1212/system01.dbf
ora_lgwr_ 16209 oracle  260u   REG  202,2 524296192 14065848 /u01/dbfile/O1212/system01.dbf
...
If you want to view processes that have opened files under a specific directory, you can do so with the +D option:
$ lsof +D /u01/dbfile/O1212
To report on files opened by a particular user, use the -u option:
$ lsof -u oracle
On Solaris systems, the lsof command may not be available. Instead, use the pfiles command to list open files. For example, if you want to identify ORACLE_HOME in a Solaris environment, first identify the SMON processes running on the server:
$ ps -ef | grep smon
oracle 22184     1   0   Mar 13 ?           5:15 ora_smon_EMREP
To identify files opened by the prior process, use the pfiles command, pass it the PID, and then search for any helpful keywords such as the string "dbs":
$ pfiles 22184 | grep dbs
Here is some output:
/orahome/app/oracle/product/12.1.0.2/db_1/dbs/hc_EMREP.dat
In this way, you can determine the location of files used by Oracle processes.



CHAPTER 9
image
Viewing and Configuring System Resources
As part of normal operations, Oracle database processes constantly coordinate and communicate with system server processes. In Linux or Solaris, this type of process-to-process coordination is referred to as interprocess communication (IPC). OSs typically support three types of interprocess communication mechanisms: semaphores, shared memory, and message queues. Oracle database applications typically require significant amounts of semaphore and shared memory resources to function properly.
A semaphore is a construct used by the OS to control which processes can have access to other server resources such as shared memory. The number of OS resources that a database application can consume is governed by several kernel parameters. Maintaining these values is described in detail in this chapter.
Image Note  The kernel is the core program of the OS that manages access to OS resources required by other processes on the system. The kernel is typically responsible for process, CPU, memory, disk management, and network activities.
Nowadays, almost any medium-to-large database can quickly exceed the default kernel settings for system resource limits. Therefore, most database vendors (Oracle, MySQL, DB2, PostgreSQL, and so on) have their own set of recommended values for the kernel parameters that govern the various required OS resources.
Before you can install a database on a server, it prudent to first ensure that that the kernel parameters are configured correctly. An incorrectly configured kernel usually results in either a nonfunctioning or poorly performing database. For example, if enough shared memory can’t be allocated when Oracle starts, an error message like this will be thrown:
ORA-27123:  unable  to  attach to  shared  memory  segment
To avoid these problems, a knowledgeable DBA must know how to view and set kernel parameters that affect database availability and performance. This chapter begins with a high-level overview of how to examine and modify kernel parameters, and then dives into specific details for settings most critical for database applications.
One last note before you start: many of the recipes in this section have slightly different solutions, depending on whether you’re using Linux or Solaris. The differences are noted where applicable.
9-1. Displaying Server Hardware and the Operating System
Problem
You have the task of installing the Oracle database software (binaries). Before you download and install the software, you need to verify the system architecture and OS version of the target host.
Solution
This “Solution” section contains two sections: one for Linux and the other for Solaris. First, let’s discuss viewing server information on Linux.
Linux
Use the uname (print system information) utility with the -a option to display system details. Here’s the output for a typical Linux system:
$ uname  -a
Linux dv3.rmug.org 2.6.18-308.4.1.0.1.el5xen #1 SMP
Tue Apr 17 16:41:30 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
You can also produce the previous output piecemeal via the following options: -s (kernel name), -n (nodename), -r (kernel release), -v (kernel version), -m (machine), -p (processor), -i (hardware platform), and -o (operating system). For example, to print the hardware platform, use the following:
$ uname -p
x86_64
Use uname with the --help parameter to display all choices available in your environment.
Solaris
For Solaris systems, you can use the uname command to print out hardware details:
$ uname -a
SunOS devz1 5.11 11.1 sun4v sparc sun4v
This output indicates that the OS release is 5.11, the platform is sun4v, and the hardware class is sparc. You can also print out the server details in a columnar format using the -X option:
$ uname -X
System = SunOS
Node = devz1
Release = 5.11
KernelID = 11.1
Machine = sun4v
BusType = <unknown>
Serial = <unknown>
Users = <unknown>
OEM# = 0
Origin# = 1
NumCPU = 16
Here’s how to confirm the release level:
$ cat /etc/release
Oracle Solaris 11.1 SPARC
...
You can show the architecture via the isainfo command:
$ isainfo -kv
64-bit sparcv9 kernel modules
How It Works
In today’s global environment, you’ll often connect remotely to database servers located in dispersed data centers. In these situations, you’ll frequently use the uname command with the -a option to verify which machine you’re logged on to.
In a Linux environment, you can also view server information by querying the virtual files in the /proc directory. For example, you can view the current version of the server by viewing the /proc/version file:
$ cat /proc/version
Linux version 2.6.18-308.4.1.0.1.el5xen (mq@ca-build56.us.rmug.com)
(gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) #1 SMP Tue Apr 17 16:41:30 EDT 2012
The Linux /proc virtual filesystem acts as an interface for viewing and configuring kernel parameters. The /proc directory is a hierarchy of files and subdirectories that contain the current settings of kernel values. It is appropriately named /proc because this virtual filesystem sends information to other system processes. This filesystem is virtual because its files don’t actually reside on disk. Note that most files beneath the /proc directory have a 0-byte size. The /proc virtual files are created dynamically in memory from kernel data when you access them.
For your convenience, the /proc filesystem is subdivided into directories that contain related parameters. For example, the /proc/sys subdirectory contains many of the parameters used to configure the kernel. Use the man proc command to view documentation on the /proc virtual filesystem for your server.
Some utilities, such as top and free, extract information from the /proc virtual files and present it in a human-readable formatted fashion.
Table 9-1 describes some of the virtual files in the /proc directory. Use your favorite file-viewing utility (cat, more, less, view, grep, and so on) to inspect these virtual files.
Table 9-1. Descriptions of Linux Virtual Files in the /proc Directory
File Name
Contains Information Regarding
/proc/cpuinfo
CPU and system architecture.
/proc/meminfo
Free and used memory for both physical RAM and swap.
/proc/net
Directory containing network information.
/proc/mounts
All mounted filesystems.
/proc/diskstats
Disk I/O statistics for each disk.
/proc/devices
PCI devices.
/proc/filesystems
Filesystems compiled into the kernel.
/proc/sys
Contains subdirectories and files pertaining to kernel variables. Some variables can be configured with the sysctl command.
/proc/cmdline
Parameters passed to the kernel at boot time.
/proc/version
Version of the OS.
9-2. Listing CPUs
Problem
The Oracle installation documentation recommends installing the binaries on a server with CPUs that meet certain requirements. You want to display the CPU characteristics on your server.
Solution
This “Solution” section contains two sections: one for Linux and the other for Solaris. First let’s cover viewing CPU information on Linux.
Linux
You can quickly obtain in-depth information about the physical characteristics of the CPU(s) on a Linux server by viewing the virtual /proc/cpuinfo file:
$ cat /proc/cpuinfo
For multiple processor boxes, there is a section in the output for each CPU. The first CPU on the box is identified as 0, the next one as 1, and so on. Here’s a partial listing of some typical output from /proc/cpuinfo:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
stepping        : 7
cpu MHz         : 2893.100
cache size      : 20480 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
...
Solaris
For Solaris systems, use the prtdiag utility to display CPU information:
$ prtdiag
Here’s a partial listing of the output:
================================ Virtual CPUs ==========

CPU ID Frequency Implementation         Status
------ --------- ---------------------- -------
0      2848 MHz  SPARC-T4               on-line
1      2848 MHz  SPARC-T4               on-line
2      2848 MHz  SPARC-T4               on-line
3      2848 MHz  SPARC-T4               on-line
...
How It Works
Sometimes you need to know whether the CPUs on your box are powerful enough to handle the database software being installed. If you have multiple CPUs on your server, you’ll see a listing for each CPU. On Linux systems, use the information in the /proc/cpuinfo file to determine whether your server meets Oracle’s prerequisites for CPU minimum megahertz speed, which is usually the cpu MHz line in the /proc/cpuinfo file. On Solaris systems, use the prtdiag utility to view CPU information.
9-3. Displaying Physical Memory
Problem
Oracle’s installation documentation recommends that you have a certain minimal amount of memory installed on the server. You want to verify that you have enough memory on a box before you do a database installation.
Solution
This “Solution” section contains two sections: one for Linux and the other for Solaris. First up is Linux.
Linux
With Linux, you can view the contents of the /proc/meminfo file with grep to check the total physical memory amount:
$ grep MemTotal /proc/meminfo
MemTotal:     15622144 kB
On Linux systems, issue the following grep command to view the total amount of swap memory:
$ grep SwapTotal /proc/meminfo
SwapTotal:    10288440 kB
Image Tip  See Oracle’s MOS notes 233753.1 and 269426.1 for a detailed discussion of /proc/meminfo.
Solaris
On Solaris systems, use the prtconf utility to display installed memory:
$ prtconf | grep "Memory size"
Here’s a snippet of the output:
Memory size: 32768 Megabytes
On Solaris systems, run the following command to view the swap size:
$ swap -l
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap 196,2        16 67108848 67108848
How It Works
When dealing with database servers, you have to be aware of two types of memory: physical RAM and virtual (swap). Depending on the software you install on a box, you are sometimes required to check to see whether there is sufficient RAM and swap memory on the target server. If you don’t have enough physical or swap memory, usually the Oracle Universal Installer will alert you to any inadequacies when you attempt to perform the installation.
You can also view physical and swap memory by issuing the free command with no options. This command gives you a view of the currently free and consumed memory on the box:
$ free
         total        used        free        shared        buffers        cached
Mem:   2074904     2050512       24392             0          84704       1759792
-/+ buffers/cache:  206016     1868888
Swap:      4184924   74652     4110272
On Linux systems, if you want to view per–process memory consumption, use the following cat commands:
$ cat  /proc/<PID>/maps
$ cat  /proc/<PID>/status
On Solaris systems, use the pmap command to see process-to-memory mappings. First use the ps command to identify the process of interest:
$ ps -ef | grep smon | grep -v grep
oracle  7625     1   0   Apr 10 ?           9:29 ora_smon_DWREP
Then use the PID as input to the pmap command:
$ pmap -x 7625
Here’s a partial listing of the output:
7625:   ora_smon_DWREP
         Address     Kbytes        RSS       Anon     Locked Mode   Mapped File
0000000100000000     214912     197232          -          - r-x--  oracle
000000010D2DE000       1584       1144        320          - rwx--  oracle
000000010D46A000         24          -          -          - rwx--  oracle
...
TEMPOARARILY ADDING SWAP SPACE
If you’re short on swap space, you can temporarily add a swap file to your server. As the root user, run the following commands to add approximately 1GB of swap space:
# dd if=/dev/zero of=tempswap bs=1k count=1000000
# chmod  600 tempswap
# mkswap  tempswap
# swapon  tempswap
Verify that the swap space was added with the -s option of the swapon command:
# swapon -s
To remove the temporary swap file, as root run the following commands:
# swapoff  tempswap
# rm tempswap
After disabling the swap file, you should see the swap space in /proc/meminfo return to its original value.
9-4. Viewing Kernel Parameters
Problem
You’re installing database binaries on a new server and need to modify kernel parameters per the installation documentation. Before you make the change, you first want to view all kernel parameters.
Solution
The solution varies by OS. First let’s view Linux kernel parameters:
Linux
Run the following grep command as root to view the current kernel settings in the /proc/sys/kernel directory:
# grep . /proc/sys/kernel/*
The previous command instructs grep to print all strings contained in files located in the /proc/sys/kernel directory. Here is a partial listing of the output:
/proc/sys/kernel/sem:250        32000   100     128
/proc/sys/kernel/shmall:2097152
/proc/sys/kernel/shmmax:1073741824
/proc/sys/kernel/shmmni:4096
Image Note  You can view many files in the /proc virtual filesystem as a non-root account. However, you will need root access to view all virtual files.
You can also use grep to filter for a particular setting. The example searches for the string sem in any files in the /proc/sys/kernel directory:
# grep . /proc/sys/kernel/* | grep  sem
/proc/sys/kernel/sem:250      32000 100    128
If you want to save all the current kernel values in a file, pipe the output of the grep command to a file:
# grep  . /proc/sys/kernel/* >jul11_kernel_parms.txt
Solaris
Solaris 10 and above no longer use the /etc/system file to manage shared memory. Solaris uses the resource control facility for memory management. The /etc/project file is where project settings are stored. To verify the project settings, first use the id command to determine the project identifier:
$ id -p
uid=2000(oracle) gid=200(dba) projid=200(group.dba)
Now use the prctl command to view the maximum shared memory allowed for a project:
$ prctl -n project.max-shm-memory -i project group.dba
project: 200: group.dba
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged      24.0GB      -   deny                                 -
        system          16.0EB    max   deny                                 -
And you can also view the maximum number of semaphore IDs for a project:
$ prctl -n project.max-sem-ids -i project group.dba
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-sem-ids
        privileged        128       -   deny                                 -
        system          16.8M     max   deny
If you’re logged on as the oracle user, you can also verify parameters for the current process as follows:
$ prctl -n project.max-shm-memory -i process $$
process: 20506: bash
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged      24.0GB      -   deny                                 -
        system          16.0EB    max   deny                                 -
Image Tip  On Solaris systems, if you receive the “ORA-27102: out of memory” message when creating a database, see MOS note 1370537.1.
How It Works
Occasionally, you’ll need to verify kernel parameter settings when you’re troubleshooting issues. On Linux systems, you can view settings in the /proc/sys/kernel directory. On Solaris systems, use the prctl utility to view kernel parameters.
Using sysctl
On Linux systems, another way to display all kernel parameters is via the sysctl utility, which allows you to view and modify the kernel files found in the /proc/sys directory. Use the -a option of sysctl to view all kernel parameters:
# sysctl -a
Here is a small snippet of the large output of the previous command:
kernel.msgmnb = 16384
kernel.msgmni = 16
kernel.msgmax = 8192
kernel.shmmni = 4096
kernel.shmall = 2097152
kernel.shmmax = 1073741824
You can save the output of the sysctl -a command to a text file, as shown here:
# sysctl -a  > /root/kernelsysctl.txt
Inspect the output of the sysctl -a command. Note that the kernel values are either a single value or an array of values. When just a single value is involved, it is fairly easy to determine the parameter setting. For example, from the output of sysctl -a, the maximum setting for shared memory (shmmax) is 1,073,741,824 bytes:
kernel.shmmax  = 1073741824
When the kernel settings are stored as an array, it becomes a bit more difficult to determine a particular value. For example, this array of four values shows the current kernel settings for sem (semaphores):
kernel.sem  = 250       32000     100      128
When you work with array values, it is important to know which element of the array maps to which kernel setting. In the previous output, the values of the semaphore array 250 32000 100 128 map to the following kernel parameters: semmsl, semmns, semopm, and semmni, respectively. For example, semmsl is set to 250 and defines the maximum number of semaphores in a semaphore array. The meanings of these semaphore parameters are discussed in detail in recipe 9-6.
Viewing Shared Memory
On Linux systems, all the shared memory parameters appropriately begin with the string shm. Use the ls (list) command with the -1 (that’s a number one) option to display which shared memory parameters are available on your Linux system:
$ ls -1 /proc/sys/kernel/shm*
You should see output similar to this:
/proc/sys/kernel/shmall
/proc/sys/kernel/shmmax
/proc/sys/kernel/shmmni
Next, use the cat command to view individual shared memory settings. This example shows how to view the maximum size (in bytes) of a shared memory segment (shmmax) that can be created:
$ cat  /proc/sys/kernel/shmmax
2147483648
Table 9-2 describes the shared memory parameters. Some of these parameters may not be updatable on your particular Linux system.
Table 9-2. Shared Memory Kernel Parameter Descriptions
IPC Shared Memory Parameter
Description
shmmax
Maximum size (in bytes) for shared memory segment
shmmin
Minimum size (in bytes) of shared memory segment
shmall
Total amount of shared memory (in bytes or pages) that can be used at any time
shmseg
Maximum number of shared memory segments per process
shmmni
Maximum number of shared memory segments for the entire system
9-5. Modifying Kernel Parameters
Problem
You’re performing a database installation. The Oracle documentation specifies the recommended settings for several kernel parameters, but you want to modify these kernel parameters.
Solution
This solution is split into two sections: one for Linux and one for Solaris. Let’s look at Linux first.
Linux
On Linux systems, there are several valid techniques for changing kernel parameters: running sysctl, editing sysctl.conf, adding entries with echo, and adding entries with cat. This section focuses on using sysctl and editing the sysctl.conf file directly. See the “How It Works” section for details on using echo and cat to modify kernel parameters.
Use the sysctl command with the -w option to dynamically modify kernel parameters. The following command changes the kernel semaphore settings in the /proc/sys/kernel/sem virtual file:
# sysctl  -w kernel.sem="250   32000 100 128"
Notice that there are no spaces around the = sign. If you attempt to run the sysctl command with spaces around the = sign, you will receive an error like the following:
error: ’kernel.sem’  must  be of  the  form  name=value
error: Malformed setting ’=’
error: ’250  32000 100 128’  must  be of  the  form  name=value
To make changes persist across system reboots, use your favorite editor (such as vi) to add the parameters to the /etc/sysctl.conf file.
Image Tip  Use the man sysctl or sysctl --help command for all options available in your environment.
You can also directly modify the /etc/sysctl.conf file and then use the sysctl -p command to make desired kernel parameter changes. This example uses vi to first edit the /etc/ sysctl.conf file:
# vi /etc/sysctl.conf
# Add changes and then exit...
After you modify the /etc/sysctl.conf file, you can use the sysctl -p command to make the entries in the /etc/sysctl.conf file instantiated as the current values used by the Linux kernel:
# sysctl -p
The previous command loads into memory the values found in the /etc/sysctl.conf file. You can verify that the values were changed by using cat to view the corresponding virtual file.
When you edit the sysctl.conf file, we recommend that you first make a copy of the file with the date embedded into the file name. For example, before making any changes, create a copy of the file, as shown here:
# cp /etc/sysctl.conf    /etc/sysctl.conf.01_jan_08
Making a copy serves two purposes. First, it provides you with a copy of the parameters as they were before the change, which comes in handy if you want to revert to the previous settings for any reason. Second, this also gives you an audit trail of all kernel changes made via this file. You can then use commands such as diff to display differences in file versions (see recipe 5-14 for details).
Solaris
To modify kernel parameters on Solaris systems, use the prctl command. The following modifies the max-shm-memory parameter to 8GB:
# prctl -n project.max-shm-memory -v 8gb -r -i project group.dba
The following command modifies the max-sem-ids parameter to 256:
# prctl -n project.max-sem-ids -v 256 -r -i project group.dba
When you use the prctl command, you don’t have to reboot the system to instantiate the parameters. However, you do need to use the projmod command to ensure that the parameters are added to the /etc/project file, so that the parameters persist across system reboots. This example updates the /etc/project for the max-shm-memory parameter:
# projmod -sK "project.max-shm-memory=(privileged,8G,deny)" group.dba
How It Works
One advantageous feature of both Linux and Solaris is that you can dynamically change many of the kernel settings while the system is running. The parameters take effect as soon as you change them, and you are not required to reboot the system. This is different from many other OSs that require a server reboot for kernel changes to become instantiated.
On Linux systems, the /proc/sys directory contains virtual files that correspond to kernel settings. You can change the /proc/sys files to dynamically configure kernel values. Hundreds of parameters exist that you can modify. Run the following find command to give you a rough idea of how many kernel files there are with your version of Linux:
# find /proc/sys -type f | wc -l
598
Not all virtual files in the /proc/sys directory can be modified. One quick way to determine whether a /proc/sys file can be altered is to check the permissions. Any file that shows the writable permission can be changed. This example uses the ls -altr command to view the /proc/sys/kernel file permissions:
# ls -altr /proc/sys/kernel
Here is a partial listing of the output. Notice that the first two files are not modifiable, but the last three can be modified (signified by the w write permission):
-r--r--r--      1 root  root  0 Sep      4 16:32  version
-r--r--r--      1 root  root  0 Sep      4 16:32  tainted
-rw-r--r--      1 root  root  0 Sep      4 16:32  shmmni
-rw-r--r--      1 root  root  0 Sep      4 16:32  shmmax
-rw-r--r--      1 root  root  0 Sep      4 16:32  shmall
The shmmax parameter can have a significant impact on database performance. With Oracle databases, this parameter should be set to a value higher than the SGA size. If the shmmax value is too small, you may not be able to start your Oracle instance. Correctly sizing and configuring shared memory are important DBA tasks when building a new database server.
Databases use shared memory as a holding area for data read from disk. Database processes read and modify the data held in shared memory. The shared memory area uses semaphores to control exclusive access to memory segments. A database will fail to start if there isn’t enough shared memory available to be allocated. Therefore, it is paramount that DBAs know how to manage shared memory because it has a direct impact on database availability and performance.
Image Caution  Be careful when modifying kernel values. Modifying a kernel parameter to an unusable value can cause the system to become unstable and require a restart with the boot disk.
Adding sysctl.conf Entries with echo
You can use the echo command to modify kernel parameters by writing the desired output to the specified virtual file. This example writes the values 250 32000 100 128 to the virtual /proc/sys/kernel/sem file using the echo command:
# echo 250 32000 100 128 > /proc/sys/kernel/sem
This command immediately changes the kernel settings for the sem (semaphores) parameter. If you want the change to persist across system reboots, you also need to add an entry to the /etc/sysctl.conf file. This file is read when the system boots to determine the settings for kernel parameters. You can edit the /etc/sysctl.conf file directly (with an editor such as vi) and add the following line:
kernel.sem  = 250 32000 100 128
Alternatively, you can use the echo command to add the desired parameters to the end of the /etc/sysctl.conf file, as shown here:
# echo "kernel.sem  = 250 32000 100 128"  >> /etc/sysctl.conf
Notice that the previous command uses >> to concatenate the desired entry to the bottom of the /etc/sysctl.conf file. Don’t use just a single right arrow > because it would overwrite the contents of /etc/sysctl.conf.
When you use echo and >> to write to the contents of the /etc/sysctl.conf file, no checks are performed to determine whether the kernel parameters you are writing to the file already exist. The echo and >> techniques simply add the values to the bottom of the file.
If two entries in the /etc/sysctl.conf file configure the same kernel parameter, the value that appears nearest to the bottom of the file will be the one that gets set because the parameters are processed from top to bottom. For example, suppose that you have the following two lines in the /etc/sysctl.conf file:
kernel.sem  = 500 64000 200 500
kernel.sem  = 250 32000 100 128
The bottom line in the previous listing will be set last, so it will dictate the kernel setting for the kernel.sem value.
After you use echo to write to the /etc/sysctl.conf file, you can use the sysctl -p command to make the entries in the /etc/sysctl.conf file instantiated as the current values used by the Linux kernel:
# sysctl  -p
Adding sysctl.conf entries with cat
The technique shown here is handy for adding several entries to the /etc/sysctl.conf file at the same time. First, use the cat command to add entries to the /etc/sysctl.conf file. This example shows how to use cat to write typical kernel parameter settings for an Oracle database:
# cat  >> /etc/sysctl.conf  <<EOF
kernel.shmall  = 2097152
kernel.shmmax  = 536870912
kernel.shmmni  = 4096
kernel.sem  = 250 32000 100 128
fs.file-max  = 65536
net.ipv4.ip_local_port_range  =  1024 65000
net.core.rmem_default  = 262144
net.core.rmem_max  = 262144
net.core.wmem_default  = 262144
net.core.wmem_max  = 262144
EOF
This command uses cat to write all the values encapsulated between the two EOF markers to the /etc/sysctl.conf file, which allows you to add several parameters simultaneously to the /etc/sysctl.conf file. When using cat and >> to write parameters to the /etc/sysctl.conf file, there is no automatic checking to determine whether the parameters already exist in the file. Using cat and >> will simply write to the bottom of the file.
After the desired changes are made, use the sysctl -p command to make the entries in the /etc/sysctl.conf file the current values used by the Linux kernel, as shown here:
# sysctl -p
9-6. Displaying Semaphores
Problem
The Oracle installation documentation recommends configuring the system semaphores to certain minimal values. Before you modify the settings, you want to view the semaphore parameters.
Solution
This “Solution” section contains two sections: one for Linux and the other for Solaris. First, let’s discuss displaying semaphores on Linux.
Linux
On Linux systems, you can view semaphore information by displaying the contents of the /proc/sys/kernel/sem file. This example uses the cat command to view semaphore data (you don’t have to be root to view these values):
$ cat /proc/sys/kernel/sem
250         32000     100         128
Notice that there are four values listed for semaphores in the previous output. The numbers represent the value for the following semaphore kernel parameters: semmsl, semmns, semopm, and semmni, respectively. For example, the semmsl value is currently 250, semmns is 32000, and so forth.
Solaris
On Solaris, first view the project for the current user:
$ id -p
uid=2000(oracle) gid=200(dba) projid=200(group.dba)
Use the prctl command to show various semaphore settings for the currently logged-on user. For example, the following shows the maximum number of semaphores per set:
$ prctl -n process.max-sem-nsems $$
Here is some sample output:
process: 10033: -ksh
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
process.max-sem-nsems
        privileged        512       -   deny                                 -
        system          32.8K     max   deny                                 -
The line of code shows the maximum number of semaphore IDs allowed for the project:
$ prctl -n  project.max-sem-ids $$
This shows the maximum number of semaphore operations allowed per semop call:
$ prctl -n process.max-sem-ops $$
How It Works
Before making changes to any kernel information, prudent to first view the current values. Table 9-3 details the meanings of the relevant Linux semaphore variables. Notice that all semaphore variable names aptly begin with the letters sem. These semaphore names may vary slightly, depending on which version of the OS you’re using.
Table 9-3. Semaphore Kernel Parameters and Descriptions
IPC Semaphore Parameter
Description
semmsl
Maximum number of semaphores per set (array)
semmns
Maximum number of semaphores on entire system
semopm
Maximum operations for semop system call
semmni
Maximum number of semaphore arrays on entire system
semvmx
Maximum value of a semaphore
Semaphores are locking mechanisms that coordinate mutually exclusive access to sharable system resources. Semaphores act as gatekeepers to ensure that particular shared system resources are not accessed by multiple processes at the same time. Databases use semaphores to manage access to OS resources such as shared memory.
Database background processes require semaphores to manage mutually exclusive access to shared resources. If there aren’t enough semaphores available for all database processes, the database might not start or a runtime failure might occur. For semaphore-related problems, it is critical that DBAs know how to view and configure these kernel parameters.
RAILROAD SEMAPHORES
In the early days of the railroad, engineers quickly discovered the need for communicating the status of railway lines over long distances. Knowing in advance whether a railway line was in use had a direct impact on survivability. To minimize collisions, railroad engineers devised techniques of signaling via signs visible over long distances.
The signs that the railways used consisted of moving mechanical arms or multicolored lights. These mechanical signs and lights—colloquially called semaphores—were used to communicate in advance whether a set of tracks was free from an obstruction (such as an oncoming train). In this way, the railway engineers ensured that only one train at a time used a single section of tracks.
Semaphores in computers function much like their railway counterparts. Semaphores signify whether a resource is busy. These constructs are typically used to manage exclusive access to segments of shared memory. A database process that needs access to a shared memory segment must first check the status of the corresponding semaphore variable to guarantee that the section is not already in use. In this way, semaphores ensure that only one process at a time operates on a particular shared memory area.
9-7. Configuring Semaphores
Problem
You’re installing Oracle software on a database server and need to modify the semaphore settings.
Solution
This “Solution” section contains two sections: one for Linux and the other for Solaris. First, let’s discuss configuring semaphores on Linux.
Linux
On Linux systems, first check the current values in case they need to be referenced later. To view the current semaphore settings, use the cat command:
# cat /proc/sys/kernel/sem
250     32000     128    128
The values in the previous output represent the settings for the following semaphore parameters in this order: semmsl, semmns, semopm, and semmni.
This example uses the echo command to increase the maximum number of semaphore arrays (semmni) from 128 to 256:
# echo 250 32000 100 256 > /proc/sys/kernel/sem
Image Note  See recipe 9-5 for alternate ways of changing Linux kernel parameters.
Solaris
To modify kernel parameters on Solaris systems, use the prctl command. The following command modifies the max-sem-ids parameter to 256:
# prctl -n project.max-sem-ids -v 256 -r -i project group.dba
When using the prctl command, you don’t need to reboot the system to instantiate the parameters, but you do need to use the projmod command to ensure that the parameters are added to the /etc/project file. This process ensures that the parameters persist across system reboots. Here are some examples of using projmod to make settings permanent across system reboots:
# projmod -sK "project.max-sem-ids=(privileged,256,deny)" group.dba
# projmod -sK "project.max-shm-ids=(privileged,100,deny)" group.dba
# projmod -sK "project.max-sem-nsems=(privileged,256,deny)" group.db
How It Works
After you change the semaphore settings, it is a good idea to use the cat command to verify that the changes took place:
# cat  /proc/sys/kernel/sem
250     32000    100    256
If you want the changes to persist across a system reboot, ensure that you modify the /etc/sysctl.conf file appropriately. In this example, an editor (such as vi) is used to add the following entry to the /etc/sysctl.conf file:
kernel.sem  = 250 32000 100 256
Image Note  Refer to Oracle’s installation documentation for recommended semaphore settings for your version of Oracle and OS.
9-8. Viewing Memory Structures
Problem
Your database has experienced a hard. You want to see whether any database–related memory structures are still physically allocated on the server.
Solution
For both Linux and Solaris, use the ipcs (interprocess communication status) command without any options to view the current allocated physical memory, semaphores, and message queues:
$ ipcs
Here is some typical output:
------ Shared Memory  Segments  --------
Key         shmid     owner       perms     bytes       nattch      status
0xb3e36378  131072     oracle     640       421527552   17
------ Semaphore  Arrays  --------
Key         semid       owner   perms   nsems
0x288e2800 1146880      oracle  640     126
0x288e2801 1179649      oracle  640     126
0x288e2802 1212418      oracle  640     126
0x288e2803 1245187      oracle  640     126
0x288e2804 1277956      oracle  640     126
------ Message  Queues  --------
Key     msqid    owner   perms   used-bytes    messages
The prior output has three sections. The oracle user has 421,527,552 bytes of shared memory allocated. There are five semaphore arrays allocated with 126 semaphores per array. There are no message queues allocated.
How It Works
The “Solution” section of this recipe demonstrates how to view in-memory structures that are currently allocated. On Linux systems, to view the system limits imposed on memory and semaphores, use the -lms options of the ipcs command:
$ ipcs -lms
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 1048576
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767
Compare the maximum memory values to the settings in the /proc/sys/kernel directory:
$ cat /proc/sys/kernel/shmall
2097152
$ cat /proc/sys/kernel/shmmax
1073741824
$ cat /proc/sys/kernel/shmmni
4096
To view all the options available with the ipcs command, use the -h (help) option.
9-9. Removing In-Memory Structures
Problem
Your database unexpectedly crashed, and for some reason the semaphores and shared memory have not been released. Other databases are running on the server, so you can’t reboot the system to release the shared memory objects. You want to manually remove these orphaned memory structures.
Solution
On both Linux and Solaris systems, first view the structures to be removed with the ipcs -sm command:
$ ipcs -sm
On this server, there are two instances of Oracle running, each with one allocated shared memory segment and five sets of semaphore arrays:
------ Shared Memory  Segments  --------
Key        shmid     owner  perms        bytes  nattch    status
0xb3e36378 32768    oracle    640    421527552      16
0x34525e84 65537    oracle    640    421527552      11

------ Semaphore  Arrays  --------
key         semid     owner   perms  nsems
0x288e2800 360448    oracle     640    126
0x288e2801 393217    oracle     640    126
0x288e2802 425986    oracle     640    126
0x288e2803 458755    oracle     640    126
0x288e2804 491524    oracle     640    126
0x3239d0e4 622597    oracle     640    126
0x3239d0e5 655366    oracle     640    126
0x3239d0e6 688135    oracle     640    126
0x3239d0e7 720904    oracle     640    126
0x3239d0e8 753673    oracle     640    126
Image Caution  If you’re working on a server that has multiple Oracle instances running, ensure that you remove the correct memory structure. If you remove the wrong structure, you will inadvertently crash another database.
If you have multiple databases on one server, first verify which memory structures belong to the orphaned instance by running the Oracle sysresv utility (located in the ORACLE_HOME/bin directory). This command reports on memory structures that correspond to your current instance setting of ORACLE_SID. Run this command as the owner of the Oracle binaries (usually oracle):
$ sysresv
Here is the pertinent output:
IPC  Resources  for ORACLE_SID  "RMDB2"  :
Shared  Memory:
ID       KEY
65537    0x34525e84
Semaphores:
ID        KEY
622597    0x3239d0e4
655366    0x3239d0e5
688135    0x3239d0e6
720904    0x3239d0e7
753673    0x3239d0e8

Total /dev/shm size: 14168080384 bytes, used: 1828995072 bytes
Shared Memory:
ID              KEY
1397653516      0x00000000
1397686285      0x00000000
1397719054      0x00000000
1397620747      0x00000000
1397751823      0x6ecd05d4
Semaphores:
ID              KEY
4718634         0xaee7d96c
You can remove memory objects either by key or ID. This example uses the -m option to remove a shared memory segment by its ID:
$ ipcrm -m 622597
This example uses the -s option to remove a semaphore array using an ID:
$ ipcrm  -s  4718634
You can verify that the memory structures have been removed by running sysresv again.
How It Works
The ipcrm command uses either the key or the ID as its input for identifying which IPC object to remove. The basic syntax for using ipcrm is as follows:
$ ipcrm  [ -M key | -m  id  | -Q key | -q  id | -S  key | -s  id ]
In the previous syntax description, -M is used with a shared memory key, -m is used with a shared memory ID, -S is used with a semaphore key, and -s is used with a semaphore ID.
Occasionally, you might have a database crash and for some reason the database semaphores or shared memory structures haven’t been released properly by the OS. In these rare situations, if you don’t have the luxury of rebooting the server, you have to first identify the unreleased memory object with the ipcs and sysresv commands and then remove it with the appropriate ipcrm command.
9-10. Viewing Network Configuration Settings
Problem
The Oracle documentation recommends setting some network parameters on Linux systems to minimal values. You first want to inspect the current network settings.
Image Note  This recipe applies only to Linux systems.
Solution
The virtual /proc network files are usually located either in /proc/sys/net/core or in /proc/ sys/net/ipv4. By using the ls -altr command, you can see that most of the virtual network files in /proc/sys/net/core are updatable:
# ls -altr /proc/sys/net/core
Here’s a partial listing of the output:
total 0
-rw-r--r-- 1 root root 0 Jun 12 22:02 xfrm_larval_drop
-rw-r--r-- 1 root root 0 Jun 12 22:02 xfrm_aevent_rseqth
-rw-r--r-- 1 root root 0 Jun 12 22:02 xfrm_aevent_etime
-rw-r--r-- 1 root root 0 Jun 12 22:02 xfrm_acq_expires
-rw-r--r-- 1 root root 0 Jun 12 22:02 wmem_max
-rw-r--r-- 1 root root 0 Jun 12 22:02 wmem_default
...
Use the cat command to view a particular virtual network file. This example uses cat to display the current setting for the rmem_default kernel parameter:
# cat /proc/sys/net/core/rmem_default
524288
How It Works
To view a complete listing of network settings, use the sysctl command and grep for the string net:
# sysctl -a | grep -i net
You’ll be presented with a great deal of output. Table 9-4 lists some of the network kernel parameters that you may have to modify for database servers.
Table 9-4. Network Kernel Parameter Descriptions
Network Kernel Parameter
Location
Description
rmem_default
/proc/sys/net/core
Default socket receive buffer size
wmem_default
/proc/sys/net/core
Default socket send buffer size (in bytes)
rmem_max
/proc/sys/net/core
Maximum socket receive buffer size (in bytes)
wmem_max
/proc/sys/net/core
Maximum socket send buffer size
(in bytes)
tcp_keepalive_time
/proc/sys/net/ipv4
Number of seconds a connection is idle before TCP starts sending keepalive probes
tcp_keepalive_intvl
/proc/sys/net/ipv4
Interval (in seconds) between keep-alive probes
tcp_keepalive_probes
/proc/sys/net/ipv4
Number of unacknowledged probes sent before connection is terminated
tcp_retries1
/proc/sys/net/ipv4
Number of times TCP will attempt to normally transmit packet
tcp_retries2
/proc/sys/net/ipv4
Maximum number of times a TCP will attempt to transmit a packet
tcp_syn_retries
/proc/sys/net/ipv4
Maximum number of SYNs attempts to transmit
ip_local_port_range
/proc/sys/net/ipv4
Ports allowed for TCP and UDP traffic
VIEWING IP INFORMATION
You’ll occasionally need to view aspects about your network, such as the server’s Internet Protocol (IP) address. Look in the /etc/hosts file to view your hostname and IP information. The /etc/hosts file contains a cross-reference between IP addresses and server names. This example uses cat to display the contents of the /etc/hosts file:
$ cat /etc/hosts
Here’s a sample of what you might find:
127.0.0.1       localhost.localdomain  localhost
177.22.33.89    db123.cent.com    db123
You can also use the hostname -i command to view your server IP address and hostname -d to display domain information.
9-11. Configuring Network Settings
Problem
You’re installing database software on a Linux server, and the Oracle documentation indicates that you need to configure some network parameters.
Image Note  This recipe applies only to Linux systems.
Solution
Use the echo command to update the /proc/sys/net/ipv4/ip_local_port_range file. This example uses the echo command to change the first local port allowed for TCP and UPD traffic to 1024 and the last local port to 65000:
# echo 1024 65000 > /proc/sys/net/ipv4/ip_local_port_range
You can verify the changes with the cat command:
# cat /proc/sys/net/ipv4/ip_local_port_range
1024   65000
You need to add these entries to the /etc/sysctl.conf file to have the changes persist across system reboots. Here’s a sample entry in the /etc/sysctl.conf file:
net.ipv4.ip_local_port_range=1024 65000
How It Works
Before changing any network kernel parameters, make sure that you first save a copy of the original values somewhere so that you can change back to the old values if the new values cause undesirable results. In the example, the value of the ip_local_port_range is first viewed with the cat command:
$ cat /proc/sys/net/ipv4/ip_local_port_range
32768    61000
Image Note  Refer to the Oracle installation documentation for recommended settings for network kernel parameters for your version of Oracle and OS.
9-12. Modifying System Open File Limits
Problem
The Oracle installation documentation for Linux recommends setting the system-wide open file limit for the server. You want to enable this restriction.
Image Note  This solution applies only to Linux systems.
Solution
Use the echo command or the sysctl command to dynamically modify the /proc/sys/fs/file-max value. This example uses the echo command to change the file-max value on the fly to 65536:
# echo 65536 > /proc/sys/fs/file-max
Use the cat command to verify that the change took place:
# cat /proc/sys/fs/file-max
65536
Here’s an example of using sysctl -w to modify the maximum open file limit:
# sysctl -w fs.file-max=65536
Remember to add an entry to the /etc/sysctl.conf file to make the changes persist across system reboots.
How It Works
Linux imposes a limit on the overall number of files that can simultaneously be open on the server. Servers that host database applications tend to have many simultaneously open files. If the default value for the maximum number of open files is too low, you most likely will have to increase it. You’ll know that you’ve hit the maximum limit on the number of open files if you start seeing errors pertaining to “running out of file handles.”
This maximum open file limit is governed by the Linux kernel /proc/sys/fs/file-max virtual file. You can also view the maximum number of file handles by viewing the contents of the /proc/sys/fs/file-nr virtual file:
# cat /proc/sys/fs/file-nr
885 0 65536
This output shows the current number of allocated file handles, the number of free file handles, and the maximum number of file handles, respectively.
9-13. Showing Shell Limits
Problem
You want to view system resource limits instantiated by your logon shell.
Solution
This solution applies to both Linux and Solaris systems. Use the -a (all) option of the ulimit command to print the current soft limits for a process:
$ ulimit -a
core  file size          (blocks, -c) 0
data  seg size           (kbytes, -d) unlimited
file size                (blocks, -f) unlimited
pending  signals                 (-i) 1024
max  locked  memory     (kbytes,  -l) 32
max  memory  size       (kbytes,  -m) unlimited
open files                       (-n) 10000
pipe  size           (512  bytes, -p) 8
POSIX  message  queues    (bytes, -q) 819200
stack  size             (kbytes,  -s) 10240
cpu time                (seconds, -t) unlimited
max  user  processes             (-u) 16375
virtual memory          (kbytes,  -v) unlimited
file  locks                      (-x) unlimited
This output displays the parameter, units of measurement, option used to manipulate the parameter, and its current setting. To view the hard limit resources, use the ulimit -aH command. If you want to view a soft or hard limit for a particular setting, specify its option. For example, to display the hard limit for the number of open files, use the -Hn option:
$ ulimit -Hn
20000
How It Works
Your logon shell will impose default maximum limits on various resources that a process can use, such as the number of open files, processes per user, amount of memory allocated, and so on. These shell limits are defined by the ulimit command. Each resource has a soft limit setting and a hard limit setting. The soft limit setting establishes the default resource limit when a user logs on to the system. If the user process exceeds the soft limit setting for a resource, an error will be thrown. The user can manually increase the setting of the soft limit setting for a resource up to (but not exceeding) the value defined by the hard limit.
A user can modify the hard limit down, but cannot modify the hard limit up. So if you set a hard limit to a lower value, you can’t reset it to its original value. Only the root user can modify a hard limit to a higher value.
On Linux systems, the default values for soft and hard limits on the server are established by adding entries into the /etc/security/limits.conf file. On Solaris systems, use the prctl and projmod commands if you need to modify resource limits. Database processes tend to consume more resources than the default shell limits allow. Therefore, it’s important that you’re familiar with viewing and modifying shell resource limits.
Image Note  The Bash, Bourne, and Korn shells use the ulimit command to view and modify shell resource limits. If you are using the C shell, use the limit command.
9-14. Changing Shell Limits
Problem
You’re performing an Oracle installation on a new server and need to modify the shell limits per the installation instructions.
Image Note  This recipe applies only to Linux systems. Solaris systems use the prctl and projmod utilities to maintain shell limits; typically, the default configuration is adequate.
Solution
To alter the default shell limits for a user on Linux systems, edit the /etc/security/limits.conf file as root. This example shows how to change the number of processes and number of open files defined for the oracle user:
oracle  soft  nproc  2047
oracle  hard  nproc  16384
oracle  soft  nofile 1024
oracle  hard  nofile 65536
In this output, the first line defines the soft limit for the number of processes for the oracle user to 2047, and the second line sets the hard limit for the number of processes to 16384. The third and fourth lines set the soft and hard limits for the number of open files for the oracle user, respectively. These limits will be imposed on the oracle user when logging on to the server.
Image Caution  Do not set the hard limit for a user resource to be higher than the system-wide limit. In particular, don’t set the hard limit for the number of open files to be higher than the value defined in the /proc/sys/fs/file-max virtual file. If a process can open the maximum number of files that reaches the system-wide setting, the system can run out of open files, and the system can become unstable.
How It Works
The ulimit command provides a way to view and limit resources used by a shell and the resources of subprocesses started by a shell. As a non-root user, you can change your soft limits up to the value defined by the hard limit. As the root user, you can modify hard limits to higher values.
For example, to adjust the soft limit of the number of open files to 15000, issue the following:
$ ulimit -Sn 15000
To view the change, issue the command without a value:
$ ulimit -Sn
15000
If you exceed the value defined by the hard limit, you’ll receive an error such as the following:
-bash:  ulimit:  open files:  cannot  modify  limit: Invalid  argument
When you add entries to the /etc/security/limits.conf file on Linux systems, the default shell limit is set for the user process when users log on to the system. Table 9-5 shows the values and their meaning when changing the /etc/security/limits.conf file.
Table 9-5. Description of Limits Set via /etc/security/limits.conf Virtual File
Parameter
Limits
core
Core file size (in KB)
data
Maximum data size (in KB)
fsize
Maximum file size (in KB)
memlock
Maximum locked in memory address space (in KB)
nofile
Maximum number of open files
rss
Maximum resident set size (in KB)
stack
Maximum stack size (in KB)
cpu
Maximum CPU time (in minutes)
nproc
Maximum number of processes
as
Address space limit
maxlogins
Maximum number of logins for user
priority
Priority at which to run user process
Locks
Maximum number of locks for user
When working with database applications, you might not ever want to have a database process constrained by a soft limit. Instead, you might want only the hard limit to govern the amount of resources a database process uses. In this case, you can set the soft limit equal to the hard limit in the database user login profile file. For example, if the oracle user on your Linux box uses the Bash shell, you would modify the .bash_profile logon file (see recipe 2-5 for further details).
The following entries are added to the oracle user’s .bash_profile file to establish the soft limits equal to the hard limits:
ulimit -u  16384
ulimit -n  65536
When you use the ulimit command, if you don’t denote a -S (soft) or -H (hard) option, both the soft and hard limits are set to the value specified. Whenever the oracle user logs on, the soft and hard limits for the number of processes will be 16384, and the soft and hard limits for the number of open files will be 65536.
If you write an entry into the logon file that exceeds the hard limit, the user will receive an error such as this at logon time:
-bash: ulimit:  open files: cannot modify limit: Operation not permitted
Image Note  The Bourne and Korn shells use the .profile initialization file for setting parameters on logon. The C shell uses the .login or .cshrc file.


No comments:

Post a Comment