EasyReliableDBA: Database Cloud Storage-ACFS Data Services

ACFS Data Services

As mentioned in the previous chapter, the Oracle Cloud File System simplifies storage management across file systems, middleware, and applications in private clouds with a unified namespace. The Oracle Cloud File System offers rapid elasticity and increased availability of pooled storage resources as well as an innovative architecture for balanced I/O and highest performance without tedious and complex storage administration.

One of the key aspects of the Oracle Cloud File System (ACFS) is support for advanced data services such as point-in-time snapshots, replication, file tagging, and file system security and encryption features. This chapter covers the inner workings and implementation of the ACFS data services.

ACFS Snapshots

Snapshots are immutable views of the source file system as it appeared at a specific point in time. Think of it as a way to go into the past to see what files or directories looked like at some point in time (when the snapshot was taken).

ACFS provides snapshot capability for the respective file system. The snapshot starts out with a set of duplicate pointers to the extents in the primary file system. When an update is to be made, there is no need to copy extents to the snapshot because it already points to the existing blocks. New storage is allocated for the updates in the primary file system.

This snapshot uses the first copy-on-write (FCOW or COW) methodology to enable a consistent, version-based, online view of the source file system. Snapshots are initially a sparse file system, and as the source file system’s files change, the before-image extent of those files is copied into the snapshot directory. The before-image granularity is an ACFS extent, so if any byte in an extent is modified, the extent is COW’ed and any subsequent changes in that extent require no action for the snapshot.

ACFS supports read-only and read-write snapshot services. Note that ACFS snapshots cannot be used on file systems that house RMAN backups, archived logs, and Datapump dump sets; this restriction is removed in the Oracle 12g release.

When snapshots are created, they are automatically available and always online while the file system is mounted. Snapshots are created as a hidden subdirectory inside the source file system called .ACFS/snaps/, so no separate mount operation is needed and no separate file store needs to be maintained for the snapshots.

ACFS supports a total of 63 read-only, read-write, or combination of read-only and read-write snapshot views for each file system.

ACFS Read-Only Snapshots

Because ACFS read-only snapshots are a point-in-time view of a file system, they can be used as the source of a file system logical backup. An ACFS snapshot can support the online recovery of files inadvertently modified or deleted from a file system.

ACFS Read-Write Snapshots

ACFS read-write snapshots enable fast creation of a snapshot image that can be both read and written without impacting the state of the ACFS file system hosting the snapshot images. To use ACFS read-write snapshots, the disk group compatibility attribute for ADVM must be set to 11.2.0.3.0 or higher. If you create a read-write snapshot on an existing ACFS file system from a version earlier than 11.2.0.3.0, the file system is updated to the 11.2.0.3.0 format. After a file system has been updated to a higher version, it cannot be returned to an earlier version.

The read-write snapshots can be used for the following purposes:

Testing new versions of application software on production file data reflected in the read-write snapshot image without modifying the original production file system.

Running test scenarios on a real data set without modifying the original production file system.

Testing ACFS features such as encryption or tagging. Data in snapshots can be encrypted and workloads can be run to assess the performance impact of the encryption before the live data is encrypted.

ACFS Snapshot by Example

The following shows how to create ACFS snapshots. In this example, a snapshot of the database ORACLE_HOME is created:

The following acfsutil command can be used to obtain information about ACFS snapshots and the file system:

To list all snapshots available in the cluster, execute the following query:

Accessing the snapshot will always provide a point-in-time view of a file; thus, ACFS snapshots can be very useful for file-based recovery or for file system logical backups. If file-level recovery is needed (for the base file system), it can be performed using standard file copy or replace commands.

A possible use-case scenario for snapshots could be to create a consistent recovery point set between the database ORACLE_HOME and the database. This is useful, for example, when a recovery point needs to be established before applying a database patch set. Here are the steps to follow for this scenario:

1. Create an ORACLE_HOME snapshot:

2. Create a guaranteed restore point (GRP) in the database;

3. Apply the patch set.

4. If the patch set application fails, take one of the following actions:

Restore the database to the GRP.

Recover the file system by leveraging the snapshot.

ACFS Tagging

ACFS tagging enables associating tag names with files, logically grouping files that may be present in any location (directory) in a file system. ACFS replication can then select files with a unique tag name for replication to a different remote cluster site. The tagging option avoids having to replicate an entire Oracle ACFS file system. Tags can be set or unset, and tag information for files can be displayed using the command acfsutil tag.

At creation time, files and directories inherit any tags from the parent directory. When a new tag is added to a directory, existing files in the directory do not get tagged with the same tag unless the –r option is specified with the acfsutil tag set command. Any files created in the future, however, do inherit the tag, regardless of whether or not the –r option was specified with the acfsutil tag set command.

ACFS implements tagging using extended attributes. Some editing tools and backup utilities do not retain the extended attributes of the original file by default, unless a specific switch is supplied. The following list describes the necessary requirements and switch settings for some common utilities to ensure ACFS tag names are preserved on the original file:

Install the coreutils library (version coreutils-5.97-23.el5_4.1.src.rpm or coreutils-5.97-23.el5_4.2.x86_64.rpm or later) on Linux to install a version of the cp command that supports extended attribute preservation with the --preserve=xattr switch and a version of the mv command that supports extended attribute preservation without any switches.

The vi editor requires the set bkc=yes option in the .vimrc (Linux) or _vimrc (Windows) file to make a backup copy of a file and overwrite the original. This preserves tag names on the original file.

emacs requires that the backup-by-copying option is set to a non-nil value to preserve tag names on the original filename rather than a backup copy. This option must be added to the .emacs file.

The rsync file-transfer utility requires the -X flag option to preserve tag names. In addition, you must set the -l and -X flags to preserve the tag names assigned to symbolic link files themselves.

The tar backup utility on Linux requires the --xattrs flag to be set on the command line to preserve tag names on a file. However, tar does not retain the tag names assigned to symbolic link files, even with the --xattrs flag.

The tar backup utility on Windows currently provides no support for retaining tag names because no switch exists to save extended attributes.

As of 11.2.0.3, the ACFS tagging feature is available only on Linux and Windows. To use the ACFS tagging functionality on Linux, the disk group compatibility attributes for ASM and ADVM must be set to 11.2.0.2 or higher. To use ACFS tagging functionality on Windows, the disk group compatibility attributes for ASM and ADVM must be set to 11.2.0.3.

This can be done with SQL*Plus, as illustrated here (notice that SQL*Plus is executed from user Oracle on node1):

ACFS Replication Overview

In Oracle Release 11.2.0.2, the ACFS file system replication feature was introduced on the Linux platform. This feature enables replication of an ACFS file system across a network to a remote site. This capability is useful for providing disaster recovery capability. Similarly to Data Guard, which replicates databases by capturing database redo operations, ACFS replication captures ACFS file system changes on a primary file system and transmits these changes to a standby file system.

ACFS replication leverages OracleNet and the NETWORK_FILE_TRANSFER PL/SQL package for transferring replicated data from a primary node to the standby file system node. ACFS replication is only supported on Grid Infrastructure for Cluster, as selected on the Oracle Installer. ACFS replication is not supported on Grid Infrastructure for a Standalone Server. However, you can install Grid Infrastructure for a Cluster on a single node by supplying the necessary information for a single node during installation.

The combination of Oracle Real Application Clusters, Data Guard, and ACFS Replication provides comprehensive site and disaster recovery policies for all files inside and outside the database.

Primary File System

The source ACFS file system is referred to as a primary file system and the target ACFS file system as a standby file system. For every primary file system there can be only be one standby file system. ACFS replication captures, in real time, file system changes on the primary file system and saves them in files called replication logs (rlogs). These rlogs are stored in the .ACFS/repl directory of the file system that is being replicated. If the primary node is part of a multinode cluster, all rlogs (one rlog per node) created at a specific instance are collectively called a cord. Rlogs combined into a cord are then transmitted to the standby node. The cord is then used to update the standby file system.

Keep in mind that data written to files is first buffered in a file system cache (unless direct IO is used); then at a later point in time it is committed to disk. ACFS guarantees that when data is committed to disk it will also be written to the standby file system.

Current Restrictions (11.2.0.3)

The following are consideration points when implementing ACFS file system replication in an Oracle Clusterware 11.2.0.3 system.

The minimum file system size that can be replicated is 4GB.

ACFS currently supports a maximum of eight node clusters for the primary file system.

The primary and standby file systems must be the same OS, architecture, and endianness.

ACFS cannot currently use encryption or security for replicated file systems.

Cascading standbys are not supported.

The ACFS standby file system must be empty before replication is initiated.

Standby File System

Replication logs are asynchronously transported to the node hosting the standby file system, at which point, replication logs are then read and applied to the standby file system. When the replication logs have been successfully applied to the standby file system, they are deleted on both the primary and standby file systems. Because the standby file system is a read-only file system, it can be the source of consistent file system backups after all the outstanding logs are applied.

NOTE

If needed, a read-write snapshot can be taken of the standby file system.

Planning for ACFS Replication

This section describes how to enable ACFS replication. The examples assume that the Grid Infrastructure software has been installed on nodes hosting the ACFS file system and that the ADVM volumes are enabled and the ACFS file systems are mounted.

Note that the primary and standby sites can have differing configurations. In other words, the primary can be a multinode cluster and the standby can be a single-node cluster. If a standby node is used for disaster recovery purposes, it is recommended that the standby node have a configuration similar to the cluster configuration.

There are no rigid primary and standby node roles; that is, a primary node can provide the role of primary for one file system and also provide the role of standby for another file system. However, for simplicity, this chapter will use the term primary node to indicate the node hosting the primary file system and the term standby node for the node hosting the standby file system.

This configuration represents the system used in the following examples. With respect to replication, some commands, such as acfsutil, must be executed with root privileges. Other commands, such as sqlplus, are issued from the oracle user ID. In the examples, the user ID is shown with the command prompt.

Tagging Considerations

ACFS tagging can be a key aspect of to ACFS replication, as it allows users to assign a common naming attribute to a group of files. This is done by leveraging OS-specific extended attributes and implementing generic tagging CLIs. ACFS replication can use these tags to select files with a unique tag name for replication to a different remote cluster site. Thus, rather than an entire file system being replicated, ACFS tagging enables a user to select specific tagged files and directories for replication, by assigning a common naming attribute to a group of files. ACFS replication uses this tag to filter files with unique tag names for remote file system replication. Tagging enables data- or attribute-based replication.

The following example illustrates recursively tagging all files of the /acfs directory with the “reptag” tag:

Using tagging with ACFS replication requires that a replication tag be specified when replication is first initiated on the primary node. Tagging with replication cannot be implemented after replication has been initiated. To begin tagging after replication has been initiated requires that replication first be terminated and then restarted with a tag name.

Before you implement ACFS replication, it is important to determine how and what will be replicated; for example, will all file system data be replicated, certain directories, or only specific ACFS tagged files? This choice will impact file system sizing.

Keep in mind that the tags specified on the init command line need not be applied to files at the time of the initialization. For example, you can replicate files with the tags Chicago and Boston, when at the time of replication only files with the Chicago tag exist (that is, no files with the Boston tag exist). Any subsequent files tagged with Boston will also begin to be replicated.

Setting Up Replication

Before initializing ACFS replication, ensure that the primary file system has a minimum of 4GB of free space multiplied by the number of nodes mounting the file system. This should be done prior to executing the acfsutil repl init command; otherwise, this command will fail.

ACFS replication also requires that the compatible.asm and compatible.advm attributes for the disk group containing the ACFS file system are set to a minimum of 11.2.0.2.0 for Linux (or 11.2.0.3 on Windows) on both the primary and standby nodes. If this was not done in the earlier steps (for enabling tagging or other features), then it can be done now with the sqlplus command, as illustrated here:

Admin User Setup

In most cases, the SYS user in an ASM instance can be used as the ACFS replication administrator, in which case the SYS user will need to be granted the SYSDBA privilege (on the ASM instance). If there is a need to have separate roles for replication management (replication admin) and daily ASM management, then a separate ASM user can be set up. This user must be granted SYSASM and SYSDBA privileges. The following example shows how to set up a replication admin user with a user ID of admin and a password of admin1.

If an ASM password file does not exist, you should create the password file for ASM on all nodes (primary/standby and secondary nodes with multinode clusters), as follows:

NOTE

Please use a password appropriate for your installation.

Next, create the ASM user on the primary node and assign the appropriate roles:

Then create the ASM user on the standby node and assign the appropriate roles:

Finally, review changes to the password file by querying v$pwfile_users:

Hereafter, the ACFS administrator role “admin” will refer to the role that manages ACFS file system replication.

File System Setup

Before initiating replication, the ACFS admin must ensure that the primary file system is mounted and the standby file system is only mounted on one node (in cluster configurations).

It is recommended that users have the same file system name for the standby and primary file systems. Also, ensure that if you’re replicating the entire file system (that is, not using ACFS tagging) that the standby file system is created with a size that is equal to or larger than the primary file system.

Also, you should ensure that sufficient disk space is available on both the primary and the standby file systems for storing the replication logs. The “Pause and Resume Replication” section later in this chapter covers file system sizing details when replication is used. It is recommended that ACFS administrators monitor and prevent both the primary file system and the standby file system from running out of space. Enterprise Manager (EM) can be used for this monitoring and for sending alerts when the file system approaches more than 70-percent full.

In 11.2.0.3, the auto-terminate safeguard functionality was introduced to prevent the primary file system from running out of space. If 2GB or less of free space is available, ACFS will terminate replication on the node. Auto-terminate prevents further consumption of disk space for replication operations and frees disk space consumed by any replication logs that remain. Before reaching the 2GB limit, ACFS writes warnings about the free space problem in the Oracle Grid Infrastructure home alert log. Note, using the Auto-terminate feature exposes the administrator to lose the ability to use the standby if the primary fails when it is running near full capacity. We advice that this feature should be used with extreme caution.

If the primary file system runs out of space, the applications using that file system may fail because ACFS cannot create a new replication log. If the standby file system runs out of space, it cannot accept new replication logs from the primary node; therefore, changes cannot be applied to the standby file system, which causes replication logs to accumulate on the primary file system as well. In cases where the ACFS file system space becomes depleted, ACFS administrators can expand the file system, remove unneeded ACFS snapshots, or remove files to reclaim space (although the latter option is not recommended). If the primary file system runs out of space and the ACFS administrator intends to remove files to free up space, then only files that are not currently being replicated (such as when ACFS tagging is used) should be removed because the removal of a file that is replicated will itself be captured in a replication log.

Network Setup

Two steps are needed to configure the network for ACFS replication:

1. Generate the appropriate Oracle Network files. These files provide communication between the ASM instances and ACFS replication.

2. Set the appropriate network parameters for network transmission. Because ACFS replication is heavily tied to network bandwidth, the appropriate settings need to be configured.

Generating the Oracle Network Files

ACFS replication utilizes Oracle Net Services for transmitting replication logs between primary and standby nodes. The principal OracleNet configuration is a file called tnsnames.ora, and it resides at $ORACLE_HOME/network/admin/tnsnames.ora. This file can be edited manually or through a configuration assistant called netca in the Grid Home. The tnsnames.ora file must be updated on each of the nodes participating in ACFS replication. The purpose of a tnsnames.ora file is to provide the Oracle environment the definition of a remote endpoint used during replication. For example, there are tnsnames.ora files for both primary and standby nodes.

Once the file systems are created, use $ORACLE_HOME/bin/netca (from Grid Home) to create connect strings and network aliases for the primary/standby sites. Figures 11-1 and 11-2 illustrate the usage of NETCA to create Net Services for ACFS Replication.

FIGURE 11-1. NETCA configuration step 1

FIGURE 11-2. NETCA Add Service configuration

On netca exit, the following message should be displayed if the services were set up correctly:

In our example, we created a PRIMARY_DATA service and STANDBY_DATA service for the primary file system and standby file system, respectively. In this example, the tnsnames.ora file used for the primary node is

The important elements are the alias name (STANDBY), the hostname (node2), the default port (1521), and the service name (acfs_fs). This tnsnames.ora defines the remote endpoint for replication (in this case, standby to node1, which is the primary node). The standby node requires a tnsnames.ora file that defines the primary endpoint. It contains the following:

Notice the symmetry between the two tnsnames.ora files. For the sake of simplicity, the service names are the same.

Setting the Network Tunables

Successful replication deployment requires network efficiency and sufficient bandwidth; therefore, the appropriate network tuning must be performed. For ACFS replication, first determine if Data Guard (DG) is already configured on the hosts. If DG is set up appropriately with the appropriate network tunable parameters, then ACFS replication can leverage the same settings. If DG is not enabled, use the Data Guard best practices guide for network setup. The following document describes these best practices (see the “Redo Transport Best Practices” section of this paper):

Validating Network Configuration

Use the tnsping utility and SQL*Plus to test and ensure that the tnsnames.ora files are set up correctly and basic connectivity exists between both sites.

Execute the following to test connectivity from the primary node:

Execute the following to test connectivity from the standby node:

Replication Configuration and Initiation

For good measure, the following ensures that file systems are mounted on each node.

Execute the following to initiate replication from the primary node:

Execute the following to validate replication from the standby node:

Initializing the Standby File System

Replication is first initiated on the standby node, followed by initiation on the primary. Replication on the standby is initiated using the /sbin/acfsutil command by the root user, like so:

NOTE

If this command is interrupted for any reason, the user must re-create the standby file system, mount it only on one node of the site hosting the standby file system, and then rerun the command.

This command uses the following configuration information:

The –p option indicates the username connection to the primary file system site as well as the service name to be used to connect as ASMADMIN on the primary file system node.

The file system listed is the standby file system (/acfs).

If the standby site is using a different service name than the primary file system site, the command –c service_name is required. (Note that this is optional and not shown in the example.)

Now you need to verify that the standby file system is initiated:

Initializing the Primary File System

Once the standby node has been enabled, the ACFS admin can initialize replication on the primary file system by running the acfsutil repl init primary command:

This command allows for the following configuration information:

The –s option, followed by the connect string used to connect as ASMADMIN on the standby node.

The ACFS file system that is to be replicated.

The mount point on the standby node (-m mountp). This is optional and not shown in the example. If not specified, it is assumed that this mount point path is the same on the standby node as it is on the primary file system node.

The –c option, which is used to indicate the primary service name. Again, this is optional and not shown in the example.

If tagging was enabled for this directory, then the tag name “reptag” can be added in the initialization command, as follows:

Next, you need to verify that the primary file system is initiated:

Once the acfsutil repl init primary command completes successfully, replication will begin transferring copies of all specified files to the standby file system.

The replication happens in two phases: The initial phase copies just the directory tree structure, and the second phase copies the individual files. During this second phase, all updates or truncates to replicated files are blocked. Once a file is completely copied to the standby file system, replication logging for that particular file is enabled. All changes to copied files are logged, transported, and applied to the standby file system.

Next, you need to validate replication instantiation:

The rate of data change on the primary file system can be monitored using the command

where the –s flag indicates the sample rate. The amount of change includes all user and metadata modifications to the file system. The following example illustrates its usage:

This “amount” value approximates the size of replication logs generated when capturing changes to the file system. This command is useful for approximating the extra space required for storing replication logs in cases of planned or unplanned outages.

Pause and Resume Replication

The acfsutil repl pause command is used in instances when replication needs to be temporarily halted, such as for planned downtime on either the primary or standby site. The ACFS pause command can be issued on either the primary or a standby file system node. However, there is a difference in behavior between the two scenarios.

A pause command issued on the standby node will continue to generate and propagate replication logs from the primary to the standby file system, but these rlogs will not be applied to the standby file system; in other words, it does not suspend transfer of rlogs from the primary node, only the application is deferred. Consequently, rlogs will continue to accumulate at the node hosting the standby file system. As noted earlier, replication logs are deleted on the primary and standby sites only after they are successfully applied to the file system on the standby node; therefore, care should be taken to ensure that this does not cause the primary and standby file systems to run out of space.

A pause command issued on the primary node will generate replication logs but not propagate them to standby; in other words, it will generate rlogs but suspend their propagation. In this scenario, rlogs will continue to accumulate at the primary file system. This may cause the primary file system to run out of space. Thus, when paused on the standby, it is possible to run out of space on both primary and standby, while pausing on primary has the potential to just cause issue for the primary. This would be relevant when a standby system is destination for multiple file systems.

In both cases, ACFS administrators should run the acfsutil repl resume command at the earliest point possible, before the accumulated replication logs fill the file system. Note that the resume command should be executed at the same location where replication was paused.

In cases where there is a planned outage and the standby and primary file systems have to be unmounted, it is best to ensure that all the changes are propagated and applied on the standby file system. The acfsutil repl sync command is used for this purpose. It is used to synchronize the state of the primary and standby file systems, and it implicitly causes all outstanding replication data to be transferred to the standby file system. The acfsutil repl sync command returns success when this transfer is complete or when all these changes have been successfully applied to the standby file system, if the apply parameter is supplied. This command can only be run on the node hosting the primary file system.

For unplanned outages, if the cluster (or node) hosting the primary file system fails, the administrator of the standby file system should decide whether or not the situation is a disaster. If it is not a disaster, then when the primary site recovers, replication will automatically restart. If it is a disaster, you should issue an acfsutil terminate command on the standby file system to convert it into a primary. If replication needs to reinstantiated, then once the original primary is restarted, replication initialization will need to be performed again.

If the node hosting the standby file system fails, a major concern is the amount of update activity that occurs on the primary file system relative to the amount of free space allocated to address standby file system outages. If the free space in the primary file system is exceeded because of the inability to transfer updates to the standby file system, a “file system out of space” condition will occur and space will need to be made available—for example, by removing items no longer needed (particularly snapshots), performing a file system resize to add space, and so on. However, assuming the standby comes back, then as soon as primary file system space is available, replication will continue. During this interval, where no space is available, the file system will return errors in response to update requests. If the standby file system is going to be down for a long period of time, it is recommended that the primary file system be unmounted to avoid update activity on the file system that could result in an out-of-space condition. When the standby file system becomes available, the primary file system could be remounted and replication will restart automatically. Alternatively, the primary file system admin could elect to terminate and reinstantiate once the site hosting the standby file system is recovered.

Sizing ACFS File Systems

To size the primary and standby file systems appropriately for these planned and unplanned outages, you can use the acfsutil fs info command, described earlier, as a guide to determine the rate of replication log creation. First, determine the approximate time interval when the primary file system is unable to send replication logs to the standby file system at its usual rate or when standby file systems are inaccessible while undergoing maintenance. Although it is not easy to determine how long an unplanned will last, this exercise helps in determining the overall impact when an unplanned outage occurs.

As an aid, run acfsutil info fs -s 1200 on the primary file system to collect the average rate of change over a 24-hour period with a 20-minute interval:

The output from this command helps determine the average rate of change, the peak rate of change, and how long the peaks last. Note that this command only collects data on the node it is executed on. For clustered configurations, run the command and collect data for all nodes in the cluster.

In the following scenario, assume that t = 60 minutes is the time interval that would adequately account for network problems or maintenance on the site hosting the standby file system. The following formula approximates the extra storage capacity needed for an outage of 60 minutes:

N = Number of cluster nodes in the primary site generating rlogs

p_t= Peak amount of change generated across all nodes for time t

t = 60 minutes

Therefore, the extra storage capacity needed to hold the replication logs is (N * 1GB) + p_t.

In this use-case example, assume a four-node cluster on the primary where all four are generating replication logs. Also, during peak workload intervals, the total amount of change reported for 60 minutes is approximately 6GB for all nodes. Using the preceding storage capacity formula, 10GB of excess storage capacity on the site hosting the primary file system is required for the replication logs, or (4 * 1GB) + 6GB = 10GB.

ACFS Compare Command

In certain situations, users may want to compare the contents of the primary and standby file systems. The acfsutil repl compare command can be used to compare the entire ACFS file system or a subset of files (such as tagged files).

The acfsutil repl compare command requires that the standby file system be mounted locally for comparison. This can be accomplished by NFS mounting the standby file system onto the primary. As with any compare operation, it is recommend that the primary has limited or no file changes occurring.

The acfsutil repl compare command with the -a option can be used to compare the entire contents of the primary file system against those on the standby file system. The -a option also tests for extra files on the standby file system that do not currently exist on the primary.

The -a option is typically used when no tag names were specified during the acfsutil repl init operation. When only tagged files need to be compared, the -t option can be used. Users can even compare multiple sets of tagged files by listing comma-separated tag names. This option first locates all filenames on the primary file system with the specified tag names and compares them to the corresponding files on the standby. The -t option also tests for extra files on the standby file system that do not have an associated tag name specified during the acfsutil repl init operation. The acfsutil repl info -c option can be used to determine what tags were specified during the acfsutil repl init operation. If neither the -a nor -t option is provided, a primary-to-standby file comparison is done without testing tag names or extended attributes.

The following shows a sample execution of acfsutil repl compare:

Termination of Replication

The acfsutil repl terminate command is used to abort the ongoing replication. The terminate command operates on a specific file system. A graceful termination can be achieved by terminating the replication first on the primary followed by the standby node. A graceful termination allows for the standby to apply all outstanding logs.

The following command terminates replication on primary node:

Next to terminate replication on standby node:

After the standby is terminated, the file system is automatically converted to writable mode:

Once file system replication termination has completed for a specific file system, no replication infrastructure exists between that primary and standby file systems. The termination of replication is a permanent operation and requires a full reinitialization to instantiate again. To restart replication, use the acfsutil repl init command, as previously illustrated.

ACFS Security and Encryption

As discussed in Chapter 10, ACFS provides standard POSIX file system support; however, ACFS also provides other file system services, such as security, encryption, tagging, snapshots, and replication. In this section we cover ACFS Security and ACFS Encryption. ACFS Security and Encryption—along with Oracle Database Vault and Oracle Advanced Security Option (ASO) —provide a comprehensive security solution for unstructured data residing outside the database and database-resident structured data, respectively. Note that ACFS Security and Encryption are not part of the ASO license; they must be licensed separately via the Cloud Edition.

There are two aspects of security on a file system. One is the restriction of logical access to the data, such as obtaining file information or file location. The other is preventing physical access to the data, such as opening, reading, or writing to data (files). The former is handled by ACFS Security and the latter by ACFS Encryption.

Databases and Security

Databases generally have peripheral data (data that lives outside the database, but has direct ties to the data within the database), such as medical reports and images, text files, contracts, metadata, and other unstructured data. This data needs to be kept secured and must meet regulatory compliancy (for example, SOX, HIPAA, PCI, or PII).

ACFS supports the Unix “user, group, others” model and supports Access Control Lists (ACLs) on Windows. These constructs are based on the Discretionary Access Control (DAC) model. In the DAC model, controls are discretionary in the sense that a subject with certain access permission is capable of passing that permission (perhaps indirectly) on to any other subject. In the case of a file system, the owner of a file can pass the privileges to anybody.

Besides some of the issues in DAC, such as transfer of ownership, a major concern is that the root user or administrator will bypass all user security and have the privileges to access or modify anything on the file system. For databases where the DBA (database administrator) has more privileges than required to perform his duties, Oracle addresses this problem with a security product called Oracle Database Vault, which helps users address such security problems as protecting against insider threats, meeting regulatory compliance requirements, and enforcing separation of duty. It provides a number of flexible features that can be used to apply fine-grained access control to the customer’s sensitive data. It enforces industry-standard best practices in terms of separating duties from traditionally powerful users. It protects data from privileged users but still allows them to maintain Oracle databases.

The goal of Oracle Database Vault, however, is limited to Oracle databases. Today, customers need the same kind of fine-grained access control to data outside the database (such as Oracle binaries, archive logs, redo logs, and application files such as Oracle Apps). ACFS Security fills this gap with a similar paradigm, in which realms, rules, rule sets, and command rules (described later) provide fine-grained access to data.

ACFS Security

ACFS Security provides finer-grained access policy definition and enforcement than allowed by an OS-provided access control mechanism alone. Another goal of ACFS Security is to provide a means to restrict users’ ability to pass privileges of the files they own to other users if they are not consistent with the global policies set within an organization. Lastly, ACFS Security follows the principle of least privilege in the facilities it provides for the definition and administration of security policies.

ACFS Security uses realms, rules, rule sets, and command rules for the definition and enforcement of security policies:

Realm An ACFS realm is a functional grouping of file system objects that must be secured for access by a user or a group of users. File system objects can be files or directories. By having these objects grouped in the form of a realm, ACFS Security can provide fine-grained access control to the data stored in ACFS. For realm protection to take effect, objects must be added to a realm. Objects can be added to more than one realm. The definition of a realm also includes a list of users and groups. Only those users who are part of the realm directly or indirectly via the groups are allowed access to the realm. Only those users who are part of the realm can access the objects within the realm if the rules are satisfied.

Rule A rule is a Boolean expression that evaluates to TRUE or FALSE based on some system parameter on which the rule is based. An option of ALLOW or DENY can be associated with each rule. Rules can be shared among multiple rule sets. For example, a “5–9PM” rule evaluates to TRUE if the system time is between 5 p.m. and 9 p.m. when the rule is evaluated. ACFS Security supports four types of rules:

Time Evaluates to TRUE or FALSE based on whether the current system time falls between the start time and end time specified as part of rule definition.

User Evaluates to TRUE or FALSE based on the user executing the operation.

Application Evaluates to TRUE or FALSE based on the application that is accessing the file system object.

Hostname Evaluates to TRUE or FALSE based on the hostname accessing the file system object. The hostname specified must be a cluster member and not a client host accessing an ACFS file system via NFS, for example.

Rule set A rule set is a collection of rules that evaluates to “allow” or “deny” based on the assessment of its constituent rules. Rule sets can be configured to evaluate to “allow” if all constituent rules evaluate to TRUE with the option “allow” or if at least one rule evaluates to TRUE with the option “allow” depending on the rule set options.

Command rule Oracle ACFS command rules are associations of the file system operation with a rule set. For example, the association of a file system create, delete, or rename operation with a rule set makes a command rule. Command rules are associated with a realm.

ACFS Security Administrator

In accordance with the principle of least privilege, ACFS Security mandates that security policy definition and management be the duty of a user with a well-defined security administrator role and not a user with the system administrator role. To this end, as part of initializing ACFS Security, the system administrator is required to designate an OS user as an ACFS security administrator. A temporary password is set for this security administrator, and it should be changed immediately to keep the security administrator’s role secure. This security administrator can then designate additional users as security administrators using the acfsutil sec admin add command. Only a security administrator can designate or remove another user as a security administrator. There is always at least one security administrator once ACFS Security has been initialized, and the last security administrator cannot be removed using the acfsutil sec admin remove command.

The security administrator creates and manages security policies using realms, rules, rule sets, and command rules. For any administrative tasks, the security administrator must authenticate himself using a password that is different from his OS account password. Each security administrator has a unique password, which can be changed only by that security administrator. These passwords are managed by ACFS Security infrastructure and are kept in a secure Oracle Wallet stored in the Oracle Cluster Repository (OCR). Security administrators are allowed to browse any part of the file system tree. This allows them to list and choose files and directories to be realm-secured. No security administrator, however, is allowed to read the contents of any files without appropriate OS and realm permissions.

Enabling and Disabling ACFS Security

ACFS Security can be enabled or disabled on a file system by running the acfsutil sec enable and acfsutil sec disable commands, respectively. Disabling ACFS Security on a file system preserves all the security policies defined for that file system, but disables their enforcement, which implies access to files and directories on that file system is arbitrated only through only the OS mechanism. To enable enforcement, the ACFS security administrator can run the acfsutil sec enable command. Security can be enabled and disabled at the file system or realm level. By default, ACFS Security is enabled on a file system when it is prepared for security. A newly created realm can have it enabled or disabled based via a command-line option (the default is enabled). Disabling ACFS Security at the file system level disables enforcement via all realms defined for that file system. Enable and disable capability can be useful when security policies are not completely defined and the security administrator wishes to experiment with some policies before finalizing them.

Configuring ACFS Security

ACFS Security is supported only for ASM 11g Release 2, and the disk group compatibility attributes for ASM and ADVM must be set to 11.2.0.x, where x represents the version of the ASM installed.

ACFS file systems can be configured to use ACFS Security via the acfsutil sec commands or the ASMCA utility. ACFS Security must be initialized before any file systems can be configured to use it. This is done using the acfsutil sec init command, which needs to be run only once for the entire cluster. As part of the acfsutil sec init command, an OS user is designated to be the first security administrator. It is recommended that this OS user be distinct from the DBA user. This user must also be in an existing OS group designated as the Security Administrator group. Additional users can be designated as security administrators. All security administrators, however, must be members of the designated security OS group. Moreover, initializing ACFS Security also creates the storage necessary to house the security administrator’s security credentials.

Once ACFS Security has been initialized for the cluster, the security administrator can prepare a file system to use it by running the acfsutil sec prepare command. This step is a prerequisite for defining security policies for the file system. The acfs sec prepare command performs the following actions:

It initializes ACFS Security metadata for the file system.

It enables ACFS Security on the file system.

It creates the following directories in the file system that is being prepared:

.Security

.Security/backup

.Security/logs

It builds the following system security realms:

SYSTEM_Logs Protects ACFS Security log files in the .Security/realm/logs/directory.

SYSTEM_SecurityMetadata Protects the ACFS Security metadata XML file in the .Security/backup/directory.

On the Windows platform, the SYSTEM_Antivirus realm is created to provide installed antivirus software programs access to run against the ACFS file system. The SYSTEM_Antivirus realm can only perform the OPEN, READ, READDIR, and setting time attribute operations on a file or directory. Generally, antivirus software programs inoculate and delete infected files. For the antivirus software programs to perform these actions successfully, the ACFS security will need to be disabled. For every realm-protected file or directory, the SYSTEM_Antivirus realm is evaluated when authorization checks are performed to determine if the SYSTEM_Antivirus realm allows access to the file or directory. To allow the antivirus process to access realm-protected files or directories, you must add the LocalSystem or SYSTEM group to the realm with the acfsutil sec realm add command. If antivirus processes are running as administrator, then the user administrator must be added to the SYSTEM_Antivirus realm to allow access to realm-protected files and directories. If no antivirus products have been installed, do not add any users or groups to the SYSTEM_Antivirus realm. Because users or groups added to the SYSTEM_Antivirus realm have READ and READDIR access, you should limit the users or groups added to this realm. ACFS administrators can restrict the time window when the users or groups of this realm can access the realm-protected files or directories with time-based rules. Additionally, ACFS administrators can also have application-based rules if they can identify the process name for the antivirus installation that scans the files.

Once a file system has been prepared for security, the security administrator can start defining security policies for the data in the file system by considering the following:

What data needs to be protected? Files to be protected must to be added to one or more realms.

Who has access to data? Users intended to be allowed access to files in the realm must be added to the realm.

What actions are the users allowed or not allowed to take on data? Command rules (in conjunction with rule sets) define these actions.

Under what conditions can the data be accessed? Rules and rule sets define these criteria.

Access to files in a realm of an ACFS file system must be authorized by both the realm and the underlying OS permissions (that is, the standard “owner, group, other” permissions on typical Unix/Linux platforms or Access Control Lists (ACLs) on Windows). Accessing a file that has security enabled involves tiered validation. First, access is checked against all realms that the file is a part of. If even a single realm denies access, overall operation is not allowed. If realm authorization allows access, then OS permissions are checked. Also, if authorized by the latter, the overall operation is allowed.

ACFS Security and Encryption Logging

Auditing is a key aspect of any security configuration, and ACFS Security is no exception. Auditing and diagnostic data are logged for ACFS Security and Encryption. These log files include information such as the execution of acfsutil commands, use of security or system administrator privileges, run-time realm-check authorization failures, setting of encryption parameters, rekey operations, and so on. Logs are written to the following log files:

$GRID_HOME/.Security/realm/logs/sec-host_name.log This file is created during the acfsutil sec prepare command and is itself protected by ACFS Security using the SYSTEM_Logs realm.

$GRID_HOME/log/host_name/acfssec/acfssec.log This file contains messages for commands that are not associated with a specific file system, such as acfsutil sec init. The directory is created during installation and is owned by the root user.

When an active log file grows to a predefined maximum size (10MB), the file is automatically moved to log_file_name.bak, the administrator is notified, and logging continues to the regular log file name. When the administrator is notified, he must archive and remove the log_file_name.bak file. If an active log file grows to the maximum size and the log_file_name.bak file exists, logging stops until the backup file is removed. After the backup log file is removed, logging restarts automatically.

Databases and Encryption

Although mechanisms natively built into the Oracle Database and those provided by Oracle Database Vault can control access to data stored in the database, this data is not protected from direct access via physical storage. A number of third-party tools can be used to provide read and write access to data stored on secondary storage, thus circumventing protection provided by the database and the OS. Furthermore, these database and OS protections mechanisms do not protect against data loss or theft. For example, storage can be re-attached on a completely different system from the one it was intended for. Features in Oracle Database’s Advanced Security Option (ASO) provide protection against such scenarios. Transparent Data Encryption (TDE) provides the capability for encrypting data at the column and tablespace levels, service data protection, and compliance needs of customers.

ACFS and File Encryption

The same threats as those mentioned previously for database data exist for file system data too. ACFS Encryption protects from these threats by encrypting data stored on a secondary device, or data at rest. It should be noted that ACFS Encryption protects user data and not file system metadata. Keeping data at rest encrypted renders the data useless without encryption keys in case of theft of the physical storage on which the data resides.

Encryption can be applied to individual files, directories, or an entire ACFS file system. Furthermore, both encrypted and nonencrypted files can exist in the same ACFS file system. Applications need no modification to continue to work seamlessly with encrypted files. Data is automatically encrypted when it is written to disk and automatically decrypted when accessed by the application. It should be noted that encryption is used for protecting stored data. It does not provide access control or protection against malicious access, both of which fall under the purview of ACFS Security and OS-level access control mechanisms. Thus, a user authorized to read a file would always get plain-text data.

Figure 11-3 shows the application and internal view of ACFS Encryption.

FIGURE 11-3. ACFS Encryption hierarchy

ACFS Encryption imposes no penalty on cached reads and writes because the data in the OS page cache is in plain text. Data is encrypted when it is flushed to disk and decrypted when read from disk into the OS page cache for the first time.

ACFS Encryption Key Management

ACFS Encryption requires minimal key management tasks to be performed by the administrator. Keys are transparently created and securely stored with minimal user intervention. ACFS Encryption uses two-level keys to minimize the amount of data that is encrypted with a single key. A file encryption key (FEK) is a per-file unique key. A file’s data is encrypted using the FEK. A volume encryption key (VEK), a per-file system key, serves as a wrapping key, and each FEK is stored on disk encrypted using the VEK. Figure 11-4 shows the relationship between the two types of keys.

FIGURE 11-4. A file encryption key (FEK) and volume encryption key (VEK) relationship

The encryption keys are never stored on disk or in memory in plain text. The keys are either obfuscated or encrypted using a user-supplied password. ACFS Encryption supports the Advanced Encryption Standard (AES), which is a symmetric cipher algorithm, defined in Federal Information Processing (FIPS) Standard 197. AES provides three approved key lengths: 256, 192, and 128 bits. The key length can be specified when you are configuring ACFS Encryption for a file system.

ACFS Encryption supports the “rekey” operation for both VEKs and FEKs. The rekey operation generates a new key and reencrypts the data with this new key. For FEKs, the data to encrypt is the user data residing in files and for VEKs the data is the FEKs.

ACFS Encryption Configuration and Use

Before using ACFS Encryption, the system administrator needs to create storage for encryption keys using the acfsutil encr init command. This command needs to be run once per cluster, and it must be run before any other encryption commands. ACFS Encryption provides an option to create password-protected storage for encryption keys. Creating password-protected storage requires that the password be supplied whenever an operation is going to read or modify the encryption key store. The three operations that read or modify the encryption key store are acfsutil encr set, acfsutil encr rekey –v, and mount.

Once ACFS Encryption has been initialized, a file system can be configured to use it via the acfsutil encr set command. This command sets or changes encryption parameters, algorithm, and key length for a file system. Once this command has been run on a file system, individual files and directories or the entire file system can be encrypted using the acfsutil encr on command.

Certain ACFS Encryption command usage and functionality requires system administrator privileges. This functionality includes the commands for initiating, setting, and reconfiguring ACFS Encryption. System administrators and ACFS security administrators can initiate ACFS encryption operations; however, unprivileged users can initiate encryption for files they own. An ACFS security administrator can manage encryption parameters on a per-realm basis. After a file is placed under realm security, file-level encryption operations are not allowed on that file. Even if ACFS Security allows the file owner or the root user to open the file, file-level encryption operations are blocked. Encryption of realm-protected files is managed exclusively or entirely by the ACFS security administrator, who can enable and disable encryption for files at a security realm level. After a directory has been added to a security realm, all files created in the directory inherit the realm-level encryption parameters. When a file is removed from its last security realm, the file is encrypted or decrypted to match the file system-level encryption status. The file is not re-encrypted to match file system-level parameters if it was already encrypted with security realm parameters.

A system administrator cannot rekey realm-secured files at the file system or file level. To ensure all realm-secured files are encrypted with the most recent VEK, you must first remove encryption from all realms, and then re-enable encryption. This action re-encrypts all files with the most recent VEK.

Encryption information for Oracle ACFS file systems is displayed in the (G) V$ASM_ACFS_ENCRYPTION_INFO view or using acfsutil sec info or acfsutil encr info command sets.

ACFS Snapshots, Security, and Encryption

Users cannot modify security or encryption metadata in read-only snapshots. That is, security policies cannot be modified or created and files cannot be encrypted, decrypted, or rekeyed in a read-only snapshot. Files in a snapshot, however, preserve their security and encryption statuses, as they existed, at the time of snapshot creation. Changing the encryption or security status of a file in the live file system does not change its status in the snapshot, whether read-only or read-write. Therefore, if a file was not secured by a realm in the snapshot, it cannot be realm-secured by adding the corresponding file in the active file system to a security realm. If a file was not encrypted in the snapshot, that file cannot be encrypted by encrypting the corresponding file in the active file system. Therefore, unprotected files in snapshots present another potential source of data for malicious users. When applying security and encryption policies, an administrator should be aware of these potential backdoors to unprotected data. To that end, when certain encryption operations such as enabling of file system–level encryption and rekey are attempted, a warning is printed to let the administrator know that these will not affect the files in the snapshot(s). To ensure no unprotected copies of data are available for misuse, administrators should confirm that no snapshots exist when security and encryption policies are applied.

Because read-write snapshots allow changes to files in the snapshot, the encryption status of files can also be changed. They can be encrypted, decrypted, or rekeyed by specifying as the target a path in a read-write snapshot. An encryption, decryption, or rekey operation specified at the file system level, however, does not process files and directories of snapshots, read-only or read-write, in the .ACFS/snaps directory. For the purpose of these operations, the file system boundary includes only the live file system and not its snapshots. To do these operations on read-write snapshot files, the administrator can specify as the target a path in the read-write snapshot.

In the 11.2.0.3 release, changing or creating security policies in a read-write snapshot is not yet supported. Furthermore, files in a read-write snapshot cannot be added to or removed from realms. A new file created in a realm-secured directory in a read-write snapshot, however, inherits the realm security attributes of the parent directory. If the realm protecting the new file has encryption turned on, the file is encrypted with the encryption parameters set in the realm. If the realm protecting the new file has encryption turned off, the file is decrypted. Files and directories in a read-write snapshot cannot be added to or removed from any security realm.

ACFS Security and Encryption Implementation

This section describes the steps to implement ACFS Security and Encryption. These steps can be performed by using the command line or the ASMCA utility. A mixture of both will be shown for simplicity.

Here’s the use-case scenario: Company ABC provides escrow services to buyers and sellers. As part of the value-add services, ABC provides access to an information library. This library, which is managed and maintained by ABC, stores historical content such as preliminary reports, pro forma, and escrow final reports. ABC loads all these reports from the front-end servers to the ACFS file system that runs on a RAC cluster, with one directory per escrow. ABC now wants to encrypt and secure all the escrow content.

In this example, it is assumed that the RAC cluster is built, the ACFS file system is created with the appropriate directories, and the content is loaded. Here are the steps to follow:

1. Create or identity the OS user who will be the ACFS security administrator for the cluster. In our use case, we will create a user named orasec in the orasec group.

2. Launch ASMCA to initialize ACFS Security and Encryption (see Figure 11-5).

FIGURE 11-5. Configure ACFS cluster file system security using ASMCA

Because enabling ACFS Security can only be done by a privileged user, the Show Command button will display the command to be issued, by root. The user will be prompted to enter a new password for the ACFS security administrator, which must be eight characters long. Note that this password is not the login password for the OS user but rather the password for the ACFS security administrator. Here’s the command to configure ACFS Security:

And here’s the command to configure ACFS Encryption:

3. Verify that orasec has the appropriate authorization to display ACFS Security information. Execute the acfsutil sec info commands using only the chosen user ID:

4. Configure the ASM disk group compatibility attributes for ADVM and ASM. ASM has several disk group attributes, but in the context of ACFS Security and Encryption the relevant ones are the following:

5. Create or identify the ACFS file system that will be secured. Once this is identified, prepare the file system for ACFS Security. In our case, we want to enable ACFS Security on the /u01/app/acfsdata/bfile_data file system:

6. Verify that security is enabled:

7. As stated earlier, ACFS Security can be used in conjunction with ACFS Encryption. For these cases, encryption must be initialized and set before encryption is enabled on a security realm. Keep in mind that in our example we initialized ACFS Security and Encryption in one command (see step 2). However, if this was not done in step 2, the following needs to executed as root:

On the other hand, if it was performed in step 2, then run the following as root:

Note that we did not specify an AES encryption algorithm or key length for which ACFS picked up the defaults. To set a different key length (AES is the only supported algorithm), use the –k option. For example, to set AES encryption of 256 bits, execute the following:

The acfsutil encr set command transparently generates a volume encryption key that is kept in the key store that was previously configured with the acfsutil encr init command.

Here’s the command to verify that encryption has been set:

Note that “Encryption status” here is set to OFF. This is because the acfsutil encr set command does not encryption any data on the file system, but only creates a volume encryption key (VEK) and sets the encryption parameters for the file system.

8. Enable encryption at the file system level with the following command:

9. Create the ACFS Security rule sets and then the rules:

This example specifies a rule type and a rule value. The rule type can be application, hostname, time, or username. The rule value depends on the type of rule. A rule can be added to a rule set, and that rule set can be added to a realm. However, you can create singleton rules without having the hierarchy of the rule set and realms.

10. Add new rules to the rule set:

All these definitions could be listed in a file executed in batch using the acfsutil sec batch command.

Summary

ACFS Data Services such as Replication, Tagging, Snapshots, Security, and Encryption complement Oracle’s Database Availability and Security technologies, such as Data Guard, Database Vault, and Transparent Data Encryption. In addition, ACFS Data Services provides rich file support for unstructured data.

ASM Optimizations in Oracle Engineered Solutions

When implementing a private cloud solution, a common question is, “Should I build or buy?” In the “build” option, the IT personnel would create a cloud pool by purchasing all the essential components, such as servers, HBAs, storage arrays, and fabric switches. Additionally, all software components—RAC, ASM, Grid Infrastructure, and Database—would have to be installed and validated. Although this approach allows IT the flexibility to pick and choose the appropriate components, it does increase mean time to deploy as well as the chances of misconfiguration.

With the “buy” approach, users can deploy Engineered Solutions or integrated solutions, which are pretested, prevalidated, and preconfigured. Both provide fast deployment and simplified management. Examples of integrated solutions include HP CloudSystem and vCloud by VCE. Engineered Solutions, such as Oracle’s Exadata, are not just integrated solutions, but the application software (the Oracle database, in this case) and the hardware are tightly and intelligently woven together.

Oracle Exadata Database Machine, Oracle Database Appliance, and Oracle SPARC SuperCluster are solutions designed to be optimal platforms for Oracle Database and therefore ideal platforms for Private Database Cloud computing. This chapter focuses on the ASM optimizations developed specifically for Oracle Exadata and Oracle Database Appliance. This chapter is not meant to be an exhaustive look at Exadata or ODA; many papers are available on Oracle Technology Network (OTN) that cover this topic.

Overview of Exadata

Oracle Exadata Database Machine includes all the hardware and software required for private cloud deployments. Exadata combines servers, storage, and networks into one engineered package, eliminating the difficult integration problems typically faced when building your own private cloud. Rather than going through the entire rationalization and standardization process, IT departments can simply implement Oracle Exadata Database Machine for database consolidation onto a private cloud.

Exadata Components

This section covers the important components of the Exadata system.

Compute Servers

Compute servers are the database servers that run the Grid Infrastructure stack (Clusterware and ASM) along with the Oracle Real Application Clusters stack. The compute servers behave like standard database servers, except that in Exadata they are linked with the libcell library, which allows the databases to communicate with the cellserv interface in the Exadata Storage Server.

Exadata Storage Server

In the Exadata X2 configuration, the Exadata Storage cells (Exadata Cells) are servers preconfigured with 2×6-core Intel Xeon L5640 processors, 24GB memory, 384GB of Exadata Smart Flash Cache, 12 disks connected to a storage controller with 512MB battery-backed cache, and dual-port InfiniBand connectivity. Oracle Enterprise Linux operating system (OS) is the base OS for Exadata Cells as well as for the compute nodes. All Exadata software comes preinstalled when delivered. The 12 disks can be either High Performance (HP) Serial Attached SCSI (SAS) disks that are 600GB 15,000 rpm or High Capacity (HC) SAS disks that are 3TB 7,200 rpm.

Each of the 12 disks represents a Cell Disk residing within an Exadata Storage cell. The Cell Disk is created automatically by the Exadata software when the physical disk is discovered. Cell Disks are logically partitioned into one or more Grid Disks. Grid Disks are the logical disk devices assigned to ASM as ASM disks. Figure 12-1 illustrates the relationship of Cell Disks to Grid Disks in a more comprehensive Exadata Storage grid.

FIGURE 12-1. Grid layout

Once the Cell Disks and Grid Disks are configured, ASM disk groups are defined across the Exadata configuration. When the data is loaded into the database, ASM will evenly distribute the data and I/O within disk groups. ASM mirroring is enabled for these disk groups to protect against disk failures.

Cellsrv (Cell Services), a primary component of the Exadata software, provides the majority of Exadata storage services and communicates with database instances on the database server using the iDB protocol. Cellsrv provides the advanced SQL offload capabilities, serves Oracle blocks when SQL offload processing is not possible, and implements the DBRM I/O resource management functionality to meter out I/O bandwidth to the various databases and consumer groups issuing I/O. Cellsrv maintains a file called griddisk.owners.dat, which has details such as the following:

ASM disk name

ASM disk group name

ASM failgroup name

Cluster identifier

When IORM is used, the IORM (I/O Resource Manager) manages the Exadata cell I/O resources on a per-cell basis. Whenever the I/O requests reach the cell’s disks threshold, IORM schedules I/O requests according to the configured resource plan. When the cell is operating below capacity, IORM does not queue I/O requests. IORM schedules I/Os by immediately queuing requests and issuing other I/O requests. IORM selects these I/Os to issue based on the resource plan’s allocations; databases and consumer groups with higher allocations are scheduled more frequently than those with lower allocations.

When IORM is enabled, it automatically manages background I/Os. Critical background I/Os such as log file syncs and control file reads and writes are prioritized. Databases with higher resource allocations are able to issue disk I/Os more rapidly. Resource allocation for workloads within a database is specified through the database resource plan. If no database resource plan is enabled, all user I/O requests from the database are treated equally. Background I/Os, however, are still prioritized automatically.

Two other components of Oracle software running in the cell are the Management Server (MS) and Restart Server (RS). The MS is the primary interface to administer, manage, and query the status of the Exadata cell. MS manageability, which is performed using the Exadata cell command-line interface (CLI) or EM Exadata plug-in, provides standalone Exadata cell management and configuration. For example, from the cell, CLI commands are issued to configure storage, query I/O statistics, and restart the cell. The distributed CLI can also be used to issue commands to multiple cells, which eases management across cells. Restart Server (RS) ensures ongoing functioning of the Exadata software and services. RS also ensures storage services are started and running, or services are restarted when required.

InfiniBand infrastructure

The Database Machine includes an InfiniBand interconnect between the compute nodes and Exadata Storage Server. The InfiniBand network was chosen to ensure sufficient network is in place to support the low-latency and high-bandwidth requirements. Each database server and Exadata cell has dual-port Quad Data Rate (QDR) InfiniBand connectivity for high availability. The same InfiniBand network also provides a high-performance cluster interconnect for the Oracle Database Real Application Clusters (RAC) nodes.

iDB Protocol

The database servers and Exadata Storage Server software communicate using the Intelligent Database protocol (iDB), which is implemented in the database kernel. iDB runs over InfiniBand and leverages ZDP (Zero-loss Zero-copy Datagram Protocol), a zero-copy implementation of the industry-standard Reliable Datagram Sockets (RDSv3) protocol. ZDP is used to minimize the number of data copies required to service I/O operations. The iDB protocol implements a function shipping architecture in addition to the traditional data block shipping provided by the database; for example, iDB is used to ship SQL operations down to the Exadata cells for execution and to return query result sets to the database kernel. This allows Exadata cells to return only the rows and columns that satisfy the SQL query, instead of returning the entire database blocks as in typical storage arrays. Exadata Storage Server operates like a traditional storage array when offload processing is not possible. But when feasible, the intelligence in the database kernel enables table scans to be passed down to execute on the Exadata Storage Server so only requested data is returned to the database server.

11gR2 Database Optimizations for Exadata

Oracle Database 11g Release 2 has been significantly enhanced to take advantage of Exadata storage. One of the unique things the Exadata storage does compared to traditional storage is return only the rows and columns that satisfy the database query rather than the entire table being queried. Exadata pushes SQL processing as close to the data (or disks) as possible and gets all the disks operating in parallel. This reduces CPU consumption on the database server, consumes much less bandwidth moving data between database servers and storage servers, and returns a query result set rather than entire tables. Eliminating data transfers and database server workload can greatly benefit data warehousing queries that traditionally become bandwidth and CPU constrained. Eliminating data transfers can also have a significant benefit on online transaction processing (OLTP) systems that often include large batch and report processing operations.

Exadata Storage Servers also run more complex operations in storage:

Join filtering

Incremental backup filtering

I/O prioritization

Storage indexing

Database-level security

Offloaded scans on encrypted data

Data mining model scoring

Smart file creation

The application is transparent to the database using Exadata. In fact, the exact same Oracle Database 11g Release 2 that runs on traditional systems runs on the Database Machine. Existing SQL statements, whether ad hoc or in packaged or custom applications, are unaffected and do not require any modification when Exadata storage is used. The offload processing and bandwidth advantages of the solution are delivered without any modification to the application.

ASM Optimizations for Exadata

ASM provides the same functionality in standard RAC clusters as in Exadata configurations. In Exadata, ASM redundancy (ASM mirroring) is used, where each Exadata cell is defined as a failure group. ASM automatically stripes the database data across Exadata cells and disks to ensure a balanced I/O load and optimum performance. The ASM mirroring in Exadata can be either normal or high redundancy, but Maximum Availability Architecture (MAA) best practices recommend high redundancy for higher resiliency.

ASM in Exadata automatically discovers grid disks presented by the Exadata Storage Server. The pathname for discovered Grid Disks has the format of o/cell-ip-address/griddisk-name.

The following is a sample listing of an Exadata Grid Disk discovered by ASM:

The o in the Grid Disk pathname indicates that it’s presented via libcell. Keep in mind that these disks are not standard block devices; therefore, they cannot be listed or manipulated with typical Linux commands such as fdisk and multipath.

The IP address in the pathname is the address of the cell on the InfiniBand storage network. The Grid Disk name is defined when the Grid Disk was provisioned in Exadata Storage. Alternatively, the name may be system generated, by concatenating an administrator-specified prefix to the name of the Cell Disk on which the Grid Disk resides. Note that all Grid Disks from the same cell use the same IP address in their pathname.

Disk Management Automation in Exadata

In Exadata, many of the manual ASM operations have been internally automated and several disk-partnering capabilities have been enhanced. ASM dynamic add and drop capability enables non-intrusive cell and disk allocation, deallocation, and reallocation.

The XDMG is a new background process in the ASM instance that monitors the cell storage for any state change. XDMG also handles requests from the cells to online, offline, or drop/add a disk based on an user event or a failure. For example, if a cell becomes inaccessible from a transient failure, or if a Grid Disk or Cell Disk in the cell is inactivated, then XDMG will automatically initiate an OFFLINE operation in the ASM instance.

The XDMG process works with the XDWK process, which also runs within the ASM instance. The XDWK process executes the ONLINE or DROP/ADD operation as requested by XDMG. The new processes in the ASM instance automatically handles storage reconfiguration after disk replacement, after cell reboots, or after cellsrv crashes.

Exadata Storage Server has the capability to proactively drop a disk if needed, and is effective for both true disk failures and predictive failures. When a disk fails, all I/Os to that disk will fail. The proactive disk drop feature will then automatically interrupt the existing drop operation (triggered by the prior disk predictive failure) and turn it into a disk drop force. This is to ensure that redundancy gets restored immediately without having to wait for the disk repair timer to kick in.

The following output displays the action taken by the XDWK process to drop a disk:

To list the condition of Exadata disks, the following cellcli commands can be run on the Exadata Storage Server:

In addition to the new ASM processes just mentioned, there is one master diskmon process and one slave diskmon process (dskm) for every Oracle database and ASM instance. The diskmon is responsible for the following:

Handling of storage cell failures and I/O fencing

Monitoring of Exadata Server state on all storage cells in the cluster (heartbeat)

Broadcasting intra-database IORM (I/O Resource Manager) plans from databases to storage cells

Monitoring of the control messages from the database and ASM instances to storage cells

Communicating with other diskmons in the cluster

The following output shows the diskmon and the dskm processes from the database and ASM instances:

Exadata ASM Specific Attributes

The CONTENT.TYPE attribute identifies the disk group type, which can be DATA, RECOVERY, or SYSTEM. The type value determines the distance to the nearest partner disk/failgroup. The default value is DATA, which specifies a distance of 1. The value of RECOVERY specifies a distance of 3, and the value of SYSTEM specifies a distance of 5. The primary objective of this attribute is to ensure that failure at a given Exadata Storage Server does take out all the disk groups configured on that RACK. By having different partners based on the content type, a failure of a disk/cell does not affect the same set of disk partners in all the disk group.

NOTE

For CONTENT.TYPE to be effective one needs to have a full RACK of the DB machine.

The CONTENT.TYPE attribute can be specified when you’re creating or altering a disk group. If this attribute is set or changed using ALTER DISKGROUP, then the new configuration does not take effect until a disk group rebalance is explicitly run.

The CONTENT.TYPE attribute is only valid for disk groups that are set to NORMAL or HIGH redundancy. The COMPATIBLE.ASM attribute must be set to 11.2.0.3 or higher to enable the CONTENT.TYPE attribute for the disk group.

ODA Overview

The Oracle Database Appliance (ODA) is a fully integrated system of software, servers, storage, and networking in a single chassis. Like Exadata, Oracle Database Appliance is not just preconfigured; it is prebuilt, preconfigured, pretested, and pretuned. This configuration is an ideal choice for a pay-as-you-grow Private Database Cloud. Customers can continually consolidate and enable CPU capacity as needed.

An appliance like ODA reduces the time in procuring individual components and completely minimizes the effort required to set up an optimal configuration. The OS, network components, SAN, storage redundancy, multipathing, and more, all become configured as part the ODA system enablement.

ODA Components

Oracle Database Appliance is made up of building blocks similar to Exadata. These can be broken down into two buckets: hardware and software. Each component is built and tested together to provide maximum availability and performance.

Software Stack

ODA comes with preinstalled Oracle Unbreakable Linux and Oracle Appliance Manager (OAK) software. The OAK software provides one-button automation for the entire database stack, which simplifies and automates the manual tasks typically associated with installing, patching, managing, and supporting Oracle database environments.

Hardware

The Oracle Database Appliance is a four-rack unit (RU) server appliance that consists of two server nodes and twenty-four 3.5 SAS/SSD disk slots.

Each Oracle Database Appliance system contains two redundant 2U form factor server nodes (system controllers SC0 and SC1).

Each server node plugs into the Oracle Database Appliance chassis and operates independently of the other. A failure on one server node does not impact the other node. The surviving node uses cluster failover event management (via Oracle Clusterware) to prevent complete service interruption. To support a redundant cluster, each server node module contains a dual-port Ethernet controller internally connected between the two server node modules through the disk midplane. This internal connection eliminates the need for external cables, thus making ODA a self-contained database appliance.

Each server contains two CPU sockets for the Intel Xeon Processer X5675 CPUs, providing up to 12 enabled-on-demand processor cores and 96GB of memory. On each ODA node are two dual-ported LSI SAS controllers. They are each connected to an SAS expander that is located on the system board. Each of these SAS expanders connects to 12 of the hard disks on the front of the ODA. Figure 12-2 shows the detailed layout of disk to expander relationship. The disks are dual-ported SAS, so that each disk is connected to an expander on each of the system controllers (SCs). The Oracle Database Appliance contains twenty 600GB SAS hard disk drives that are shared between the two nodes.

FIGURE 12-2. ODA disk and expander layout

ASM and Storage Notice that expander-to-disk connectivity is arranged such that Expander-0 from both nodes connects to disks in columns 1 and 2, whereas Expander-1 from both nodes connects to disks in columns 3 and 4.

ASM high redundancy is used to provide triple-mirroring across these devices for highly available shared storage. This appliance also contains four 73GB SAS solid state drives for redo logs, triple mirrored to protect the Oracle database in case of instance failure. Each disk is partitioned into two slices: p1 and p2. The Oracle Linux Device Mapper utility is used to provide multipathing for all disks in the appliance.

Brief Overview on Storage Layout (Slots/Columns) The ASM storage layout includes the

following:

ASM Diskgroup +DATA size 1.6TB (high redundancy)

ASM Diskgroup +RECO size 2.4TB (high redundancy) or size 0.8TB (high redundancy with external storage for backups)

ASM Diskgroup +REDO size 97.3GB (high redundancy)

The following output displays the disk naming and mapping to the ASM disk group. Note that the output has been shortened for brevity. ODA uses a specific naming convention to identify the disks. For example, disk HDD_E0_S05_971436927p2 indicates that it resides inside Expander 0 (E0) and slot 5 (S05).

ASM Optimizations for ODA

The Oracle Appliance Manager (OAK), in conjunction with ASM, automatically configures, manages, and monitors disks for performance and availability. Additionally, OAK provides alerts on performance and availability events as well as automatically configures replacement drives in case of a hard disk failure.

The Storage Management Module feature has the following capabilities:

It takes corrective action on appropriate events.

It interacts with ASM for complete automation.

Oracle Applicance Manager daemon (OAKd) monitors the physical state of disks.

Monitors disk status in ASM.

Based on events interacts with ASM for corrective actions.

ASM takes actions as directed by OAKd.

OAK tracks the configuration and layout of all storage devices in the system. If a storage device fails, it detects the failure and sends an alert by e-mail. When an alert is received, users have the option to remove the failed disk and replace it. When OAK detects a disk has been replaced, it verifies the disk size and other characteristics, such the firmware level. OAK then rebuilds the partition table on the new disk to match the table on the failed disk. Because ODA disks map the disk slot exactly to an expander group and failgroup, the disk is added back to the appropriate ASM disk group without intervention from the user. This mitigates incorrect ASM disk adds.

The oakcli command can be used to display the status of all disks in the engineered package. This is shown here using the oakcli show disk command:

The oakcli command can also be used to display the status of a specific disk. This is shown here using the oakcli show disk <disk_name> command:

ODA and NFS

Optionally, customers can use NFS storage as Tier 3 external storage and can be connected using one of the 10GB cards inside ODA. This NFS storage can be used to offload read-mostly or archived datasets. Oracle recommends using ZFS Storage Appliance ZFSSSA storage or other NFS appliance hardware. If the NFS appliance contains read-only datasets, its recommended to convert these data files to read-only using

and then set the init.ora parameter to READ_ONLY_OPEN_DELAYED=TRUE. This will improve database availability in case the NFS appliance becomes unavailable. Note that having NFS-based files presented to ASM as disks is neither recommended nor supported.

Summary

Oracle Exadata Database Machine, Oracle Database Appliance, and Oracle SPARC SuperCluster are excellent Private Cloud Database consolidation platforms. These solutions provide preintegrated configurations of hardware and software components engineered to work together and optimized for different types of database workloads. They also eliminate the complexity of deploying a high-performance database system. Engineered systems such as ODA and Exadata are tested in the factory and delivered ready to run. Because all database machines are the same, their characteristics and operations are well known and understood by Oracle field engineers and support. Each customer will not need to diagnose and resolve unique issues that only occur on their configuration. Performance tuning and stress testing performed at Oracle are done on the exact same configuration that the customer has, thus ensuring better performance and higher quality. Further, the fact that the hardware and software configuration and deployment, maintenance, troubleshooting, and diagnostics processes are prevalidated greatly simplifies the operation of the system, significantly reduces the risks, and lowers the overall cost while producing a highly reliable, predictable database environment. The ease and completeness of the predictive, proactive, and reactive management processes possible on Oracle Database Appliance are simply nonexistent in other environments.

Applications do not need to be certified against engineered systems. Applications that are certified with Oracle Database 11.2 RAC will run against the engineered systems. Choosing the best platform for your organization will be one of the key milestones in your roadmap to realizing the benefits of cloud computing for your private databases.

ASM Tools and Utilities

Many tools can be used to manage ASM, such as SQL*Plus, ASMCMD, and Enterprise Manager (EM). In Oracle Clusterware 11gR2, these tools were enhanced and several new tools were introduced to manage ASM and its storage. These tools and utilities can be broken down into two categories: fully functional ASM management and standalone utilities.

Fully functional ASM management:

ASMCA

ASMCMD

Enterprise Manager

SQL*Plus

Standalone utilities:

Renamedg

ASRU

KFOD

AMDU

This chapter focuses on ASMCA, ASMCMD, and the standalone utilities.

ASMCA

ASMCA is a multipurpose utility and configuration assistant like DBCA or NETCA. ASMCA is integrated and invoked within the Oracle Universal Installer (OUI). It can be used as a tool to upgrade ASM or run as a standalone configuration tool to manage ASM instances, ASM disk groups, and ACFS.

ASMCA is invoked by running $GI_HOME/asmca. Prior to running asmca, ensure that the ORACLE_SID for ASM is set appropriately.

The following example illustrates ASMCA usage:

This illustration shows the ASM Instances tab, where ASM instances can be started, stopped, and upgraded. ASMCA uses a combination of SQL*Plus and Clusterware commands to configure the ASM instance. By default, the ASM server parameter file (SPFILE) is always stored in the disk group, allowing the instance to bootstrap on its own.

The Disk Groups tab of ASMCA, as shown in Figure 13-1, allows the user to configure ASM disk groups based on the availability requirements of the deployment. The Disk Groups tab allows the user to modify disk group attributes.

FIGURE 13-1. The Disk Groups tab

If the user wants to create a new disk groups then ASMCA allows disk group creations as shown in Figure 13-2. The default discovery string used to populate the candidate disks is either obtained by what the server uses or the OS specific default is used. As part of the disk group creation the user get to choose the redundancy of the disk group.

FIGURE 13-2. How to create disk groups using ASMCA

Finally Figure 13-3 shows the list of disk groups that were created and are available for users to either create database or ACFS file systems on.

FIGURE 13-3. A list of the disk groups created

To leverage additional configuration menu options, users can right-click any of the listed disk groups. The menu options enable you to perform the following actions:

Add disks to the disk group

Edit the disk group attributes

Manage templates for the disk group

Create an ACFS-based database home on the selected disk group

Dismount and mount the disk group, either locally or globally on all nodes of the cluster

Drop the disk group

Using ASMCA to manage and configure ACFS and ADVM is covered in Chapter 11. Although we showed ASMCA usage via GUI access, ASMCA can also be used in command-line mode as well. The command line provides opportunities to script ASM configuration. The following example illustrates ASMCA command-line usage by creating an ASM disk group:

ASMCMD

ASMCMD was first introduced in 10gR2 with a basic command set for managing ASM files; however, in 11gR2 the asmcmd command has been expanded to fully manage ASM. In previous versions, fully managing ASM was only possible via EM and SQL*Plus. This section only focuses on the key new commands introduced in 11gR2. For the complete reference of the ASMCMD command set, refer to the Oracle Storage Administrator’s Guide.

ASMCMD, like ASMCA, needs to have the ORACLE_SID set to the ASM instance SID. In Oracle Clusterware 11gR2, the “asmcmd -a” flag has been deprecated; in its place, the ASM privilege must be set. ASMCMD can be used in interactive mode or noninteractive (batch) mode.

The new capabilities found in 11.2 ASMCMD include ASM instance management, disk group and disk management, ASM file management, and ACFS/ADVM management. ASMCMD is profiled throughout this book, please refer to the specific chapter for appropriate ASMCMD command usage.

ASM instance management includes the following capabilities:

Starting and stopping ASM instances

Modifying and listing ASM disk strings

Creating/modifying/removing ASM users and groups

Backing up and restoring the SP file

Adding/removing/modifying/listing users from the password file

Backing up and restoring the metadata of disk groups

Displaying connected clients

Disk group and disk management includes the following capabilities:

Showing directory space utilization using the Linux/Unix-like command du

Mounting/dismounting/creating/altering/dropping disk groups

Rebalancing the disk groups

Displaying disk group attribute

Onlining or offlining the disks/failure groups

Repairing physical blocks

Displaying disk I/O statistic

ASM file management includes the following capabilities:

Managing ASM directories, templates, and aliases

Copying the files between the disk groups and OS

Adding/removing/modifying/listing the templates

Managing and modifying the ASM file ACLs

Finally, ACFS/ADVM management includes the capability to create, delete, enable, and list the ADVM volumes.

Renamedg

In many cases, users may need to rename disk groups. This could include cloned disk groups that will used be on different hosts for test environments or snapshots of disk groups that are to be mounted on the same host. Oracle Clusterware 11gR2 introduces a command that provides this capability: the renamedg command. The renamedg command can be executed using a single phase or two phases. Phase one generates a configuration file to be used in phase two, and phase two uses the configuration file to perform the renaming of the disk group.

The following example shows the steps in renaming a disk group:

1. Before the disk group can be renamed, it must first be dismounted. If databases or ACFS file systems are using this disk group, they must be unmounted before the renamedg is executed. Note that this must be done on all nodes on a RAC configuration. For RAC configurations, it’s best to use ASMCA to globally dismount the disk group. Additionally, if the ASM spfile exists in this disk group, it must be moved to another location or disk group. The asmcmd lsof command can be used to determine whether any open files exist in the disk group before a dismount is attempted.

For example: Use asmcmd to dismount the diskgroup:

2. Verify that the desired disk group was dismounted:

3. Rename the disk group:

The renamedg command can also be run in dry-run mode. This may be useful to verify the execution. This check mode verifies all the disks can be discovered and that the disk group can be successfully renamed. The following example shows the check option of the renamedg command:

Once the check has verified appropriate behavior, we can run the actual renamedg command:

4. Once the renamedg command completes, we can remount the disk group. Mounting the disk group inherently validates the disk group header:

This disk group name change is also reflected in the CRS resource status automatically:

Although the renamedg command updates the CRS resource with the new disk group name, the old CRS disk group resource still exists within CRS. In our example the DATA_NISHA disk group is still listed as a CRS resource, although it is in the offline state:

To remove this defunct CRS resource, users should run the srvctl remove –g <diskgroup> command (note that users should not use the crsctl delete resource ora.<diskgroup>.dg command because it is unsupported):

The renamedg command currently does not rename the disks that belong to the disk group. In our example, the original disk group was named DATA_NISHA, so all underlying member disk names started with DATA_NISHA by default. After the renamedg command is run, the disk group is renamed to DATA_ISHAN01; however, the disks are still named DATA_NISHA. This should not have any operational impact. As reported in the example below:

The renamedg command does not rename the pathname references for the data files that exist within that disk group. To rename the data files appropriately and have this reflected in the database control file, we can use the following database SQL to generate a data file rename script:

Renaming disk groups that contain the Clusterware files requires more careful planning. The following steps illustrate this procedure. (Note that the Clusterware files must be relocated to another disk group.)

1. Back up the OCR manually using either of the following commands:

2. Create a temporary disk group, like so:

We can use a shared file system instead of a temporary disk group.

3. Confirm that Grid Infrastructure on all nodes is active:

4. Move the OCR, the voting files, and the ASM spfile to a temporary disk group:

We can use a shared file system instead of a temporary disk group as the destination for clusterware files as well as ASM spfile.

5. Restart the Grid Infrastructure on all nodes and confirm that it is active on all nodes:

6. Rename the disk group, like so:

7. Mount the disk group:

The diskgroup (.dg) resource is registered automatically.

8. Remove the old disk group (.dg) resource:

9. Move the OCR, the voting files, and the ASM spfile to <new_name_dg>:

10. Restart the Grid Infrastructure on all nodes and confirm that it is active on all nodes:

Although the renamedg command was introduced as part of Oracle Clusterware 11gR2, in many cases a disk group in a 10g or 11.1 environment needs to be renamed. You can use this tool to rename your 10g or 11gR1 ASM disk group.

In these cases, the 11gR2 stack, or more specifically Oracle Restart (Oracle Grid Infrastructure for Single Instance), needs to be installed on the server(s) where the disk group operation will be performed. It is not necessary to install 11gR2 RAC for this. Also, it is not necessary to start the Oracle Restart stack (you simply need the dormant software installation for the renamedg command). Here are the steps to follow:

1. Install 11.2.0.x Oracle Restart (Oracle Grid Infrastructure for Single Instance).

2. Unmount the disk group that will be renamed.

3. Run renamedg from the Oracle Restart home.

4. Use the renamedg tool to rename the 10g or 11gR1 disk group.

5. Optionally, uninstall the Oracle Restart software stack. If frequent disk group renames will be needed, this step is not recommended.

6. Mount the disk group.

NOTE

The disk group cannot be renamed when it contains offline disks.

ASM Storage Reclamation Utility (ASRU)

Storage costs—both in administrative overhead and capital expenses—are growing concerns for most enterprises. Storage vendors have introduced many features to reduce the acquisition cost side of storage. One such feature is thin provisioning, which is a feature common to many storage arrays. Thin provisioning enables on-demand allocation rather than up-front allocation of physical storage. This storage feature reduces unused space and improves storage utilization. Deploying Oracle databases with cost-effective thin provisioned storage is an ideal way to achieve high storage efficiency and dramatic storage capacity savings. By boosting storage utilization, thin provisioning drives savings in purchased capacity, associated power, and cooling costs. Although ASRU can be used against storage vendor providing thin provisioning, this section illustrates the ASRU feature usage against 3Par’s storage to provide context.

The Oracle ASRU feature offers the ability to improve storage efficiency for Oracle Database 10g and 11g environments by reclaiming unused (but allocated) ASM disk space in thin provisioned environments.

Overview of ASRU Operation

Two key features allow thin provision storage reclamation:

Oracle ASRU compacts the ASM disks, writes zeros to the free space, and resizes the ASM disks to the original size with a single command, online and without disruption.

3Par Thin Persistence software detects zero writes and eliminates the capacity associated with free space in thin provisioned volumes—simply, quickly, and without disruption. 3Par Thin Persistence leverages the unique, built-in, zero-detection capabilities.

Oracle ASM Storage Reclamation Utility (ASRU) is a standalone utility used to reclaim storage in an ASM disk group that was previously allocated but is no longer in use. ASRU accepts the name of the disk group for which space should be reclaimed. When executed, it writes blocks of zeros to regions on ASM disks where space is currently unallocated. The storage array, using the zero-detect capability of the array, will detect these zero blocks and reclaim any corresponding physical storage.

The ASM administrator invokes the ASRU utility, which operates in three phases:

Compaction phase In this phase, ASRU logically resizes the disks downward such that the amount of space in the disk group is at the allocated amount of file space in the disk group, plus a reserve capacity. The default value for the reserve amount is 25 percent; however, the reserve value is a tunable option in the utility. The resize operation of the disks is logical to ASM and has no effect on the physical disks. The effect of the resize operation is that file data in the ASM disk group is compressed near the beginning of the disks, which is accomplished by an ASM rebalance of the disk group. The utility uses the appropriate ASM V$ views to determine the current allocated size of the disk group. The next phase does not begin until the ASM rebalance for the disk group has completed and has been verified as complete.

Deallocation phase During this phase, ASRU writes zeros above the region where the ASM disks have been resized. The ASRU utility invokes another script called zerofill that does the writing of zeros. It is during this deallocation phase that the zero-detect algorithm within the 3Par Thin Engine will return the freed storage blocks to the free storage pool.

Expansion phase In the final phase, all the ASM disks will be resized to their original size as determined when ASRU was started. This resize operation is a logical resize of the disks with respect to ASM and does not result in a reorganization of file data in the disk group.

When to Use ASRU to Reclaim Storage

Storage reclamation should be considered for the following database storage events:

Dropping one or more databases in an ASM disk group.

Dropping one or more tablespaces.

Adding new LUNs to an ASM disk group to replace old LUNs. This triggers an ASM rebalance to move a subset of the data from the old LUNs to the new LUNs. The storage released from the old volumes is a candidate for reclamation.

To determine whether storage reclamation will be beneficial after one of these operations, it is important to consider the effect of the reserve maintained by ASRU when the utility reduces the size of the disk group during the compaction phase. The temporarily reduced size is equal to the allocated space plus a reserve, which allows active databases to grow during the reclamation process; the default reserve is 25 percent of the allocated storage. Storage reclamation is likely to be beneficial if the amount of allocated physical storage significantly exceeds the amount of storage allocated within ASM plus the reserve.

The amount of physical storage allocated on a 3Par InServ array can be determined using the 3Par InForm operating system’s showvv command, available from the InForm command-line interface (CLI), to show information about the virtual volumes (VVs) used by ASM. Here is the standard way of using this command to obtain information related to the effectiveness of thin provisioning for a group of volumes matching oel5.*:

The –s option produces voluminous output. Therefore, to make the output easier to understand, we will use more complex options that show just the data columns that are directly relevant to thin provisioning:

The Usr_Used_MB column indicates how many megabytes are actually allocated to user data. In this example, 825,770MB of storage within ASM’s volumes has been written.

ASM’s view of how much storage is in use can be determined with a SQL query:

This example shows 197,986MB of free storage out of 1,023,984MB available, or about 19.3 percent. The difference between these quantities—825,998MB—is how much storage within ASM is in use (that is, has actual written data).

Using ASRU to Reclaim Storage on 3Par: Use Cases

To illustrate storage reclamation using Oracle ASRU and the 3Par InServ Storage Server, a 1TB ASM disk group was created using four 250GB thin provisioned virtual volumes (TPVVs) on the 3Par InServ array. Zero detection is enabled for the volumes from the InForm CLI, as detailed in the following use cases.

Use Case #1

This use case involves reclaiming storage space after dropping a database/data file. The steps involve creating a DB named “yoda,” creating a 15GB data file tablespace and filling it with data, and then dropping the data file and reclaiming the space via ASRU.

1. Create an ASM disk group and create a database using DBCA. Now use the showvv command to view space consumed:

2. Create a 15GB tablespace (data file) called par3_test. Create table BH in the par3_test tablespace and fill it with test data:

3. Drop tablespace par3_test and then reclaim the space using ASRU:

4. Recheck space usage in 3Par:

Use Case #2

In this use case, a new database is created along with a (single) new tablespace (15GB). Data is then seeded in it. Then the table is truncated and a check is done to determine whether ASRU can reclaim space from this space transaction.

1. Create the table:

2. Load the data:

3. Truncate the table:

4. Recheck space usage in 3Par:

5. No space was reclaimed, as expected.

NOTE

Typically DBAs will drop database segments (tables, indexes, and so on) or truncate tables in order to reclaim space. Although this operation does reclaim space back to the tablespace (and to the overall database), it does not release physical space back to the storage array. In order to reclaim physical space back to the storage array, a physical data file (or files) must be dropped or shrunk.

6. Now shrink the data file (since space HWM was already done via truncate):

7. Recheck space usage in 3Par:

KFOD

This section describes the purpose and function of the KFOD utility. KFOD is used to probe the system for disks that can be used for ASM. Note that KFOD does not perform any unique discovery; in other words, it uses the same discovery mechanism invoked by the ASM instance. KFOD is also invoked within the Oracle Universal Installer (OUI) for Grid Infrastructure stack installation. This section is not intended to be an exhaustive command reference; you can find that information in the Oracle Storage Administrator’s Guide. Instead, we will cover the most important and relevant features and commands.

Recall from Chapter 4 that ASM stamps every disk that it adds to a disk group with a disk header. Therefore, when KFOD lists candidate disks, these are all disks that do not include the disk header. KFOD can list “true” candidate disks if the keyword status=TRUE is specified. True candidate disks are ones identified with the header status of CANDIDATE, FOREIGN, or FORMER. Note that disks with a header status of CANDIDATE include true candidate disks as well as FORMER disks (that is, disks that were previously part of an ASM diskgroup). KFOD does not distinguish between the two.

KFOD can display MEMBER disks if disks=all or disks=asm is specified. KFOD will include the disk group name if dscvgroup=TRUE is specified. KFOD will also discover and list Exadata grid disks.

You can specify the name of the disk group to be discovered by KFOD. KFOD lists all disks that are part of this disk group. At most one disk group name can be specified at a time. This command is valid only if an ASM instance is active.

For disk discovery, KFOD can use the default parameters, command-line options, or read options from a pfile. Specifying a pfile allows users to determine what disks would be available to an ASM instance using a configured pfile. If parameters and a pfile are specified, the pfile is not read; otherwise, if a pfile is specified, the parameters are taken from it.

As of 11g, KFOD supports a clustered environment. This means that KFOD is aware of all ASM instances currently running in the cluster and is able to get information about disk groups and ASM clients pertaining to all instances in the cluster. The “hostlist” parameter can be used to filter the output for the specified node in the cluster. 11gR2 ASM also introduces a display format, which is invoked using the cluster=true keyword. The default is to run in noncluster mode for backward compatibility. The next set of examples shows KFOD usage.

Here’s how to list all active ASM instances in the ASM cluster:

Here’s how to display the client databases accessing the local ASM instance:

KFOD can be used to display the total megabytes of metadata required for a disk group, which it calculates using numbers of disks, clients, nodes, and so on, provided on the command line or by default. The metadata parameters can be overridden to perform what-if scenarios.

Here’s how to display the total megabytes of metadata required for a disk group with specified parameters:

AMDU

The ASM Metadata Dump Utility (AMDU) is part of the Oracle Grid Infrastructure distribution. AMDU is used to extract the available metadata from one or more ASM disks and generate formatted output of individual blocks.

AMDU also has the ability to extract one or more files from an unmounted disk group and write them to the OS file system. This dump output can be shipped to Oracle Support for analysis. Oracle Support can use the dump output to generate formatted block printouts. AMDU does not require the disk group to be mounted or the ASM instance to be active.

AMDU performs three basic functions. A given execution of AMDU may perform one, two, or all three of these functions:

Dump metadata from ASM disks to the OS file system for later analysis.

Extract the contents of an ASM file and write it to an OS file system even if the disk group is not mounted.

Print metadata blocks.

The AMDU input data may be the contents of the ASM disks or ingested from a directory created by a previous run of AMDU:

AMDU produces four types of output files:

Extracted files One extracted file is created for every file listed under the -extract option on the command line.

Image files Image files contain block images from the ASM disks. This is the raw data that is copied from the disks.

Map files Map files are ASCII files that describe the data in the image files for a particular disk group.

Report file One report file is generated for every run of the utility without the -directory option (except if -noreport is specified).

In this first example, we will use AMDU to extract a database control file. The disk group is still mounted and we’ll extract one of the control files for a database named ISHAN.

1. Determine the ASM disk string:

2. Determine the location of all the control files:

In this example, we have a single copy of the control file in the disk group DATA.

3. Determine the disks for the DATA disk group:

4. Extract the control file out of the disk group DATA onto the file system. Here are the options used:

-diskstring This is either the full path to disk devices or the value of the ASM_DISKSTRING parameter.

-extract The disk group name, followed by a period, followed by the ASM file number.

-output The output file name (in the current directory).

-noreport Indicates not to generate the AMDU run report.

-nodir Indicates not to create the dump directory.

In this second example, we extract a data file when the disk group is not mounted using AMDU. The objective is to extract a single data file, named something like USERS, from the disk group DATA, which is dismounted. This will require us to dump all metadata for the disk group DATA.

Report.txt contains information about the server, the amdu command, the options used, a list of disks that are members of the disk group DATA, and information about the allocation units (AUs) on those disks. Let’s review the contents of the report file:

The file DATA.map contains the data map. The following shows a sampling of DATA.map that was generated:

Of immediate interest are fields starting with A and F. The field A0000421, for example, indicates that this line is for allocation unit (AU) 421, and the field F00000259 indicates that this line is about ASM file 259.

ASM metadata file 6 is the alias directory, so that is the first place to look. From DATA.map, we can work out AUs for ASM file 6:

This single line in the map indicates that all the file aliases fit in a single AU; in other words, there are not many files in this disk group. If the output listed multiple lines from the grep command, this would reflect that many ASM files exists in this disk group.

From the preceding grep output, the alias directory seems to be in allocation unit 10 (A00000010) on disk 2 (D0002). From report.txt, we know that disk 2 is /dev/xvde1 and that the AU size is 1MB. Let’s have a look at the alias directory. You can use kfed for this:

KFBTYP_ALIASDIR indicates that this is the alias directory. Now look for a data file named USERS:

This gives us the following output:

The USERS tablespace is ASM file 259. Now extract the file:

These steps can be repeated for the System and Sysaux data files as well as the control files. Then these can used to open the database, or then plug the extracted files into another database and recover.

It is important to note that although the amdu command will extract the file, the file itself may be corrupt or damaged in some way. After all, there is a reason for the disk group not mounting—chances are the ASM metadata is corrupt or missing, but that can be the case with the data file as well. The point is that there’s no substitute for a backup, so keep that in mind.

Summary

Various tools can be used to manage ASM. These tools and utilities handle a range of tasks—from managing daily ASM activities, to assisting in the recovery of ASM files and renaming ASM disk groups. As a best practice, the ASMCMD utility or Enterprise Manager should be used to manage ASM.

14 Oracle 12c ASM: A New Frontier

When Automatic Storage Management (ASM) was introduced in 10gR1, it was simply marketed as the volume manager for the Oracle database. ASM was designed as a purpose-built host-based volume management and file system that is integrated with the Oracle database. These simple ideas delivered a powerful solution that eliminates many headaches DBAs and storage administrators once had with managing storage in an Oracle environment.

However, ASM has now become an integral part of the enterprise stack. ASM is not only a significant part of the Oracle Clusterware stack, but is also a core component of Engineered Systems such as Exadata and ODA. Oracle 12c was announced in 2013, and along with this 12c release came significant changes for ASM. This chapter covers some of the key management and high availability features introduced in 12c ASM. You will get a glimpse of these advancements, the history behind the new features, and why these features are a necessary part of the future of ASM.

The main theme of 12c ASM is extreme scalability and management of real-world data types. In addition, it removes many of the limitations of previous ASM generations. This chapter previews some of the key features of ASM and cloud storage in Oracle 12c. Note that this chapter does not provide an exhaustive overview of the new features, just the key features and optimizations. This chapter was written using the Beta2 version, so the examples and command syntax may be different from the production 12c version.

Password Files in ASM

In releases prior to Oracle 12c, most of the Oracle database and ASM-related files could be stored in ASM disk groups. The key exception was the oracle password file—neither the ASM and database password files could be stored in a disk group. These password files, created by orapwd utility, resided in the $ORACLE_HOME/dbs directory by default and therefore were local to the node and instance. This required manual synchronization of the password file. If the password file became out of sync between instances, it could cause inconsistent login behavior. Although Oracle 11gR2 provided the capability for cross-instance calls (CIC) to synchronize the password file, if an instance or node was inactive, synchronization was not possible, thus still leaving the password file inconsistent. Inconsistent ASM password file is more problematic for ASM instances because ASM does not have a data dictionary to fall back on when the file system–based password file was inconsistent.

In Oracle 12c (for new installations), the default location of the password file is in an ASM disk group. The location of the password file becomes a CRS resource attribute of the ASM and database instance. The ASM instance and the disk group that is storing the password file needs to be available before password file authentication is possible. The SYSASM or SYDBA privilege can be used for the password file in ASM.

For the ASM instance, operating system authentication is performed to bootstrap the startup of the ASM instance. This is transparently handled as part of the Grid Infrastructure startup sequence. As in previous releases, the SYSASM privilege is required to create the ASM password file.

Note that the compatible.asm disk group attribute must be set to 12.1 or later to enable storage of shared password files in an ASM disk group.

The following illustrates how to set up a password in ASM:

Database password file:

1. Create a password file:

2. Move the existing password into ASM:

ASM password file:

Create an ASM password file. Note the asm=y option to distinguish this creation from regular password file creation.

Disk Management and Rebalance New Features

In 12c there are several new features that improve disk management functions—specifically improved availability from transient disk or failgroup failures. This section covers these key disk management features.

Fast Disk Resync and Checkpoints

The 11g disk online feature provides the capability to online and resync disks that have incurred transient failures. Note that this feature is applicable only to ASM disk groups that use ASM redundancy.

The resync operation updates the ASM extents that were modified while the disk or disks were offline. However, prior to Oracle 12c, this feature was single threaded; i.e., using a single online process thread to bring the disk(s) completely online. For disks that have been offline for a prolonged period of time, combined with a large number of extent changes, could make the disk resync operation very long. In Oracle 12c, the online and resync operation becomes a multi-threaded operation very similar to the ASM rebalance operation.

Thus the disk online can leverage a power level from 1 to 1024, with 1 being the default. This power level controls how many outstanding IOs will be issued to the IO subsystem, and thus has a direct impact on the performance of the system. Keep in mind that you are still bounded by the server’s IO subsystem layer, thus setting a very large power level does not necessarily improve resync time. This is because a server where the resync operation is submitted can only process a certain number of IO. A power level between 8 and 16 has proven beneficial for resyncing a single disk, whereas a power level of 8–32 has proven useful for bringing a failure group (with multiple disks) online.

In versions prior to Oracle 12c, the resync operation sets and clears flags (in Staleness Registry) at the beginning and end of the resync operation; an interrupted resync operation would need to be started from the beginning since the stale extent bit flags are cleared at the end of the resync operation. In 12c ASM, resync operations now support checkpoints. These checkpoints are now set after a batch of extents are updated and their stale extent flags cleared, thus making auto-restart begin at the last checkpoint. If the resync operation fails or gets interrupted, it is automatically restarted from the last resync phase and uses internally generated resync checkpoints.

The following illustrates the command usage for performing an ASM fast disk resync:

Fast Disk Replacement

In versions 11gR2 and prior, a failed disk is taken offline or dropped, a new disk put in its place (generally in the same tray slot), and then this disk is added back into the ASM disk group. This procedure required a complete disk group rebalance. In Oracle 12c, the Fast Disk Replacement feature allows a failed disk (or disks) to be replaced without requiring a complete disk group rebalance operation. With the Fast Disk Replacement feature, the disk is replaced in the disk tray slot and then added back into the ASM disk group as a replacement disk. Initially this disk is in an offline state and resynced (populated) with copies of ASM extents from mirror extents from its partners. Note that because this is a replacement disk, it inherits the same disk name and is automatically placed back into the same failure group. The key benefit of the Fast Disk Replacement feature is that it allows ASM administrators to replace a disk using a fast, efficient, atomic operation with minimal system impact because no disk group reorganization is necessary.

The main difference between Fast Disk Resync and Fast Disk Replacement is that the disk has failed and is implicitly dropped in Fast Disk Replacement, whereas in Fast Disk Resync the disk is temporarily offline due to a transient path or component failure. If the disk repair timer expires before the replacement disk can be put in place, then users would have to use the regular disk add command to add the replacement disk to the disk group.

The following illustrates the command for performing ASM Fast Disk Replacement:

Failure Group Repair Timer

When an individual disk fails, the failure is often terminal and the disk must be replaced. When all the disks in a failure group fail simultaneously, it is unlikely that all the disks individually failed at the same time. Rather, it is more likely that some transient issue caused the failure. For example, a failure group could fail because of a storage network outage. Because failure group outages are more likely to be transient in nature, and because replacing all the disks in a failure group is a far more expensive operation than replacing a single disk, it makes sense for failure groups to have a larger repair time to ensure that all the disks don’t get dropped automatically in the event of a failure group outage. Administrators can now specify a failure group repair time similar to the 11g disk repair timer. This includes a new disk group attribute called failgroup_repair_time. The default setting is 24 hours.

Rebalance Time Estimations

In Oracle 12c, the different phases of the ASM rebalance operation are itemized with time estimations. In versions prior to Oracle Database 12, the rebalance work estimates were highly variable.

With Oracle Database 12c, a more detailed and accurate work plan is created at the beginning of each rebalance operation. Additionally, administrators can produce a work plan estimate before actually performing a rebalance operation, allowing administrators to better plan storage changes and predict impact.

In Oracle Database 12c, administrators can now use the new ESTIMATE WORK command to generate the work plan. This work estimate populates the V$ASM_ESTIMATE view, and the EST_WORK column can be used to estimate the number of ASM extents units that will be moved by the operation.

It is important to note that the unit in the V$ASM_ESTIMATE view is ASM extents, and this does not provide an explicit time estimate, such as the one provided in V$ASM_OPERATION.

The time estimate in V$ASM_OPERATION is based on the current work rate observed during execution of the operation. Because the current work rate can vary considerably, due to variations in the overall system workload, administrators should use knowledge of their environment and workload patterns to convert the data in V$ASM_ESTIMATE into a time estimate if required.

The first step is generating a work estimate for the disk group rebalance operation:

Now, the work plan estimate that’s generated can be viewed:

File Priority Rebalance

When a disk fails and no replacement is available, the rebalance operation redistributes the data across the remaining available disks in order to quickly restore redundancy.

With Oracle Database 12c, ASM implements file priority ordered rebalance, which provides priority-based restoration of the redundancy of critical files, such as control files and online redo log files, to ensure that they are protected if a secondary failure occurs soon afterward.

Flex ASM

In releases prior to 12c, an ASM instance ran on every node in a cluster, and the databases communicated via this local ASM instance for storage access. Furthermore, the ASM instances communicated with each other and presented shared disk groups to the database clients running in that cluster. This collection of ASM instances form what is known as an ASM cluster domain.

Although this ASM architecture has been the standard since the inception of ASM, it does have some drawbacks:

Database instances are dependent on a node-specific ASM instance. Thus, if an ASM instance fails, all the database instances on that server fail as well. Additionally, as the ASM cluster size grows, the number of ASM instances grows and the communication overhead associated with managing the storage increases.

ASM overhead scales with the size of the cluster, and cluster reconfiguration events increase with the number of servers in a cluster. From an ASM perspective, larger clusters mean more frequent reconfiguration events. A reconfiguration event is when a server enters or departs a cluster configuration. From a cluster management perspective, reconfiguration is a relatively expensive event.

With Private Database Cloud and database consolidation, as the number of database instances increases on a server, the importance and dependence on the ASM instance increases.

The new feature Flex ASM, in Oracle Release 12c, changes this architecture with regard to ASM cluster organization and communication. The Flex ASM feature includes two key sub-features or architectures: Flex ASM Clustering and Remote ASM Access.

Flex ASM Clustering

In Oracle Release 12c, a smaller number of ASM instances run on a subset of servers in the cluster. The number of ASM instances is called the ASM cardinality. The default ASM cardinality is three, but that can be changed using the srvctl modify asm command. 12c database instance connectivity is connection time load balanced across the set of ASM instances. If a server running an ASM instance fails, Oracle Clusterware will start a new ASM instance on a different server to maintain the cardinality. If a 12c database instance is using a particular ASM instance, and that instance is lost because of a server crash or ASM instance failure, then the Oracle 12c database instance will reconnect to an ASM instance on another node. The key benefits of the Flex ASM Clustering feature include the following:

It eliminates the requirement for an ASM instance on every cluster server.

Database instances connect to any ASM instance in the cluster.

Database instances can fail over to a secondary ASM instance.

Administrators specify the cardinality of ASM instances (the default is three).

Clusterware ensures ASM cardinality is maintained.

The Flex ASM feature can be implemented in three different ways:

Pure 12c mode In this mode, the Grid Infrastructure and database are both running the 12c version. In this model, the database fully leverages all the new 12c features.

Mixed mode This mode includes two sub-modes: Standard mode and Flex Cluster mode.

With Standard mode, 12c Clusterware and ASM are hard-wired to each node, similar to the pre-12c deployment style. This model allows pre-12c and 12c databases to coexist. However, in the event of a node or ASM failure, only 12c databases can leverage the failover to an existing ASM instance (on another node).

In Flex Cluster mode, ASM instances only run on specific nodes (as noted by cardinality). Pre-12c databases connect locally where ASM is running, and 12c database can be running on any node in the cluster and connect to ASM remotely.

Flex ASM Listeners

In order to support the Flex ASM feature, the ASM listener was introduced. The ASM Listener, which is functionally similar to the SCAN Listener, is a new global CRS resource with the following key characteristics:

Three ASM listeners in Flex ASM and runs where ASM Instance is running.

ASM instances register with all ASM listeners.

Connectivity is load balanced across ASM instances.

Clients (DB instances) connect to ASM using ASM listener endpoints. These clients connect using connect data credentials defined by the Cluster Synchronization Services (CSS) Group Membership Services (GMS) layer. The clients seek the best connection by using the ASM Listener if it’s running on the same local node; if no local-node ASM, then the clients connect to (any) remote ASM instance in the cluster.

Flex ASM Network

In versions prior to 12c, Oracle Clusterware required a public network for client application access and a private network for internode communication within the cluster; this included ASM traffic. The Flex ASM Network feature also provides the capability to isolate ASM’s internal network traffic to its own dedicated private network. The Oracle Universal Installer (OUI) presents the DBA with a choice as to whether a dedicated network is to be used for ASM. The ASM network is the communication path in which all the traffic between database instances and ASM instances commence. This traffic is mostly metadata, such as a particular file’s extent map. If the customer chooses, the ASM private network can be dedicated for ASM traffic or shared with CSS, and a dedicated network is not required.

Remote ASM Access

In previous versions, ASM clients use OS authentication to connect to ASM. This was a simplistic model because ASM clients and servers are always on the same server. With Oracle Database 12c, ASM clients and ASM servers can be on different servers (as part of the Flex ASM Network configuration). A default configuration is created when the ASM cluster is formed, which is based on the password specified for the ASM administrator at installation time. Also, by default, the password file for ASM is now stored in an ASM disk group. Having a common global password file addresses many issues related to synchronizing separate password files on many servers in a cluster. Additionally, the storing of password files in a disk group is also extended to Oracle 12 databases as well. For database instances, the DBCA utility executes commands to create an ASM user for the operating system user creating the database. This is done automatically without user intervention. Following this process, the database user can remotely log into ASM and access ASM disk groups.

ASM Optimizations on Engineered Systems

In Chapter 12, we described some of the ASM optimizations that were made specifically for Engineered Systems such as Exadata and the Oracle Database Appliance (ODA). There are several other important features in Oracle 12c ASM that support Engineered Systems. This section describes further ASM optimizations and features added in Oracle 12c for supporting Engineered Systems:

Oracle Database 12c allows administrators to control the amount of resources dedicated to disk resync operations. The ASM power limit can now be set for disk resync operations, when disks are brought back online. This feature is conceptually similar to the power limit setting for disk group rebalance, with the range being 1 (least system resources) to 1024 (most system resources).

If a resync operation is interrupted and restarted, the previously completed phases of the resync are skipped and processing recommences at the beginning of the first remaining incomplete phase. Additionally, these disk resync operations now have checkpoints enabled, such that an interrupted resync operation is automatically restarted.

With Oracle Database 12c, extent relocations performed by a rebalance operation can be offloaded to Exadata Storage Server. Using this capability, a single offload request can replace multiple read and write I/O requests. Offloading relocations avoids sending data to the ASM host, thus improving rebalance performance.

For NORMAL and HIGH redundancy ASM disk groups, the algorithm that determines the placement of secondary extents uses an adjacency measure to determine the placement. In prior versions of ASM, the same algorithm and adjacency measure were used for all disk groups. Oracle Database 12c ASM provides administrators with the option to specify the content type associated with each ASM disk group. Three possible settings are allowed: data, recovery, and system. Each content type setting modifies the adjacency measure used by the secondary extent placement algorithm. The result is that the contents of disk groups with different content type settings are distributed across the available disks differently. This decreases the likelihood that a double-failure will result in data loss across NORMAL redundancy disk groups with different content type settings. Likewise, a triple-failure is less likely to result in data loss for HIGH redundancy disk groups with different content type settings.

Administrators can specify the content type to reach disk group using the disk group attribute CONTENT.TYPE. Possible values for the content type are data, recovery, or system. Specifying different content types decreases the likelihood of a single disk failure from impacting multiple disk groups in the same way.

Error Checking and Scrubbing

In previous Oracle Database versions, when data was read, a series of checks was performed on data to validate its logical consistency. If a logical corruption was detected, ASM could automatically recover by reading the mirror copies on NORMAL and HIGH redundancy disk groups. One problem with this approach is that corruption to seldom-accessed data could go unnoticed in the system for a long time between reads. Also, the possibility of multiple corruptions affecting all the mirror copies of data increases over time, so seldom-accessed data may simply be unavailable when it is required. Additionally, in releases prior to Oracle 12c, when an ASM extent was moved during a rebalance operation, it was read and written without any additional content or consistency checks.

ASM in Oracle 12c introduces proactive scrubbing capabilities, which is the process of checking content consistency in flight (as it’s accessed). Scrubbing provides the capability to perform early corruption detection. Early detection of corruption is vital because undetected corruption can compromise redundancy and increases the likelihood of data loss.

Scrubbing is performed by a new background process, SCRB, that performs various checks for logical data corruptions. When a corruption is detected, the scrubbing process first tries to use available mirrors to resolve the situation. If all the mirror copies of data are corrupt or unavailable, the scrubbing process gives up and the user can recover the corrupted blocks from an RMAN backup if one is available.

Scrubbing can be implicitly invoked during rebalance operations or areas can be scrubbed on-demand at a disk group level, on specific areas, by an administrator. To perform on-demand scrubbing, the following command can be executed:

When scrubbing occurs during a rebalance, extents that are read during the rebalance undergo a series of internal checks to ensure their logical integrity. Scrubbing in the rebalance operation requires a new attribute to be set for the disk group.

The content checking includes hardware assisted resilient data (HARD), which includes checks on user data, validation of file types from the file directory against the block contents and file directory information, and mirror side comparisons.

Other Miscellaneous Flex ASM Features

Other features that are part of Flex ASM that are notable include:

The maximum number of ASM disk groups is increased from 63 to 511.

ASM instances in an ASM cluster validate the patch level of each other. However, this is disabled for the purposes of rolling upgrades. At the end of rolling upgrades the patch level consistency is validated.

ASM physical metadata such as disk headers and allocation tables are now replicated. Previously, only virtual metadata had been replicated when ASM mirroring was used.

Summary

Since its inception, ASM has grown from being a purpose-built volume manager for the database to a feature-rich storage manager that supports all database-related files and includes a POSIX-complaint cluster file system. In addition, ASM has become the centerpiece of the Oracle Engineered Systems. The 12c ASM addresses extreme scalability and the management of real-world data types. In addition, it removes many of the limitations of previous generations of ASM. ASM also has evolved cloud computing demands of consolidation, high utilization, and high availability.

EasyReliableDBA

Friday, 6 July 2018

Database Cloud Storage-ACFS Data Services

ACFS Data Services

ASM Optimizations in Oracle Engineered Solutions

ASM Tools and Utilities

No comments:

Post a Comment

Search This Blog