Friday 7 October 2022

Exadata Storage Server And Disk Layout


Exadata Storage Server


  • Exadata Storage Server is highly optimized storage for Oracle Database.
  • It delivers outstanding I/O and SQL processing for database. 
  • A Single Storage server is also called a Cell. A Cell is building block for  a Storage grid.
  • Each cell has  OS(linux x86_64), CPUs,Memory,a bus,disks and network adapters.
  • Storage server backup happens automatically. It use internal USB drive called the Cellboot FlashDrive to take backup of software.
There is two type of Exadata storage server
    
1)  High Capacity Storage server- It comes with more storage space and less RPM.The maximum SQL bandwidth  for a full Rack database machine(14 cells) is 25GB/s
 2) Extreme Flash Storage server- It comes with less storage and high RPM.The maximum SQL bandwidth  for a full Rack database machine(14 cells) is 263GB/s. Data permanently  resides on high-performance flash drive. 


Advantage of Storage Server

  • Database can  offload some database processing  on storage server which is called Smart Scan.
  • It is highly optimized for  fast processing  of  large queries.
  • It is intelligent to use high performance flash memory to boost performance 
  • It use InfiniBand network for  higher throughput.
  • It supports Hybrid Columnar compression which provide high level of data compression.
  • It manage  I/O resource  through  IORM.
  • It uses ASM to evenly distribute the storage load for every database.
  • We can assign dedicated storage to a single database . Shared storage is not a perfect solution. Running multiple types of workloads and databases on shared storage often leads to performance problems. large parallel queries on one database can impact the performance of critical queries on another database. Also, a data load on an analytics database can impact the performance of critical queries running on it. 

 Disk Layout


Disk Layout-The disk layout needs some additional explanation because that’s where most of the activities occur.  The disks are attached to the storage cells and presented as logical units (LUNs), on which physical volumes are built. Each cell has 12 physical disks. In a high capacity configuration they are about 8TB and in a high performance configuration, they are about 3.2GB each. The disks are used for the database storage. Two of the 12 disks are also used for the home directory and other Linux operating system files. 

  • The physical disks are divided into multiple partitions. Each partition is then presented as a LUN to the cell. Some LUNs are used to create a file system for the OS. The others are presented as storage to the cell. These are called cell disks. The cell disks are further divided as grid disks, These grid disks are used to build ASM Diskgroups, so they are used as ASM disks. An ASM diskgroup is made up of several ASM disks from multiple storage cells. If the diskgroup is built with normal or high redundancy (which is the usual case), the failure groups are placed in different cells. As a result, if one cell fails, the data is still available on other cells. Finally the database is built on these diskgroups.

  • Cell disk and Grid Disk are a logical component of the physical Exadata storage. A cell or Exadata Storage server cell is a combination of Disk Drives put together to store user data. 

  • Each Cell Disk corresponds to a LUN (Logical Unit) which has been formatted by the Exadata Storage Server Software. Typically, each cell has 12 disk drives mapped to it.

  • Grid Disks are created on top of Cell Disks and are presented to Oracle ASM as ASM disks. Space is allocated in chunks from the outer tracks of the Cell disk and moving inwards. One can have multiple Grid Disks per Cell disk.
   In Exadata, a LUN (Logical Unit) is a logical abstraction of a storage device. LUNs are based on hard disks and flash devices. LUNs are automatically created when Exadata is initially configured. Each Exadata cell contains 12 hard disk-based LUNs along with 4 flashbased LUNs.
     List the LUNs on your primary Exadata cell/Storage Server.
   
CellCLI> list lun


  • A cell disk is a higher level storage abstraction. Each cell disk is based on a LUN and contains additional attributes and metadata. 
Examine the attributes for the cell disk-based on the LUN.
         CellCLI> list celldisk CD_09_qr01celadm01 detail
  • A grid disk defines an area of storage on a cell disk. Grid disks are consumed by ASM and are used as the storage for ASM disk groups. 
  • Each cell disk can contain a number of grid disks.Examine the grid disks associated with the cell disk
step. Note the names and sizes of the grid disks.

CellCLI> list griddisk where celldisk=CD_09_qr01celadm01 detail
 
  • By default, Exadata Smart Flash Cache is configured across all the flash-based cell disks.Use the LIST FLASHCACHE DETAIL command to confirm that Exadata Smart Flash Cache is configured on your flash-based cell disks. 
   CellCLI> list flashcache detail


                                Storage Objects in Exadata


In an Exadata environment we have the following disk types:

Physical disk is a hard disk on a storage cell. Each storage cell has 12 physical disks, all with the same capacity (600 GB, 2 TB or 3 TB).

Flashdisk is a Sun Flash Accelerator PCIe solid state disk on a storage cell. Each storage cell has 16 flashdisks - 24 GB each in X2 (Sun Fire X4270 M2) and 100 GB each in X3 (Sun Fire X4270 M3) servers.

Celldisk is a logical disk created on every physicaldisk and every flashdisk on a storage cell. Celldisks created on physicaldisks are named CD_00_cellname, CD_01_cellname ... CD_11_cellname. Celldisks created on flashdisks are named FD_00_cellname, FD_01_cellname ... FD_15_cellname.

Griddisk is a logical disk that can be created on a celldisk. In a standard Exadata deployment we create griddisks on hard disk based celldisks only. While it is possible to create griddisks on flashdisks, this is not a standard practice; instead we use flash based celldisks for the flashcashe and flashlog.

 ASM Disk - is a grid disk (in Exadata Environment )
- used to create ASM Disk Groups.
- ASM and Database Instances have access to them.

        Auto Disk Management Concept in Exadata

These are the disk operations that are automated in Exadata:

1. Grid disk status change to OFFLINE/ONLINE

If a griddisk becomes temporarily unavailable, it will be automatically OFFLINED by ASM. When the griddisk becomes available, it will be automatically ONLINED by ASM.

2. Grid disk DROP/ADD

If a physical disk fails, all grid disks on that physical disk will be DROPPED with FORCE option by ASM. If a physical disk status changes to predictive failure, all griddisks on that physical disk will be DROPPED by ASM. If a flash disk performance degrades, the corresponding griddisks (if any) will be DROPPED with FORCE option by ASM.

When a physical disk is replaced, the celldisk and griddisks will be recreated by CELLSRV, and the griddisks will be automatically ADDED by ASM.

NOTE: If a griddisk in NORMAL state and in ONLINE mode status, is manually dropped with FORCE option (for example, by a DBA with 'alter diskgroup ... drop disk ... force'), it will be automatically added back by ASM. In other words, dropping a healthy disk with a force option will not achieve the desired effect.

3. Grid disk OFFLINE/ONLINE for rolling Exadata software (storage cells) upgrade

Before the rolling upgrade all griddisks will be inactivated on the storage cell by CELLSRV and OFFLINED by ASM. After the upgrade all griddisks will be activated on the storage cell and ONLINED in ASM.

4. Manual grid disk activation/inactivation

If a gridisk is manually inactivated on a storage cell, by running 'cellcli -e alter griddisk ... inactive',  it will be automatically OFFLINED by ASM. When a gridisk is activated on a storage cell, it will be automatically ONLINED by ASM.

5. Grid disk confined ONLINE/OFFLINE

If a grid disk is taken offline by CELLSRV, because the underlying disk is suspected for poor performance, all grid disks on that cell disk will be automatically OFFLINED by ASM. If the tests confirm that the celldisk is performing poorly, ASM will drop all griddisks on that celldisk. If the tests find that the disk is actually fine, ASM will online all grid disks on that celldisk.

Software components

1. Cell Server (CELLSRV)

The Cell Server (CELLSRV) runs on the storage cell and it's the main component of Exadata software. In the context of automatic disk management, its tasks are to process the Management Server notifications and handle ASM queries about the state of griddisks.

2. Management Server (MS)

The Management Server (MS) runs on the storage cell and implements a web service for cell management commands, and runs background monitoring threads. The MS monitors the storage cell for hardware changes (e.g. disk plugged in) or alerts (e.g. disk failure), and notifies the CELLSRV about those events.

3. Automatic Storage Management (ASM)

The Automatic Storage Management (ASM) instance runs on the compute (database) node and has two processes that are relevant to the automatic disk management feature:

Exadata Automation Manager (XDMG) initiates automation tasks involved in managing Exadata storage. It monitors all configured storage cells for state changes, such as a failed disk getting replaced, and performs the required tasks for such events. Its primary tasks are to watch for inaccessible disks and cells and when they become accessible again, to initiate the ASM ONLINE operation.

Exadata Automation Manager (XDWK) performs automation tasks requested by XDMG. It gets started when asynchronous actions such as disk ONLINE, DROP and ADD are requested by XDMG. After a 5 minute period of inactivity, this process will shut itself down.

Working together

All three software components work together to achieve automatic disk management.

In the case of disk failure, the MS detects that the disk has failed. It then notifies the CELLSRV about it. If there are griddisks on the failed disk, the CELLSRV notifies ASM about the event. ASM then drops all griddisks from the corresponding disk groups.

In the case of a replacement disk inserted into the storage cell, the MS detects the new disk and checks the cell configuration file to see if celldisk and griddisks need to be created on it. If yes, it notifies the CELLSRV to do so. Once finished, the CELLSRV notifies ASM about new griddisks and ASM then adds them to the corresponding disk groups.

In the case of a poorly performing disk, the CELLSRV first notifies ASM to offline the disk. If possible, ASM then offlines the disk. One example when ASM would refuse to offline the disk, is when a partner disk is already offline. Offlining the disk would result in the disk group dismount, so ASM would not do that. Once the disk is offlined by ASM, it notifies the CELLSRV that the performance tests can be carried out. Once done with the tests, the CELLSRV will either tell ASM to drop that disk (if it failed the tests) or online it (if it passed the test).

The actions by MS, CELLSRV and ASM are coordinated in a similar fashion, for other disk events.

ASM initialization parameters

The following are the ASM initialization parameters relevant to the auto disk management feature:

_AUTO_MANAGE_EXADATA_DISKS controls the auto disk management feature. To disable the feature set this parameter to FALSE. Range of values: TRUE [default] or FALSE.

_AUTO_MANAGE_NUM_TRIES controls the maximum number of attempts to perform an automatic operation. Range of values: 1-10. Default value is 2.

_AUTO_MANAGE_MAX_ONLINE_TRIES controls maximum number of attempts to ONLINE a disk. Range of values: 1-10. Default value is 3.

All three parameters are static, which means they require ASM instances restart. Note that all these are hidden (underscore) parameters that should not be modified unless advised by Oracle Support.

Files

The following are the files relevant to the automatic disk management feature:

1. Cell configuration file - $OSSCONF/cell_disk_config.xml. An XML file on the storage cell that contains information about all configured objects (storage cell, disks, IORM plans, etc) except alerts and metrics. The CELLSRV reads this file during startup and writes to it when an object is updated (e.g. updates to IORM plan).

2. Grid disk file - $OSSCONF/griddisk.owners.dat. A binary file on the storage cell that contains the following information for all griddisks:

ASM disk name

ASM disk group name

ASM failgroup name

Cluster identifier (which cluster this disk belongs to)

Requires DROP/ADD (should the disk be dropped from or added to ASM)

3. MS log and trace files - ms-odl.log and ms-odl.trc in $ADR_BASE/diag/asm/cell/`hostname -s`/trace directory on the storage cell.

4. CELLSRV alert log - alert.log in $ADR_BASE/diag/asm/cell/`hostname -s`/trace directory on the storage cell.

5. ASM alert log - alert_+ASMn.log in $ORACLE_BASE/diag/asm/+asm/+ASMn/trace directory on the compute node.

6. XDMG and XDWK trace files - +ASMn_xdmg_nnnnn.trc and +ASMn_xdwk_nnnnn.trc in $ORACLE_BASE/diag/asm/+asm/+ASMn/trace directory on the compute node.


Exadata – Change Diskgroup Redundancy from High to Normal

Step 1: Drop Diskgroup

SUCCESS:  drop diskgroup RECO_FHDB including contents

Tue Jul 17 22:55:09 2012

NOTE: diskgroup resource ora.RECO_FHDB.dg is dropped

Step 2: Extract DDL (create diskgroup) command for RECO_FHDB from ASM alert log and replace redundancy clause and run create diskgroup command on ASM instance.

SQL> CREATE DISKGROUP RECO_FHDB NORMAL REDUNDANCY  DISK

'o/192.168.10.10/RECO_FHDB_CD_00_fhdbcel06',

….

'o/192.168.10.9/RECO_FHDB_CD_11_fhdbcel05' ATTRIBUTE

'compatible.asm'='11.2.0.2','compatible.rdbms'='11.2.0.2','au_size'='4M','cell.smart_scan_capable'='TRUE' /* ASMCA */

SUCCESS: diskgroup RECO_FHDB was mounted


ASM spfile, OCR and voting disks were located on DATA_FHDB diskgroup and I had to relocate above files from DATA_FHDB to RECO_FHDB to recreate DATA_FHDB diskgroup with normal redundancy.

Step 1: Drop diskgroup will throw following error when ASM SPFILE is located on same diskgroup.

SQL> drop diskgroup DATA_FHDB including contents

NOTE: Active use of SPFILE in group

Wed Jul 18 14:49:29 2012

GMON querying group 1 at 18 for pid 19, osid 9914

Wed Jul 18 14:49:29 2012

NOTE: Instance updated compatible.asm to 11.2.0.2.0 for grp 1

ORA-15039: diskgroup not dropped

ORA-15027: active use of diskgroup "DATA_FHDB" precludes its dismount

Step 2: Move OCR and voting disk to RECO_FHDB

[oracle@fhdbdb01 ~]$ ocrcheck

Status of Oracle Cluster Registry is as follows :

         Version                  :          3

         Total space (kbytes)     :     262120

         Used space (kbytes)      :       3344

         Available space (kbytes) :     258776

         ID                       : 1272363019

         Device/File Name         : +DATA_FHDB

                                    Device/File integrity check succeeded 

                                    Device/File not configured 

                                    Device/File not configured 

                                    Device/File not configured 

                                    Device/File not configured 

         Cluster registry integrity check succeeded 

         Logical corruption check bypassed due to non-privileged user

[root@fhdbdb01 cssd]# ocrconfig -add +RECO_FHDB

[root@fhdbdb01 cssd]# 

[root@fhdbdb01 cssd]# ocrconfig -delete +DATA_FHDB

[root@fhdbdb01 cssd]# ocrcheck

Status of Oracle Cluster Registry is as follows :

         Version                  :          3

         Total space (kbytes)     :     262120

         Used space (kbytes)      :       3364

         Available space (kbytes) :     258756

         ID                       : 1272363019

         Device/File Name         : +RECO_FHDB

                                    Device/File integrity check succeeded 

                                    Device/File not configured 

                                    Device/File not configured 

                                    Device/File not configured 

                                    Device/File not configured 

         Cluster registry integrity check succeeded 

         Logical corruption check succeeded

[root@fhdbdb01 ~]$ crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group

--  -----    -----------------                --------- ---------

 1. ONLINE   75c79c52f88b4fcebf2f84ccad0be646 (o/192.168.10.10/DATA_FHDB_CD_00_fhdbcel06) [DATA_FHDB]

 2. ONLINE   14f6d0e1c8b94f3bbf222b821f7f48ab (o/192.168.10.11/DATA_FHDB_CD_00_fhdbcel07) [DATA_FHDB]

 3. ONLINE   7aed830fb6ee4f70bf9160b2f39ea64b (o/192.168.10.5/DATA_FHDB_CD_00_fhdbcel01) [DATA_FHDB]

 4. ONLINE   9cc87608cabd4fb0bfea7e1f7d403134 (o/192.168.10.6/DATA_FHDB_CD_00_fhdbcel02) [DATA_FHDB]

 5. ONLINE   2c6008a2c0864fbfbf4ae1c9cbc60d5c (o/192.168.10.7/DATA_FHDB_CD_00_fhdbcel03) [DATA_FHDB] 


[root@fhdbdb01 cssd]# crsctl replace votedisk +RECO_FHDB

Successful addition of voting disk 161fa97cc71e4fffbfe10408e1e32aa0.

Successful addition of voting disk 128fb088bd7c4fe7bf6dff63d946dbc6.

Successful addition of voting disk 804b6348a5974f53bfccb328b92f9350.

Successful deletion of voting disk 75c79c52f88b4fcebf2f84ccad0be646.

Successful deletion of voting disk 14f6d0e1c8b94f3bbf222b821f7f48ab.

Successful deletion of voting disk 7aed830fb6ee4f70bf9160b2f39ea64b.

Successful deletion of voting disk 9cc87608cabd4fb0bfea7e1f7d403134.

Successful deletion of voting disk 2c6008a2c0864fbfbf4ae1c9cbc60d5c.

Successfully replaced voting disk group with +RECO_FHDB.

CRS-4266: Voting file(s) successfully replaced 


[root@fhdbdb01 cssd]# crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group

--  -----    -----------------                --------- ---------

 1. ONLINE   161fa97cc71e4fffbfe10408e1e32aa0 (o/192.168.10.10/RECO_FHDB_CD_00_fhdbcel06) [RECO_FHDB]

 2. ONLINE   128fb088bd7c4fe7bf6dff63d946dbc6 (o/192.168.10.11/RECO_FHDB_CD_00_fhdbcel07) [RECO_FHDB]

 3. ONLINE   804b6348a5974f53bfccb328b92f9350 (o/192.168.10.5/RECO_FHDB_CD_00_fhdbcel01) [RECO_FHDB]

Located 3 voting disk(s).


Step 3: Move ASM spfile.

SQL> create pfile='/nfs/zfs/init+ASM.ora' from spfile; 

File created. 

SQL> create spfile='+RECO_FHDB/fhdb-cluster/asmparameterfile/spfileASM.ora' from pfile='/nfs/zfs/init+ASM.ora'; 

File created. 

echo "SPFILE='+RECO_FHDB/fhdb-cluster/asmparameterfile/spfileASM.ora'" > init+ASM.ora

Step 4: Drop DATA_FHDB diskgroup

SQL> drop diskgroup DATA_FHDB including contents;

drop diskgroup DATA_FHDB including contents

*

ERROR at line 1:

ORA-15039: diskgroup not dropped

ORA-15027: active use of diskgroup "DATA_FHDB" precludes its dismount

ASMCMD> cd DATA_FHDB/

ASMCMD> ls

fhdb-cluster/

ASMCMD> cd fhdb-cluster

ASMCMD> ls

ASMPARAMETERFILE/

OCRFILE/

ASMCMD> cd ASMPARAMETERFILE/

ASMCMD> ls

REGISTRY.253.788355279

ASMCMD> rm REGISTRY.253.788355279

ORA-15032: not all alterations performed

ORA-15028: ASM file '+DATA_FHDB/fhdb-cluster/ASMPARAMETERFILE/REGISTRY.253.788355279' not dropped; currently being accessed (DBD ERROR: OCIStmtExecute)

SQL> alter diskgroup DATA_FHDB dismount force;

Diskgroup altered.

SQL> drop diskgroup DATA_FHDB force including contents;

Diskgroup dropped.


Step 5: Create DATA_FHDB diskgroup


Configuring Hosts to Access Exadata Cells Configuration files on each database server enable access to Exadata storage.

  •  cellinit.ora identifies the storage network interfaces/InfiniBand  IP address on the database server.
  • The cellinit.ora file contains the database server IP address that connects to the storage network. This file is host specific, and contains the IP addresses of the InfiniBand storage network interfaces for that database server. The IP addresses are specified in Classless Inter-Domain Routing (CIDR) format.
  • cellip.ora identifies the Exadata cells that are accessible to the database server.

  • To ensure that ASM discovers Exadata grid disks, set the ASM_DISKSTRING initialization parameter. A search string with the following form is used to discover Exadata grid disks:
    o/<cell IP address>/<grid disk name>
    Wildcards may be used to expand the search string. For example, to explicitly discover all the available             Exadata grid disks set ASM_DISKSTRING='o/*/*'. To discover a subset of available grid disks having            names that begin with data, set
    
    ASM_DISKSTRING='o/*/data*'.


    Bear in mind the following general considerations when reconfiguring Exadata storage


Reconfiguring an existing disk group requires the ability to drop disks from the disk group, reconfigure them and then add them back into the disk group. If the amount of free space in the disk group is greater than the REQUIRED_MIRROR_FREE_MB value reported in V$ASM_DISKGROUP, then you can use methods which reconfigure the diskgroup one cell at a time. If the free space is less than REQUIRED_MIRROR_FREE_MB,then you may need to reorganize your storage to create more free space. It may also be possible, though not recommended, to reconfigure the storage one disk at a time.
Best practices recommend that all disks in an ASM disk group should be of equal size and have equal performance characteristics. For Exadata this means that all the grid disks allocated to a disk group should be the same size and occupy the same region on each disk. There should not be a mixture of interleaved and non-interleaved grid disks, likewise there should not be a mixture of disks from high-capacity cells and high-
    performance cells. Finally, the grid disks should all occupy the same location on each disk.

  •   If you try to drop a grid disk without the FORCE option the command will not be processed and an error will be displayed if the grid disk is being used by an ASM disk group. If you remove a disk from an ASM disk group ensure that the resulting rebalance operation completes before attempting to drop the associated grid disk.If you need to use the DROP GRIDDISK command with the FORCE option, use extreme caution since incorrectly dropping an active grid disk could result in data loss.If you try to drop a cell disk without the FORCE option the command will not be processed and an error will be displayed if the cell disk contains any grid disks. It is possible to use the DROP CELLDISK command with the FORCE option to drop a cell disk and all the associated grid disks. Use the FORCE option with extreme caution since incorrectly dropping an active grid disk could result in data loss.
  • Clusterware files (cluster registry and voting disks) are stored by default in a special ASM disk group named DBFS_DG. Resizing the DBFS_DG disk group is generally not recommended since the grid disks associated with it are sized specially to match the size of the system areas on the first two disk in each cell. If there is a requirement to alter this disk group, or the underlying grid disks or cell disks, special care must be taken

  • to preserve the clusterware files.  Reconfiguring Exadata storage on an active system without any downtime is possible, however doing so can be a time-consuming process involving many ASM rebalancing operations. The time required depends on the number of storage cells, the existing disk usage and the load on the system.



         How to configure Exadata cell alert notification?


We can configure SMTP alert on cell level using cellcli utility on each cell to get alert notification by providing below required details.


To find out the current situation:

CellCLI> list cell detail
.........
notificationMethod:     mail
notificationPolicy:     critical,warning,clear
.........
smtpFrom:               "Oracle Database Machine"
smtpFromAddr:           exadata@company.com
smtpPort:               25
smtpPwd:                ******
smtpServer:             192.168.1.11
smtpToAddr:             "system@company.com, admin@company.com"
smtpUser:               exadata
smtpUseSSL:             FALSE

CellCLI> ALTER CELL smtpServer='mailserver.example.com',
             smtpFromAddr='exadataalert@easyreliable.com',
             smtpFrom='Exadata Alert',         
             smtpToAddr='exaadtadba@easyreliable.com',
             notificationPolicy='maintenance,clear,warning,critical',
             notificationMethod='mail,snmp'

Here,

smtpserver - Mail Server Name
smtpFromAddr - Mail id from where alert will be send to 
smtpToAddr - Mail id on which alert should go
notificationPolicy - Define notification policy
notificationMethod - method of notification


Validate email notification on cell by executing

CellCLI> ALTER CELL VALIDATE MAIL

We can also change the format of E-mail address by executing commands like

CellCLI> ALTER CELL emailFormat='text'
CellCLI> ALTER CELL emailFormat='html'


To close alert emails:


CellCLI>alter cell notificationMethod=null
 
To open alert emails:

CellCLI>alter cell notificationMethod='mail,snmp'

or
CellCLI>alter cell notificationMethod='mail'









No comments:

Post a Comment