RMAN incrementally updated image copy backups

Oracle RMAN’s incrementally updated image copy backups is neither a new feature, neither an undocumented one.

Thus, why should I care writing about it? I don’t like reinventing the wheel; but I do like to have neat scripts, meaningful file names, low RTO and a large recovery windows.

Moreover, what if I want to have both tape-based rman backup (with a wide recoery window) and disk-based image backups (with a much shorter recovery window, say 2 days)? The former could be ideal as long-term retention, off-site backup while the latter could by suitable to run faster

These are the reasons why I did some more investigations on this topic, and my findings.

I started with this script:

RUN {

  ALLOCATE CHANNEL DISK1 TYPE DISK FORMAT '/u01/app/oracle/backup-image/%d/%b' ;
  ALLOCATE CHANNEL DISK2 TYPE DISK FORMAT '/u01/app/oracle/backup-image/%d/%b' ;

  RECOVER COPY OF DATABASE WITH TAG 'BACKUP_DISK_INCR';

  BACKUP
   INCREMENTAL LEVEL 1
   FOR RECOVER OF COPY WITH TAG 'BACKUP_DISK_INCR'
   AS BACKUPSET
   FORMAT '/u01/app/oracle/backup-image/%d/rman_%d_%T_%U_incr.bkp'
   DATABASE ;

  RELEASE CHANNEL DISK1 ;
  RELEASE CHANNEL DISK2 ;
}

Please note that we are allocating channel with an explicit FORMAT even if we do specify a format when we run a backup. This is not an error neither redundant because

  1. the format specified in the BACKUP command will be used for INCR. LEVEL 1 backups, so that any incremental backup file will be something like /u01/app/oracle/backup-image/ORCL/rman_ORCL_YYYYMMDD_xxxxxxxx_y_z_incr.bkp
  2. the format specified in the ALLOCATE CHANNEL will be used for LEVEL 0 backup, i.e.
    when RMAN is not able to run an “incremental level 1” backup because it does not
    find any LEVEL 0 backup with the same TAG and it has to revert to a level 0
    backup, using the “default” format specified.
    Some possible options for the FORMAT are:
    1. /copy/%b as %b Specifies the file name stripped of directory paths
      so that /copy/%b would be /copy/system01.dbf for /oracle/oradata/system01.dbf
    2. /copy/%U as %U translates to “data-D-%d_id-%I_TS-%N_FNO-%f_%u” for datafiles image copies, so that %U may be “data-D-ORCL_id-NNNNNNNN_TS-1_FNO-1_xxxxxxxx” for “system01.dbf” since
      1. %d is Database name
      2. %I is DBID
      3. %N is Tablespace name
      4. %f is Absolute file number
      5. %u is 8-character short system generated file name (a compressed representations of the backup set or image copy number and the time the image copy was created)
    3. /copy/%b-%U to use a unique identified (should you do more copies of the same
      datafiles) and still have a maningful name, i.e. /copy/system01.dbf-data-D-ORCL_id-NNNNNNNN_TS-1_FNO-1_xxxxxxxx for “/oradata/system01.dbf“, /copy/users01.dbf-data-D-ORCL_id-NNNNNNNN_TS-4-FNO-4_xxxxxxxx for “/oradata/users01.dbf” , …

Next, I want to introduce a 2-days recovery window on this backup, so I changed the recover command as follow

RECOVER COPY OF DATABASE WITH TAG 'BACKUP_DISK_INCR' UNTIL TIME 'SYSDATE - 2';

As a side effect, this would cause longer recovery times (an up-to-sysdate recover would need to read and apply each incremental backup). To keep acceptable recovery times we can use cumulative backups; we may also choose to compress them to save space. If we can use multiple channel we can also split incremental backups (using both multisection backups and multi-piece backups) to allow massive parallel restore. Our backup command become:

BACKUP
  SECTION SIZE 512M
  INCREMENTAL LEVEL 1 CUMULATIVE
  FOR RECOVER OF COPY WITH TAG 'BACKUP_DISK_INCR'
  FILESPERSET 1
  AS COMPRESSED BACKUPSET
  FORMAT '/u01/app/oracle/backup-image/%d/rman_%d_%T_%U_incr.bkp'
  DATABASE ;

To complete the script, I should add at least archivelog backup, controlfile backup, obsolete backup cleanup.

To delete old backup we can run a quite simple DELETE NOPROMPT OBSOLETE ;. However, should you run both disk-based image backups and tape-based backup with different recovery windows you should run something like the following (see MOS note 1494690.1 RMAN: How to use two retention policies – one for disk and another for tape )

DELETE NOPROMPT OBSOLETE RECOVERY WINDOW OF 3 DAYS DEVICE TYPE DISK ;

The whole rman scripts is:

RUN {
ALLOCATE CHANNEL DISK1 TYPE DISK FORMAT '/u01/app/oracle/backup-image/%d/%b' ;
ALLOCATE CHANNEL DISK2 TYPE DISK FORMAT '/u01/app/oracle/backup-image/%d/%b' ;

RECOVER COPY OF DATABASE WITH TAG 'BACKUP_DISK_INCR' UNTIL TIME 'SYSDATE - 2';

DELETE NOPROMPT OBSOLETE RECOVERY WINDOW OF 3 DAYS DEVICE TYPE DISK ;

BACKUP
 SECTION SIZE 512M
 INCREMENTAL LEVEL 1 CUMULATIVE
 FOR RECOVER OF COPY WITH TAG 'BACKUP_DISK_INCR'
 FILESPERSET 1
 AS COMPRESSED BACKUPSET
 FORMAT '/u01/app/oracle/backup-image/%d/rman_%d_%T_%U_incr.bkp'
 DATABASE ;

BACKUP
 FILESPERSET 10
 AS COMPRESSED BACKUPSET
 TAG 'BACKUP_DISK_ARCH'
 ARCHIVELOG ALL NOT BACKED UP 1 TIMES
 FORMAT '/u01/app/oracle/backup-image/%d/rman_%d_%T_%U_arch.bkp' ;

BACKUP
 AS COMPRESSED BACKUPSET
 TAG 'BACKUP_DISK_CTRL'
 CURRENT CONTROLFILE
 FORMAT '/u01/app/oracle/backup-image/%d/rman_%d_%T_%U_ctrl.bkp' ;

BACKUP
 AS COMPRESSED BACKUPSET
 TAG 'BACKUP_DISK_SPFL'
 SPFILE
 FORMAT '/u01/app/oracle/backup-image/%d/rman_%d_%T_%U_spfl.bkp' ;

RELEASE CHANNEL DISK1 ;
RELEASE CHANNEL DISK2 ;
}

With this script I have image-based incrementally updated backup while still having a custom recovery window, optional additional tape-based backups, consistent recovery times, meaningful file names, massively parallel recover and restore capabilities.

Posted in Uncategorized | Leave a comment

RMAN “archival backup” – part 2

This will be a quite small follow-up to “part 1” and will almost entirely cover bash scripting details (no rocket science here!)

TOC is:

  1. backups will be scheduled in an external software suite: we must provide shell script to run level 0 backups, incremental backups, archivelog backups
  2. we want to use the same shell script on different servers which will include Oracle Linux 5,6,7, RHEL 5,6,7, AIX 6.1, AIX 7.1, Sun Solaris 11
  3. shell scripts must return with a proper exit code to notify the caller in case of errors
  4. shell scripts must both show their output to the caller and save it in a log file (each execution must write its own log to avoid the same log being overwritten)
  5. shell scripts must show their output, too
  6. shell scripts should work even if their path contains symlinks
  7. backup logs must contain any information needed to reinstall Oracle Software and restore the database
  8. it would be nice to known which are the shell’s process ancestors0

1) backups will be scheduled in an external software suite: we must provide shell script to run level 0 backups, incremental backups, archivelog backups

This should be an obvious task.

2) we want to use the same shell script on different servers which will include Oracle Linux 5,6,7, RHEL 5,6,7, AIX 6.1, AIX 7.1, Sun Solaris 11

Some standard unix command behave differently on different *nix system (awk, hostname, bash are some of these commands). I used the output of $( uname ) to use different commands and I put all this stuff in a single script to keed code neat.

3) shell scripts must return with a proper exit code to notify the caller in case of errors

This can be achieved running something like one of the followings, depending whether you need to inspect logfile content or not

rman target / [...] cmdfile=[...]
RMAN_RETCODE=$?
[...]
exit ${RMAN_RETCODE}
rman target / [...] cmdfile=[...]
RETCODE=$?

# check whether the log file contains RMAN-xxxxx messages, even if rman's retcode looks fine

if [[ ${RETCODE} -eq 0 ]] ; then
echo
  echo "looking for ^RMAN-xxxxx messages in log file..."
  echo
  egrep -q "^RMAN-([0-9]){5}" ${LOG_FILE}
  if [[ $? -eq 0 ]] ; then
    RETCODE=2
    echo "ERROR(s) found:"
    egrep "^RMAN-([0-9]){5}" ${LOG_FILE}
    echo
  else
    echo "no RMAN-xxxxx messages in log file"
  fi
fi


# check whether my own log shows EXPIRED backup; if this is the case set
# RETCODE to a non-zero value

egrep -q "^validation failed for archived log|^crosschecked backup piece: found to be 'EXPIRED'" ${LOG_FILE}

if [[ $? -eq 0 ]] ; then
  echo ERROR found EXPIRED backup or MISSING archivelog
  RETCODE=3
fi

[...]

exit ${RETCODE}

4-5) shell scripts must both show their output to the caller and save it in a log file (each execution must write its own log to avoid the same log being overwritten) and they must show their output, too

Showing stdout and saving it in a file can be done with tee , but I found using tee with both stdout and stderr a bit more tricky. On Linux you can done this from bash leveraging process substitution with & redirection and exec like this:

exec &> >(tee -a "${LOG_FILE}")

while on different *nix system I used to a simpler approach witch does not show anything real time to the caller process:

if [[ $( uname ) == "Linux" ]] ; then
  [...]
else
  exec 6>&1 ## save stdout linking it to file desriptor 6
  exec 7>&2 ## save stderr linking it to file desriptor 7
  exec >> ${LOG_FILE} ## redirects all stdout to ${LOG_FILE}
  exec 2>&1 ## redirects all stderr to stdout
fi

[...]

#end of the script:
# restore stdout and stderr file descriptors

if [[ $( uname ) != "Linux" ]]; then
  exec 1>&6 6>&- ## restore stdout and close file descriptor 6
  exec 2>&7 7>&- ## restore stderr and close file descriptor 7
  cat ${LOG_FILE}
fi

exit

To have separate log files for different executions (this is really fundamental to debug overlapping executions!) I build the log file name for script.sh as script.DATE_HOUR.PID.log

6) scripts should work even if their path contains symlinks

This should be working out-of-the-box and will allow for more flexibility

7) backup logs must contain any information needed to reinstall Oracle Software and restore the database

In a better world, any installed O.S. and software component should be documented and backed up, including OS version, ORACLE_HOME contents and ORACLE_HOME’s inventory. However it could be useful to know exactly which OS, Oracle version, Oracle database component(s) and patch(es) have been installed.

The print* scripts you can find in https://github.com/yawod/rman-archival-backup/tree/master/lib will show such informations.

uname and uname -a will tell witch OS and kernel we are using.

To know which Oracle Database Software we are using I choosed a multiple approach:

  1. ${ORACLE_HOME}/bin/sqlplus will tell you which release we are on
  2. ${ORACLE_HOME}/OPatch/opatch lspatches willl tell us which patches we installed
  3. On some *nix systems, ar -t ${ORACLE_HOME}/rdbms/lib/libknlopt.a can be used to see which components have been installed or linked (i.e. ASM, ConText, Database Vault, OLAP, RAC, Unified Auditing, …)
  4. dba_registry can always show which database components are configured in this database (i.e. OJVM, ConText, OLAP, ApEx, …)
  5. dba_registry_sqlpatch and dba_registry_history can show us which patches have been installed, too

the print_oracle_home_info.sh script will use all of these to give us any information needed to reinstall a fully compatible database server.

8) it would be nice to known which are the shell’s process ancestors

Knowing a shell’s process ancestors can be useful to check whether it has been started by an interactive shell, a crond job, a scheduler agent… moreover, we could match such informations with audit data to know “how did what”.

I got this with a bash recursive function which use ps to show current process parents up to PID 1:

echo
echo current process tree follows:

print_ps_tree()
{
if [[ ${1} -eq 1 ]]; then
  ps -fp 1
else
  print_ps_tree $( ps -fp $1 | tail -n 1 | tr -s [:space:]| cut -d " " -f 3 )
  ps -fp $1 | tail -n 1
fi;
}

print_ps_tree $$

That’s all, folks!

Posted in Uncategorized | Leave a comment

RMAN “archival backup” – part 1

I recently had to improve some backup scripts; since I did not find so much documentation about “long-term archival backup” I thought I could share my experience, because combining together different requirements is not always straightforward.

Here are facts and requirements:

  1. backup will be written on a VTL (we will use rman catalog)
  2. we must guarantee a 16-days recovery windowwe must produce monthly archival backup that will be retained for 1 year
  3. we must be able to use these archival backup to perform both in-place restore and rman duplicate
  4. archival backup must not block, prevent or slow down any other “ordinary” rman backup (this requirement is strictly related to RPO)
  5. each archival backup must have its unique TAG
  6. we can accept any backup job to run with a low degree of parallelism to reduce backup overhead
  7. restore jobs must be able to use a proper, higher, degree of parallelism in order to guarantee an acceptable RTO
  8. RMAN backup scripts must work on Oracle 11.2 and subsequent version
  9. use the FRA to store archivelog in order to reduce RTO
  10. run at least basic sanity check (i.e. rman validate and rman crosscheck
  11. backups will be scheduled in an external software suite: we must provide shell script to run level 0 backups, incremental backups, archivelog backups
  12. we want to use the same shell script on different servers which will include Oracle Linux 5,6,7, RHEL 5,6,7, AIX 6.1, AIX 7.1, Sun Solaris 11
  13. shell scripts must return with a proper exit code to notify the caller in case of errors
  14. shell scripts must both show their output to the caller and save it in a log file (each execution must write its own log to avoid the same log being overwritten)
  15. shell scripts must show their output , too
  16. shell scripts should work even if their path contains symlinks
  17. backup logs must contain any information needed to reinstall Oracle Software and restore the database
  18. it would be nice to known which are the shell’s process ancestors

Our starting point is RMAN’s documentation and some additional MOS note:

  1. Bug 14215868 : BACKUP DATABASE PLUS ARCHIVELOG WITH AUTOBACKUP IS NOT SUFFICIENT TO DUPLICATE
  2. RMAN duplicate errors with RMAN-6026 RMAN-6024 no backup of controlfile (Doc ID 1348071.1)
  3. Bug 14054568 – RMAN DUPLICATE fails with RMAN-6054 (Doc ID 14054568.8)
  4. Perform Backup Based RMAN DUPLICATE Without Connecting To Target Database For Both Disk & Tape Backups (Doc ID 1375864.1)
  5. RMAN 11GR2 : DUPLICATE Without Target And Recovery Catalog Connection (Doc ID 874352.1)

This post will go through point 1-10; I’ll discuss remaing topics in a “part 2”

If you want to skip to some topics, here is the TOC:

  1. backup will be written on a VTL
  2. we must guarantee a 16-days recovery window
  3. we must produce monthly archival backup that will be retained for 1 year
  4. we must be able to use these archival backup to perform both in-place restore and rman duplicate
  5. use the FRA to store archivelog in order to reduce RTO
  6. archival backup must not block, prevent or slow down any other “ordinary” rman backup (this requirement is strictly related to RPO)
  7. each archival backup must have its unique TAG
  8. we can accept backup job to run with a low degree of parallelism to reduce backup jobs overhead
  9. restore jobs must be able to use a proper, higher, degree of parallelism in order to guarantee an acceptable RTO
  10. RMAN backup scripts must work on Oracle 11.2 and subsequent version
  11. run at least basic sanity check (i.e. rman validate and rman crosscheck)
  12. backup and log should contain as much information as possible to restore the database and its configuration.
  13. Extra RMAN considerations
  14. the RMAN archival backup script
  15. Source code

1) backup will be written on a VTL

This is not an issue at all: this will make the use of an external rman catalog even much more recommended.

2) we must guarantee a 16-days recovery window

Not an issue: we set RETENTION POLICY to RECOVERY WINDOW OF 16 DAYS in rman configuration

3-4) we must produce monthly archival backup that will be retained for 1 year and we must be able to use these archival backup to perform both in-place restore and rman duplicate

This is the main topic, the reason which made me write down this note. This will take some more time than point 1 and 2.

Since version 10.2, rman allows to specify “KEEP LOGS” when executing a backup.
This will force rman to keep *ALL* subsequents archivelog backup *as long as the backup will not be deleted*.
Please note that rman will not only retain the archivelog needed to recover the backup: it will retaing all subsequent archivelog (or their backup).

With version 11.1 rman introduced “archival backups” ( see https://docs.oracle.com/cd/B28359_01/backup.111/b28270/wnbradv.htm#BRADV021 and
https://docs.oracle.com/cd/B28359_01/backup.111/b28270/rcmbckba.htm#CHDHAFHJ ).

The manuals say that “You can use BACKUP … KEEP UNTIL … to create a backup that is both all-inclusive and exempt from the backup retention policy. The backup is all-inclusive because every file needed to restore and recover the database is backed up to a single disk or tape location. The KEEP option also specifies that the backup should be exempt from the retention policy either forever or for a specified period of time. The general name for a backup created with BACKUP … KEEP is an archival backup.“;

However, as you can read from the MOS notes listed before, this backup cannot be used as a source to run a DUPLICATE command.

The main reason behind this is that archival backup will perform these tasks:

  1. at time t_0, start the BACKUP … KEEP UNTIL … command
  2. first step (say, at time t_1), is to complete a full database backup
  3. next (say, at time t_2), rman will automatically backup any archivelog needed to recover this backup
  4. next (say, at time t_3), rman will automatically backup the spfile
  5. at the end (say, at time t_4), rman will automatically backup the controlfile

As a consequence, any DUPLICATE … UNTIL TIME t_2 will not start because it won’t find a suitable controlfile backup, i.e. a backup of controlfile at time <= t_2 (you may have to wait a bit to see this error: it would arise as soon as the archival backup you are going to use fell out of the configured recovery window) and DUPLICATE … UNTIL TIME t_4 will fail while applying archivelogs because it will restore a controlfile backed up at time t_4 but it will not find archivelog produced between time t_2 and time t_4.

Since we will store these backup for a quite long time it’s quite unlikely to be asked to use them to replace a production database: I’d rather expect to use these backup to replace a test/dev/check database while using a different database name to avoid any confusion.

To overcome this limit I added to any archival backup an additional archivelog backup. This will add some overhead (some archivelogs will appear twice in the archival backup) but this will allow us to run the DUPLICATE command. My procedure looks like this:

  1. before starting any backup, read and store the current SCN in a shell variable
  2. at time t_0, start the command BACKUP FULL DATABASE FORMAT … KEEP UNTIL TIME […] TAG [custom unique tag]
  3. first step (say, at time t_1), is to complete a full database backup
  4. next (say, at time t_2), rman will automatically backup any archivelog needed to recover this backup
  5. next (say, at time t_3), rman will automatically backup the spfile
  6. next (say, at time t_4), rman will automatically backup the controlfile
  7. execute alter system archive log current ;
  8. execute BACKUP ARCHIVELOG FROM SCN [SCN read just before time t_0] INCARNATION CURRENT FORMAT … KEEP UNTIL TIME […] TAG [same tag as before]

The last step will make an additional backup of archivelog backed up at step 4. This is obviusly redundat, but I choosed to tolerate this overhead to keep my backup scripts as simple as possible.

This way we will backup some more archivelogs that will let the DUPLICATE … UNTIL TIME t_4 end fine.

5) use the FRA to store archivelog in order to reduce RTO

Using the FRA to store archivelog will not conflict with these backups; recovery time would be shortened because rman will use archivelog in the FRA and with no need to restore them.

We only have to set proper values for db_recovery_file_dest and db_recovery_file_dest_size, and set log_archive_dest_1=’LOCATION=USE_DB_RECOVERY_FILE_DEST’ .

6) archival backup must not block, prevent or slow down any other “ordinary” rman backup (this requirement is strictly related to RPO)

We do care about RPO, and we will schedule archivelog backup consequently.
Since RMAN jobs can be run in parallel, we can schedule rman “ordinary” backups and rman “archival” backup without having to worry about overlapping schedules, as our archival backup will not block or
prevent other rman jobs.

However, a massive parallel archival backup may slow down concurrent operations. To avoid these issue we will not use rman’s configured degree of parallelism and we will explicitly allocate a low number of channel in archival backup scripts.

Please note that both “ordinary” rman backups and “archival” rman backups must backup the same archivelog independently: we must ensure that the archivelog backup will not delete any archivelog needed by the archival backup.

We can achieve this storing the archivelog in the FRA and backing them with ” BACKUP ARCHIVELO ALL NOT BACKED UP ” with no additional DELETE option.
Archivelog will be backed up, say, hourly and they will be only deleted when the database will think it is time to free up some space in the FRA, provided that they have already been backed up.
I will manually check that any rman full backup won’t last longer than 1 day and that any FRA is large enough to store 2-3 days of archivelog to get this setup working.
This way the archival backup shoul always find all needed archivelog still present in the FRA.

7) each archival backup must have its unique TAG

I will manually generate a tag. The shell script will contain

rman target / catalog [...]cmdfile=[...] `date +%Y%m%d%H%M` [...]

and the associate rman cmdfile will run commands like

backup [...] tag 'ARCHIVAL_BACKUP_&1'

This will let us manage easily archival backup, i.e. deleting, changing their retention, restoring/duplicating from tag, … .

8 – 9) we can accept backup job to run with a low degree of parallelism to reduce backup jobs overhead; however restore jobs must be able to use a proper, higher, degree of parallelism in order to guarantee an acceptable RTO

Restore DOP (degree of parallelism) will be limited by the number of “tapes” the backup has been spread on because RMAN will never try to restore at the same time two backusets stored on the same media, neither it will try to read the same backuppiece in parallel.
This is true regardless of using “real” tapes or “virtual” tapes.

Using a high DOP while creating the backup will almost certainly guarantee that we may use (at least) the same DOP when restoring (to be more precise: ; however running any backup job with a high DOP may not be feasible.

Splitting the backup in several backuppiece may lead us to same result if the VTL will store them on different “tapes”.
We can run “show backup” and check the Media id in its output to see whether different backuppieces are stored on different medias:

RMAN> list backup [...] ;

[...]
BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
1467620 Incr 0 506.25M SBT_TAPE 00:00:11 2020-03-29 12:36:28
        BP Key: 1467629 Status: AVAILABLE Compressed: NO Tag: FBKP
        Handle: FULL_xxxxxxxx_26usa621_1_1 Media: 11328
  List of Datafiles in backup set 1467620
  File LV Type Ckp SCN    Ckp Time            Name
  ---- -- ---- ---------- ------------------- ----
  1    0  Incr 313007859  2020-03-29 12:36:17 /oracle/system01.dbf

I used the “filesperset 1” option to split the backups and double-checked the real DOP of a massive parallel restore job; a possible enhancement is adding the “SECTION SIZE” option to achieve a much granular division and enable single-datafile restore parallelism, too.
This may be a must when using bigfile tablespace; I think it is unnecessary when backing up smallfile tablespaces consisting of dozens of datafiles.

10) RMAN backup scripts must work on Oracle 11.2 and subsequent version

I’m still not using any feature which wasn’t available with Oracle 11g.

11) run at least basic sanity check (i.e. rman validate and rman crosscheck )

In a better world, any database should have its backup working and they should be regurarly used to refresh dev/test system. In our world they need to be periodically monitored and tested to check their validity and to measure RTO.

In order to automate some checks, we may run the following:

 restore spfile validate ;
restore controlfile validate ;
restore database validate ;
restore archivelog from time ' sysdate - 36 / 24 ' validate;

Please note that restore validate will not write anything: it will only READ the backup and it will leave the database running untouched.

You might choose to run these checks with a low DOP in order not to overload the backup/storage system.

Restore archivelog all would try to restore all archivelog hence it will always raise errors for any deleted archivelog with no backup.
Since we can always find some old archivelog metadata in out catalog, such a check would fail consistently; validating archivelog in the last 36 hours running “restore archivelog from time ‘ sysdate – 36 / 24 ‘ validate” seems to be reasonable

Restore database validate will use the latest full or level 0 backup, regardless of its retention policy (long-term archival backup are exempted from the standard retention policy but can obviously be used to restore and recover the database).

Restoring the whole database without parallelism can take some time. Consider running it
– only once a week
– with increased parallelism
– only on a subset of the database, i.e. “restore datafile 1,2 validate” or “restore tablespace SYSTEM,SYSAUX validate”

An additional check can include:

 crosscheck archivelog all ;
 crosscheck backup ;
 crosscheck copy ;
 list expired backup;
 list expired copy ;

Keep in mind that such a check is only useful if we inspect its logs rather than its exit code. That’s why I execute the following:

rman target / catalog [...] cmdfile=[...]
RETCODE=$?

#look for RMAN errors in the log file
if [[ ${RETCODE} -eq 0 ]] ; then
  echo
  echo "look for ^RMAN-12345 messages in log file"
  echo
  egrep -q "^RMAN-([0-9]){5}" ${LOG_FILE}
  if [[ $? -eq 0 ]] ; then
    RETCODE=2
    echo "ERROR(s) found:"
    egrep "^RMAN-([0-9]){5}" ${LOG_FILE}
    echo
  else
    echo "INFO: no RMAN-xxxxx message in log file"
  fi
fi

#look for expired backup:
egrep -q "^validation failed for archived log|^crosschecked backup piece: found to be 'EXPIRED'" ${LOG_FILE}
if [[ $? -eq 0 ]] ; then
  echo "ERROR: found EXPIRED backup or MISSING archivelog"
  RETCODE=3
fi

[...]

exit ${RETCODE}

12) backup and log should contain as much information as possible to restore the database and its configuration.

I added a “show all;” command in any rman configuration script. This way backup logs will also show how channel(s) were allocated, including any ENV=() parameter used.

13) Extra RMAN considerations

What about rman optimization and controlfile autobackup?

I choosed not to use rman optimization: I don’t plan to have huge read-only tablespace and I’m not sure whether these optimizations could conflict with archival backup. I feel safer with no automatic optimization in place, as backup optimization could exclude some datafiles from backups.

I also choosed not to use controlfile autobackup to avoid a few bugs on older, unpatches systems.

See bugs:

  1. bug 9689727 – AUTOBACKUP CONTROLFILE SEQUENCE REUSED CAUSING LOSS OF PREVIOUS CONTROLFILE
  2. bug 14605621 – RESTORE SPFILE FROM AUTOBACKUP INCONSISTENT BEHAVIOUR
  3. bug 26670003 – CONTROLFILE AUTOBACKUP IS NOT COMPRESSED WHEN COMPRESSION IS CONFIGURED
  4. bug 14053422 – DATABASE BACKUP PLUS ARCHIVELOG WITH CONTROLFILE AUTOBACKUP NOT ENOUGH FOR DUPLI

14) the RMAN archival backup command

It took quite a long time, and it ended up with quite a short rman script. Here you can find the rman archival backuop command. a long post rman script will look like this:
sql ‘alter session set NLS_DATE_FORMAT=”dd/mm/yyyy hh24:mi:ss” ‘ ;

show all ;

sql 'alter session set NLS_DATE_FORMAT="dd/mm/yyyy hh24:mi:ss" ' ;
show all ;
run
{
  # Explicit channel allocation to use a lower DOP with respect to current rman configuration
allocate channel c1 device type 'SBT_TAPE' parms 'ENV=(TDPO_OPTFILE=/home/oracle/scripts/ORCL/TSM/TDP/ORCL_tdpo.opt)' ;
  # I will use an input variable to TAG this backup.
  # IMPORTANT: if no input variable is used, rman will ask for it and wait forever!
  #
  # I will do a FULL backup rather than a LEVEL 0 backup because this
  # backup will not be used in any incremental backup strategy
  #
  backup
   full filesperset 3
   database
   format '%d_%T_%u_total_1_year' tag 'FULL_1YEAR_&1' keep until time = 'SYSDATE+400' ;
  # The full backup can be used to restore the database, but it will not
  # be sufficient to run a duplicate: the backup command will:
  # - backup database at time t0
  # - backup required archivelog at time t1
  # - backup spfile at time t2
  # - backup controlfile at time t3
  # hence it will never contain the redo stream between time t1 and t2.
  # This will force any duplicate to fail:
  # - duplicate until time t1 will fail with "no controlfile backup found"
  # - duplicate until time t3 will fail with "no backup of log X found"
  # As a workaround, I will make an additional, partially redundant log backup at time t4
  # The backup will use a second input variable to only backup recent archivelog

  alter system archive log current ;
  backup
   archivelog from scn &2 incarnation current
   format '%d_%T_%u_total_1_year' tag 'FULL_1YEAR_&1' keep until time = 'SYSDATE+400' ;
  release channel c1 ;
}

15) All the scripts

Here you can find all the scripts. They should just work out-of-the box with a few modifications and they contain quite a lot of comments.

Posted in oracle | Leave a comment

Mount, browse, edit content of OVM “.img” vDisks

Should you need to mount, browse or edit some OVM VirtualDisk, here are a few recipes.

I tested these steps on OVM (Oracle VM for x86) 3.4.2.

As a first step we need to find the partition list of the .img file. We will use fdisk with the -l option (to have the partition table listed) and the -u option (to use secor as unit measure):


[root@*****-ovm01 ~]# fdisk -l -u /OVS/Repositories/0004fb0000030000d0e89f3e18de419b/VirtualDisks/0004fb00001200002f0efdef9a5a9f09.img

You must set cylinders.

You can do this from the extra functions menu.

Disk /OVS/Repositories/0004fb0000030000d0e89f3e18de419b/VirtualDisks/0004fb00001200002f0efdef9a5a9f09.img: 0 MB, 0 bytes

255 heads, 63 sectors/track, 0 cylinders, total 0 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x86688668

Device&nbsp; Boot&nbsp;&nbsp;&nbsp; Start&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; End&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Blocks&nbsp;&nbsp; Id&nbsp; System

/OVS/Repositories/0004fb0000030000d0e89f3e18de419b/VirtualDisks/0004fb000012000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 63&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 208844&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 104391&nbsp;&nbsp; 83&nbsp; Linux

/OVS/Repositories/0004fb0000030000d0e89f3e18de419b/VirtualDisks/0004fb000012000&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 208845&nbsp;&nbsp;&nbsp; 83875364&nbsp;&nbsp;&nbsp; 41833260&nbsp;&nbsp; 8e&nbsp; Linux LVM

Partition 2 has different physical/logical endings:

phys=(1023, 254, 63) logical=(5220, 254, 63)

[root@*****-ovm01 ~]#

In this case sector size is 512, partition 1 starts at sector 63, and partition 1 is used with a regular filesystem (no LVM), so I can try to mount it using


[root@*****-ovm01 ~]# mount -t auto -o loop,offset=$((512*63)) /OVS/Repositories/0004fb0000030000d0e89f3e18de419b/VirtualDisks/0004fb00001200002f0efdef9a5a9f09.img /mnt/tmp/

[root@*****-ovm01 ~]#

Now the partition has been mounted and you can read, change, unmount it with the standard commands:

[root@*****-ovm01 ~]# mount | grep /OVS/Repositories/0004fb0000030000d0e89f3e18de419b/VirtualDisks/0004fb00001200002f0efdef9a5a9f09.img

/OVS/Repositories/0004fb0000030000d0e89f3e18de419b/VirtualDisks/0004fb00001200002f0efdef9a5a9f09.img on /mnt/tmp type ext3 (rw,loop=/dev/loop34,offset=32256)

[root@*****-ovm01 ~]#

[root@*****-ovm01 ~]# df -h /mnt/tmp/

Filesystem            Size  Used Avail Use% Mounted on

/OVS/Repositories/0004fb0000030000d0e89f3e18de419b/VirtualDisks/0004fb00001200002f0efdef9a5a9f09.img

98M   36M   57M  39% /mnt/tmp

[root@*****-ovm01 ~]#

[root@*****-ovm01 ~]# umount /mnt/tmp/

[root@*****-ovm01 ~]#
Posted in Uncategorized | Leave a comment

How to use the public OVM3 repository to update your OVM cluster

If you bought OVM support you have access to Oracle ULN (Unbreakable Linux Network) and you can configure a local YUM server as per OVM 3 documentation (see the chapter “Updating and Upgrading Oracle VM Servers” http://docs.oracle.com/cd/E26996_01/E18549/html/BABDDEGC.html ).

However, if you choose not to buy OVM support, you can still access latest OVM server patches using http://public-yum.oracle.com/repo/OracleVM/OVM3/latest/x86_64/ as YUM Base URL and http://public-yum.oracle.com/RPM-GPG-KEY-oracle-el5 as YUM GPG Key.
If you want to build a local YUM server to mirror the public OVM3 YUM server, yuo can try to follow this HowTo.

First of all, install required packages. I will use httpd (Apache) as an http server and YUM tools to replicate the repository:

yum install httpd
chkconfig httpd on
service httpd start
mkdir -p /var/www/html/yum
yum install createrepo yum-utils

I will configure YUM to use the OVM3 public YUM repository:

cat /etc/yum.repos.d/public-yum-ovm3.repo
[ovm3_latest]
name=Oracle VM Server 3 Latest ($basearch)
baseurl=http://public-yum.oracle.com/repo/OracleVM/OVM3/latest/x86_64/
gpgkey=http://public-yum.oracle.com/RPM-GPG-KEY-oracle-el5
gpgcheck=1
enabled=1

You can check that it is correctly configured using yum repolist:

# yum repolist -v ovm3_latest
Not loading "rhnplugin" plugin, as it is disabled
Loading "refresh-packagekit" plugin
Loading "security" plugin
Config time: 0.012
Yum Version: 3.2.29
Setting up Package Sacks
pkgsack time: 0.032
Repo-id : ovm3_latest
Repo-name : Oracle VM Server 3 Latest (x86_64)
Repo-status : enabled
Repo-updated : Mon Mar 17 22:13:33 2014
Repo-pkgs : 568
Repo-size : 1.1 G
Repo-baseurl : http://public-yum.oracle.com/repo/OracleVM/OVM3/latest/x86_64/
Repo-expire : 21,600 second(s) (last: Mon Mar 31 15:08:17 2014)
Repo-excluded: 404
repolist: 568
#

and you can manually clone it:

# /usr/bin/reposync -r ovm3_latest -p /var/www/html/yum/public/

will download all the packages from the ovm3_latest yum channel to the /var/www/html/yum/public/ovm3_latest/getPackage/ directory and

# /usr/bin/createrepo /var/www/html/yum/public/ovm3_latest/getPackage/

will build the repodata subdirectory.

You can configure any YUM client to use this repository using a .repo config file like this:

Thus you can navigate to “Tools and Resources” -> “Server Update Management (YUM)” and set “YUM Base URL” to “http://<ovm_server>/yum/public/ovm3_latest/getPackage/&#8221;, “Enable GPG Key” to “No” to have you local ovm3 yum server working

cat /etc/yum.repos.d/my-ovm3-local-yum-server.repo
[my_ovm3_mirror]
name=Oracle VM Server 3 Latest ($basearch)
baseurl=http://<my_server>/yum/public/ovm3_latest/getPackage/
gpgcheck=0
enabled=1

Thus you can navigate to “Tools and Resources” -> “Server Update Management (YUM)” and set “YUM Base URL” to “http://<ovm_server>/yum/public/ovm3_latest/getPackage/&#8221;, “Enable GPG Key” to “No” to have you local ovm3 yum server working

 

Posted in Uncategorized | Leave a comment

ACS (adaptive cursor sharing) and SPM (sql plan management, aka sql plan stability) together

I wanted to check how well Oracle’s adaptive cursor sharing and sql plan management (plan stability) interact.

The first is intended not to share the same plan for when different values of the binded variables would cause the CBO to choose different plans (i.e. for skewed data distributions), the second one is intended to provide plan stability.

This point has already been eplained in many posts (like this one https://blogs.oracle.com/optimizer/entry/how_do_adaptive_cursor_sharing ), but I wanted to write down my own example to be sure I understood it 🙂

First of all, let’s build a small test table with skewed data and check if ACS is working.

I’m using a little Win7 box running Oracle 11.2.0.3.0

C:\>sqlplus andrea/andrea

SQL*Plus: Release 11.2.0.3.0 Production on Thu Oct 10 19:48:41 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> set linesize 140
SQL> set pagesize 100
SQL>
SQL> column is_obsolete       format a9
SQL> column is_bind_sensitive format a9
SQL> column is_bind_aware     format a9
SQL> column is_shareable      format a9
SQL>
SQL>
SQL> drop   table t_acs_spm purge ;

Table dropped.

SQL> create table t_acs_spm (c1 integer, c2 integer) ;

Table created.

SQL> insert into  t_acs_spm (select 1, level from dual connect by level <=     1 );

1 row created.

SQL> insert into  t_acs_spm (select 2, level from dual connect by level <=     1 );

1 row created.

SQL> insert into  t_acs_spm (select 3, level from dual connect by level <=     1 );

1 row created.

SQL> insert into  t_acs_spm (select 4, level from dual connect by level <=     1 );

1 row created.

SQL> insert into  t_acs_spm (select 5, level from dual connect by level <=     1 );

1 row created.

SQL> insert into  t_acs_spm (select 6, level from dual connect by level <=     1 );

1 row created.

SQL> insert into  t_acs_spm (select 7, level from dual connect by level <=     1 );

1 row created.

SQL> insert into  t_acs_spm (select 8, level from dual connect by level <= 29992 );

29992 rows created.

SQL> insert into  t_acs_spm (select 9, level from dual connect by level <=     1 );

1 row created.

SQL> commit ;

Commit complete.

SQL>
SQL> create index i_acs_spm on t_acs_spm(c1) ;

Index created.

SQL>
SQL> begin
2    dbms_stats.gather_table_stats(
3     ownname          => user,
4     tabname          => 'T_ACS_SPM',
5     estimate_percent => 100,
6     cascade          => true,
7     method_opt       => 'for all indexed columns size 254');
8  end;
9  /

PL/SQL procedure successfully completed.

SQL>
SQL> select c1, count(*) from t_acs_spm group by c1 order by c1 ;

C1   COUNT(*)
---------- ----------
1          1
2          1
3          1
4          1
5          1
6          1
7          1
8      29992
9          1

9 rows selected.

SQL>

Let’s chek that ACS is used:

SQL> variable i number
SQL> exec :i := 1

PL/SQL procedure successfully completed.

SQL>
SQL> select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i ;

AVG(C2)
----------
1

SQL>
SQL> select * from table( dbms_xplan.display_cursor() ) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------

SQL_ID  41daq1cmgdy21, child number 0
-------------------------------------
select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i

Plan hash value: 1326733081

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |           |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_ACS_SPM |     1 |     8 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | I_ACS_SPM |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("C1"=:I)


20 rows selected.

SQL>
SQL> select sql_id, child_number, is_obsolete, is_bind_sensitive, is_bind_aware, is_shareable, plan_hash_value
2   from  v$sql v
3   where sql_id = '41daq1cmgdy21'
4   order by last_load_time, child_number ;

SQL_ID        CHILD_NUMBER IS_OBSOLE IS_BIND_S IS_BIND_A IS_SHAREA PLAN_HASH_VALUE
------------- ------------ --------- --------- --------- --------- ---------------
41daq1cmgdy21            0 N         Y         N         Y              1326733081

SQL> 

Now I will run the same sql statement two more times with a different bind value: after the first execution the CBO will know that this execution’s statistics do not fit with the previous statistichs, hence the second execution will
pick the bind variable’s new value and re-evaluate a new plan.
Moreoever, if we move back to different bind values the plan still change. Good!

SQL>
SQL> exec :i := 8

PL/SQL procedure successfully completed.

SQL>
SQL> select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i ;

AVG(C2)
----------
14996.5

SQL>
SQL> select * from table( dbms_xplan.display_cursor() ) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------

SQL_ID  41daq1cmgdy21, child number 0
-------------------------------------
select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i

Plan hash value: 1326733081

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |           |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_ACS_SPM |     1 |     8 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | I_ACS_SPM |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("C1"=:I)


20 rows selected.

SQL>
SQL> select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i ;

AVG(C2)
----------
14996.5

SQL>
SQL> select * from table( dbms_xplan.display_cursor() ) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------

SQL_ID  41daq1cmgdy21, child number 1
-------------------------------------
select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i

Plan hash value: 588473228

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |       |       |    17 (100)|          |
|   1 |  SORT AGGREGATE    |           |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T_ACS_SPM | 29992 |   234K|    17   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("C1"=:I)


19 rows selected.

SQL>
SQL> select sql_id, child_number, is_obsolete, is_bind_sensitive, is_bind_aware, is_shareable, plan_hash_value
2   from  v$sql v
3   where sql_id = '41daq1cmgdy21'
4   order by last_load_time, child_number ;

SQL_ID        CHILD_NUMBER IS_OBSOLE IS_BIND_S IS_BIND_A IS_SHAREA PLAN_HASH_VALUE
------------- ------------ --------- --------- --------- --------- ---------------
41daq1cmgdy21            0 N         Y         N         N              1326733081
41daq1cmgdy21            1 N         Y         Y         Y               588473228

SQL> exit

Now we connect again and we see that we may go back to the index range scan for rare bind values.
In this case I will only have two child cursors because my histogram shows only two kind on cardinalities for rare bind values.
You should be aware that you may end up with several child cursors for the same access plan when the CBO estimates different cardianlities
for different bind values, event if it ends up with the same access plan.

C:\>sqlplus andrea/andrea

SQL*Plus: Release 11.2.0.3.0 Production on Thu Oct 10 19:48:45 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> set linesize 140
SQL> set pagesize 100
SQL>
SQL> column is_obsolete       format a9
SQL> column is_bind_sensitive format a9
SQL> column is_bind_aware     format a9
SQL> column is_shareable      format a9
SQL>
SQL> variable i number
SQL> exec :i := 2

PL/SQL procedure successfully completed.

SQL>
SQL> select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i ;

AVG(C2)
----------
1

SQL>
SQL> select * from table( dbms_xplan.display_cursor() ) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------

SQL_ID  41daq1cmgdy21, child number 2
-------------------------------------
select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i

Plan hash value: 1326733081

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |           |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_ACS_SPM |     1 |     8 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | I_ACS_SPM |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("C1"=:I)


20 rows selected.

SQL>
SQL> select sql_id, child_number, is_obsolete, is_bind_sensitive, is_bind_aware, is_shareable, plan_hash_value
2   from  v$sql v
3   where sql_id = '41daq1cmgdy21'
4   order by last_load_time, child_number ;

SQL_ID        CHILD_NUMBER IS_OBSOLE IS_BIND_S IS_BIND_A IS_SHAREA PLAN_HASH_VALUE
------------- ------------ --------- --------- --------- --------- ---------------
41daq1cmgdy21            0 N         Y         N         N              1326733081
41daq1cmgdy21            1 N         Y         Y         Y               588473228
41daq1cmgdy21            2 N         Y         Y         Y              1326733081

SQL>
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

C:\>sqlplus andrea/andrea

SQL*Plus: Release 11.2.0.3.0 Production on Thu Oct 10 19:48:45 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> set linesize 140
SQL> set pagesize 100
SQL>
SQL> column is_obsolete       format a9
SQL> column is_bind_sensitive format a9
SQL> column is_bind_aware     format a9
SQL> column is_shareable      format a9
SQL>
SQL> variable i number
SQL> exec :i := 3

PL/SQL procedure successfully completed.

SQL>
SQL> select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i ;

AVG(C2)
----------
1

SQL>
SQL> select * from table( dbms_xplan.display_cursor() ) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------

SQL_ID  41daq1cmgdy21, child number 2
-------------------------------------
select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i

Plan hash value: 1326733081

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |           |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_ACS_SPM |     1 |     8 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | I_ACS_SPM |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("C1"=:I)


20 rows selected.

SQL>
SQL> select sql_id, child_number, is_obsolete, is_bind_sensitive, is_bind_aware, is_shareable, plan_hash_value
2   from  v$sql v
3   where sql_id = '41daq1cmgdy21'
4   order by last_load_time, child_number ;

SQL_ID        CHILD_NUMBER IS_OBSOLE IS_BIND_S IS_BIND_A IS_SHAREA PLAN_HASH_VALUE
------------- ------------ --------- --------- --------- --------- ---------------
41daq1cmgdy21            0 N         Y         N         N              1326733081
41daq1cmgdy21            1 N         Y         Y         Y               588473228
41daq1cmgdy21            2 N         Y         Y         Y              1326733081

SQL>

Wow, ACS is working: we have one sql statement, one sql_id, and two plans! You could alse check that v$sql now has three entries for the same sql_id: che first entry is not bind-aware, the second and third entries are bind-aware.

Now I will first build a SQL plan baseline with only one plan to see what happens.


SQL> select count(*) from dba_sql_plan_baselines ;

COUNT(*)
----------
0

SQL> variable n number
SQL> exec :n := dbms_spm.load_plans_from_cursor_cache( sql_id => '41daq1cmgdy21', plan_hash_value => 1326733081)

PL/SQL procedure successfully completed.

SQL> print n

N
----------
1

SQL> select sql_handle, enabled, accepted, fixed from dba_sql_plan_baselines;

SQL_HANDLE                     ENA ACC FIX
------------------------------ --- --- ---
SQL_11b34a703c16c179           YES YES NO

SQL>

Ok, we have one baseline with only one plan.

Now we will have plan stability:

C:\>sqlplus andrea/andrea

SQL*Plus: Release 11.2.0.3.0 Production on Thu Oct 10 20:02:19 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> set linesize 140
SQL> set pagesize 100
SQL> variable i number
SQL> exec :i := 1

PL/SQL procedure successfully completed.

SQL>
SQL> select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i ;

AVG(C2)
----------
1

SQL> select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i ;

AVG(C2)
----------
1

SQL> select * from table( dbms_xplan.display_cursor() ) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------

SQL_ID  41daq1cmgdy21, child number 1
-------------------------------------
select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i

Plan hash value: 1326733081

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |           |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_ACS_SPM |     1 |     8 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | I_ACS_SPM |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("C1"=:I)

Note
-----
- SQL plan baseline SQL_PLAN_13cuaf0y1dhbtf6b951ca used for this statement


24 rows selected.

SQL>
SQL> select 1 from dual /* to close the current cursor */ ;

1
----------
1

SQL> exec :i := 8

PL/SQL procedure successfully completed.

SQL> select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i ;

AVG(C2)
----------
14996.5

SQL> select * from table( dbms_xplan.display_cursor() ) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------

SQL_ID  41daq1cmgdy21, child number 1
-------------------------------------
select /*my_sql_stm*/ avg(c2) from t_acs_spm where c1 = :i

Plan hash value: 1326733081

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |       |       |   113 (100)|          |
|   1 |  SORT AGGREGATE              |           |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_ACS_SPM | 29992 |   234K|   113   (1)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | I_ACS_SPM | 29992 |       |    59   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("C1"=:I)

Note
-----
- SQL plan baseline SQL_PLAN_13cuaf0y1dhbtf6b951ca used for this statement


24 rows selected.

SQL>

(plese note that I am runnig every sql statement more times just to prevent dbms_xplan.display to hit any ORA-01403 error).

OK, we do have plan stability!
If we check dba_sql_plan_baselines we find one more non accepted plan:

SQL> select sql_handle, enabled, accepted, fixed from dba_sql_plan_baselines;

SQL_HANDLE                     ENA ACC FIX
------------------------------ --- --- ---
SQL_11b34a703c16c179           YES NO  NO
SQL_11b34a703c16c179           YES YES NO

SQL>

Now I wonder… should dbms_spm.evolve_sql_plan_baseline automagically accept the new plan? Will it be able to re-execute the sql with the right bind values?

Let’s even try to flush the shared pool before evolving the baseline:

SQL> alter system flush shared_pool ;

System altered.

SQL> alter system flush shared_pool ;

System altered.

SQL> alter system flush shared_pool ;

System altered.

SQL> alter system flush shared_pool ;

System altered.

SQL> alter system flush shared_pool ;

System altered.

SQL> select sql_handle, enabled, accepted, fixed from dba_sql_plan_baselines;

SQL_HANDLE                     ENA ACC FIX
------------------------------ --- --- ---
SQL_11b34a703c16c179           YES NO  NO
SQL_11b34a703c16c179           YES YES NO

SQL> alter system flush shared_pool ;

System altered.

SQL> alter system flush shared_pool ;

System altered.

SQL> alter system flush shared_pool ;

System altered.

SQL>
SQL> set long 99999
SQL> select DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE( sql_handle => 'SQL_11b34a703c16c179' ) from dual ;

DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE(SQL_HANDLE=>'SQL_11B34A703C16C179')
--------------------------------------------------------------------------------

-------------------------------------------------------------------------------
Evolve SQL Plan Baseline Report
-------------------------------------------------------------------------------

Inputs:
-------
SQL_HANDLE = SQL_11b34a703c16c179
PLAN_NAME  =
TIME_LIMIT = DBMS_SPM.AUTO_LIMIT
VERIFY     = YES
COMMIT     = YES

Plan: SQL_PLAN_13cuaf0y1dhbtbac135ad
------------------------------------
Plan was verified: Time used .281 seconds.
Plan passed performance criterion: 1.89 times better than baseline plan.
Plan was changed to an accepted plan.

Baseline Plan      Test Plan       Stats Ratio
-------------      ---------       -----------
Execution Status:              COMPLETE       COMPLETE
Rows Processed:                       1              1
Elapsed Time(ms):                12.749          2.816              4.53
CPU Time(ms):                    12.133          3.466               3.5
Buffer Gets:                        113             60              1.88
Physical Read Requests:               0              0
Physical Write Requests:              0              0
Physical Read Bytes:                  0              0
Physical Write Bytes:                 0              0
Executions:                           1              1

-------------------------------------------------------------------------------
Report Summary
-------------------------------------------------------------------------------
Number of plans verified: 1
Number of plans accepted: 1


SQL>
SQL> select sql_handle, enabled, accepted, fixed, plan_name from dba_sql_plan_baselines ;

SQL_HANDLE                     ENA ACC FIX PLAN_NAME
------------------------------ --- --- --- ------------------------------
SQL_11b34a703c16c179           YES YES NO  SQL_PLAN_13cuaf0y1dhbtbac135ad
SQL_11b34a703c16c179           YES YES NO  SQL_PLAN_13cuaf0y1dhbtf6b951ca

SQL>

Wow! The new plan has been accepted! Nice work!

Posted in oracle, Uncategorized | Tagged | Leave a comment

Single Instance HA with Oracle Grid Infrastructure 11gR2

I just configured an Oracle cluster with Oracle Grid Infrastructure 11.2.0.3 for single-instance high availability (cold failover cluster) and I decided to write down what I did.

I choose to use Oracle Grid Infrastructure:

  • because it is supported (but you have to care about the clustering scripts: see MOS note 790189.1)
  • because it is free if you use it to protect some licensed Oracle product (see the “clusterware” paragraph in Oracle Database Licensing Information 11g Release 2 (11.2))
  • because it is powerful: you can configure dependencies, affinities, anti-affinities, load balancing, … I will not use all of these features, but it is nice to know that we have such a powerful tool.

The first step is to install Oracle GI 11gR2. I will not explain this point because it is a quite ordinary job.
I will use a “grid” user to install the GI and an “oracle” user to manage the database, so I will use the “job role separation” installation explained in Oracle® Grid Infrastructure Installation Guide 11g Release 2 (11.2) for Linux – Advanced Installation Oracle Grid Infrastructure for a Cluster Preinstallation Tasks.
This is not mandatory, but since the database will be manually clustered I think this setup would be more appropriate.
I will use Oracle Grid Infrastructure 11.2.0.3 with the latest PSU, but this example should apply to a generic 11gR2 setup.

The second step is RTFM.

The third step is to read again chapter 6 of TFM.

Now we can start to play with the design.
We will need:

  1. a shared (global) storage, or a clustered storage
  2. a single-instance database installation, on every cluster node
  3. a clustered virtual IP (VIP)
  4. a clustered listener to listen on the VIP
  5. a clustered single-instance database
  6. a clustered filesystem to host backups
  7. a simple way to start and stop all the clustered resources on the same node
  8. (optional) an Enterprise Manager Database Console

1. Shared (global) storage, or clustered storage

We have some possible choises:

  • Grid Infrastructure’s ASM storage
  • OCFS2 (a clustered filesystem always shared between all cluster nodes
  • A simple filesystem, manually clustered

I will go with the first option. I like ASM, we already have it working (it’s included in the GI), and we will see that it helps us avoiding the “split-brain syndrome”. You only need to configure a disk group.

A simple, non cluster-aware, filesystem is a possibile choice too but

  • you will have to cluster it
  • if you wrongly mount the same device on both nodes you will probably corrupt the filesystem in a few seconds.

This is why I prefer to place the database inside ASM.
We may decide to use a simple filesystem to host RMAN backup and export to make it easy to copy them, but we may explore this point later.

2. Single instance database installation

Simply install Oracle Database software on every cluster node.
We could also install Oracle Database software in a shared location. This option would allow you to cluster the database console as per MOS note 362524.1, but I prefer to keep the binaries in local volumes. If you really want the dbconsole you can try to configure the dbconsole in an unsupported configuration (using http -not https- on the database VIP) or you could use the (free) Enterprise Manager Grid Infrastructure to manage your clustered database.

I installed Oracle database software on both nodes. Both installation must be identical. After the installation do check that $ORACLE_HOME/bin/oracle is owned by oracle:asmadmin with permissions set to -rwsr-s--x to avoid problems described in MOS note 1054033.1 .

You should repeat this check after every patch installation on the database home.

3. A clustered virtual IP

We will configure a virtual IP which will be used to connect to the database. Following the documentation, you just have to choose an IP address and a network name, ensure that your VIP’s name (in my case mydw-mpls-vip) can be resolved consistently from all cluster nodes and from all clients in your network, and then run as root
# /u01/app/11.2.0/grid/bin/appvipcfg create -network=2 -ip=192.168.236.143 -vipname=mydw.vip -user=grid.

Note that you may have to change -network=2 into -network=1 to suite your network number and that -vipname does not need to be the network name associated with the VIP, but since it will be the name of the cluster resource it should be at least similar to it.

4. a listener to listen on the VIP

Create a $ORACLE_HOME/network/admin/listener.ora on every cluster node like this:

LISTENER_MYDW =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS =
        (PROTOCOL = TCP)
        (HOST = mydw-mpls-vip)
        (PORT = 1522)
      )
      (ADDRESS =
        (PROTOCOL = IPC)
        (KEY = EXTPROC1522)
      )
    )
  )

ADR_BASE_LISTENER_MYDW = /u01/app/oracle

If you use netca you will be asked whether you want to configure a cluster listener in the Grid Infrastructure home. I didn’t do this because I want a simple listener to run on a single node, listening on a single IP address.

To let the clusterware manage the listener we need an action script.
The action script must accept a single parameter to start|stop|check|clean a resource.
It would be nice to build a generic, parametrized action script for listeners but we cannot add custom attributes to cluster resources. We may follow Andrejs Karpovs’s way but application resources are deprecated in 11gR2, so I preferred a different approach.

I will use:

  • a generic script called lsnr.sh owned by oracle to receive some parameters and start|stop|check|clean a listener
  • a wrapper script owner by grid to do sudo -u oracle lsnr.sh ORA_HOME LSNR_NAME and interpret the result
  • a master script to set some variables (ORA_OWNER, ORA_HOME, LSNR_NAME) and call the wrapper

In this way I will use the master script to start|stop|check|clean the resource, but I will put all the code logic in the generic parametrized scripts lsnr.sh and lsnr_wrapper.sh .

And here they are:

lsnr.sh:

#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# lsnr.sh - action script for listener
# this script MUST receive three inputs:
#  $1 = ORACLE_HOME
#  $2 = LISTENER_NAME
#  $3 = start|stop|check|clean
# This script will assume that the ORACLE_OWNER is the current user
# This script will attempt to do $ORACLE_HOME/bin/lsnrctl ...
# and will not interpret the result

AWK=/bin/awk
DATE=/bin/date
ECHO=/bin/echo
EGREP=/bin/egrep
ENV=/bin/env
GREP=/bin/grep
KILL=/bin/kill
PS=/bin/ps
RM=/bin/rm
SORT=/bin/sort
SUDO=/usr/bin/sudo
TOUCH=/bin/touch
WC=/usr/bin/wc
WHOAMI=/usr/bin/whoami

if [ ! $# -eq 3 ]
then
  ${ECHO} "Syntax: $0 ORA_HOME LSNR_NAME ACTION"
  exit 666
fi

RESOURCE_NAME=${_CRS_NAME}

ORACLE_OWNER=`${WHOAMI}`
ORACLE_HOME=$1 ; export ORACLE_HOME
LISTENER_NAME=$2
ACTION=$3

case "${ACTION}" in
  start)
    ${ORACLE_HOME}/bin/lsnrctl start  ${LISTENER_NAME}
    RSTAT=$?
    exit ${RSTAT}
    ;;

  stop)
    ${ORACLE_HOME}/bin/lsnrctl stop   ${LISTENER_NAME}
    RSTAT=$?
    exit ${RSTAT}
    ;;

  check)
    PROCESS_COUNT=`${PS} -ef | ${GREP} "^${ORACLE_OWNER} " | ${GREP} " ${ORACLE_HOME}/bin/tnslsnr ${LISTENER_NAME}" | ${WC} -l`
    if [ "${PROCESS_COUNT}" = "0" ]
    then
      # no process found - exit
      RSTAT=1
    else
      RSTAT=`${ORACLE_HOME}/bin/lsnrctl status ${LISTENER_NAME} | ${GREP} TNS- | ${WC} -l`
    fi
    exit ${RSTAT}
    ;;

  clean)
    # to clean we try to gracefully stop the listener and afterward we try to KILL it
    ${ORACLE_HOME}/bin/lsnrctl stop   ${LISTENER_NAME}
    TNSLSNR_PID=`${PS} -ef | ${GREP} "^${ORACLE_OWNER} " | ${GREP} " ${ORACLE_HOME}/bin/[t][n][s]lsnr ${LISTENER_NAME} " | ${AWK} ' { print \$2 } ' `
    if [ ! "x${TNSLSNR_PID}" = "x" ]
    then
      ${ECHO} "killing process ${TNSLSNR_PID} !!!"
      ${KILL} -9 ${TNSLSNR_PID}
    else
      ${ECHO} "tnslsnr process not found"
    fi
    exit 0
    ;;

  *)
    echo "Usage: $0 ORA_HOME LISTENER_NAME start|stop|check|clean, not $0 $*"
    exit 1
    ;;
esac

lsnr_wrapper.sh:

#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# lsnr_wrapper.sh - action script for listener (wrapper)
# this script MUST receive four inputs:
#  $1 = ORACLE_OWNER
#  $2 = ORACLE_HOME
#  $3 = LISTENER_NAME
#  $4 = start|stop|check|clean
# and will attempt to do sudo -u $ORA_OWNER /opt/cluster/scripts/lsnr.sh $ORA_HOME $LISTENER_NAME
# and to interpret the result

DATE=/bin/date
ECHO=/bin/echo
EGREP=/bin/egrep
ENV=/bin/env
GREP=/bin/grep
RM=/bin/rm
SORT=/bin/sort
SUDO=/usr/bin/sudo
TOUCH=/bin/touch

if [ ! $# -eq 4 ]
then
  ${ECHO} "Syntax: $0 ORA_OWNER ORA_HOME LSNR_NAME ACTION"
  exit 666
fi

RESOURCE_NAME=${_CRS_NAME}
TRACE_FILE_PATH=/opt/cluster/scripts/lsnr_wrapper.${RESOURCE_NAME}.out
TRACE=1

ORA_OWNER=$1
ORA_HOME=$2
ORA_LISTENER_NAME=$3

if [ "${TRACE}" = "1" ]
then
  ${ECHO} " "                                                                                           >>{TRACE_FILE_PATH}
  ${ECHO} "================================================================="                           >>${TRACE_FILE_PATH}
  ${ECHO} " "                                                                                           >>${TRACE_FILE_PATH}
  ${ECHO} "ORA_OWNER=${ORA_OWNER}"                                                                      >>${TRACE_FILE_PATH}
  ${ECHO} "ORA_HOME=${ORA_HOME}"                                                                        >>${TRACE_FILE_PATH}
  ${ECHO} "ORA_LISTENER_NAME=${ORA_LISTENER_NAME}"                                                      >>${TRACE_FILE_PATH}
  ${ECHO} " "                                                                                           >>${TRACE_FILE_PATH}
fi

case "$4" in
  start)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                          >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} start
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} start returned ${RSTAT}" >> ${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                                             >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  stop)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP  resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                          >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} stop
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} stop  returned ${RSTAT}" >> ${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                                             >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  check)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                      >>${TRACE_FILE_PATH}
     #${ECHO} "CRS env follows"                                                                       >>${TRACE_FILE_PATH}
     #$ENV | $GREP ^_CRS_ | $SORT                                                                     >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} check
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} check returned ${RSTAT}" >> ${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                                             >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  clean)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                          >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} clean
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} clean returned ${RSTAT}" >> ${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                                             >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  *)
    echo "Usage: $0 ORA_OWNER ORA_HOME LISTENER_NAME start|stop|check|clean , not $0 $*"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' called with no option for resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                                   >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                                    >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                                     >>${TRACE_FILE_PATH}
    fi
    exit 1
    ;;
esac

lsnr_LISTENER_MYDW:

#!/bin/sh
# this script only sets the variables variabili ORA_HOME, ORA_LISTENER_NAME, ORA_OWNER
# before calling lsnr_wrapper.sh

ECHO=/bin/echo

if [ ! $# -eq 1 ]
then
  ${ECHO} "Syntax: $0 ACTION"
  exit 666
fi

ORA_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
ORA_LISTENER_NAME=LISTENER_MYDW
ORA_OWNER=oracle
ACTION=$1

/opt/cluster/scripts/lsnr_wrapper.sh ${ORA_OWNER} ${ORA_HOME} ${ORA_LISTENER_NAME} ${ACTION}

Since my scripts will do sudo -u oracle (because I want oracle processes to run as the “oracle” user) I added the following lines to /etc/sudoers on both nodes:

Defaults:grid !requiretty
grid ALL = (oracle) NOPASSWD: /opt/cluster/scripts/lsnr.sh

Now we can add the listener to the cluster using

crsctl add resource mydw.lsnr -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/lsnr_LISTENER_MYDW.sh, CHECK_INTERVAL=30 RESTART_ATTEMPTS=3, AUTO_START=restore, CARDINALITY=1, DEGREE=1, OFFLINE_CHECK_INTERVAL=60, SCRIPT_TIMEOUT=60, START_TIMEOUT=60, STOP_TIMEOUT=60, START_DEPENDENCIES='hard(mydw.mpls-vip.vip) pullup(mydw.mpls-vip.vip)', STOP_DEPENDENCIES='hard(mydw.mpls-vip.vip)'"

and we can manage it using crsctl start|stop|status resource mydw.lsnr.

5. A clustered single-instance database

Create the database on just one node without configuring the dbconsole.

After database creation, use srvctl to delete the database from the OCR:
$ srvctl remove database -d MYDW
(if you wonder why, read MOS node 790189.1).

If you check the alert.log you will find the messages
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Loaded library: System
SUCCESS: diskgroup DATA_DISK_GROUP was mounted
Mon Mar 11 16:03:54 2013
ERROR: failed to establish dependency between database MYDW and diskgroup resource ora.DATA_DISK_GROUP.dg

My guess is that we cannot avoid these messages, because we are using ASM from a database which will remain unknown to the clusterware.

I placed the spfile in ASM, so I had to put a pfile on all nodes in $ORACLE_HOME/dbs with the only entry spfile=’+DATA_DISK_GROUP/MYDW/spfileMYDW.ora’.

If you copy pfile, passwordfile and oratab to all remaining cluster nodes you will be able to manually start/stop the database on every cluster node.
If you try to manually start the database on more than one node you will get some ASM errors: this is why I told you that ASM helps avoiding a split-brain syndrome!

Now we need action script to manage the database. Here they are:

db.sh:

#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# db.sh - action script for listener
# this script MUST receive three inputs:
#  $1 = ORACLE_HOME
#  $2 = ORACLE_SID
#  $3 = start|stop|check|clean
# This script will assume that the ORACLE_OWNER is the current user
# This script will attempt to do $ORACLE_HOME/bin/sqlplus ...
# and will not interpret the result

AWK=/bin/awk
CUT=/bin/cut
DATE=/bin/date
DIRNAME=/usr/bin/dirname
ECHO=/bin/echo
EGREP=/bin/egrep
ENV=/bin/env
GREP=/bin/grep
KILL=/bin/kill
PS=/bin/ps
REV=/usr/bin/rev
RM=/bin/rm
SLEEP=/bin/sleep
SORT=/bin/sort
SUDO=/usr/bin/sudo
TOUCH=/bin/touch
WC=/usr/bin/wc
WHOAMI=/usr/bin/whoami

if [ ! $# -eq 3 ]
then
  ${ECHO} "Syntax: $0 ORA_HOME ORA_SID ACTION"
  exit 666
fi

RESOURCE_NAME=${_CRS_NAME}

ORACLE_OWNER=`${WHOAMI}`
ORACLE_HOME=$1 ; export ORACLE_HOME
ORACLE_SID=$2  ; export ORACLE_SID
ACTION=$3

case "${ACTION}" in
  start)
    ${ORACLE_HOME}/bin/sqlplus -L " / as sysdba" <${TMP_OUTPUT_FILE} 2>&1 <<eofcheck
       connect Francesco/TOTTI
       exit ;
eofcheck

      if [ `${GREP} ^ORA-01017 ${TMP_OUTPUT_FILE} | ${WC} -l ` -ge 1 ]
      then
        # we got ORA-01017: invalid username/password: the db is UP!
        RSTAT=0
      else
        # we got some different errors so the database was unable to check our account: the db is not available!
        RSTAT=2
      fi
      ${RM} ${TMP_OUTPUT_FILE}
    fi
    exit ${RSTAT}
    ;;

  clean)
    # to clean we try to shutdown abort and after we kill -9 pmon
    ${ORACLE_HOME}/bin/sqlplus -L " / as sysdba" <>${TRACE_FILE_PATH}
  ${ECHO} "================================================================="                           >>${TRACE_FILE_PATH}
  ${ECHO} " "                                                                                           >>${TRACE_FILE_PATH}
  ${ECHO} "ORA_OWNER=${ORA_OWNER}"                                                                      >>${TRACE_FILE_PATH}
  ${ECHO} "ORA_HOME=${ORA_HOME}"                                                                        >>${TRACE_FILE_PATH}
  ${ECHO} "ORA_SID=${ORA_SID}"                                                                          >>${TRACE_FILE_PATH}
  ${ECHO} " "                                                                                           >>${TRACE_FILE_PATH}
fi

case "$4" in
  start)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                          >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} start
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} start returned ${RSTAT}" >>${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                                 >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  stop)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP  resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                          >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} stop
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} stop  returned ${RSTAT}" >>${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                                 >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  check)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                      >>${TRACE_FILE_PATH}
     #${ECHO} "CRS env follows"                                                                       >>${TRACE_FILE_PATH}
     #$ENV | $GREP ^_CRS_ | $SORT                                                                     >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} check
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} check returned ${RSTAT}" >>${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                                 >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  clean)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                          >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} clean
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} clean returned ${RSTAT}" >>${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                                 >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  *)
    echo "Usage: $0 ORA_OWNER ORA_HOME ORA_SID start|stop|check|clean , not $0 $*"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' called with no option for resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                                   >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                                    >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                                     >>${TRACE_FILE_PATH}
    fi
    exit 1
    ;;
esac

db_MYDW.sh:

#!/bin/sh
# questo script imposta le variabili ORA_HOME, ORA_SID, ORA_OWNER
# e chiama poi db_wrapper.sh

ECHO=/bin/echo

if [ ! $# -eq 1 ]
then
  ${ECHO} "Syntax: $0 ACTION"
  exit 666
fi

ORA_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
ORA_SID=MYDW
ORA_OWNER=oracle
ACTION=$1

/opt/cluster/scripts/db_wrapper.sh   ${ORA_OWNER} ${ORA_HOME} ${ORA_SID}           ${ACTION}

Since my scripts will do sudo -u oracle (because I want oracle processes to run as the “oracle” user) I modified /etc/sudoers on both nodes to let “gripd” execute db.sh as “oracle”:

Defaults:grid !requiretty
grid      ALL = (oracle) NOPASSWD: /opt/cluster/scripts/lsnr.sh, /opt/cluster/scripts/db.sh

Now we can add the database as a clustered custom resource using:
crsctl add resource mydw.db -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/db_MYDW.sh,CHECK_INTERVAL=30,RESTART_ATTEMPTS=3,AUTO_START=restore,CARDINALITY=1,DEGREE=1,OFFLINE_CHECK_INTERVAL=60,SCRIPT_TIMEOUT=180,START_TIMEOUT=180,STOP_TIMEOUT=180,START_DEPENDENCIES=hard(ora.DATA_DISK_GROUP.dg),STOP_DEPENDENCIES=hard(ora.DATA_DISK_GROUP.dg)"

6. A clustered file system

In this section I will cluster a logical volume to be mounted under /backup.
I created the logical volume /dev/mapper/vg02-bcklv
We will need root privileges to (u)mount it, so I will create some action scripts and I will add

Defaults:grid !requiretty
grid      ALL = NOPASSWD: /opt/cluster/scripts/fs.sh

in /etc/sudoers.

The action scripts are the following.

fs_backup:

#!/bin/sh
# questo script imposta le variabili USER_MOUNTER, MOUNT_POINT
# e chiama poi fs.sh

ECHO=/bin/echo

if [ ! $# -eq 1 ]
then
  ${ECHO} "Syntax: $0 ACTION"
  exit 666
fi

USER_MOUNTER=root
MOUNT_POINT=/backup
ACTION=$1

/opt/cluster/scripts/fs_wrapper.sh   ${USER_MOUNTER} ${MOUNT_POINT} ${ACTION}

fs_wrapper:

#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# fs_wrapper.sh - action script for fs (wrapper)
# this script MUST receive three inputs:
#  $1 = USER_MOUNTER
#  $2 = MOUNT_POINT
#  $3 = start|stop|check|clean
# and will attempt to do sudo -u $USER_MOUNTER /opt/cluster/scripts/fs.sh   $MOUNT_POINT
# and to interpret the result

DATE=/bin/date
ECHO=/bin/echo
EGREP=/bin/egrep
ENV=/bin/env
GREP=/bin/grep
RM=/bin/rm
SORT=/bin/sort
SUDO=/usr/bin/sudo
TOUCH=/bin/touch

if [ ! $# -eq 3 ]
then
  ${ECHO} "Syntax: $0 USER_MOUNTER MOUNT_POINT ACTION"
  exit 666
fi

RESOURCE_NAME=${_CRS_NAME}
TRACE_FILE_PATH=/opt/cluster/scripts/fs_wrapper.${RESOURCE_NAME}.out
TRACE=1

USER_MOUNTER=$1
MOUNT_POINT=$2

if [ "${TRACE}" = "1" ]
then
  ${ECHO} " "                                                                                           >>${TRACE_FILE_PATH}
  ${ECHO} "================================================================="                           >>${TRACE_FILE_PATH}
  ${ECHO} " "                                                                                           >>${TRACE_FILE_PATH}
  ${ECHO} "USER_MOUNTER=${USER_MOUNTER}"                                                                >>${TRACE_FILE_PATH}
  ${ECHO} "MOUNT_POINT=${MOUNT_POINT}"                                                                  >>${TRACE_FILE_PATH}
  ${ECHO} " "                                                                                           >>${TRACE_FILE_PATH}
fi

case "$3" in
  start)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                          >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} start
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} start returned ${RSTAT}" >>${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                            >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  stop)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP  resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                          >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} stop
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} stop  returned ${RSTAT}" >>${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                         >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  check)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                      >>${TRACE_FILE_PATH}
     #${ECHO} "CRS env follows"                                                                       >>${TRACE_FILE_PATH}
     #$ENV | $GREP ^_CRS_ | $SORT                                                                     >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} check
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} check returned ${RSTAT}" >>${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                            >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  clean)
   #${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}]"
   #${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                          >>${TRACE_FILE_PATH}
    fi
    ${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} clean
    RSTAT=$?
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} clean returned ${RSTAT}" >>${TRACE_FILE_PATH}
      ${ECHO} " "                                                                                            >>${TRACE_FILE_PATH}
    fi
    exit ${RSTAT}
    ;;

  *)
    echo "Usage: $0 USER_MOUNTER MOUNT_POINT start|stop|check|clean , not $0 $*"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' called with no option for resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                                   >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                                    >>${TRACE_FILE_PATH}
      ${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT}                                                                     >>${TRACE_FILE_PATH}
    fi
    exit 1
    ;;
esac

fs.sh:

#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# fs.sh - action script for filesystem
# this script MUST receive three inputs:
#  $1 = MOUNT_POINT
#  $2 = start|stop|check|clean
# This script will assume that the current user can mount the device
# This script will attempt to do mount / umount
# and will not interpret the result

AWK=/bin/awk
CUT=/bin/cut
DATE=/bin/date
DIRNAME=/usr/bin/dirname
ECHO=/bin/echo
EGREP=/bin/egrep
ENV=/bin/env
GREP=/bin/grep
KILL=/bin/kill
LSOF=/usr/sbin/lsof
MOUNT=/bin/mount
PS=/bin/ps
REV=/usr/bin/rev
RM=/bin/rm
SLEEP=/bin/sleep
SORT=/bin/sort
SUDO=/usr/bin/sudo
TOUCH=/bin/touch
UMOUNT=/bin/umount
XARGS=/usr/bin/xargs
WC=/usr/bin/wc
WHOAMI=/usr/bin/whoami

if [ ! $# -eq 2 ]
then
  ${ECHO} "Syntax: $0 MOUNT_POINT ACTION"
  exit 666
fi

RESOURCE_NAME=${_CRS_NAME}

MOUNT_POINT=$1
ACTION=$2

case "${ACTION}" in
  start)
    ${MOUNT} ${MOUNT_POINT}
    RSTAT=$?
    exit ${RSTAT}
    ;;

  stop)
    ${UMOUNT} ${MOUNT_POINT}
    RSTAT=$?
    exit ${RSTAT}
    ;;

  check)
    MOUNT_COUNT=` ${MOUNT} | ${GREP} "${MOUNT_POINT} " | ${WC} -l `
    if [ "${MOUNT_COUNT}" = "0" ]
    then
      # device not mounted
      RSTAT=1
    else
      # device mounted
      RSTAT=0
    fi
    exit ${RSTAT}
    ;;

  clean)
    # to clean we try to kill any process using the fs and we umount -f
   #lsof /backup/ | awk ' {  print $2 } ' | grep -v ^PID | xargs kill -9
    ${ECHO} "${LSOF} ${MOUNT_POINT} | ${AWK} ' { print \$2 } ' | ${GREP} -v ^PID | ${XARGS} ${KILL} -9 "
    ${LSOF} ${MOUNT_POINT} | ${AWK} ' { print $2 } ' | ${GREP} -v ^PID | ${XARGS} ${KILL} -9
    ${UMOUNT} -f ${MOUNT_POINT}
    RSTAT=$?
    exit ${RSTAT}
    ;;

  *)
    echo "Usage: $0 MOUNT_POINT start|stop|check|clean, not $0 $*"
    exit 1
    ;;
esac

We can add the filesystem to the cluster, as usual:
crsctl add resource mydw.bck -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/fs_backup.sh,CHECK_INTERVAL=60,RESTART_ATTEMPTS=3,AUTO_START=restore,CARDINALITY=1,DEGREE=1,OFFLINE_CHECK_INTERVAL=60,SCRIPT_TIMEOUT=30,START_TIMEOUT=30,STOP_TIMEOUT=30"

We can add a dependency between the database and this filesystem:

crsctl modify resource mydw.db -attr "START_DEPENDENCIES='hard(ora.DATA_DISK_GROUP.dg,mydw.bck) pullup(mydw.bck)',STOP_DEPENDENCIES='hard(ora.DATA_DISK_GROUP.dg,mydw.bck)'"

7. a simple way to start and stop all the clustered resources on the same node

Whose who are familiar with other vendors’ clusterwares may want to put all the resources we configured into a “set”.
(Well, in this particular case the resource mydw.db depends on a clustered ASM disk group and we shouldn’t change the way ASM is clustered… so what we are looking for is a convenient way to start/stop all mydw* resources).

We can use an empty resource called “mydw” and place dependecies such that crsctl stop resource mydw -f  will stop all mydw* resources, and we can use an empty resource called “mydw.all” and place dependecies such that crsctl start resource mydw.all will start all mydw* resources.

I will use “file” resources to build these empty resources: the clustered resource will be an empty file.
The script will be /opt/cluster/scripts/file.sh:

#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# file.sh - empty action script for "file" cluster resources
# Usefult to plat with CRS and
# to create empty resources to be used as resource groups

DATE=/bin/date
ECHO=/bin/echo
ENV=/bin/env
GREP=/bin/grep
RM=/bin/rm
SORT=/bin/sort
TOUCH=/bin/touch

RESOURCE_NAME=${_CRS_NAME}
FLAG_FILE_PATH=/opt/cluster/scripts/file.${RESOURCE_NAME}.flag
TRACE_FILE_PATH=/opt/cluster/scripts/file.${RESOURCE_NAME}.out
TRACE=1

case "$1" in
  start)
    ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}]"
    ${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${GREP} ^_CRS_ | ${SORT}                                                                 >>${TRACE_FILE_PATH}
    fi
    $TOUCH $FLAG_FILE_PATH
    exit 0
    ;;

  stop)
    ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}]"
    ${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP  resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${GREP} ^_CRS_ | ${SORT}                                                                 >>${TRACE_FILE_PATH}
    fi
    $RM $FLAG_FILE_PATH
    exit 0
    ;;

  check)
    ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}]"
    ${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                    >>${TRACE_FILE_PATH}
     #${ECHO} "CRS env follows"                                                                     >>${TRACE_FILE_PATH}
     #$ENV | $GREP ^_CRS_ | $SORT                                                                 >>${TRACE_FILE_PATH}
    fi
    if [ -f ${FLAG_FILE_PATH} ]
    then
      exit 0
    else
      exit 1
    fi
    ;;

  clean)
    ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}]"
    ${ECHO} "command line is $0 $*, pid is $$"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                        >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                         >>${TRACE_FILE_PATH}
      ${ENV} | ${GREP} ^_CRS_ | ${SORT}                                                                 >>${TRACE_FILE_PATH}
    fi
    $RM -f $FLAG_FILE_PATH
    exit 0
    ;;

  *)
    echo $"Usage: $0 {start|stop|check|clean}"
    if [ "${TRACE}" = "1" ]
    then
      ${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' called with no option for resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
      ${ECHO} "command line is $0 $*, pid is $$"                                                                   >>${TRACE_FILE_PATH}
      ${ECHO} "CRS env follows"                                                                                    >>${TRACE_FILE_PATH}
      ${ENV} | ${GREP} ^_CRS_ | ${SORT}                                                                            >>${TRACE_FILE_PATH}
    fi
    exit 1
    ;;
esac

We can create the resources and place dependencies using crsctl, as usual:


$ crsctl add    resource mydw       -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/file.sh,                CHECK_INTERVAL=60, RESTART_ATTEMPTS=3, AUTO_START=restore, CARDINALITY=1, DEGREE=1, OFFLINE_CHECK_INTERVAL=600, SCRIPT_TIMEOUT=10,  START_TIMEOUT=10,  STOP_TIMEOUT=10  "
$ crsctl start  resource mydw

$ crsctl modify resource mydw.lvbck                        -attr "START_DEPENDENCIES='hard(mydw) pullup(mydw)', STOP_DEPENDENCIES=hard(mydw)"
$ crsctl modify resource mydw.db                           -attr "START_DEPENDENCIES='hard(ora.DATA_DISK_GROUP.dg, mydw.lvbck, mydw) pullup(mydw, mydw.lvbck)', STOP_DEPENDENCIES='hard(ora.DATA_DISK_GROUP.dg, mydw, mydw.lvbck)'"
# /u01/app/11.2.0/grid/bin/crsctl modify resource mydw.mpls.vip                     -attr "START_DEPENDENCIES='hard(ora.net2.network, mydw) pullup(ora.net2.network, mydw)', STOP_DEPENDENCIES='hard(ora.net2.network, mydw)'"
$ crsctl modify resource mydw.lsnr                         -attr "START_DEPENDENCIES='hard(mydw.mpls.vip, mydw.erog.vip, mydw) pullup(mydw.mpls.vip, mydw.erog.vip, mydw)', STOP_DEPENDENCIES='hard(mydw.mpls.vip, mydw.erog.vip, mydw)'"
$ crsctl modify resource mydw.em                           -attr "START_DEPENDENCIES='hard(mydw.db, mydw.lsnr, mydw.erog.vip, mydw) pullup(mydw.db, mydw.lsnr, mydw.erog.vip, mydw)', STOP_DEPENDENCIES='hard(mydw.db, mydw.lsnr, mydw.erog.vip, mydw)'"

$ crsctl add    resource mydw.all   -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/file.sh,                CHECK_INTERVAL=60, RESTART_ATTEMPTS=3, AUTO_START=restore, CARDINALITY=1, DEGREE=1, OFFLINE_CHECK_INTERVAL=600, SCRIPT_TIMEOUT=10,  START_TIMEOUT=10,  STOP_TIMEOUT=10,  START_DEPENDENCIES='hard(mydw, mydw.lvbck, mydw.db, mydw.erog.vip, mydw.mpls.vip, mydw.lsnr, mydw.em) pullup(mydw, mydw.lvbck, mydw.db, mydw.erog.vip, mydw.mpls.vip, mydw.lsnr, mydw.em)', STOP_DEPENDENCIES='hard(mydw, mydw.lvbck, mydw.db, mydw.erog.vip, mydw.mpls.vip, mydw.lsnr, mydw.em)'"

Now you can start together database, VIP and listener with crsctl start resource mydw.all and you can stop together database, VIP and listener with crsctl stop resource mydw -f.
The clusterware will manage the dependencies and will start/stop the resources in the correct order.

8. Enterprise Manager Database Console

We should not try to configure the database console: it is not supported on cold failover clusters, so we should use the Enterprise Manager Grid Control.

However, if you want to use the database console, you can configure it setting ORACLE_HOSTNAME to yout VIP’s network name, unsecure it, and cluster it as follows:
crsctl add resource mydw.em -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/em_mydw.sh, CHECK_INTERVAL=60, RESTART_ATTEMPTS=3, AUTO_START=restore, CARDINALITY=1, DEGREE=1, OFFLINE_CHECK_INTERVAL=600, SCRIPT_TIMEOUT=180, START_TIMEOUT=180, STOP_TIMEOUT=180, START_DEPENDENCIES='hard(mydw.db, mydw.lsnr)', STOP_DEPENDENCIES='hard(mydw.db, mydw.lsnr)'"

That’s all folks!

Posted in oracle | Tagged | Leave a comment

Hello world!

Welcome to WordPress.com. After you read this, you should delete and write your own post, with a new title above. Or hit Add New on the left (of the admin dashboard) to start a fresh post.

Here are some suggestions for your first post.

  1. You can find new ideas for what to blog about by reading the Daily Post.
  2. Add PressThis to your browser. It creates a new blog post for you about any interesting  page you read on the web.
  3. Make some changes to this page, and then hit preview on the right. You can always preview any post or edit it before you share it to the world.
Posted in Uncategorized | 1 Comment