I just configured an Oracle cluster with Oracle Grid Infrastructure 11.2.0.3 for single-instance high availability (cold failover cluster) and I decided to write down what I did.
I choose to use Oracle Grid Infrastructure:
- because it is supported (but you have to care about the clustering scripts: see MOS note 790189.1)
- because it is free if you use it to protect some licensed Oracle product (see the “clusterware” paragraph in Oracle Database Licensing Information 11g Release 2 (11.2))
- because it is powerful: you can configure dependencies, affinities, anti-affinities, load balancing, … I will not use all of these features, but it is nice to know that we have such a powerful tool.
The first step is to install Oracle GI 11gR2. I will not explain this point because it is a quite ordinary job.
I will use a “grid” user to install the GI and an “oracle” user to manage the database, so I will use the “job role separation” installation explained in Oracle® Grid Infrastructure Installation Guide 11g Release 2 (11.2) for Linux – Advanced Installation Oracle Grid Infrastructure for a Cluster Preinstallation Tasks.
This is not mandatory, but since the database will be manually clustered I think this setup would be more appropriate.
I will use Oracle Grid Infrastructure 11.2.0.3 with the latest PSU, but this example should apply to a generic 11gR2 setup.
The second step is RTFM.
The third step is to read again chapter 6 of TFM.
Now we can start to play with the design.
We will need:
- a shared (global) storage, or a clustered storage
- a single-instance database installation, on every cluster node
- a clustered virtual IP (VIP)
- a clustered listener to listen on the VIP
- a clustered single-instance database
- a clustered filesystem to host backups
- a simple way to start and stop all the clustered resources on the same node
- (optional) an Enterprise Manager Database Console
1. Shared (global) storage, or clustered storage
We have some possible choises:
- Grid Infrastructure’s ASM storage
- OCFS2 (a clustered filesystem always shared between all cluster nodes
- A simple filesystem, manually clustered
I will go with the first option. I like ASM, we already have it working (it’s included in the GI), and we will see that it helps us avoiding the “split-brain syndrome”. You only need to configure a disk group.
A simple, non cluster-aware, filesystem is a possibile choice too but
- you will have to cluster it
- if you wrongly mount the same device on both nodes you will probably corrupt the filesystem in a few seconds.
This is why I prefer to place the database inside ASM.
We may decide to use a simple filesystem to host RMAN backup and export to make it easy to copy them, but we may explore this point later.
2. Single instance database installation
Simply install Oracle Database software on every cluster node.
We could also install Oracle Database software in a shared location. This option would allow you to cluster the database console as per MOS note 362524.1, but I prefer to keep the binaries in local volumes. If you really want the dbconsole you can try to configure the dbconsole in an unsupported configuration (using http -not https- on the database VIP) or you could use the (free) Enterprise Manager Grid Infrastructure to manage your clustered database.
I installed Oracle database software on both nodes. Both installation must be identical. After the installation do check that $ORACLE_HOME/bin/oracle is owned by oracle:asmadmin with permissions set to -rwsr-s--x
to avoid problems described in MOS note 1054033.1 .
You should repeat this check after every patch installation on the database home.
3. A clustered virtual IP
We will configure a virtual IP which will be used to connect to the database. Following the documentation, you just have to choose an IP address and a network name, ensure that your VIP’s name (in my case mydw-mpls-vip) can be resolved consistently from all cluster nodes and from all clients in your network, and then run as root
# /u01/app/11.2.0/grid/bin/appvipcfg create -network=2 -ip=192.168.236.143 -vipname=mydw.vip -user=grid
.
Note that you may have to change -network=2 into -network=1 to suite your network number and that -vipname does not need to be the network name associated with the VIP, but since it will be the name of the cluster resource it should be at least similar to it.
4. a listener to listen on the VIP
Create a $ORACLE_HOME/network/admin/listener.ora on every cluster node like this:
LISTENER_MYDW =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = TCP)
(HOST = mydw-mpls-vip)
(PORT = 1522)
)
(ADDRESS =
(PROTOCOL = IPC)
(KEY = EXTPROC1522)
)
)
)
ADR_BASE_LISTENER_MYDW = /u01/app/oracle
If you use netca you will be asked whether you want to configure a cluster listener in the Grid Infrastructure home. I didn’t do this because I want a simple listener to run on a single node, listening on a single IP address.
To let the clusterware manage the listener we need an action script.
The action script must accept a single parameter to start|stop|check|clean a resource.
It would be nice to build a generic, parametrized action script for listeners but we cannot add custom attributes to cluster resources. We may follow Andrejs Karpovs’s way but application resources are deprecated in 11gR2, so I preferred a different approach.
I will use:
- a generic script called lsnr.sh owned by oracle to receive some parameters and start|stop|check|clean a listener
- a wrapper script owner by grid to do sudo -u oracle lsnr.sh ORA_HOME LSNR_NAME and interpret the result
- a master script to set some variables (ORA_OWNER, ORA_HOME, LSNR_NAME) and call the wrapper
In this way I will use the master script to start|stop|check|clean the resource, but I will put all the code logic in the generic parametrized scripts lsnr.sh and lsnr_wrapper.sh .
And here they are:
lsnr.sh:
#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# lsnr.sh - action script for listener
# this script MUST receive three inputs:
# $1 = ORACLE_HOME
# $2 = LISTENER_NAME
# $3 = start|stop|check|clean
# This script will assume that the ORACLE_OWNER is the current user
# This script will attempt to do $ORACLE_HOME/bin/lsnrctl ...
# and will not interpret the result
AWK=/bin/awk
DATE=/bin/date
ECHO=/bin/echo
EGREP=/bin/egrep
ENV=/bin/env
GREP=/bin/grep
KILL=/bin/kill
PS=/bin/ps
RM=/bin/rm
SORT=/bin/sort
SUDO=/usr/bin/sudo
TOUCH=/bin/touch
WC=/usr/bin/wc
WHOAMI=/usr/bin/whoami
if [ ! $# -eq 3 ]
then
${ECHO} "Syntax: $0 ORA_HOME LSNR_NAME ACTION"
exit 666
fi
RESOURCE_NAME=${_CRS_NAME}
ORACLE_OWNER=`${WHOAMI}`
ORACLE_HOME=$1 ; export ORACLE_HOME
LISTENER_NAME=$2
ACTION=$3
case "${ACTION}" in
start)
${ORACLE_HOME}/bin/lsnrctl start ${LISTENER_NAME}
RSTAT=$?
exit ${RSTAT}
;;
stop)
${ORACLE_HOME}/bin/lsnrctl stop ${LISTENER_NAME}
RSTAT=$?
exit ${RSTAT}
;;
check)
PROCESS_COUNT=`${PS} -ef | ${GREP} "^${ORACLE_OWNER} " | ${GREP} " ${ORACLE_HOME}/bin/tnslsnr ${LISTENER_NAME}" | ${WC} -l`
if [ "${PROCESS_COUNT}" = "0" ]
then
# no process found - exit
RSTAT=1
else
RSTAT=`${ORACLE_HOME}/bin/lsnrctl status ${LISTENER_NAME} | ${GREP} TNS- | ${WC} -l`
fi
exit ${RSTAT}
;;
clean)
# to clean we try to gracefully stop the listener and afterward we try to KILL it
${ORACLE_HOME}/bin/lsnrctl stop ${LISTENER_NAME}
TNSLSNR_PID=`${PS} -ef | ${GREP} "^${ORACLE_OWNER} " | ${GREP} " ${ORACLE_HOME}/bin/[t][n][s]lsnr ${LISTENER_NAME} " | ${AWK} ' { print \$2 } ' `
if [ ! "x${TNSLSNR_PID}" = "x" ]
then
${ECHO} "killing process ${TNSLSNR_PID} !!!"
${KILL} -9 ${TNSLSNR_PID}
else
${ECHO} "tnslsnr process not found"
fi
exit 0
;;
*)
echo "Usage: $0 ORA_HOME LISTENER_NAME start|stop|check|clean, not $0 $*"
exit 1
;;
esac
lsnr_wrapper.sh:
#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# lsnr_wrapper.sh - action script for listener (wrapper)
# this script MUST receive four inputs:
# $1 = ORACLE_OWNER
# $2 = ORACLE_HOME
# $3 = LISTENER_NAME
# $4 = start|stop|check|clean
# and will attempt to do sudo -u $ORA_OWNER /opt/cluster/scripts/lsnr.sh $ORA_HOME $LISTENER_NAME
# and to interpret the result
DATE=/bin/date
ECHO=/bin/echo
EGREP=/bin/egrep
ENV=/bin/env
GREP=/bin/grep
RM=/bin/rm
SORT=/bin/sort
SUDO=/usr/bin/sudo
TOUCH=/bin/touch
if [ ! $# -eq 4 ]
then
${ECHO} "Syntax: $0 ORA_OWNER ORA_HOME LSNR_NAME ACTION"
exit 666
fi
RESOURCE_NAME=${_CRS_NAME}
TRACE_FILE_PATH=/opt/cluster/scripts/lsnr_wrapper.${RESOURCE_NAME}.out
TRACE=1
ORA_OWNER=$1
ORA_HOME=$2
ORA_LISTENER_NAME=$3
if [ "${TRACE}" = "1" ]
then
${ECHO} " " >>{TRACE_FILE_PATH}
${ECHO} "=================================================================" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
${ECHO} "ORA_OWNER=${ORA_OWNER}" >>${TRACE_FILE_PATH}
${ECHO} "ORA_HOME=${ORA_HOME}" >>${TRACE_FILE_PATH}
${ECHO} "ORA_LISTENER_NAME=${ORA_LISTENER_NAME}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
case "$4" in
start)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} start
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} start returned ${RSTAT}" >> ${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
stop)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} stop
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} stop returned ${RSTAT}" >> ${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
check)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
#${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
#$ENV | $GREP ^_CRS_ | $SORT >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} check
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} check returned ${RSTAT}" >> ${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
clean)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} clean
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/lsnr.sh ${ORA_HOME} ${ORA_LISTENER_NAME} clean returned ${RSTAT}" >> ${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
*)
echo "Usage: $0 ORA_OWNER ORA_HOME LISTENER_NAME start|stop|check|clean , not $0 $*"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' called with no option for resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
exit 1
;;
esac
lsnr_LISTENER_MYDW:
#!/bin/sh
# this script only sets the variables variabili ORA_HOME, ORA_LISTENER_NAME, ORA_OWNER
# before calling lsnr_wrapper.sh
ECHO=/bin/echo
if [ ! $# -eq 1 ]
then
${ECHO} "Syntax: $0 ACTION"
exit 666
fi
ORA_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
ORA_LISTENER_NAME=LISTENER_MYDW
ORA_OWNER=oracle
ACTION=$1
/opt/cluster/scripts/lsnr_wrapper.sh ${ORA_OWNER} ${ORA_HOME} ${ORA_LISTENER_NAME} ${ACTION}
Since my scripts will do sudo -u oracle (because I want oracle processes to run as the “oracle” user) I added the following lines to /etc/sudoers on both nodes:
Defaults:grid !requiretty
grid ALL = (oracle) NOPASSWD: /opt/cluster/scripts/lsnr.sh
Now we can add the listener to the cluster using
crsctl add resource mydw.lsnr -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/lsnr_LISTENER_MYDW.sh, CHECK_INTERVAL=30 RESTART_ATTEMPTS=3, AUTO_START=restore, CARDINALITY=1, DEGREE=1, OFFLINE_CHECK_INTERVAL=60, SCRIPT_TIMEOUT=60, START_TIMEOUT=60, STOP_TIMEOUT=60, START_DEPENDENCIES='hard(mydw.mpls-vip.vip) pullup(mydw.mpls-vip.vip)', STOP_DEPENDENCIES='hard(mydw.mpls-vip.vip)'"
and we can manage it using crsctl start|stop|status resource mydw.lsnr
.
5. A clustered single-instance database
Create the database on just one node without configuring the dbconsole.
After database creation, use srvctl to delete the database from the OCR:
$ srvctl remove database -d MYDW
(if you wonder why, read MOS node 790189.1).
If you check the alert.log you will find the messages
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
NOTE: Loaded library: System
SUCCESS: diskgroup DATA_DISK_GROUP was mounted
Mon Mar 11 16:03:54 2013
ERROR: failed to establish dependency between database MYDW and diskgroup resource ora.DATA_DISK_GROUP.dg
My guess is that we cannot avoid these messages, because we are using ASM from a database which will remain unknown to the clusterware.
I placed the spfile in ASM, so I had to put a pfile on all nodes in $ORACLE_HOME/dbs with the only entry spfile=’+DATA_DISK_GROUP/MYDW/spfileMYDW.ora’.
If you copy pfile, passwordfile and oratab to all remaining cluster nodes you will be able to manually start/stop the database on every cluster node.
If you try to manually start the database on more than one node you will get some ASM errors: this is why I told you that ASM helps avoiding a split-brain syndrome!
Now we need action script to manage the database. Here they are:
db.sh:
#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# db.sh - action script for listener
# this script MUST receive three inputs:
# $1 = ORACLE_HOME
# $2 = ORACLE_SID
# $3 = start|stop|check|clean
# This script will assume that the ORACLE_OWNER is the current user
# This script will attempt to do $ORACLE_HOME/bin/sqlplus ...
# and will not interpret the result
AWK=/bin/awk
CUT=/bin/cut
DATE=/bin/date
DIRNAME=/usr/bin/dirname
ECHO=/bin/echo
EGREP=/bin/egrep
ENV=/bin/env
GREP=/bin/grep
KILL=/bin/kill
PS=/bin/ps
REV=/usr/bin/rev
RM=/bin/rm
SLEEP=/bin/sleep
SORT=/bin/sort
SUDO=/usr/bin/sudo
TOUCH=/bin/touch
WC=/usr/bin/wc
WHOAMI=/usr/bin/whoami
if [ ! $# -eq 3 ]
then
${ECHO} "Syntax: $0 ORA_HOME ORA_SID ACTION"
exit 666
fi
RESOURCE_NAME=${_CRS_NAME}
ORACLE_OWNER=`${WHOAMI}`
ORACLE_HOME=$1 ; export ORACLE_HOME
ORACLE_SID=$2 ; export ORACLE_SID
ACTION=$3
case "${ACTION}" in
start)
${ORACLE_HOME}/bin/sqlplus -L " / as sysdba" <${TMP_OUTPUT_FILE} 2>&1 <<eofcheck
connect Francesco/TOTTI
exit ;
eofcheck
if [ `${GREP} ^ORA-01017 ${TMP_OUTPUT_FILE} | ${WC} -l ` -ge 1 ]
then
# we got ORA-01017: invalid username/password: the db is UP!
RSTAT=0
else
# we got some different errors so the database was unable to check our account: the db is not available!
RSTAT=2
fi
${RM} ${TMP_OUTPUT_FILE}
fi
exit ${RSTAT}
;;
clean)
# to clean we try to shutdown abort and after we kill -9 pmon
${ORACLE_HOME}/bin/sqlplus -L " / as sysdba" <>${TRACE_FILE_PATH}
${ECHO} "=================================================================" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
${ECHO} "ORA_OWNER=${ORA_OWNER}" >>${TRACE_FILE_PATH}
${ECHO} "ORA_HOME=${ORA_HOME}" >>${TRACE_FILE_PATH}
${ECHO} "ORA_SID=${ORA_SID}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
case "$4" in
start)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} start
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} start returned ${RSTAT}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
stop)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} stop
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} stop returned ${RSTAT}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
check)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
#${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
#$ENV | $GREP ^_CRS_ | $SORT >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} check
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} check returned ${RSTAT}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
clean)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} clean
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${ORA_OWNER} /opt/cluster/scripts/db.sh ${ORA_HOME} ${ORA_SID} clean returned ${RSTAT}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
*)
echo "Usage: $0 ORA_OWNER ORA_HOME ORA_SID start|stop|check|clean , not $0 $*"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' called with no option for resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
exit 1
;;
esac
db_MYDW.sh:
#!/bin/sh
# questo script imposta le variabili ORA_HOME, ORA_SID, ORA_OWNER
# e chiama poi db_wrapper.sh
ECHO=/bin/echo
if [ ! $# -eq 1 ]
then
${ECHO} "Syntax: $0 ACTION"
exit 666
fi
ORA_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
ORA_SID=MYDW
ORA_OWNER=oracle
ACTION=$1
/opt/cluster/scripts/db_wrapper.sh ${ORA_OWNER} ${ORA_HOME} ${ORA_SID} ${ACTION}
Since my scripts will do sudo -u oracle (because I want oracle processes to run as the “oracle” user) I modified /etc/sudoers on both nodes to let “gripd” execute db.sh as “oracle”:
Defaults:grid !requiretty
grid ALL = (oracle) NOPASSWD: /opt/cluster/scripts/lsnr.sh, /opt/cluster/scripts/db.sh
Now we can add the database as a clustered custom resource using:
crsctl add resource mydw.db -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/db_MYDW.sh,CHECK_INTERVAL=30,RESTART_ATTEMPTS=3,AUTO_START=restore,CARDINALITY=1,DEGREE=1,OFFLINE_CHECK_INTERVAL=60,SCRIPT_TIMEOUT=180,START_TIMEOUT=180,STOP_TIMEOUT=180,START_DEPENDENCIES=hard(ora.DATA_DISK_GROUP.dg),STOP_DEPENDENCIES=hard(ora.DATA_DISK_GROUP.dg)"
6. A clustered file system
In this section I will cluster a logical volume to be mounted under /backup.
I created the logical volume /dev/mapper/vg02-bcklv
We will need root privileges to (u)mount it, so I will create some action scripts and I will add
Defaults:grid !requiretty
grid ALL = NOPASSWD: /opt/cluster/scripts/fs.sh
in /etc/sudoers
.
The action scripts are the following.
fs_backup:
#!/bin/sh
# questo script imposta le variabili USER_MOUNTER, MOUNT_POINT
# e chiama poi fs.sh
ECHO=/bin/echo
if [ ! $# -eq 1 ]
then
${ECHO} "Syntax: $0 ACTION"
exit 666
fi
USER_MOUNTER=root
MOUNT_POINT=/backup
ACTION=$1
/opt/cluster/scripts/fs_wrapper.sh ${USER_MOUNTER} ${MOUNT_POINT} ${ACTION}
fs_wrapper:
#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# fs_wrapper.sh - action script for fs (wrapper)
# this script MUST receive three inputs:
# $1 = USER_MOUNTER
# $2 = MOUNT_POINT
# $3 = start|stop|check|clean
# and will attempt to do sudo -u $USER_MOUNTER /opt/cluster/scripts/fs.sh $MOUNT_POINT
# and to interpret the result
DATE=/bin/date
ECHO=/bin/echo
EGREP=/bin/egrep
ENV=/bin/env
GREP=/bin/grep
RM=/bin/rm
SORT=/bin/sort
SUDO=/usr/bin/sudo
TOUCH=/bin/touch
if [ ! $# -eq 3 ]
then
${ECHO} "Syntax: $0 USER_MOUNTER MOUNT_POINT ACTION"
exit 666
fi
RESOURCE_NAME=${_CRS_NAME}
TRACE_FILE_PATH=/opt/cluster/scripts/fs_wrapper.${RESOURCE_NAME}.out
TRACE=1
USER_MOUNTER=$1
MOUNT_POINT=$2
if [ "${TRACE}" = "1" ]
then
${ECHO} " " >>${TRACE_FILE_PATH}
${ECHO} "=================================================================" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
${ECHO} "USER_MOUNTER=${USER_MOUNTER}" >>${TRACE_FILE_PATH}
${ECHO} "MOUNT_POINT=${MOUNT_POINT}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
case "$3" in
start)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} start
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} start returned ${RSTAT}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
stop)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} stop
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} stop returned ${RSTAT}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
check)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
#${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
#$ENV | $GREP ^_CRS_ | $SORT >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} check
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} check returned ${RSTAT}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
clean)
#${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}]"
#${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} clean
RSTAT=$?
if [ "${TRACE}" = "1" ]
then
${ECHO} "${SUDO} -u ${USER_MOUNTER} /opt/cluster/scripts/fs.sh ${MOUNT_POINT} clean returned ${RSTAT}" >>${TRACE_FILE_PATH}
${ECHO} " " >>${TRACE_FILE_PATH}
fi
exit ${RSTAT}
;;
*)
echo "Usage: $0 USER_MOUNTER MOUNT_POINT start|stop|check|clean , not $0 $*"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' called with no option for resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${EGREP} "ORA|^_CRS_" | ${SORT} >>${TRACE_FILE_PATH}
fi
exit 1
;;
esac
fs.sh:
#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# fs.sh - action script for filesystem
# this script MUST receive three inputs:
# $1 = MOUNT_POINT
# $2 = start|stop|check|clean
# This script will assume that the current user can mount the device
# This script will attempt to do mount / umount
# and will not interpret the result
AWK=/bin/awk
CUT=/bin/cut
DATE=/bin/date
DIRNAME=/usr/bin/dirname
ECHO=/bin/echo
EGREP=/bin/egrep
ENV=/bin/env
GREP=/bin/grep
KILL=/bin/kill
LSOF=/usr/sbin/lsof
MOUNT=/bin/mount
PS=/bin/ps
REV=/usr/bin/rev
RM=/bin/rm
SLEEP=/bin/sleep
SORT=/bin/sort
SUDO=/usr/bin/sudo
TOUCH=/bin/touch
UMOUNT=/bin/umount
XARGS=/usr/bin/xargs
WC=/usr/bin/wc
WHOAMI=/usr/bin/whoami
if [ ! $# -eq 2 ]
then
${ECHO} "Syntax: $0 MOUNT_POINT ACTION"
exit 666
fi
RESOURCE_NAME=${_CRS_NAME}
MOUNT_POINT=$1
ACTION=$2
case "${ACTION}" in
start)
${MOUNT} ${MOUNT_POINT}
RSTAT=$?
exit ${RSTAT}
;;
stop)
${UMOUNT} ${MOUNT_POINT}
RSTAT=$?
exit ${RSTAT}
;;
check)
MOUNT_COUNT=` ${MOUNT} | ${GREP} "${MOUNT_POINT} " | ${WC} -l `
if [ "${MOUNT_COUNT}" = "0" ]
then
# device not mounted
RSTAT=1
else
# device mounted
RSTAT=0
fi
exit ${RSTAT}
;;
clean)
# to clean we try to kill any process using the fs and we umount -f
#lsof /backup/ | awk ' { print $2 } ' | grep -v ^PID | xargs kill -9
${ECHO} "${LSOF} ${MOUNT_POINT} | ${AWK} ' { print \$2 } ' | ${GREP} -v ^PID | ${XARGS} ${KILL} -9 "
${LSOF} ${MOUNT_POINT} | ${AWK} ' { print $2 } ' | ${GREP} -v ^PID | ${XARGS} ${KILL} -9
${UMOUNT} -f ${MOUNT_POINT}
RSTAT=$?
exit ${RSTAT}
;;
*)
echo "Usage: $0 MOUNT_POINT start|stop|check|clean, not $0 $*"
exit 1
;;
esac
We can add the filesystem to the cluster, as usual:
crsctl add resource mydw.bck -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/fs_backup.sh,CHECK_INTERVAL=60,RESTART_ATTEMPTS=3,AUTO_START=restore,CARDINALITY=1,DEGREE=1,OFFLINE_CHECK_INTERVAL=60,SCRIPT_TIMEOUT=30,START_TIMEOUT=30,STOP_TIMEOUT=30"
We can add a dependency between the database and this filesystem:
crsctl modify resource mydw.db -attr "START_DEPENDENCIES='hard(ora.DATA_DISK_GROUP.dg,mydw.bck) pullup(mydw.bck)',STOP_DEPENDENCIES='hard(ora.DATA_DISK_GROUP.dg,mydw.bck)'"
7. a simple way to start and stop all the clustered resources on the same node
Whose who are familiar with other vendors’ clusterwares may want to put all the resources we configured into a “set”.
(Well, in this particular case the resource mydw.db depends on a clustered ASM disk group and we shouldn’t change the way ASM is clustered… so what we are looking for is a convenient way to start/stop all mydw* resources).
We can use an empty resource called “mydw” and place dependecies such that crsctl stop resource mydw -f
will stop all mydw* resources, and we can use an empty resource called “mydw.all” and place dependecies such that crsctl start resource mydw.all
will start all mydw* resources.
I will use “file” resources to build these empty resources: the clustered resource will be an empty file.
The script will be /opt/cluster/scripts/file.sh:
#!/bin/sh
#
# Copyright (c) 2013, YAWOD. All rights reserved.
# file.sh - empty action script for "file" cluster resources
# Usefult to plat with CRS and
# to create empty resources to be used as resource groups
DATE=/bin/date
ECHO=/bin/echo
ENV=/bin/env
GREP=/bin/grep
RM=/bin/rm
SORT=/bin/sort
TOUCH=/bin/touch
RESOURCE_NAME=${_CRS_NAME}
FLAG_FILE_PATH=/opt/cluster/scripts/file.${RESOURCE_NAME}.flag
TRACE_FILE_PATH=/opt/cluster/scripts/file.${RESOURCE_NAME}.out
TRACE=1
case "$1" in
start)
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}]"
${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to START resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${GREP} ^_CRS_ | ${SORT} >>${TRACE_FILE_PATH}
fi
$TOUCH $FLAG_FILE_PATH
exit 0
;;
stop)
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}]"
${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to STOP resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${GREP} ^_CRS_ | ${SORT} >>${TRACE_FILE_PATH}
fi
$RM $FLAG_FILE_PATH
exit 0
;;
check)
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}]"
${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CHECK resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
#${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
#$ENV | $GREP ^_CRS_ | $SORT >>${TRACE_FILE_PATH}
fi
if [ -f ${FLAG_FILE_PATH} ]
then
exit 0
else
exit 1
fi
;;
clean)
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}]"
${ECHO} "command line is $0 $*, pid is $$"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' asked to CLEAN resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${GREP} ^_CRS_ | ${SORT} >>${TRACE_FILE_PATH}
fi
$RM -f $FLAG_FILE_PATH
exit 0
;;
*)
echo $"Usage: $0 {start|stop|check|clean}"
if [ "${TRACE}" = "1" ]
then
${ECHO} "Action script '${_CRS_ACTION_SCRIPT}' called with no option for resource [${_CRS_NAME}] at `$DATE`" >>${TRACE_FILE_PATH}
${ECHO} "command line is $0 $*, pid is $$" >>${TRACE_FILE_PATH}
${ECHO} "CRS env follows" >>${TRACE_FILE_PATH}
${ENV} | ${GREP} ^_CRS_ | ${SORT} >>${TRACE_FILE_PATH}
fi
exit 1
;;
esac
We can create the resources and place dependencies using crsctl, as usual:
$ crsctl add resource mydw -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/file.sh, CHECK_INTERVAL=60, RESTART_ATTEMPTS=3, AUTO_START=restore, CARDINALITY=1, DEGREE=1, OFFLINE_CHECK_INTERVAL=600, SCRIPT_TIMEOUT=10, START_TIMEOUT=10, STOP_TIMEOUT=10 "
$ crsctl start resource mydw
$ crsctl modify resource mydw.lvbck -attr "START_DEPENDENCIES='hard(mydw) pullup(mydw)', STOP_DEPENDENCIES=hard(mydw)"
$ crsctl modify resource mydw.db -attr "START_DEPENDENCIES='hard(ora.DATA_DISK_GROUP.dg, mydw.lvbck, mydw) pullup(mydw, mydw.lvbck)', STOP_DEPENDENCIES='hard(ora.DATA_DISK_GROUP.dg, mydw, mydw.lvbck)'"
# /u01/app/11.2.0/grid/bin/crsctl modify resource mydw.mpls.vip -attr "START_DEPENDENCIES='hard(ora.net2.network, mydw) pullup(ora.net2.network, mydw)', STOP_DEPENDENCIES='hard(ora.net2.network, mydw)'"
$ crsctl modify resource mydw.lsnr -attr "START_DEPENDENCIES='hard(mydw.mpls.vip, mydw.erog.vip, mydw) pullup(mydw.mpls.vip, mydw.erog.vip, mydw)', STOP_DEPENDENCIES='hard(mydw.mpls.vip, mydw.erog.vip, mydw)'"
$ crsctl modify resource mydw.em -attr "START_DEPENDENCIES='hard(mydw.db, mydw.lsnr, mydw.erog.vip, mydw) pullup(mydw.db, mydw.lsnr, mydw.erog.vip, mydw)', STOP_DEPENDENCIES='hard(mydw.db, mydw.lsnr, mydw.erog.vip, mydw)'"
$ crsctl add resource mydw.all -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/file.sh, CHECK_INTERVAL=60, RESTART_ATTEMPTS=3, AUTO_START=restore, CARDINALITY=1, DEGREE=1, OFFLINE_CHECK_INTERVAL=600, SCRIPT_TIMEOUT=10, START_TIMEOUT=10, STOP_TIMEOUT=10, START_DEPENDENCIES='hard(mydw, mydw.lvbck, mydw.db, mydw.erog.vip, mydw.mpls.vip, mydw.lsnr, mydw.em) pullup(mydw, mydw.lvbck, mydw.db, mydw.erog.vip, mydw.mpls.vip, mydw.lsnr, mydw.em)', STOP_DEPENDENCIES='hard(mydw, mydw.lvbck, mydw.db, mydw.erog.vip, mydw.mpls.vip, mydw.lsnr, mydw.em)'"
Now you can start together database, VIP and listener with crsctl start resource mydw.all
and you can stop together database, VIP and listener with crsctl stop resource mydw -f
.
The clusterware will manage the dependencies and will start/stop the resources in the correct order.
8. Enterprise Manager Database Console
We should not try to configure the database console: it is not supported on cold failover clusters, so we should use the Enterprise Manager Grid Control.
However, if you want to use the database console, you can configure it setting ORACLE_HOSTNAME to yout VIP’s network name, unsecure it, and cluster it as follows:
crsctl add resource mydw.em -type cluster_resource -attr "ACTION_SCRIPT=/opt/cluster/scripts/em_mydw.sh, CHECK_INTERVAL=60, RESTART_ATTEMPTS=3, AUTO_START=restore, CARDINALITY=1, DEGREE=1, OFFLINE_CHECK_INTERVAL=600, SCRIPT_TIMEOUT=180, START_TIMEOUT=180, STOP_TIMEOUT=180, START_DEPENDENCIES='hard(mydw.db, mydw.lsnr)', STOP_DEPENDENCIES='hard(mydw.db, mydw.lsnr)'"
That’s all folks!