RAC benefit and characteristics- does not protect from human errrors- increased availabilty from node/instance failure- speed up parallel DSS queries- no speed up parallel OLTP processes- no availability increase on data failures- no availability increase on network failures- no availability increase on release upgrades- no scalability increased for applications workloads in all cases
RAC tuning - After migration to RAC test: - Interconnect latency - Instance recovery time - Application strongly relying on table truncates, full scan tables, sequences and non-sequences key generation, global context variables
RAC specific background processes for the database instanceCluster Synchronization Service (CSS) ocssd daemon, manages cluster configuration Cluster Ready Services (CRS) manages resources(listeners, VIPs, Global Service Daemon GSD, Oracle Notification Service ONS) crsd daemon backup the OCR every for hours, configuration is stored in OCREvent Manager (EVM) evmd daemon, publish eventsLMSn coordinate block updatesLMON global enqueue for shared locksLMDn manages requests for global enqueuesLCK0 handle resources not requiring Cache FusionDIAG collect diagnostic info
FAN Fast Application Notification-
Must connect using serviceLogged to:&ORA_CRS_HOME/racg/dump $ORA_CRS_HOHE/log/
Oracle Notification Service ONS- Transmits FAN events-
For every FAN event status change, all executables in $ORA_CRS_HOME/racg/usrco are launched (callout scripts) The ONS process is $ORA_CRS_HOME/opmn/bin/ons Arguments: -d: Run in daemon mode -a
FCF Fast Connection Failover-
A JDBC application configured to use FCF automatically subscribes to FAN events- A JDBC application must use service names to connect- A JDBC application must use implicit connection cache- $ORACLE_HOME/opmn/lib/ons.jar must be in classpath- -Doracle.ons.oraclehome -
Check for main Clusterware services up #check Event Manager upps -ef | grep evmd#check Cluster Synchronization Services upps -ef | grep ocssd #check Cluster Ready Services upps -ef | grep crsd#check Oracle Notification Serviceps -ef | grep ons[/etc/inittab]...hl:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&l
OCR - Oracle Cluster Registry [/etc/oracle/ocr.loc](10g) or [/etc/oracle/srvConfig.loc](9i, still exists in 10g for compatibility)ocrconfig_loc=/dev/raw/rawl ocrmirrorconfig_loc=/dev/raw/raw2 local_only=FALSE OCRCONFIG - Command-line tool for managing Oracle Cluster Registry #recover OCR logically, must be done on all nodes ocrconfig -import exp.dmp #export OCR content logically ocrconfig -export #recover OCR from OCR backup ocrconfig -restore bck.ocr #show backup status #crsd daemon backup the OCR every for hours, the most recent backup file is backup00.ocr ocrconfig -showbackup londonl 2005/08/04 11:15:29 /uOl/app/oracle/product/lO.2.0/crs/cdata/crs londonl 2005/08/03 22:24:32 /uOl/app/oracle/product/10.2.0/crs/cdata/crs #change OCR autobackuo location ocrconfig -backuploc #must be run on each affected node ocrconfig -repair ocr
Pre install, prerequisite(./run)cluvfy : run from install media or CRS_HOME, verify prerequisites on all nodesPost installation- Backup root.sh- Set up other user accounts- Verify Enterprise Manager / Cluster Registry by running srvctl config database -d db_name
SRVCTLStores infos in OCR, manages:Database, Instance, Service, Node applications, ASM, Listener srvctl config database -d
ServicesChanges are recorded in OCR only! Must use DBMS_SERVICE to update the dictionarysrvctl start service -d RAC -s "SERVICE1,SERVICE2"srvctl status service -d RAC -s "SERVICE1,SERVICE2"srvctl stop service -d RAC -s "SERVICE1,SERVICE2" -fsrvctl disable service -d RAC -s "SERVICE2" -i RAC4srvctl remove service -d RAC -s "SERVICE2"#relocate from RAC2 to RAC4srvctl relocate service -d RAC -s "SERVICE2" -i RAC2 -t RAC4#preferred RAC1,RAC2 and available RAC3,RAC4#-P PRECONNECT automatically creates a ERP and ERP_PRECONNECT service to use as BACKUP in tns_names#See TNSnames configuration#the service is NOT started, must be started manually (dbca do it automatically)srvctl add service -d ERP -s SERVICE2 -i "RAC1,RAC2" -a "RAC3,RAC4" -P PRECONNECT#show configuration, -a shows TAF confsrvctl config service -d RAC -a#modify an existing servicesrvctl modify service -d RACDB -s "SERVICE1" -i "RAC1,RAC2" -a "RAC3,RAC4"srvctl stop service -d RACDB -s "SERVICE1"srvctl start service -d RACDB -s "SERVICE1"ViewsGV$SERVICES GV$ACTIVE_SERVICES GV$SERVICEMETRICGV$SERVICEMETRIC_HISTORYGV$SERVICE_WAIT_CLASSGV$SERVICE_EVENTGV$SERVICE_STATSGV$SERV_MOD_ACT_STATS
SQL for RACselect * from V$ACTIVE_INSTANCES;Cache Fusion - GRD Global Resource DirectoryGES(Global Enqueue Service)GCS(Global Cache Service)Data Guard & RAC- Configuration files at primary location can be stored in any shared ASM diskgroup, on shared raw devices, on any shared cluster file system. They simply have to be sharedAdding a new node- Configure hardware and OS- Reconfigure listeners with NETCA- $ORA_CRS_HOME/bin/addnode.sh
VIP virtual IP- Both application/RAC VIP fail over if related application fail and accept new connections- Recommended RAC VIP sharing among database instances but not among different applications because...- ...VIP fail over if the application fail over- A failed over VIP application accepts new connection- Each VIP requires an unused and resolvable IP address- VIP address should be registered in DNS- VIP address should be on the same subnet of the public network- VIPs are used to prevent connection requests timeout during client connection attemptsChanging a VIP1- Stop VIP dependent cluster components on one node2- Make changes on DNS3- Change VIP using SRVCTL4- Restart VIP dependent components5- Repeat above on remaining nodes
oifcfgallocating and deallocating network interfaces, get values from OCRTo display a list of networksoifcfg getifeth1 192.168.1.0 global cluster_interconnecteth0 192.168.0.0 global publicdisplay a list of current subnetsoifcfg iflist etho 147.43.1.0 ethl 192.168.1.0 To include a description of the subnet, specify the -p option: oifcfg iflist -p ethO 147.43.1.0 UNKNOWN ethl 192.168.1.0 PRIVATE In 10.2 public interfaces are UNKNOWN. To include the subnet mask, append the -n option to the -p option: oifcfg if list -p -n etho 147.43.1.0 UNKNOWN 255.255.255.0 ethl 192.168.1.0 PRIVATE 255.255.255.0
Db parameters with SAME VALUE across all instancesactive_instance_countarchive_lag_targetcompatiblecluster_database RAC paramcluster_database_instance RAC param#Define network interfaces that will be used for interconnect#it is not a failover but a redistribution. If an address not work then stop all#Overrides the OCRcluster_interconnects RAC param = 192.168.0.10; 192.168.0.11; ...control_filesdb_block_sizedb_domaindb_filesdb_namedb_recovery_file_destdb_recovery_file_dest_sizedb_unique_namedml_locks (when 0)instance_type (rdbms or asm)max_commit_propagation_delay RAC paramparallel_max_serversremote_login_password_filetrace_enabled#cannot be mixed AUTO and MANUAL in a RACundo_managementDb parameters with INSTANCE specific VALUE across all instancesinstance_nameinstance_numberthreadundo_tablespace #system paramListener parameterslocal_listener='(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.13) (PORT = 1521)))' #allow pmon to register with local listener when not using 1521 portremote_listener = '(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.2.9) (PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST =192.168.2.10)(PORT = 1521)))' #make the listener aware of the load of the listeners of other nodesImportant Rac Parametersgc_files_to_locks #other than default disable Cache Fusionrecovery_parallelism #number of redo application server processes in instance or media recoveryRac and Standby parametersdg_broker_config_file1 #shared between primary and standby instancesdg_broker_config_file2 #different from dg_broker_config_file1, shared between primary and standby instances
Forum section
Shared contentsdatafiles, controlfiles, spfiles, redo logShared or local? RAW_Dev File_Syst ASM NFS OCFS- Datafiles : shared mandatory- Control files : shared mandatory- Redo log : shared mandatory- SPfile : shared mandatory- OCR and vote : shared mandatory Y Y N- Archived log : shared not mandatory. N Y N Y- Undo : local- Flash Recovery : shared Y Y Y- Data Guard broker conf.: shared(prim. & stdby) Y Y
Adding logfile thread groups for a new instance#To support a new instance on your RAC 1) alter database add logfile thread 3 group 7;1) alter database add logfile thread 3 group 8;2) alter database enable thread 3;# if you want to change an used thread 2) alter system set thread=3 scope=pfile sid='RAC01' 3) srvctl stop instance -d RACDB -i RAC01
Views and queriesselect * from GV$CACHE_TRANSFER
An instance failed to start, what do we do?1) Check the instance alert.log2) Check the Oracle Clusterware software alert.log3) Check the resource state using CRS_STAT
Install
See official Note 239998.1 for removing crs installation
See http://startoracle.com/2007/09/30/so-you-want-to-play-with-oracle-11gs-rac-heres-how/ to install 11g RAC on VMware
See http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi.html to install on Linux with iSCSI disks
See http://www.oracle-base.com/articles/10g/OracleDB10gR2RACInstallationOnCentos4UsingVMware.php to install on VMware
#If using VMware remember to allow shared disks disabling locking
#Obviously this is not required if you use OCFS or ASM
disk.locking = "FALSE"
See OCFS Oracle Cluster Filesystem
Prerequisites check#check node connectivity and Clusterware integrity./runcluvfy.sh stage -pre dbinst -n all./runcluvfy.sh stage -post hwos -n "linuxes,linuxes1" -verboseWARNING: Package cvuqdisk not installed. rpm -Uvh clusterware/rpm/cvuqdisk-1.0.1-1.rpmWARNING: Unable to determine the sharedness of /dev/sdf on nodes: linuxes1,linuxes1,linuxes1,linuxes1,linuxes1,linuxes1,linuxes,linuxes,linuxes,linuxes,linuxes,linuxesSafely ignore this error./runcluvfy.sh comp peer -n "linuxes,linuxes1" -verbose./runcluvfy.sh comp nodecon -n "linuxes,linuxes1" -verbose ./runcluvfy.sh comp sys -n "linuxes,linuxes1" -p crs -verbose ./runcluvfy.sh comp admprv -n "linuxes,linuxes1" -verbose -o user_equiv./runcluvfy.sh stage -pre crsinst -n "linuxes,linuxes1" -r 10gR2
Restart intallation - Remove from each nodesu -c "$ORA_CRS_HOME/install/rootdelete.sh; $ORA_CRS_HOME/install/rootdeinstall.sh"#oracle userexport DISPLAY=192.168.0.1:0.0/app/crs/oui/bin/runInstaller -removeHome -noClusterEnabled ORACLE_HOME=/app/crs LOCAL_NODE=linuxesrm -rf $ORA_CRS_HOME/*#rootsu -c "chown oracle:dba /dev/raw/*; chmod 660 /dev/raw/*; rm -rf /var/tmp/.oracle; rm -rf /tmp/.oracle"
#Format rawdevices using
dd if=/dev/zero of=/dev/raw/raw6 bs=1M count=250
#If related error message appears during installation, manually launch on related node
/app/crs/oui/bin/runInstaller -attachHome -noClusterEnabled ORACLE_HOME=/app/crs ORACLE_HOME_NAME=OraCrsHome CLUSTER_NODES=linuxes,linuxes1 CRS=true "INVENTORY_LOCATION=/app/oracle/oraInventory" LOCAL_NODE=linuxes
runcluvfy.sh stage -pre crsinst -n linuxes -verbose
Forum section
/etc/hosts example# Do not remove the following line, or various programs # that require network functionality will fail, 127.0.0.1 localhost 147.43.1.101 londonl 147.43.1.102 london2 #VIP is usable only after VIPCA utility run,#should be created on the public interface. Remember that VIPCA is a GUI tool147.43.1.201 londonl-vip 147.43.1.202 london2-vip 192.168.1.1 londonl-priv 192.168.1.2 london2-priv
Kernel Parameters(/etc/sysctl.conf) Recommended Valueskernel.sem (semmsl) 250 kernel.sem (semmns) 32000 kernel.sem (semopm) 100 kernel.sem (semmni) 128 kernel.shmall 2097152 kernel.shmmax Half the size of physical memory kernel.shmmni 4096 fs.file-max 65536 net.core.rmem_default 262144 net.core.rmem_max 262144 net.core.wmem_default 262144 net.core.wmem_max 262144 net.ipv4.ip_local_port_range 1024 to 65000
RAC restrictions- dbms_alert, both publisher and subscriber must be on same instance, AQ is the workaround- dbms_pipe, only works on the same instance, AQ is the workaround- UTL_FILE, directories, external tables and BFILEs need to be on shared storage
Implementing the HA High Availability FrameworkUse srvctl to start/stop applications#Manually create a script that OCR will use to start/stop/status#Create an application VIP. #This command generates an application profile called haf demovip.cap in the $ORA_CRS_HOME/crs/ public directory. $ORA_CRS_HOME/bin/crs_profile -create hafdemovip -t application -a $ORA_CRS_HOME/bin/usrvip -o oi=eth0,ov=147.43.1.200,on=255.255.0.0 #As the oracle user, register the VIP with Oracle Clusterware: ORA_CRS_HOME/bin/crs_register hafdemovip #As the root user, set the owner of the apphcation VIP to root: $ORA_CRS_HOME/bin/crs_setperm hafdemovip -o root #As the root user, grant the oracle user permission to run the script: $ORA_CRS_HOME/bin/crs_setperm hafdemovip -u user:oracle:r-x #As the oracle user, start the application VIP: $ORA_CRS_HOME/bin/crs_start hafdemovip 2. Create an application profile. $ORA_CRS_HOHE/bin/crs_profile -create hafdemo -t application -d "HAF Demo" -r hafdemovip -a /tmp/HAFDemoAction -0 ci=5,ra=603. Register the application profile with Oracle Clusterware. $ORA_CRS_HOHE/bin/crs_register hafdemo $ORA_CRS_HOME/bin/crs_start hafdemo
CRS commandscrs_profilecrs_registercrs_unregistercrs_getpermcrs_setpermcrs_startcrs_stopcrs_statcrs_relocate
Server side calloutOracle instance up(/down?)Service member down(/up?)Shadow application service up(/down?)
Removing a node from a cluster- Remove node from clusterware- Check that ONS configuration has been updated on other node- Check that database and instances are terminated on node to remove- Check that node has been removed from database and ASM repository- Check that software has been removed from database and ASM homes on node to remove
$ srvctl status database -d RAC
Instance RAC1 is running on node orarac1
Instance RAC2 is running on node orarac2
$srvctl status instance -d RAC -i RAC1
Instance RAC1 is running on node orarac1
$srvctl status asm -n orarac1
ASM instance +ASM1 is running on node orarac1
$srvctl status nodeapps -n orarac1
VIP is running on node: orarac1
GSD is running on node: orarac1
PRKO-2016 : Error in checking condition of listener on node: orarac1
Please refer PRKO-2016 to Oracle Bug No.4625482
"CHECKING/STARTING/STOPING NAMED LISTENER WITH SRVCTL NODEAPPS FAILS /W
PRKO-2016". It's a known issue on Oracle 10.2.0.1. Ignore if the
Stop/Start Oracle RAC
1. Stop Oracle 10g on one of RAC nodes.
$ export ORACLE_SID=RAC1
$ srvctl stop instance -d RAC -i RAC1
$ srvctl stop asm -n orarac1
$ srvctl stop nodeapps -n orarac1
2. Start Oracle 10g on one of RAC nodes.
$ export ORACLE_SID=RAC1
$ srvctl start nodeapps -n orarac1
$ srvctl start asm -n orarac1
$ srvctl start instance -d RAC -i RAC1
3. Stop/start Oracle 10g on all RAC nodes.
$ srvctl stop database -d RAC
Check Oracle Listener on nodes
$ srvctl config listener -n orarac1
orarac1 LISTENER_ORARAC1
$ lsnrctl start LISTENER_ORARAC1
Check, backup, restore OCR
$ocrconfig -showbackup
$ocrconfig -export /data/backup/rac/ocrdisk.bak
!To restore OCR, it must stop Clusterware on all nodes before.
$ocrconfig -import /data/backup/rac/ocrdisk.bak
$cluvfy comp ocr -n all //verification
$ocrcheck //check OCR disk usage
Check configuration of ORACLE RAC
$srvctl config database
RAC
$srvctl config database -d RAC
orarac1 RAC1 /space/oracle/product/10.2.0/db_2
orarac2 RAC2 /space/oracle/product/10.2.0/db_2
$srvctl config asm -n orarac2
+ASM2 /space/oracle/product/10.2.0/db_2
$srvctl config nodeapps -n orarac1 -a -g -s -l
VIP exists.: /orarac1-vip.banctecmtl.lan/10.21.51.13/255.255.224.0/Public
GSD exists.
ONS daemon exists.
Listener exists.
SQL> column HOST_NAME format a10;
SQL> select INSTANCE_NAME, HOST_NAME,STARTUP_TIME,STATUS,PARALLEL,DATABASE_STATUS,ACTIVE_STATE from gv$instance;
INSTANCE_NAME HOST_NAME STARTUP_T STATUS PAR DATABASE_STATUS ACTIVE_ST
---------------- ---------- --------- ------------ --- ----------------- ---------
rac2 ORARAC2 16-SEP-08 OPEN YES ACTIVE NORMAL
rac1 ORARAC1 19-SEP-08 OPEN YES ACTIVE NORMAL
$ srvctl start database -d RAC
listener works fine.
ONS daemon is running on node: orarac1
 
 

No comments:
Post a Comment