My DBA Reference

Monday, April 27, 2015

TFA Diag collection for certain period of time

You need to run as root or sudo.

sudo /etc/init.d/init.tfa start

sudo /opt/app/crs/11.2.0/11.2.0.4/tfa/bin/tfactl diagcollect -from "Apr/26/2015 01:00:00" -to "Apr/26/2015 09:00:00"

sudo /etc/init.d/init.tfa stop

Friday, April 24, 2015

Add a log file in ASM

ALTER DATABASE ADD LOGFILE Instance 'DHODS1' GROUP 5
SIZE 4096 M;

This simple commands adds redo log file with 2 members of size 4g. One member is located in a disk group and mirrored in other disk group if you have normal redundancy. No need to specify path. Please verify once done.

Thursday, April 23, 2015

Register instance to a listener

alter system set local_listener = '(address_list=(address=(protocol=tcp)(host=xxx.hke.local)(port=1521)))' scope=both sid='shods1';

Monday, April 20, 2015

List users based on role

SELECT grantee FROM DBA_ROLE_PRIVS
WHERE GRANTED_ROLE = 'HMA_TM_PROD_RO_ROLE';

Friday, April 17, 2015

RAC Issues 11gr2

Notes for RAC Installation

Problem #1: RAC node does fails to start CRS after turning off 1 Infiniband switch

Symptom:

After one of the redundant Infiniband switches are powered off, one or more nodes lose connection to cluster and is/are evicted.

Cause: The Infiniband configuration on the RAC nodes are different between the nodes

Error:

2014-01-03 10:58:12.376

[cssd(10047)]CRS-1612:Network communication with node kmaiptmsdb03 (3) missing for 50% of timeout interval. Removal of this node from c

luster in 299.178 seconds

2014-01-03 11:00:42.434

[cssd(10047)]CRS-1611:Network communication with node kmaiptmsdb03 (3) missing for 75% of timeout interval. Removal of this node from c

luster in 149.131 seconds

2014-01-03 11:01:22.151

[/opt/app/crs/11.2.0/11.2.0.3/bin/orarootagent.bin(10023)]CRS-5818:Aborted command 'check' for resource 'ora.crsd'. Details at (:CRSAGF0

0113:) {0:0:2} in /opt/app/crs/11.2.0/11.2.0.3/log/kmaiptmsdb01/agent/ohasd/orarootagent_root/orarootagent_root.log.

2014-01-03 11:02:12.468

[cssd(10047)]CRS-1610:Network communication with node kmaiptmsdb03 (3) missing for 90% of timeout interval. Removal of this node from c

luster in 59.104 seconds

2014-01-03 11:03:11.593

[cssd(10047)]CRS-1607:Node kmaiptmsdb03 is being evicted in cluster incarnation 283963173; details at (:CSSNM00007:) in /opt/app/crs/11.

2.0/11.2.0.3/log/kmaiptmsdb01/cssd/ocssd.log.

2014-01-03 11:03:14.601

[cssd(10047)]CRS-1625:Node kmaiptmsdb03, number 3, was manually shut down

2014-01-03 11:03:20.546

[cssd(10047)]CRS-1601:CSSD Reconfiguration complete. Active nodes are kmaiptmsdb01 kmaiptmsdb02

Solution: Ensure /etc/path_to_inst has the same Infiniband configuration across all RAC nodes

path_to_inst:"/pci@400/pci@2/pci@0/pci@8/pciex15b3,673c@0/ibport@1,ffff,ipib" 1 "ibd"

path_to_inst:"/pci@400/pci@2/pci@0/pci@8/pciex15b3,673c@0/ibport@2,ffff,ipib" 2 "ibd"

path_to_inst:"/pci@500/pci@2/pci@0/pci@a/pciex15b3,673c@0/ibport@1,ffff,ipib" 0 "ibd"

path_to_inst:"/pci@500/pci@2/pci@0/pci@a/pciex15b3,673c@0/ibport@2,ffff,ipib" 3 "ibd"

Problem #2: 11.2 RAC database fails to come up after restart

Symptom: RAC database throws syntax error during startup phase

Cause: The remote_listener parameter is not interpreted properly by 11.2 RAC database

Error:

ORA-00132: syntax error or unresolved network name hmaiptmsdb-scan:1521'

Solution: Ensure EZCONNECT parameter is included in sqlnet.ora. This is new behavior in 11.2 RAC.

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)

*********************************************************************************************************************************

When a communication failure occurs

If you are still facing the issue please follow below steps and upload the details from the failing node.

A>. On the failing node force stop crs using "crsctl stop crs -f"

B>. Delete the socket files available at the below location on the failing node
<< you may need to be root for this >>
/var/tmp/.oracle
or
/tmp/.oracle
or
/usr/tmp/.oracle
<< rm -rf /var/tmp/.oracle >>
.
C>. on the failing node issue
ps -ef | grep 'init d.bin' | grep -v grep
kill any d.bin process on the failing node

D>. Then kill the gipcd on the good node and let it respawn automatically.
That should allow the gipcd on both nodes to talk to each other over the private interconnect.
You can kill gipcd process by getting the process id of gipcd
(issue ps -ef | grep gipcd.bin" to find the gipcd.bin pid).

E>. start crs on the failing node using "date; crsctl start crs"

For verification that we've used this process before you can check:
#1>11gR2 GI Node May not Join the Cluster After Private Network is Functional After Eviction due to Private Network Problem ( Doc ID 1479380.1 )
#2>Due to bug 13899736 - Node cannot join the cluster after reboot or "interconnect restored"

*********************************************************************************************************************************

Oracle support may ask you to take a system dump when hang occurs

Here are the steps

System dump gather

sqlplus / as sysdba

SQL> oradebug setospid <ospid of diag process>

SQL> oradebug unlimit

SQL> oradebug -g all hanganalyze 3

##..Wait about 1-2 minutes

SQL> oradebug -g all hanganalyze 3

SQL> oradebug -g all dump systemstate 258

If you can NOT connect to the instance as / as sysdba, you can use prelim however hanganalyze will not be possible with prelim:

sqlplus -prelim '/ as sysdba'

SQL> oradebug setospid <ospid of diag process>

SQL> oradebug unlimit

SQL> oradebug -g all dump systemstate 258

Additional details for collecting system state dumps can be found in MOS note: 452358.1.

******************************************************

If you have large SGA than 100gb and running rac please follow the recommendations here

a.      Set _lm_sync_timeout to 1200
           Setting this will prevent some timeouts during reconfiguration and DRM

b.      Set _ksmg_granule_size to 134217728
           Setting this will cut down the time needed to locate the resource for a data block.

c.      Set shared_pool_size to 15% or larger of the total SGA size.
        For example, if SGA size is 1 TB, the shared pool size should be at least 150 GB.

d.      Set _gc_policy_minimum to 15000
        There is no need to set _gc_policy_minimum if DRM is disabled by setting _gc_policy_time = 0

e.      Set _lm_tickets to 5000
        Default is 1000.   Allocating more tickets (used for sending messages) avoids issues where we ran out of tickets during the reconfiguration.

f.      Set gcs_server_processes to the twice the default number of lms processes that are allocated.
        The default number of lms processes depends on the number of CPUs/cores that the server has,
        so please refer to the gcs_server_processes init.ora parameter section in the Oracle Database Reference Guide
        for the default number of lms processes for your server. Please make sure that the total number of lms processes
        of all databases on the server is less than the total number of CPUs/cores on the server. Please refer to the Document 558185.1

******************************************************

Also consider changing these

2). Set "_gc_read_mostly_locking"=FALSE

This disables read mostly locking, this feature helps in an environment where the objects are mostly read without being modified. Once an object is identified as read-mostly,

share locks are granted immediately, to reduce nodes constantly opening and closing share locks and sending lots of messages.

example:

/opt/oradiag/diag/rdbms/phsub/phsub1/trace/phsub1_lmd0_8457.trc

*** 2016-02-03 01:36:21.136

Begin DRM(12080) (swin 1) - READMOSTLY transfer pkey 96063.0 to 1 oscan 1.1

kjiobjscn 1

3). Set _gc_policy_minimum=15000

This reduces the chances of 'affinity' DRM to happen, the default in 11.2.0.3 is 1500 (note this was 6000 in 10.2.0.5). By setting to 15000 we are saying an object must be accessed 250 times a second by a node to initiate affinity,

the default was 25 (100 in 10.2.0.5)

example:

READMOSTLY object id 96063.0, objscan 1.1, create affinity to instance 1

Total pkey count in this drm 1

* drm quiesce

4). Another workaround is increasing the number of Lock Elements by setting _gc_element_percent=140 (default is 110 in 11.2, 120 in 12.1, and this may increase to 140 in 12.2).

This will increase the number of LEs to 140% of number of data block buffers in the buffer cache. This will result in a slight increase in the shared pool usage.

5). Verify the number of LSM processes - you should have 4-

show parameter gcs_serverprocesses

Thursday, April 16, 2015

DATA Vault expdp error

Even using system because of DB vault i am getting some insufficient priveleges error.

So when you have db vault installed

Run the following statement as Database vault owner and you are good to go

EXEC DVSYS.DBMS_MACADM.AUTHORIZE_DATAPUMP_USER('XXX');

here replace xxx with username you are trying to export with

Wednesday, April 15, 2015

Refresh roles in database scheduler job

So take a schema name hma_prod and it has several tables and new tables are being added frequently. Users who need read access to these schemas will assigned to HMA_PROD_RO_ROLE.

It is a pain to grant each and every table every time you add a new one. So i set up this daily scheduler job that runs every night and refreshes roles.

For this i created a schema with dba role with access to all access in database.

Create the following procedure in the schema

CREATE OR REPLACE PROCEDURE HISNA_DBA.REFRESH_SCHEMA_PRIVS (p_schema IN VARCHAR2, p_role IN VARCHAR2, p_privilege IN VARCHAR2)
AUTHID CURRENT_USER IS
v_exec_string VARCHAR2(300);

CURSOR c_tables IS
SELECT table_name
FROM sys.dba_tables
WHERE owner = p_schema;

CURSOR c_views IS
SELECT object_name
FROM sys.dba_objects
WHERE owner = p_schema
AND status='VALID'
AND object_type='VIEW';

grant_option EXCEPTION;
PRAGMA EXCEPTION_INIT(grant_option, -1720);
BEGIN
IF (p_schema IS NULL OR p_role IS NULL OR p_privilege IS NULL) THEN
dbms_output.put_line('One of the mandatory parameters are blank.');
dbms_output.put_line('Usage: refresh_schema_privs(<schema_name>, <role>, [ RO | RW ])');
ELSE
FOR r_tables IN c_tables
LOOP
BEGIN
IF UPPER(p_privilege) = 'RO' THEN
v_exec_string := 'GRANT SELECT ON ' || p_schema || '.' || r_tables.table_name || ' TO ' || p_role;
ELSIF UPPER(p_privilege) = 'RW' THEN
v_exec_string := 'GRANT SELECT, INSERT, UPDATE, DELETE, ALTER ON ' || p_schema || '.' || r_tables.table_name || ' TO ' || p_role;
ELSE
dbms_output.put_line('Invalid Input for Parameter privilege.');
END IF;

EXECUTE IMMEDIATE v_exec_string;

EXCEPTION
WHEN grant_option THEN
CONTINUE;
END;
END LOOP;

FOR r_views in c_views
LOOP
BEGIN
EXECUTE IMMEDIATE 'GRANT SELECT ON ' || p_schema || '.' || r_views.object_name || ' TO ' || p_role;
EXCEPTION
WHEN grant_option THEN
CONTINUE;
END;
END LOOP;
END IF;
END;
/

After the procedure is created.. create a scheduler job

BEGIN

SYS.DBMS_SCHEDULER.CREATE_JOB

(

job_name => 'HISNA_DBA.DAILY_SCHEMA_REFRESH_PRIVS'

,start_date => TO_TIMESTAMP_TZ('2014/03/18 09:32:40.131457 America/Los_Angeles','yyyy/mm/dd hh24:mi:ss.ff tzr')

,repeat_interval => 'FREQ=DAILY;BYHOUR=1;BYMINUTE=0;BYSECOND=0'

,end_date => NULL

,job_class => 'DEFAULT_JOB_CLASS'

,job_type => 'PLSQL_BLOCK'

,job_action => 'begin

HISNA_DBA.REFRESH_SCHEMA_PRIVS (''HMA_PROD'', ''HMA_PROD_RO_ROLE'', ''RO'');

end;'

,comments => 'Job to refresh the schema level privileges daily'

);

SYS.DBMS_SCHEDULER.SET_ATTRIBUTE