Wednesday, February 18, 2009

Approach to Resolve ORA-4031

Simplified Approach to Resolve ORA-4031

A Few Case studies and other related articles, about an approach towards Resolving ORA -4031 error. First we will see what ORA-4031 actually means.

04031, 00000, "unable to allocate %s bytes of shared memory (\"%s\",\"%s\",\"%s\,\"%s\")"

// *Cause: More shared memory is needed than was allocated in the shared pool.

// *Action: If the shared pool is out of memory, either use the

// dbms_shared_pool package to pin large packages,

// reduce your use of shared memory, or increase the amount of

// available shared memory by increasing the value of the

// INIT.ORA parameters "shared_pool_reserved_size" and

// "shared_pool_size".

// If the large pool is out of memory, increase the INIT.ORA

// parameter "large_pool_size".

ORA-4031 error is encountered when we do not have sufficient memory available in shared pool/large pool to service a memory request. But in actual ORA – 4031 can be encountered in any of these areas

1) Shared pool
2) Large Pool
3) Java Pool
4)Streams pool (new to 10g)

This brings us to the first step in our pursuit for finding the cause for ORA -4031.

Step1: Identify the Pool associated with error

Like any other Oracle error, we first need to check Database Alert Log file and also any trace files which gets generated during that time in user_dump_dest,background_dump_dest. Though there are cases when ORA-4031 error is not recorded in alert.log. Starting from 9.2.0.5, you should be able to see trace files which gets generated in udump/bdump location (Depending on whether background process or user process encountered the error).

ORA – 4031 has basically three arguments

1) Size requested
2) Area
3) Comment

ORA-4031: unable to allocate bytes of shared memory (”area “,”comment”)

e.g ORA-4031: unable to allocate 2196 bytes of shared memory

("shared pool","JOB$","KGLS heap","KGLS MEM BLOCK))

So we see from above that the error has occurred in Shared Pool. This is very important step as in case of other pools, ORA-4031 errors are resolved by increasing Java_pool_size and Streams_pool _size.

In this article I will be discussing mostly about errors encountered in Shared pool with small section on Large Pool.

Step2: What is value of SHARED_POOL_SIZE?

Current settings for shared pool related parameters can be found using below query

SQL>col name for a50

SQL>col value for a10

SQL> select nam.ksppinm NAME, val.KSPPSTVL VALUE from x$ksppi nam, x$ksppsv val

where nam.indx = val.indx and nam.ksppinm like '%shared_pool%' order by 1;



NAME VALUE

-------------------------------------------------- ----------

__shared_pool_size 654311424

_dm_max_shared_pool_pct 1

_enable_shared_pool_durations TRUE

_io_shared_pool_size 4194304

_shared_pool_max_size 0

_shared_pool_minsize_on FALSE

_shared_pool_reserved_min_alloc 4400

_shared_pool_reserved_pct 5

shared_pool_reserved_size 19293798

shared_pool_size 0

You can use following notes for checking the minimum shared pool size

Note 105813.1 - SCRIPT TO SUGGEST MINIMUM SHARED POOL SIZE

In case of 10g, you can use SGA_TARGET parameter for managing values of Shared Pool,Large pool, Streams Pool,Java Pool, Buffer Cache (DB_CACHE_SIZE). Following note can be used for 10g

Note 270935.1 - Shared pool sizing in 10g

It is recommended to set a lower limit for SHARED_POOL_SIZE parameter.

You can also use V$LIBRARYCACHE view (AWR/Statspack report also has this section) and check if there were lot of Reloads happening for SQL AREA and TABLE/PROCEDURE Namespace. This gives indication that Shared Pool is not appropriately sized. In case you see high value for Invalidations, then this could be due to executing DDL against the objects, gathering stats (DBMS_STATS), or granting/revoking privileges.

High Value for Hard parses in AWR/Statspack report can also be caused by shared pool sizing issues but it cannot be used as a sole criteria as High hard parses can be caused by use of literals and presence of version counts/Child Cursors. This is discussed in section Using Literals Instead of Bind Variables and Multiple Child Cursors/High Version Count.

Some more key points related to Shared pool Sizing

-Shared pool memory consumption varies from release to release

-10g might fail with shared pool of 300 Mb though 8i was working fine

-Some part of memory allocated to fixed structures. Parameters like db_files, open_cursors and processes contribute to Overhead. When you use “Show SGA” command, you will see that “Variable Size” will be more then sum of “Shared Pool + Large Pool + Java Pool”. This is attributed to the value of these parameters.

Please note that in case you specify a low value for SGA_MAX_SIZE, you will see Oracle bumping the value to higher value so as to accomodate high value of Overhead memory.

Staring from 10g, Overhead memory is accomodated in shared_pool_size.

e.g If you specify SHARED_POOL_SIZE as 200 MB and your internal overhead is 100 Mb, then your actual shared pool value available to instance is only 100Mb.

You can read Note:351018.1 - Minimum for SHARED_POOL_SIZE Parameter in 10.2 Version for more information.

Shared Pool Fragmentation

Shared Pool fragmentation also can cause ORA-4031. This is caused when your queries are not being shared and you are seeing lot of reloads and Hard parses in the Statspack Report. In this case check the request failure size

ORA-4031: unable to allocate 16400 bytes of shared memory

We see that failure size is 16K. In this case you can see if you are using Shared_pool_reserved_size parameter for defining shared pool reserved area. Algorithm for memory allocation is such that it will first try to get memory from the shared pool and then if the requested memory size is greater then _Shared_pool_reserved_min_alloc , then it will get the memory from Shared Pool Reserved area. By default this value is set to 4400 bytes. In case the failure value is say 4200, you can try reducing the value of this parameter to reduce the occurences. Though this is not the complete solution. Read Tweaking _Shared_pool_reserved_min_alloc and ORA-4031 for more details.

You can also identify shared pool fragmentation by querying X$KSMSP

select  'sga heap('||KSMCHIDX||',0)'sga_heap,ksmchcom ChunkComment,

decode(round(ksmchsiz/1000),0,'0-1K', 1,'1-2K', 2,'2-3K',

3,'3-4K',4,'4-5K',5,'5-6k',6,'6-7k',7,'7-8k',8,'8-9k', 9,'9-10k','> 10K') "Size",

count(*), ksmchcls "Status", sum(ksmchsiz) "Bytes" from x$ksmsp

where KSMCHCOM = 'free memory' group by 'sga heap('||KSMCHIDX||',0)',

ksmchcom, ksmchcls, decode(round(ksmchsiz/1000),0,'0-1K', 1,'1-2K', 2,'2-3K',

3,'3-4K',4,'4-5K',5,'5-6k',6,'6-7k',7,'7-8k',8,'8-9k', 9,'9-10k','> 10K')



SGA_HEAP CHUNKCOMMENT Size COUNT(*) Status Bytes

-------------- ---------------- ----- ---------- ---------- ----------

sga heap(1,0) free memory > 10K 393 free 11296600

sga heap(1,0) free memory 3-4K 256 free 781928

sga heap(1,0) free memory 8-9k 63 free 510656

sga heap(1,0) free memory 6-7k 60 free 367076

sga heap(1,0) free memory 2-3K 555 free 1071448

sga heap(1,0) free memory 1-2K 1818 free 1397244

sga heap(1,0) free memory 0-1K 3418 free 348344

sga heap(1,0) free memory 9-10k 30 free 269820

sga heap(1,0) free memory 4-5K 154 free 640332

sga heap(1,0) free memory 5-6k 75 free 381920

sga heap(1,0) free memory > 10K 39 R-free 8302632

sga heap(1,0) free memory 7-8k 22 free 152328

If you see lot of memory chunks in 1-4k and very few in buckets >5K then it indicates Shared Pool Fragmentation. In this case you need to also look at Hard Parses (Statspack/AWR Report). This is discussed in section Using Literals Instead of Bind Variables and Multiple Child Cursors/High Version Count.

Note: - It is not recommended to run queries on X$KSMSP as it can lead to Latching issues. Do not run them frequently (I have seen people scheduling them as part of Oracle Hourly jobs. This should be avoided)

Step3: Is it MTS? If Yes, then are you using LARGE_POOL_SIZE?

LARGE_POOL_SIZE recommended for many features of Oracle which are designed to utilize large shared memory chunks like

- Recovery Manager (RMAN)

- parallel processing/IO slave processing. e.g px msg pool consuming more memory

- Shared Server Configuration

UGA will be allocated from shared pool in case large pool is not configured. So this can cause issues while using Shared Server Mode (MTS). Ensure that you are using LARGE_POOL_SIZE parameter or SGA_TARGET.

Step4: Are you having Multiple Subpools?

Subpool concept introduced from 9i R2. Instead of one big shared pool, memory will be divided into many sub pools.To determine number of subpools, you can use below query

SQL> select nam.ksppinm NAME, val.KSPPSTVL VALUE from x$ksppi nam, x$ksppsv val

where nam.indx = val.indx and nam.ksppinm like ‘%kghdsidx%’ order by 1 ;



NAME VALUE

—————————— ——————–

_kghdsidx_count 4

Above query indicates that there are 4 subpools

In case you get ORA-4031 and trace file gets generated, then the trace file can also be used to know the number of subpools configured. To do this search on “Memory Utilization of Subpool”
e.g
Memory Utilization of Subpool 1
========================
free memory 10485760
Memory Utilization of Subpool 2
========================

free memory 20971520

This means that there are two subpools configured for your database.

Oracle suggest having 500M as minimum subpool size. I will say that in case you are not facing serious Shared pool Latch contention, 2 subpools should be sufficient (though I believe most of contention issues can be solved by tuning the application). To change the number of subpools, we need to set parameter _kghdsidx_count in pfile or spfile and restart the database

In case of Spfile

alter system set “_kghdsidx_count”=1 scope=spfile;

Restart of database is required as it is a Static parameter. Please note that Large pool has same number of subpools as shared pool so you might be required to change number of subpools in case you are observing ORA-4031 in large pool.

You can read more about Shared Subpools in my earlier post

Step5: Is Sqlarea consuming lot of Memory?

Actually this can also be categorized into “Bad Application Design” as most of the cases are caused by way applications have been designed. High value for sqlarea in V$SGASTAT (or AWR/Statspack report) can be attributed to following causes

Using Literals Instead of Bind Variables

This is the most common cause for ORA-4031. Tom Kyte explains this on one of his post consequences of not using bind variables

If you do not use bind variables and you flood the server with
hundreds/thousands of unique queries you will
-run dog slow
-consume a ton of RAM (and maybe run out)
-not scale beyond a handful of users, if thatamong other really bad side effects.

The above statement is true and you can find lot of cases where not using Bind variables caused excessive Parsing issues (leading to CPU contention) and ORA-4031 issues. One of the way to locate such statements is by running following query.

SELECT substr(sql_text,1,90) "SQL",count(*) "SQL Copies",

sum(executions) "TotExecs", sum(sharable_mem) "TotMemory"

FROM v$sqlarea

WHERE executions <> 30

ORDER BY 2;

I personally try to use script from Asktom website to find these statements. You can find ,more information by clicking here

create table t1 as select sql_text from v$sqlarea;

alter table t1 add sql_text_wo_constants varchar2(1000);

create or replace function

remove_constants( p_query in varchar2 ) return varchar2

as

l_query long;

l_char varchar2(1);

l_in_quotes boolean default FALSE;

begin

for i in 1 .. length( p_query )

loop

l_char := substr(p_query,i,1);

if ( l_char = '''' and l_in_quotes )

then

l_in_quotes := FALSE;

elsif ( l_char = '''' and NOT l_in_quotes )

then

l_in_quotes := TRUE;

l_query := l_query '''#';

end if;

if ( NOT l_in_quotes ) then

l_query := l_query l_char;

end if;

end loop;

l_query := translate( l_query, '0123456789', '@@@@@@@@@@' );

for i in 0 .. 8 loop

l_query := replace( l_query, lpad('@',10-i,'@'), '@' );

l_query := replace( l_query, lpad(' ',10-i,' '), ' ' );

end loop;

return upper(l_query);

end;

/

update t1 set sql_text_wo_constants = remove_constants(sql_text);



select sql_text_wo_constants, count(*)

from t1

group by sql_text_wo_constants

having count(*) > 100

order by 2

/

Above query will give you queries which are using literals and should be modified to use bind variables. Sometimes it is not possible to modify the application, in that case you can use CURSOR_SHARING=SIMILAR/FORCE to force the application to use bind variables. Please note that this can cause issues (especially CURSOR_SHARING=SIMILAR), so it is recommended to test the application in Test environment before implementing in Production. Applications like Oracle Apps do not certify use of this parameter so also check with your application vendor if this can be used.

You can refer to following articles where I have discussed similar issue

ORA-4031 - A Case Study

Application Design and ORA-4031

Multiple Child Cursors/High Version Count

This is also one of the cause for high usage of memory in SQLAREA region. Child cursors are generated in Shared pool when the SQL text is same but Oracle cannot share it because the underlying objects are different or different optimizer settings, etc. To know about child cursors, refer to following Metalink note

Note 296377.1 - Handling and resolving unshared cursors/large version_counts

In case of Oracle 10g, you can use Statspack/AWR report for finding the child cursors under category “SQL ordered by Version Counts”. Following statements can also be run to identify if child cursors are being generated in your database


For 10g

SQL> select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss

where sa.address=ss.address and sa.version_count > 50 order by sa.version_count ;

For 8i/9i

select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss

where sa.address=ss.KGLHDPAR and sa.version_count > 50 order by sa.version_count ;

Results returned by above query reports SQL which are not being shared due to some reason. You should find column with Value Y to find the cause. Most of these issues are encountered while using CURSOR_SHARING=SIMILAR. In case you are using this parameter with columns having Histograms, then it is expected behavior.Read more about Cursor issues related to Histograms in Note:261020.1 - High Version Count with CURSOR_SHARING = SIMILAR or FORCE

There are cases where none of the column value returns Y value. Most of these cases, you need to work with Oracle support to find the cause as this could be a bug.

Child Cursors are problematic as they increase shared pool memory consumption, High parsing and also as the number of child cursors increase, Oracle will take more time to span all the child cursors to match if it can reuse them, if not then it spawns a new child cursor. This results in High Parsing time and CPU contention.

High Sharable Memory per SQL

One more cause for high value of SQLAREA in V$SGASTAT is high memory consumption for SQL statement. This can be due to poorly written SQL statement or due to Oracle Bugs.

In case of Oracle 10g, you can use Statspack/AWR report for finding the statements with high value of Sharable Memory. You can also use Sharable_mem column in V$SQLAREA to find these queries.

Step6:What Next?

You have followed all the above steps and find everything is ok. Now what do we check next?

We can look for any trace file which got generated during the time of error and see which component was taking more memory. You can try searching in metalink with that component. Else you can take a heapdump at time of error and upload the file to support.

Heapdump event
The Heapdump event is used to dump memory from different subheaps. Errors ora-4030 are associated with problems in the pga, uga or cga heaps, and error ora-4031 is related only to problems with the shared pool/large pool/Java Pool/Streams Pool.

command - >  alter system set events ‘4031 trace name heapdump level 2’;

init.ora - >events=’4031 trace name heapdump, level 2’

SQL>oradebug setmypid

SQL>oradebug dump heapdump 2

SQL>oradebug tracefile_name

Staring from 9.2.0.5, level 536870914 can be used for generating heapdump which will gather more diagnostic information for support to diagnose the cause.

Also it is not recommended to set Heapdump event in init.ora or spfile since it will force multiple dumps at time of Shared Pool memory issues. Oracle requires Shared pool Latch for dumping heapdump, so this can worsen the Latching situation. You can set Errorstack event to generate trace file at time of ORA-4031 error

alter system set events '4031 trace name errorstack level 3'; 

Use immediate trace option or Oradebug command at time of error

SQL> connect / as sysdba

SQL> alter session set events 'immediate trace name heapdump level 536870914';



OR



sqlplus "/ as sysdba"

oradebug setmypid

oradebug unlimit

oradebug dump heapdump 536870914

oradebug tracefile_name

exit

Upload the tracefile to Oracle support.

Using the above approach will help you to resolve ORA-4031 in Shared Pool.

Large Pool

While working on ORA-4031 in large pool, you need to follow below approach

1)Check size for LARGE_POOL_SIZE. If possible increase it.

2)Check number of subpools. Ensure that you have sufficient memory in each subpool. _kghdsidx_count is used to control the number of subpools in large pool also. So you would have to either increase memory available in each subpool or decrease the count.

3)In case of MTS, check if any session is consuming lot of memory. It’s a case where instead of getting ORA-4030, you get ORA-4031 in large pool (In MTS, UGA is part of large pool).

4)If all above suggestions have been tried, then capture heapdump and upload the file to Oracle Support. You can use level 32 or 536870944 i.e

SQL> connect / as sysdba

SQL> alter session set events 'immediate trace name heapdump level 32';

or

SQL> alter session set events 'immediate trace name heapdump level 536870944';

I hope this article helps in following a methodology for resolving ORA-4031. At present this article is not exhaustive article on this error and it will be more useful if it can be used as a approach after you have gone through below metalink notes.

Note:62143.1 - Understanding and Tuning the Shared Pool

Note:396940.1 - Troubleshooting and Diagnosing ORA-4031 Error

Note:146599.1 - Diagnosing and Resolving Error ORA-04031

No comments:

Post a Comment