Monthly Archives: July 2017

Tracking down PGA memory leak – a walkthrough

The other day I tracked down a nasty problem one application was having. It was first noticed in the alert.log:

ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT
...
PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT of 7000 MB
Immediate Kill Session#: 686, Serial#: 26646
Immediate Kill Session: sess: 0x19e7c2a78  OS pid: 35173

Checking the PGA memory from gv$process revealed a shocking picture:

SQL> select prc.inst_id, ses.sid, ses.serial#, prc.pid, prc.spid, round(prc.pga_used_mem/1024/1024) used_mb, round(prc.pga_alloc_mem/1024/1024) alloc_mb, round((prc.pga_alloc_mem - prc.pga_used_mem)/1024/1024, 0) unused_mb, round(prc.pga_freeable_mem/1024/1024) freeable_mb, round(prc.pga_max_mem/1024/1024) max_mb
from gv$process prc
  left outer join gv$session ses on (ses.inst_id = prc.inst_id and ses.paddr = prc.addr and ses.type = 'USER')
order by used_mb desc
fetch first 10 rows only;

   INST_ID        SID    SERIAL#        PID SPID                        USED_MB   ALLOC_MB  UNUSED_MB                             FREEABLE_MB     MAX_MB
---------- ---------- ---------- ---------- ------------------------ ---------- ---------- ---------- --------------------------------------- ----------
         1        655      50496         89 338934                          227        284         57                                       0        499
         1        675      45630        137 338938                          223        280         57                                       0        296
         1        416       7124        285 315339                          222        281         58                                       1        298
         1        290      45321        296 81710                           184        233         49                                       0        280
         2        296      52565         98 223329                          163        207         45                                       0        270
         2        710       8463        293 386687                          162        207         45                                       0        270
         1         67      15923         54 98311                           147        187         40                                       0        264
         2        295      25163        242 316189                          143        185         42                                       0        266
         1         32      55120        270 358206                          137        178         41                                       0        260
         2        309      13006        230 167307                          135        175         39                                       0        258

 10 rows selected 

A lot of sessions used between 100MB and 230MB. Monitoring these sessions over some time showed that the memory was never released, something’s definitely wrong. PGA dumps of a couple of connections all showed the same pattern.

oradebug setmypid
oradebug dump pga_detail_get <pid>
SQL> select category, name, heap_name, round(bytes/1024/1024, 0) size_mb, allocation_count
from gv$process_memory_detail
where pid = 89
order by pid, bytes desc
fetch first 10 rows only;

CATEGORY        NAME                       HEAP_NAME          SIZE_MB                        ALLOCATION_COUNT
--------------- -------------------------- --------------- ---------- ---------------------------------------
Other           permanent memory           kolarsCreateCt         214                                   10541
Other           free memory                top uga heap            57                                    1571
Other           free memory                kolarsCreateCt           3                                    4907
Other           free memory                session heap             2                                    1016
Other           kollalos2                  koh-kghu sessi           1                                      86
Other           permanent memory           pga heap                 1                                      43
Other           kxsFrame4kPage             session heap             0                                     123
Other           qmushtCreate               qmtmInit                 0                                       6
PL/SQL          static frame of inst       koh-kghu sessi           0                                      36
Other           inode->map_kdlimem         buckets_kdliug           0                                       9

The closest I come to kolarsCreateCt in the oradebug doc is this:

oradebug doc component
...
Components in library GENERIC:
--------------------------
...
  LOB                          LOB (koll, kola)
...

This is the clue that there is apparently a LOB resource management problem. Luckily, the PL/SQL code base was not too big and only a couple of places where LOB handling was done. But, all LOBs seemed to be handled correctly.
One thing, though, I first didn’t pay much attention to was a few “XMLTYPE.getClobVal()” in SELECT statements. I’ve used that myself in the past a couple of times and knew this returned a persistent LOB handle and should not leak.
I took one of these SQL statements and ran them in my favorite tool, SQL Developer, and BOOOOOM! My session just consumed a couple of hundreds of megabytes of PGA memory and wouldn’t release it until I closed the connection.

So I crafted a little test case to check what’s going on.

SQL> create table xmltest(id number, data xmltype, primary key (id))
SQL>   xmltype column data store as binary xml;
SQL> insert into xmltest (id, data) values (1, '<someelement></someelement>');
SQL> commit;
SQL> select t.id, dbms_lob.istemporary(t.data.getclobval()) is_temp from xmltest t;

        ID    IS_TEMP
---------- ----------
         1          1

I don’t think this is what I knew from past experience so I did a quick MOS search and found following bug report:
Bug 23061709 : HIGH KOLARSCREATECT AND ORA-04036 WHEN XMLTYPE COLUMN IS STORED AS BINARY

OK, lets change the XMLType storage to CLOB and see what happens:

SQL> create table xmltest(id number, data xmltype, primary key (id))
SQL>   xmltype column data store as securefile clob;
SQL> insert into xmltest (id, data) values (1, '<someelement></someelement>');
SQL> commit;
SQL> select t.id, dbms_lob.istemporary(t.data.getclobval()) is_temp from xmltest t;

        ID    IS_TEMP
---------- ----------
         1          0

To be fair, the documentation for 11.2, 12.1, and 12.2 all state the following:
“This member function returns a CLOB containing the serialized XML representation. If the CLOB returned is temporary, it must be freed after use.”

Because I always run my tests in SQL*Plus additionally to SQL Developer I was puzzled to see that there’s no memory leak. Why does it not happen in SQL*Plus? I thought that maybe SQL*Plus is clever enough to always check for temporary LOBs and free them.

Firing up gdb with this script proved me right.

break OCILobFreeTemporary
  commands
    where
    continue
  end

This SQL fetches five rows with CLOB data.

SQL> select t.id, t.data.getclobval() from xmltest t where rownum <= 5;

Output from gdb:

Breakpoint 1, 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1
#0  0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1
#1  0x00007f678eafb1ef in afioci () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#2  0x00007f678eaed912 in afifpilobfreetemp () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#3  0x00007f678eaed799 in afifpifretmp () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#4  0x00007f678eae8f20 in afidet () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#5  0x00007f678eb272f3 in safiqry () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#6  0x00007f678eb096e3 in afiqry () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#7  0x00007f678eb21bf2 in afixeqqry () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#8  0x00007f678eb21521 in afixeqsql () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#9  0x00007f678eb20ed5 in afixeqr () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#10 0x00007f678eabfd74 in aficfd () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#11 0x00007f678eabec49 in aficdr () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#12 0x00007f678eaeacf2 in afidrv () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so
#13 0x00000000004007a1 in main ()

Breakpoint 1, 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1
...
#13 0x00000000004007a1 in main ()

Breakpoint 1, 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1
...
#13 0x00000000004007a1 in main ()

Breakpoint 1, 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1
...
#13 0x00000000004007a1 in main ()

Breakpoint 1, 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1
...
#13 0x00000000004007a1 in main ()

Five rows fetched and gdb shows five calls to OCILobFreeTemporary. The call to OCILobFreeTemporary originates from libsqlplus.so which tells me it’s not the OCI layer itself handling it, it’s the logic in SQL*Plus.

Next, I wanted to see myself what happens in a Java/JDBC application. Here’s the relevant snippet from my test case:

      String sql = "select t.id, t.data.getclobval() from xmltest t where rownum <= 5";
      stmt = conn.prepareStatement(sql);
      
      for (int i=0; i<3; i++) {
      stmt = conn.prepareStatement(sql);
        rset = stmt.executeQuery();

        while (rset.next ()) {
          System.out.println (rset.getInt(1));
          xmlclob = rset.getClob(2);
          xmlclob.free();
        }

        rset.close();

        c.format("\nPress ENTER 1 to proceed.\n");
        c.readLine();
      }

Like SQL*Plus being an application handling the LOB resources, you have to do it programmatically in Java as well. Line 11 frees the temporary LOB.
Without the free() call on the Clob you will see the memory leak when there XMLType storage clause is XML BINARY. It happens, even when do don’t reference the LOB column in Java:

        while (rset.next ()) {
          System.out.println (rset.getInt(1));
        }

Lesson learnt: never trust what you think you know! Sometimes things change in subtle ways.

Rant: Does Oracle care anymore?

It seems Oracle has lost all interest in its on-premises software. I’ve noticed this with other features and behaviour changes, this is just a small example.
Remember the article about DBCA silent installation? I mentioned the following:

  • Controlfiles are not multiplexed to different paths as specified. First given path is used for all controlfile members (SR is "Work In Progress")

Last Thursday, after 3+ month, my Service Request has finally been updated with an answer. They’ve changed DBCA to ignore the <ControlfileAttributes> section and the CONTROL_FILES “initParam” value unless you specify “-honorControlFileInitParam” argument. This argument does not show in the dbca help “dbca -createDatabase -help” nor is it anywhere in the documentation – how I’m I supposed to know?. And this is what bugs me the most: Oracle 12.2 on-premises is out now for quite a while but it seems that the documentation has not been updated in many places. Guess that’s how it’s gonna be from now on 🙁