Monthly Archives: April 2016

RMAN in FIRST_ROWS hell

A short while back I was doing a database upgrade/migration from 11.2.0.1 to 12.1.0.2 to a new server. To keep the downtime of the 850GB big database short I used Transportable Tablespaces together with incrementally updated backups where the datafile copies are placed on the target machine’s storage using a shared mount. So, during the downtime (as soon as the Tablespaces are set READ ONLY) all there’s left is doing the last inc 1 backup and the meta data export / import. Everything went fine on all test databases which were freshly cloned from production.
But then, around comes time for production. I started taking the inc 0 datafile copy backups a few days before. Eight long hours later I was ready to take inc 1 backups from time to time and apply them on the datafile copies. This is where all the good plans went south. Every time the “BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG…” command ran it took about 23 seconds before RMAN actually started taking the backup. During the production downtime this is probably fine if there’s only 5 datafiles. Our database had more than 50 datafiles. And, you have to account for 50 x 23 seconds for the “RECOVER COPY OF DATAFILE…” command as well as the same problem applies there, too. Clearly, this issue needed resolving before the production downtime.

My investigation showed that above RMAN commands trigger the calling of DBMS_RCVMAN.getRcvRec which in turn calls DBMS_RCVMAN.getDataFileCopy. In there are some complex queries involving V$BACKUPSET, V$DATAFILE_COPY, etc. and one of these queries did have a very bad execution plan. First I thought there might be a problem with the object statistics on the underlying X$ tables, namely X$KCCBF, X$KCCBP and X$KCCBC. OK, the stats were somewhat stale so I gathered them for the involved X$ tables. Unfortunately, this didn’t solve the problem of the bad execution plan. Then, I remembered that during the initial analysis of the database I noticed that OPTIMIZER_MODE was set to FIRST_ROWS on the instance level (for whatever reason the SW vendor claimed this was best). Of course, this setting also affected RMAN. As the database was still fully productive I couldn’t just change the parameter to ALL_ROWS. Setting up a login trigger for RMAN seemed too intrusive. The solution was simple: run an ALTER SESSION at the start of the RMAN session and all is fine…

sql "alter session set optimizer_mode = ALL_ROWS";

Btw., on the new 12.1.0.2 database the application runs just perfectly with ALL_ROWS 😉