SQL Developer – enabling TLS for JDBC thin driver

I don’t think this is documented anywhere (and no hits on google) but it shows once again how awsome SQL Developer is. 🙂
Instead of using “jdbc:oracle:oci” for TLS enabled connections we’re going to set SQL Developer up to do just that with “jdbc:oracle:thin”. In that regard, SQL Developer is no more special than any Java application.
To keep it short I’m sticking to the basics, maybe I’ll add some variations into the comments later.
First, we need to add “OraclePKIProvider” to the JRE security provider’s list. The provider’s list is defined in “$JAVA_HOME/jre/lib/security/java.security”.
I’m going to append to it by creating a new file that I only source from SQL Developer. In my case, the next free slot is 10. Depending on your OS and JRE version the list may slightly vary.
(also make sure that the java.security file is appendable: “security.overridePropertiesFile=true”)

$ vi /opt/sqldeveloper/sqldeveloper/bin/sqldev.java.security
security.provider.10=oracle.security.pki.OraclePKIProvider

Next, we add a few options to the sqldeveloper.conf file:

$ vi /opt/sqldeveloper/sqldeveloper/bin/sqldeveloper.conf
# add this to end of the config file

# specify the path to your SSO wallet file
AddVMOption -Doracle.net.wallet_location="(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/home/btr/tns/wallet)))"
# this is required if you want to use TLS authentication next to in-flight traffic encryption
AddVMOption -Doracle.net.authentication_services="(TCPS)"
# optionally, if you want the client to verify the server certificate
AddVMOption -Doracle.net.ssl_server_dn_match=true
# append/override JRE security config to add Oracle provider
AddVMOption -Djava.security.properties=/opt/sqldeveloper/sqldeveloper/bin/sqldev.java.security
# even SQLDev 18.1 ships with a very old version that is not compatible - just download the latest one from OTN
AddJavaLibFile /home/btr/orasec/jlib/oraclepki.jar

That’s it, we’re done.

Now you can create connections using TNS, LDAP, or Advanced. Basic/EZCONNECT does not support to specify the protocol, so it can’t be used for TLS.

Here we only use TLS for in-flight traffic encryption:

And here, we make use of TLS user authentication:

Little trick with lsnrctl

Are you running RAC/GI with role separation? You’re tired of constantly switching between “oracle” and “grid” users or you don’t have access to “grid” at all?
This little trick saves me lots of time whenever I need to check the status or services details of a listener running as “grid”. I hope it serves you as well as it does serve me 🙂

This gives me the listener names running on the connected node:

$ ps -ef|grep tnslsnr
grid      3468     1  0 21:50 ?        00:00:00 /u01/app/gi12201/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid      3643     1  0 21:50 ?        00:00:00 /u01/app/gi12201/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid      3646     1  0 21:50 ?        00:00:00 /u01/app/gi12201/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit
grid      3652     1  0 21:50 ?        00:00:00 /u01/app/gi12201/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit
grid      3665     1  0 21:50 ?        00:00:00 /u01/app/gi12201/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit

By default, every listener in a GI/RAC setup has an IPC end-point with the KEY being the same as the listener name. For instance:

Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=192.168.56.55)(PORT=1523)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.55)(PORT=1521)))

So you can simply run lsnrctl with the connect descriptor on the command line (as oracle):

$ lsnrctl status "(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))"

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 17-MAY-2018 22:12:49

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date                17-MAY-2018 21:52:41
Uptime                    0 days 0 hr. 20 min. 8 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/gi12201/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/ol7122rac2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=192.168.56.55)(PORT=1523)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.55)(PORT=1521)))
Services Summary...
Service "65388e79ee096b69e0533238a8c08c32.localdomain" has 2 instance(s).
  Instance "RAC11", status READY, has 2 handler(s) for this service...
  Instance "RAC12", status READY, has 2 handler(s) for this service...
Service "RAC1.localdomain" has 2 instance(s).
  Instance "RAC11", status READY, has 2 handler(s) for this service...
  Instance "RAC12", status READY, has 2 handler(s) for this service...
Service "pdbrac1.localdomain" has 2 instance(s).
  Instance "RAC11", status READY, has 2 handler(s) for this service...
  Instance "RAC12", status READY, has 2 handler(s) for this service...
The command completed successfully

Yes, this only works for listeners that run on the node you’re connected to. You can’t do it across nodes, but I still find it extremely useful.

For your convenience, you can put the connect descriptors in the TNSNAMES.ORA sourced by “oracle”:

$ cat $TNS_ADMIN/tnsnames.ora 
listener_scan1=(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))

$ lsnrctl services listener_scan1

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 17-MAY-2018 22:30:10

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
Services Summary...
Service "65388e79ee096b69e0533238a8c08c32.localdomain" has 2 instance(s).
  Instance "RAC11", status READY, has 2 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCPS)(HOST=192.168.56.53)(PORT=1523))
...

It also works within the lsnrctl command line:

$ lsnrctl

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 17-MAY-2018 22:39:58

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Welcome to LSNRCTL, type "help" for information.

LSNRCTL> set current_listener (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
Current Listener is (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
LSNRCTL> services
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
Services Summary...
Service "65388e79ee096b69e0533238a8c08c32.localdomain" has 2 instance(s).
  Instance "RAC11", status READY, has 2 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCPS)(HOST=192.168.56.53)(PORT=1523))
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.53)(PORT=1521))
...

Any comments? I’m listening…

Home is where the ORACLE_HOME is

Here’s a little SQL that may be helpful in certain troubleshooting situations. It can answer questions like:

  • Which ORACLE_HOME is my database running from?
  • Which server-side sqlnet.ora is used by my connection (TNS_ADMIN)?

Or more generally speaking: what’s the value of a given environment variable for my session’s server process.

with
  function get_env(var in varchar2)
  return varchar2
  is
    val varchar2(32767);
  begin
    dbms_system.get_env(get_env.var, get_env.val);
    return get_env.val;
  end;
select get_env('&var_name.') env_val from dual
/

Examples:

SQL> r
Enter value for var_name: ORACLE_HOME

ENV_VAL                                                                                                                                                                                                            
--------------------------------
/u01/app/oracle/product/ora12201                      

SQL> r
Enter value for var_name: TNS_ADMIN

ENV_VAL                                                                                                                                                                                                            
--------------------------------
/u01/app/oracle/network/admin

SQL> r
Enter value for var_name: PATH

ENV_VAL                                                                                                                                                                                                            
--------------------------------
 

Note, Oracle apparently intentionally redacts the value for certain environment variables, e.g. PATH.
But for the use cases above I’ve found it extremely useful…

Secure Postgres connection from SQL Developer

Searching the web it seems most people use “org.postgresql.ssl.NonValidatingFactory” to connect to Postgres from SQL Developer.

This doesn’t seem to be a good idea as the documentation states it is not secure:
Provide a SSLSocketFactory that allows SSL connections to be made without validating the server's certificate. This is more convenient for some applications, but is less secure as it allows "man in the middle" attacks.

Looking into the jar file under “/org/postgres/ssl” you’ll also find SingleCertValidatingFactory. Now that’s much better:
Provides a SSLSocketFactory that authenticates the remote server against an explicit pre-shared SSL certificate. This is more secure than using the NonValidatingFactory as it prevents "man in the middle" attacks. It is also more secure than relying on a central CA signing your server's certificate as it pins the server's certificate.

Let’s get to it…

The boring part: load the Postgres JDBC driver
Go to “Tools” -> “Preferences…” -> “Third Party JDBC Drivers” and add the jar file

The tricky part: getting the connect string right (it’s actually not that hard)
Syntax:

<hostname>/<database>?ssl=true&sslfactory=org.postgresql.ssl.SingleCertValidatingFactory&sslfactoryarg=file:<path-to-cert-file>&

For example:

abc.xzy.us-east-1.rds.amazonaws.com/postgres?ssl=true&sslfactory=org.postgresql.ssl.SingleCertValidatingFactory&sslfactoryarg=file:/home/btr/certs/amazon-rds.crt&

Enter this entire string in the “Hostname” field.

The exciting part: playing with your newly connected Postgres database 🙂

Keep it secure!

SQL Plan Baselines – the parallel trap

Today, the good news is that I have time to write this blog post. The less good news is that our basement got flooded and I have to stay home. The bad news is, my current client does not allow remote work 🙁
So, blog post it is…

There are a number of reasons why a SQL Plan Baseline might not get used. Here’s one I was not fully aware of until recently (although it makes perfect sense when you think about it): ALTER SESSION FORCE PARALLEL QUERY [PARALLEL n].
In the simplest of cases the outcome whether a SQL Plan Baseline is used depends on the following:

  • PARALLEL_DEGREE_POLICY parameter
  • table decoration (DEGREE)
  • optimizer enviroment used to generate the plan baseline (ENABLE PARALLEL / FORCE PARALLEL / FORCE PARALLEL n)
  • optimizer environment of the session executing the query (ENABLE PARALLEL / FORCE PARALLEL / FORCE PARALLEL n)
  • plan in the baseline (serial or parallel plan?)

The base of the test case is a simple table that I’m going to select:

create table t1
as
select rownum id, 'ABC' text
from dual connect by level <= 100
;

For every combination of interest we run through the following procedure:
1) open and configure a new session for the parse environment on which the plan baseline is based
2) run this query “select * from t1”
3) create a fixed plan baseline for the generated plan
4) re-connect and configure the parse environment for the executing session
5) run query from step 2) and collect cursor information
do steps 4) and 5) for ENABLE PARALLEL, FORCE PARALLEL, and FORCE PARALLEL n

Test 1: object DEGREE 1, parallel_degree_policy = manual

parsing session (SPB)
/
executing session
enable (serial plan) force (parallel plan) force 4 (parallel plan)
enable (default) used not used (*3) not used (*3)
force not used (*1) used used
force 4 not used (*2) used used

*1) Degree of Parallelism is 8 because of table property
*2) Degree of Parallelism is 4 because of session
*3) No note in the plan about DOP or baseline
Summary for test 1:
If you have a serial plan in the baseline and use any force parallel on the session the plan baseline is not used and you get a parallel plan.
If you have a parallel plan in the baseline and run the query on a session with ENABLE PARALLEL QUERY (default settings) the plan baseline is not used and you get a serial plan.

Test 2: object DEGREE 1, parallel_degree_policy = limited

parsing session (SPB)
/
executing session
enable (serial plan) force (parallel plan) force 4 (parallel plan)
enable (default) used used used (*3)
force not used (*1) used used (*3)
force 4 not used (*2) used used

*1) automatic DOP: Computed Degree of Parallelism is 2
*2) Degree of Parallelism is 4 because of session
*3) Degree of Parallelism is 2 because of hint
Summary for test 2:
If you have a serial plan in the baseline and use any force parallel on the session the plan baseline is not used and you get a parallel plan.
Now that we allow for auto DOP the session with ENABLE PARALLEL QUERY can use parallel plans in plan baselines.

Test 3: object DEGREE DEFAULT, parallel_degree_policy = limited

parsing session (SPB)
/
executing session
enable (serial plan (*4)) force (parallel plan) force 4 (parallel plan)
enable (default) used used used (*3)
force not used (*1) used used (*3)
force 4 not used (*2) used used

*1) automatic DOP: Computed Degree of Parallelism is 2
*2) Degree of Parallelism is 4 because of session
*3) Interestingly, there is no note about DOP in the plan at all. But it uses the plan baseline.
*4) automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
Summary for test 3:
If you have a serial plan in the baseline and use any force parallel on the session the plan baseline is not used and you get a parallel plan.
Again, as we allow for auto DOP the session with ENABLE PARALLEL QUERY can use parallel plans in plan baselines. The result is the same as Test 2 but some the notes in the plans differ.

Test 4: object DEGREE DEFAULT, parallel_degree_policy = limited, fake stats so DOP > 1 for all plan baselines

parsing session (SPB)
/
executing session
enable (parallel plan (*1)) force (parallel plan) force 4 (parallel plan)
enable (default) used used used
force used used used
force 4 used used used

*1) automatic DOP: Computed Degree of Parallelism is 2
Summary for test 4:
Naturally, now that we always have parallel plans in the plan baselines and the object statistics call for auto DOP > 1 the plan baselines get used in all cases.

Why did I do this? See, there’s this batch job with a SQL that has a SQL Plan Baseline on it (serial plan). Now, every once in a while the run-time of this batch job goes through the roof and every time this happens I see that the query does not use the baseline (v$sql.sql_plan_baseline is NULL). Also, next to different PLAN_HASH_VALUEs I noticed different OPTIMIZER_ENV_HASH_VALUEs. Checking the session settings V$SES_OPTIMIZER_ENV showed that “parallel_query_forced_dop” was set to “default”, which means “ALTER SESSION FORCE PARALLEL QUERY” was run previously on that session.
But why is it not deterministic? The tool that runs all the batch jobs uses a connection pool, some job steps force parallel and some don’t. We haven’t been able to technically confirm this but everything points towards that this session property is not cleared to default when a connection gets reused. So, sometimes this batch job just gets unlucky by the session it gets from the connection pool.
The solution: Adding second SQL Plan Baseline. This plan is a parallel plan with the same structure as the original serial plan. Now, either one of the plan baselines (serial or parallel plan) is being used depending on the session configuration.

Footnote:
When you use “FORCE PARALLEL QUERY” you might get a serial plan. You’ll see this in the plan notes: “automatic DOP: Computed Degree of Parallelism is 1”. Obviously, this would change some of above results.

View merging limitation on OUTER JOIN

This is a short note about a limitation in complex view merging for outer joins.

We start with two simple tables, t1 and t2. To show the effect we don’t even need to load any data.

create table t1 (
    id1 number not null
  , vc1 varchar2(200)
);

create table t2 (
    id1 number not null
  , id2 number not null
  , num1 number
);

I know, I said it’s about outer joins but let’s first check the execution plan for the INNER JOIN.

explain plan for
select *
from t1
  inner join (
    select /*+ merge */
        id2
      , 0 x
      , sum(num1) sum_num1
    from t2
    group by id2
  ) s1 on (s1.id2 = t1.id1)
;

select * from dbms_xplan.display();

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |   153 |     5  (20)| 00:00:01 |
|   1 |  HASH GROUP BY      |      |     1 |   153 |     5  (20)| 00:00:01 |
|*  2 |   HASH JOIN         |      |     1 |   153 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |     1 |   127 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T2   |     1 |    26 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Oracle merges the inline view as instructed by the MERGE hint. Btw., I’m only using the hint for demonstration purposes.

What happens if we change the join to a LEFT OUTER JOIN?

explain plan for
select *
from t1
  left outer join (
    select /*+ merge */
        id2
      , 0 x
      , sum(num1) sum_num1
    from t2
    group by id2
  ) s1 on (s1.id2 = t1.id1)
;
select * from dbms_xplan.display();

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |   143 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN OUTER     |      |     1 |   143 |     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | T1   |     1 |   115 |     2   (0)| 00:00:01 |
|   3 |   VIEW               |      |     1 |    28 |     3  (34)| 00:00:01 |
|   4 |    HASH GROUP BY     |      |     1 |    26 |     3  (34)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| T2   |     1 |    26 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

The inline view is not merged anymore. The optimizer trace reveals why it cannot merge the view anymore:

CVM:   Checking validity of merging in query block SEL$2 (#2)
CVM:     CVM bypassed: View on right side of outer join contains view with illegal column.
CVM:     CVM bypassed: Externally referenced expressions are not merge-safe.
CVM:     CVM bypassed: view on right side of Outer Join + MuLTiple TABle.
CVM:     CVM bypassed: view on right side of Outer Join + MuLTiple TABle.

In case you haven’t noticed, there’s this little expression in the projection of the inner SELECT on line 7 (“0 x”). As soon as we remove it, the view will be merged by the optimizer also for LEFT OUTER JOIN.

explain plan for
select *
from t1
  left outer join (
    select /*+ merge */
        id2
--    , 0 x
      , sum(num1) sum_num1
    from t2
    group by id2
  ) s1 on (s1.id2 = t1.id1)
;
select * from dbms_xplan.display();

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |   153 |     5  (20)| 00:00:01 |
|   1 |  HASH GROUP BY      |      |     1 |   153 |     5  (20)| 00:00:01 |
|*  2 |   HASH JOIN OUTER   |      |     1 |   153 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |     1 |   127 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T2   |     1 |    26 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Thanks to point and click tools *cough* Cognos *cough* I’ve seen this a lot lately 😉

Footnote: tests run on 12.2.0.1

Parse overhead: NOT NULL vs CHECK(xyz IS NOT NULL)

Jonathan Lewis’ update on his post “Constraints, Inserts and bind” prompted another thought: is there a difference between a check constraint (IS NOT NULL) and a NOT NULL column declaration with regards to recursive queries that are run during parsing?

Let’s start with “CHECK(xyz IS NOT NULL)” constraints:

create table t2 (
    n1 number       check(n1 is not null)
  , v1 varchar2(10) check(v1 is not null)
)
;

exec dbms_monitor.session_trace_enable(null, null, true, false)
begin
  for idx in 1..100 loop
    execute immediate 'insert into t2 values (' || idx || ', ''ABC'')';
  end loop;
end;
/
commit;
exec dbms_monitor.session_trace_disable(null, null)

Next, we replace the check constraints with “NOT NULL” declarations.

create table t1 (
    n1 number       not null
  , v1 varchar2(10) not null
)
;

exec dbms_monitor.session_trace_enable(null, null, true, false)
begin
  for idx in 1..100 loop
    execute immediate 'insert into t1 values (' || idx || ', ''ABC'')';
  end loop;
end;
/
commit;
exec dbms_monitor.session_trace_disable(null, null)

The SQL trace file from the check constraints shows 107 distinct SQLs. One of them being (as one would expect after reading Jonathan’s post):

select condition from cdef$ where rowid=:1

The trace file from the “NOT NULL” declarations shows 106 distinct SQLs. You know which one’s not in there, right? 😉

When every LIO counts, use “NOT NULL” declarations over “CHECK (xzy IS NOT NULL)” constraints!

Play fetch – get exactly one row at a time

Every once in a while I want SQL*Plus to fetch one row at a time, so I set ARRAYSIZE to 1.
But:

SQL> set autotrace traceonly statistics
SQL> set arraysize 1
SQL> select * from user1.cf1 where rownum <= 100;

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        102  consistent gets
          0  physical reads
          0  redo size
     258194  bytes sent via SQL*Net to client
       1091  bytes received via SQL*Net from client
         51  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

We only did 51 network round trips to fetch 100 rows. This is because SQL*Plus is using OCI and the default for prefetch (OCI_ATTR_PREFETCH_ROWS) is set to 1. So we end up with 2 rows per fetch call (note that with prefetch 1, the first round trip already returns 1 row).

Now, I really, really want to fetch just one row at a time. Starting with Oracle 12.1 you can create an OCI configuration file to specify various settings, one of them being prefetch. See documentation for more details.

vi ${TNS_ADMIN}/oraaccess.xml

<?xml version="1.0"?>
 <oraaccess xmlns="http://xmlns.oracle.com/oci/oraaccess" xmlns:oci="http://xmlns.oracle.com/oci/oraaccess" schemaLocation="http://xmlns.oracle.com/oci/oraaccess http://xmlns.oracle.com/oci/oraaccess.xsd">
  <default_parameters>
    <prefetch>
      <rows>0</rows>
    </prefetch>
  </default_parameters>
</oraaccess>

With this configuration in place we should see 100+ network round trips.

SQL> set autotrace traceonly statistics
SQL> set arraysize 1
SQL> select * from user1.cf1 where rownum <= 100;

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        102  consistent gets
          0  physical reads
          0  redo size
     267988  bytes sent via SQL*Net to client
       1652  bytes received via SQL*Net from client
        102  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

Go play fetch 😉

Follow-up: SQL*Net tracing

In my previous post Are you fishing or catching? – Server-side SQL*Net tracing for specific clients I said SQL*Net tracing is all or nothing as sqlnet.ora entries apply to all server processes. I since learned that there is a “oradebug” command to enable SQL*Net tracing for individual server processes:

oradebug setospid <OSPID>
oradebug dump sqlnet_server_trace <LEVEL>

Available levels are the same as for TRACE_LEVEL_SERVER in sqlnet.ora (0,4,10,16).

This is a nice little addition to my toolset. The only drawback is that you cannot trace initial connection hand-shakes (e.g. SSL troubleshooting) as you have to attach to a existing process.

12c Oracle DCD – kernel call dive

As you probably read in the 12c documentation Oracle has changed the mechanism for DCD (Dead Connection Detection) from sending TNS packets to using TCP KEEPALIVE on the socket. Unfortunately, the documentation is extremely sparse about implementation details:

When I was tracking down a firewall configuration issue recently I was in need of a few more details. Specifically, I wanted to know what TCP_KEEPIDLE, TCP_KEEPINTVL, and TCP_KEEPCNT are set to for various values of SQLNET.EXPIRE_TIME.
Since Oracle reads sqlnet.ora and sets the socket options early in the process startup (spawned by the listener) I opted for SystemTap to caputre all calls to “setsockopt”. Unfortunately, on my 7.4 Oracle Linux I ran into a stupid SystemTap bug. So, DTrace it is:

syscall::setsockopt:entry
{
  printf("%s | socket=%d | level=%d | optname=%d | optlen=%d", execname, arg0, arg1, arg2, arg4);
  self->optval = *(int *)copyin(arg3, arg4);
}

syscall::setsockopt:return
{
  printf("optval=%d\n", self->optval);
}

I ran it on dedicated lab VM to minimize cluttering of the output as the D script captures all “setsockopt” calls, not just the ones from Oracle.
But, before I show the output we need to know a few things.

The function definition for “setsockopt”:
This is the kernel function to modify socket options like TCP_KEEPALIVE, etc.

int setsockopt(
  int sockfd,
  int level,
  int optname,
  const void *optval,
  socklen_t optlen);
sockfd  => file descriptor for the socket
level   => the level on which to set the option/value (socket or any protocol)
optname => the number representing the option name according to header files
optval  => the value to set for the given option

Note, the sequence of arguments matches arg0 .. arg4 in above D script.

The relevant socket level & options (from socket.h):

# level
1 => SOL_SOCKET
 
# options
9  => SO_KEEPALIVE
20 => SO_RCVTIMEO
21 => SO_SNDTIMEO

The relevant TCP level & options (from tcp.h)

# level
6 => SOL_TCP
 
# options
1 => TCP_NODELAY
4 => TCP_KEEPIDLE
5 => TCP_KEEPINTVL
6 => TCP_KEEPCNT

Now we’ve got all the information necessary to interpret the output from my D script.

SQLNET.EXPIRE_TIME = 1

dtrace: script 'sockopt.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0    125                 setsockopt:entry tnslsnr | socket=14 | level=6 | optname=1 | optlen=4
  0    126                setsockopt:return optval=1
  
  1    125                 setsockopt:entry oracle_7090_dev | socket=14 | level=6 | optname=1 | optlen=4
  1    126                setsockopt:return optval=1
  
  1    125                 setsockopt:entry oracle_7090_dev | socket=14 | level=1 | optname=9 | optlen=4
  1    126                setsockopt:return optval=1
  
  1    125                 setsockopt:entry oracle_7090_dev | socket=14 | level=6 | optname=4 | optlen=4
  1    126                setsockopt:return optval=60
  
  1    125                 setsockopt:entry oracle_7090_dev | socket=14 | level=6 | optname=5 | optlen=4
  1    126                setsockopt:return optval=6
  
  1    125                 setsockopt:entry oracle_7090_dev | socket=14 | level=6 | optname=6 | optlen=4
  1    126                setsockopt:return optval=10

The output shows the following:

  • TCP_KEEPALIVE is enabled
  • TCP_KEEPIDLE is set to the value of SQLNET.EXPIRE_TIME in seconds
  • TCP_KEEPINTVL is set to 6
  • TCP_KEEPCNT is set to 10

OK, let’s gradually increase SQLNET.EXPIRE_TIME and see what happens.

SQLNET.EXPIRE_TIME = 10

... 
  0    125                 setsockopt:entry oracle_6400_dev | socket=14 | level=6 | optname=4 | optlen=4
  0    126                setsockopt:return optval=600
  
  0    125                 setsockopt:entry oracle_6400_dev | socket=14 | level=6 | optname=5 | optlen=4
  0    126                setsockopt:return optval=6
  
  0    125                 setsockopt:entry oracle_6400_dev | socket=14 | level=6 | optname=6 | optlen=4
  0    126                setsockopt:return optval=10
...

SQLNET.EXPIRE_TIME = 60

...
  1    125                 setsockopt:entry oracle_7147_dev | socket=14 | level=6 | optname=4 | optlen=4
  1    126                setsockopt:return optval=3600
  
  1    125                 setsockopt:entry oracle_7147_dev | socket=14 | level=6 | optname=5 | optlen=4
  1    126                setsockopt:return optval=6
  
  1    125                 setsockopt:entry oracle_7147_dev | socket=14 | level=6 | optname=6 | optlen=4
  1    126                setsockopt:return optval=10
...

SQLNET.EXPIRE_TIME = 120

...
  1    125                 setsockopt:entry oracle_7163_dev | socket=14 | level=6 | optname=4 | optlen=4
  1    126                setsockopt:return optval=7200
  
  1    125                 setsockopt:entry oracle_7163_dev | socket=14 | level=6 | optname=5 | optlen=4
  1    126                setsockopt:return optval=6
  
  1    125                 setsockopt:entry oracle_7163_dev | socket=14 | level=6 | optname=6 | optlen=4
  1    126                setsockopt:return optval=10
...

Conclusion
From this I conclude that Oracle always sets TCP_KEEPIDLE to the value of SQLNET.EXPIRE_TIME in seconds.
TCP_KEEPCNT and TCP_KEEPCNT are not adjusted based on SQLNET.EXPIRE_TIME but are always set to 6 and 10, respectively.

For the sake of completeness, here’s what happens when the connection closes.

 
  1    125                 setsockopt:entry oracle_7090_dev | socket=14 | level=1 | optname=21 | optlen=16
  1    126                setsockopt:return optval=0
  
  1    125                 setsockopt:entry oracle_7090_dev | socket=14 | level=1 | optname=20 | optlen=16
  1    126                setsockopt:return optval=0
  
  1    125                 setsockopt:entry oracle_7090_dev | socket=14 | level=1 | optname=9 | optlen=4
  1    126                setsockopt:return optval=0