How to never change connection strings again

The other day a colleague mentioned an interesting side-effect of CMAN: your clients never have to change the connection string ever again.
I’ve used CMAN for various other reasons (firewall channeling, etc.) but never thought of it as a solution to ever changing connection strings when you frequently move databases around. And from experience I can tell that for some application support teams changing connection strings can be a real struggle.

For this use case give the CMAN servers a round-robin DNS alias and publish service names that don’t need changing. This way, wherever your CMANs and databases are located the connection string for the client remains the same, e.g.:

my_service =
  (DESCRIPTION =
    (FAILOVER = ON)
    (LOAD_BALANCE = OFF)
    (TRANSPORT_CONNECT_TIMEOUT = 3)
    (CONNECT_TIMEOUT = 6) (RETRY_COUNT = 1)
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = cmans)(PORT = 1521))
    )
    (CONNECT_DATA = (SERVICE_NAME = my_service))
  )

So far everybody is happy…

Now, imagine you run Data Guard across two datacenters, A and B, for multiple databases. Some primary databases run in datacenter A, some in B. Because a few applications are extremely sensitive to latency, whenever a primary/standby role change happens you move your application server VMs to the same datacenter where the primary database runs.
Let’s introduce two CMAN instances in both datacenters and all of a sudden half your user base is complaining about performance. The problem is that about 50% of your connections are now going to the other datacenter just for CMAN to connect back, thus creating a loop:

If your sole purpose for using CMAN is to avoid changing client connect strings there are probably other (read better) solutions, for instance OID TNS lookups.

Or, hear me out, remote listeners πŸ˜‰

Replace CMAN with remote listeners to be the single point of contact for connection requests. The remote listener’s redirect messages will point to the local listener which will then spawn the server process. This way the application always connects to the database without being looped through the other datacenter. (for more details about listener redirects check this out)

How-to
For this example I’m assuming following hostnames, IPs, and DNS entries
DB server 1 in DC A: db-1 / 192.168.55.10
DB server 2 in DC B: db-2 / 192.168.55.20
Remote listeners in DC A: rl-1 / 192.168.55.30, rl-2 / 192.168.55.31
Remote listeners in DC B: rl-3 / 192.168.55.32, rl-4 / 192.168.55.33
DNS Alias: oracle-portal with IPs 192.168.55.30, 192.168.55.31, 192.168.55.32, 192.168.55.33 (round-robin for connection load balancing)

Create VMs/containers of the technology of your choosing and install the Oracle client.

INVENTORY_LOCATION=/u01/app/oraInventory
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=${ORACLE_BASE}/product/client19

./runInstaller -silent -waitForCompletion -noconfig \
oracle.install.responseFileVersion=/oracle/install/rspfmt_clientinstall_response_schema_v19.0.0 \
UNIX_GROUP_NAME=oinstall \
INVENTORY_LOCATION=${INVENTORY_LOCATION} \
ORACLE_HOME=${ORACLE_HOME} \
ORACLE_BASE=${ORACLE_BASE} \
oracle.install.client.installType=Custom \
oracle.install.client.customComponents=oracle.network.listener:19.0.0.0.0

Put this in ${ORACLE_HOME}/network/admin/listener.ora.

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = rl-1)(PORT = 1521))
    )
  )
ADMIN_RESTRICTIONS_LISTENER = ON
VALID_NODE_CHECKING_REGISTRATION_LISTENER = ON
REGISTRATION_INVITED_NODES_LISTENER = (rl-1, db-1, db-2)

Start the listener.

lsnrctl start LISTENER

Repeat that for the other 3 remote listener VMs/containers (don’t forget to chenge the hostname in listener.ora).
Invited nodes (Valid Node Checking Registration) is a good security practice for listeners to restrict the hosts that can register services (remember CVE-2012-1675 and COST? What a drama this was πŸ™‚ ).

Let the databases register their services.

alter system set remote_listener = "rl-1:1521","rl-2:1521","rl-3:1521","rl-4:1521" scope=both sid='*';
alter system register;

That’s it. From now on, your clients no longer need to know on which server your database is running.

my_service =
  (DESCRIPTION =
    (FAILOVER = ON)
    (LOAD_BALANCE = OFF)
    (TRANSPORT_CONNECT_TIMEOUT = 3)
    (CONNECT_TIMEOUT = 6) (RETRY_COUNT = 1)
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = oracle-portal)(PORT = 1521))
    )
    (CONNECT_DATA = (SERVICE_NAME = my_service))
  )

All roads lead to Rome – Importing custom TLS certificate into Oracle wallet

It appears that Oracle’s preferred way to get a certificate into a wallet is by generating a key pair and then create certificate signing request using “orapki” (that’s what you mostly see in the docs and on MOS). Once the request is singed by the CA you import the certificate into the wallet.
But, there are situations where you don’t get to create the signing request from the Oracle wallet. For instance, at one of my clients, they create everything at the CA and just send a PKCS#12 file to the DBAs. Since PKCS is a standard and the Oracle wallet itself is a PKCS#12 file one could (naively) assume that these things are compatible. Unfortunately, this is not always the case. When trying to generate an SSO file with “orapki” from an openssl created PKCS#12 file I get the following error:

Exception in thread "main" java.lang.NullPointerException
        at oracle.security.pki.OracleKeyStoreSpi.a(Unknown Source)
        at oracle.security.pki.OracleSSOKeyStoreSpi.a(Unknown Source)
        at oracle.security.pki.OracleFileWalletImpl.b(Unknown Source)
        at oracle.security.pki.OracleWallet.saveLSSO(Unknown Source)
        at oracle.security.pki.textui.OracleWalletTextUI.create(Unknown Source)
        at oracle.security.pki.textui.OracleWalletTextUI.command(Unknown Source)
        at oracle.security.pki.textui.OraclePKITextUI.main(Unknown Source)

This is just one example, I’ve encountered other problems when trying to work with PKCS#12 files that were created by tools other than “orapki” (or mkstore). Therefore you should always create the PKCS#12 file using “orapki” and then work from there.
I’m going to show a few practical ways how to import a “pre-created” custom TLS certificate into an Oracle wallet.

Assuming we get a PKCS#12 file containing everything we need:

  • Root certificate
  • Intermediate certificated
  • Private key
  • Signed certificate
ORACLE_WALLET_LOC=/home/oracle/wallet
ORACLE_WALLET_PWD=*****
SOURCE_PKCS12_FILE/home/oracle/custom-ca.p12
SOURCE_PKCS12_PWD=*****

orapki wallet create -wallet "${ORACLE_WALLET_LOC}" -compat_v12 -pwd "${ORACLE_WALLET_PWD}"
orapki wallet import_pkcs12 \
    -wallet "${ORACLE_WALLET_LOC}" \
   -pkcs12file "${SOURCE_PKCS12_FILE}" \
   -pkcs12pwd "${SOURCE_PKCS12_PWD}" \
   -pwd "${ORACLE_WALLET_PWD}"
orapki wallet create -wallet "${ORACLE_WALLET_LOC}" -compat_v12 -auto_login_local -pwd "${ORACLE_WALLET_PWD}"
orapki wallet display -wallet "${ORACLE_WALLET_LOC}"

Maybe we get a Java key store (JKS) containing everything:

ORACLE_WALLET_LOC=/home/oracle/wallet
ORACLE_WALLET_PWD=*****
SOURCE_JKS_FILE=/home/oracle/custom-ca.jks
SOURCE_JKS_PWD=*****

orapki wallet create -wallet "${ORACLE_WALLET_LOC}" -compat_v12 -pwd "${ORACLE_WALLET_PWD}"
orapki wallet jks_to_pkcs12 \
    -wallet "${ORACLE_WALLET_LOC}" \
    -keystore "${SOURCE_JKS_FILE}" \
    -jkspwd "${SOURCE_JKS_PWD}" \
    -pwd "${ORACLE_WALLET_PWD}"
orapki wallet create -wallet "${ORACLE_WALLET_LOC}" -compat_v12 -auto_login_local -pwd "${ORACLE_WALLET_PWD}"
orapki wallet display -wallet "${ORACLE_WALLET_LOC}"

Or, we might get everything as individual base64 encoded files:

  • root.cer
  • inter.cer
  • private-pwd.key (password protected)
  • custom.cer
CERT_FILES_LOC=/home/oracle
ORACLE_WALLET_LOC=/home/oracle/wallet
ORACLE_WALLET_PWD=*****
PRIVATE_KEY_PWD=*****

orapki wallet create -wallet "${ORACLE_WALLET_LOC}" -compat_v12 -pwd "${ORACLE_WALLET_PWD}"
orapki wallet add \
    -wallet "${ORACLE_WALLET_LOC}" \
    -trusted_cert \
    -cert "${CERT_FILES_LOC}/root.cer" \
    -pwd "${ORACLE_WALLET_PWD}"
orapki wallet add \
    -wallet "${ORACLE_WALLET_LOC}" \
    -trusted_cert \
    -cert "${CERT_FILES_LOC}/inter.cer" \
    -pwd "${ORACLE_WALLET_PWD}"
orapki wallet import_private_key \
    -wallet "${ORACLE_WALLET_LOC}" \
    -pvtkeyfile "${CERT_FILES_LOC}/private-pwd.key" \
    -pvtkeypwd "${PRIVATE_KEY_PWD}" \
    -cert "${CERT_FILES_LOC}/custom.cer" \
    -pwd "${ORACLE_WALLET_PWD}"
orapki wallet create -wallet "${ORACLE_WALLET_LOC}" -compat_v12 -auto_login_local -pwd "${ORACLE_WALLET_PWD}"
orapki wallet display -wallet "${ORACLE_WALLET_LOC}"

“Helper functions”
Extracting individual components from a PKCS#12 file:

CERT_FILES_LOC=/home/oracle
SOURCE_PKCS12_FILE/home/oracle/custom-ca.p12
SOURCE_PKCS12_PWD=*****
PRIVATE_KEY_PWD=*****

# extract private key from PKCS#12 file
openssl pkcs12 -nocerts -in "${SOURCE_PKCS12_FILE}" -out "${CERT_FILES_LOC}/private-pwd.key" -password pass:"${SOURCE_PKCS12_PWD}" -passout pass:"${PRIVATE_KEY_PWD}"
# cleanup file: remove Bag/Key Attribute information
sed -i -n '/-BEGIN ENCRYPTED PRIVATE KEY-/,/-END ENCRYPTED PRIVATE KEY-/p' "${CERT_FILES_LOC}/private-pwd.key"

# remove passphrase from private key file (!!!security risk!!!)
openssl rsa -in "${CERT_FILES_LOC}/private-pwd.key" -out "${CERT_FILES_LOC}/private-nopwd.key" -passin pass:"${PRIVATE_KEY_PWD}"

# extract certificate from PKCS#12 file
openssl pkcs12 -clcerts -nokeys -in "${SOURCE_PKCS12_FILE}" -out "${CERT_FILES_LOC}/custom.cer" -password pass:"${SOURCE_PKCS12_PWD}"

# extract root and intermediate certs from PKCS#12 file
#   manually split ca-chain.cer into root.cer and inter.cer
openssl pkcs12 -in "${SOURCE_PKCS12_FILE}" -cacerts -nokeys -chain -out "${CERT_FILES_LOC}/ca-chain.cer" -password pass:"${SOURCE_PKCS12_PWD}"

Create PKCS#12 file with “openssl”:

CERT_FILES_LOC=/home/oracle
TARGET_PKCS12_FILE/home/oracle/opennssl.p12
TARGET_PKCS12_PWD=*****
PRIVATE_KEY_PWD=*****

cat "${CERT_FILES_LOC}/root.cer" "${CERT_FILES_LOC}/inter.cer" > "${CERT_FILES_LOC}/ca-chain.cer"
openssl pkcs12 -export -out "${TARGET_PKCS12_FILE}" -inkey "${CERT_FILES_LOC}/private-pwd.key" -in "${CERT_FILES_LOC}/custom.cer" -certfile "${CERT_FILES_LOC}/ca-chain.cer" -password pass:"${TARGET_PKCS12_PWD}" -passin pass:"${PRIVATE_KEY_PWD}"

Convert a PKCS#12 file to Java key store (JKS):

OPENSSL_PKCS12_FILE=/home/oracle/wallet/openssl.p12
OPENSSL_PKCS12_PWD=*****
JKS_FILE=/home/oracle/wallet/keystore.jks
JKS_PWD=*****

keytool -v -importkeystore -srckeystore "${OPENSSL_PKCS12_FILE}" -srcstoretype pkcs12 -srcstorepass "${OPENSSL_PKCS12_PWD}" -destkeystore "${JKS_FILE}" -deststoretype jks -deststorepass "${JKS_PWD}"

Footnote: tested “orapki” from Oracle 19.5 database home

Tracing Data Guard traffic

Sometimes it is helpful to trace Data Guard background services so we can look at matching NSSn and RFS traces. For deep dives we also want to run TCP captures on both ends of a Data Guard configuration, and possibly on network components in the middle.
To minimize processing overhead on devices and noise in the capture files we want the packet filter to be as specific as possible. Just the source and target IP is not good enough, we also need a port number. Ideally, we apply the following filters to “tcpdump” in order to caputre the entire NSS <-> RFS traffic (but nothing more):

tcpdump 'host <primary IP> and port <NSSn port>'

NSSn is the process shipping the redo data from primary to standby. n is the number matching the LOG_ARCHIVE_DEST_n parameter for your redo transport configuration

Getting the IP of the primary host is easy, even if there is a separate network for Data Guard.
But how do we get the PORT of the NSSn process?

Method A
Identify the NSSn process on the primary:

select ses.sid, ses.serial#, ses.machine, ses.port, prc.pname, prc.spid, prc.stid, prc.tracefile
from gv$session ses
  join gv$process prc on (prc.inst_id = ses.inst_id and prc.addr = ses.paddr)
where prc.pname like 'NSS%'
;


The trick is to use the SPID as it is reflected in “v$session.process” on the standby database instance. In this case we query for process 4582 on the standby database:

select ses.sid, ses.serial#, ses.machine, ses.port, prc.pname, prc.spid, prc.stid, prc.tracefile
from gv$session ses
  join gv$process prc on (prc.inst_id = ses.inst_id and prc.addr = ses.paddr)
where ses.process = '4582'
;

In our example the port is 18274 resulting in this “tcpdump” command:

tcpdump 'host 192.168.55.71 and port 18274'

Method B
Use the same query to get the SPID of the NSSn process on the primary as in the first method.
With the SPID we can find the TCP connection details using the “ss” Linux utility:

ss -o -p -n -i | grep 4582

VoilΓ .

SQL trace
Obviously, with the respective SID/SERIAL# combinations you can enable the SQL trace on NSSn (primary) and RFS (standby):

Primary

exec dbms_monitor.session_trace_enable(session_id=> 83, serial_num => 57385, waits => true, binds => false)

Standby

exec dbms_monitor.session_trace_enable(session_id=> 191, serial_num => 37747, waits => true, binds => false)

Observation about the RFS process
The RFS trace file name shows as “DEV1_ora_4187.trc” in “v$process”. In fact, RFS is not a background process (v$session.type = ‘USER’).
But here’s an oddity, when activating SQL trace on RFS the trace file does not materialize on the filesystem with that name. It actually materializes with “DEV1_rfs_4187.trc”.

RAC and Data Guard
In a MMA environment you’d probably want to capture traffic from all primary instances (on the host with the apply instance):

tcpdump '(host <primary IP A> and port <NSSn port A>) or (host <primary IP B> and port <NSSn port B>)'

SQLcl – enabling MCS for JDBC thin driver

Let’s say you have configured TLS authentication for your database users using the Microsoft Certificate Store (MCS) on Windows clients.

For OCI based client this is pretty well documented and understood:

sqlnet.ora

SQLNET.AUTHENTICATION_SERVICES = (TCPS)
WALLET_LOCATION = (SOURCE = (METHOD = MCS))
SSL_SERVER_DN_MATCH = yes

tnsnames.ora

DEV1_TLS =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCPS)(HOST = ol7ora19dev1.spotonoracle.com)(PORT = 2484))
    (CONNECT_DATA =
      (SERVICE_NAME = DEV1.spotonoracle.com)
    )
    (SECURITY =
      (SSL_SERVER_CERT_DN = "cn=dev1,ou=servers,ou=oracle,dc=spotonoracle,dc=com")
    )
  )

With above configuration I can connect to the database using SQL*Plus.

C:\Users\user1>sqlplus /@dev1_tls

SQL*Plus: Release 12.2.0.1.0 Production on Wed May 29 18:31:53 2019

Copyright (c) 1982, 2018, Oracle.  All rights reserved.

Last Successful login time: Wed May 29 2019 18:10:15 +02:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Note that I use a 12.2 OCI client. This is because of bug 25809524 for which the fix has not made its way into the 18c release. I hope the soon to be released 19c client for Windows will have the fix.

In a previous post I’ve shown how to use a file based wallet (cwallet.sso) for JDBC thin applications,namely SQLcl. Unfortunately, we cannot simply change the WALLET_LOCATION parameter as we do in sqlnet.ora. This does not work:

set JAVA_TOOL_OPTIONS=-Doracle.net.wallet_location=(SOURCE=(METHOD=MCS)) -Doracle.net.authentication_services=(TCPS) -Doracle.net.ssl_server_dn_match=true

In order for JDBC thin applications to access the Microsoft Certificate Store we have to set Windows platform specific settings for trust store and key store properties:

set JAVA_TOOL_OPTIONS=-Djavax.net.ssl.trustStore=NONE -Djavax.net.ssl.trustStoreType=Windows-ROOT -Djavax.net.ssl.keyStore=NONE -Djavax.net.ssl.keyStoreType=Windows-MY -Doracle.net.authentication_services=(TCPS) -Doracle.net.ssl_server_dn_match=true

C:\Users\user1>sql /@dev1_tls
Picked up JAVA_TOOL_OPTIONS: -Djavax.net.ssl.trustStore=NONE -Djavax.net.ssl.trustStoreType=Windows-ROOT -Djavax.net.ssl.keyStore=NONE -Djavax.net.ssl.keyStoreType=Windows-MY -Doracle.net.authentication_services=(TCPS,KERBEROS5) -Doracle.net.ssl_server_dn_match=true

SQLcl: Release 19.1 Production on Wed May 29 18:45:21 2019

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Wed May 29 2019 18:45:22 +02:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

This works for SQLcl, SQL Developer, and any reasonable JDBC thin application. πŸ™‚

Bugs are annoying – Kerberos ticket issue

Update 08-JUN-2018
This bug has been fixed with the Oracle 19.3 client release for Windows. Below workaround should not be necessary anymore.

One cool feature of using Kerberos authentication is that when you have a TGT (Ticket Granting Ticket) in the ticket cache the Oracle client software can use that to get a service ticket and log you into the database without further asking for any credentials (single sing-on).

Here’s what it looks like with a Kerberos authenticated SSH session on Linux:

As you can see from the screenshot the “orasrv” service ticket is flagged “forwardable” and the database login is successful (“-f” tells “oklist” to show the ticket flags).

On Windows on the other hand the same fails with “ORA-12638: Credential retrieval failed”:

If we change to “non-forwardable” service tickets it works on both, Linux and Windows:

How do you change to “non-forwardable” service tickets?
1) Use “okinit” to manually re-initialize your ticket cache. By default it will get “non-forwardable” tickets (or use “-F” to be explicit).

2) You can configure the service principal in Active Directory so only “non-forwadable” tickets will be issued (even when you use “okinit -f” to explicitly ask for “forwardable” tickets)

Both workarounds do the trick but I don’t like neither of them for their obvious drawbacks. After a few weeks trying to convince the Oracle Support Engineer of the issue bug 28734494 has been filed on Oct 8, 2018 with no notable progress to date πŸ™
Despite what the bug description says the issue is not MSLSA vs file ticket cache, it is the ticket flags that make or break.

Btw. I’ve tested client versions 12.2.0.1, 18.3, 18.5 and they all exhibit the same behaviour.

A story about Data Guard and network latency

Imagine you run a trading application that, every morning at market open, processes many thousands of orders from a queue using dozens of database connections. And as we all know, trade order processing is a time critical task.
The system is a two node RAC cluster that is backed by a Data Guard physical standby database with the exact same specifications and configuration. To give you an idea what is going on in the database during this peak order processing, here are a few AWR stats (this is from node 1, but node 2 very much looks the same):

									Per Second
DB Time(s):							23.7
DB CPU(s):							4.8
Background CPU(s):					0.6
Redo size (bytes):					2,569,543.4
Logical read (blocks):				692,837.9
Block changes:						11,668.0
Physical read (blocks):				463.7
Physical write (blocks):			219.8
Read IO requests:					416.8
Write IO requests:					99.6
Global Cache blocks received:		2,125.3
Global Cache blocks served:			2,368.7
User calls:							9,354.2
Parses (SQL):						6,208.7
Hard parses (SQL):					1.6	0.0
Executes (SQL):						14,642.5
Rollbacks:							6.9	0.0
Transactions:						518.9

This is a fairly busy system with 9000+ user calls and 500+ transactions per second. “DB CPU” is about 1/5 of “DB Time” and with 32 physical cores per node the system certainly is not CPU bound. Below stats are based on a 30 seconds snapshot interval (I know, it’s insanely short but we had our reasons πŸ™‚ ).

Event								Waits		Total Wait Time (sec)		Wait Avg(ms)	% DB time	Wait Class
log file sync						15,404		244.9						15.90			29.8		Commit
DB CPU							 				167.9	 									20.4	 
gc buffer busy release				2,601		101.1						38.87			12.3		Cluster
gc buffer busy acquire				7,240		81							11.19			9.8			Cluster
gc current block busy				4,662		78.8						16.91			9.6			Cluster
gc cr block busy					3,945		68.7						17.40			8.3			Cluster
cell single block physical read 	13,029		12.3						0.94			1.5			User I/O
enq: TX - index contention			653			7.3							11.21			.9			Concurrency
enq: TX - row lock contention		378			5.3							14.08			.6			Application
gc current block 2-way				24,480		4.4							0.18			.5			Cluster

The top 10 waits show that 30% of the time was spent on “log file sync” with almost 16ms average wait time. 16ms for commit latency seems high, let’s take a closer look in the wait event histogram:

Event			Waits 64ms to 2s		<32ms	<64ms	<1/8s	<1/4s	<1/2s	<1s		<2s		>=2s
log file sync	801						94.8	 .7		  .5	   3.7		.2		 .2

Most outliers (15404/100*3.7 = 570) are in the 128ms – 250ms bucket . “log file sync” is the wait from the foreground processes and in Oracle 12c the wait chain for Data Guard sync mode is as follows:

"log file sync" (foreground process)
	=> "SYNC Remote Write" (LGWR)
		=> "Redo Transport MISC" (NSSn)

So let’s see what we have in the wait stats for the background processes:

Event								Waits		%Time -outs		Total Wait Time (s)		Avg wait (ms)	Waits /txn		% bg time
gcs log flush sync					91,603		1				45						0.49			5.09			19.64
Redo Transport MISC					6,335		0				29						4.59			0.35			12.62
SYNC Remote Write					6,335		0				28						4.34			0.35			11.93
log file parallel write				6,297		0				3						0.54			0.35			1.49
gc current block busy				53			0				2						43.59			0.00			1.00
gc buffer busy release				18			0				1						72.30			0.00			0.56
...

While not great, the average for “Redo Transport MISC” is “only” 4.59ms as compared to the 16ms “log file sync”.
Background wait event histogram:

Event					Waits 64ms to 2s	<32ms	<64ms	<1/8s	<1/4s	<1/2s	<1s		<2s		>=2s
Redo Transport MISC		57					99.1	 .0		  .0	   .8		.0		 .0	 	 	 	 
SYNC Remote Write		57					99.1	 .0		  .0	   .8		.0		 .0	 	 

Same as for “log file sync” most of the outliers for the redo transmission wait events are in the 128ms – 250ms range.

It looks like the redo transmission from primary to standby is sometimes stalling LGWR and thus causing many foreground transactions to wait for LGWR (that’s why the average “log file sync” wait is so much higher).

SQL traces from NSS/RFS processes confirmed that there are around 80-120 outliers per minute. While the redo transmission is on TCP (or lower) the corresponding RFS process on the standby is waiting on “SQL*Net message from client” or “SQL*Net more data from client”.
NSS2 trace:

WAIT #0: nam='Redo Transport MISC' ela= 205429 p1=0 p2=0 p3=0 obj#=-1 tim=12736668661230

corresponding RFS trace:

WAIT #0: nam='SQL*Net message from client' ela= 207662 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=18731316629403

This told us that both, the sender and the receiving process were waiting on the network stack and interestingly, in all instances the waits took a just a little over 200ms.

The next thing was running a TCP packet capture (tcpdump) on the primary & standby database servers to see what was going on on the network stack. This revealed that every time we saw a 200ms long “Redo Transport MISC” event in the NSS2 trace a TCP retransmission would occur, e.g.:

Why does it consistently take 200ms before the packet is retransmitted?
There is an exponential back-off algorithm in the Linux kernel for TCP retransmissions and it starts at 200ms in this environment:

$ grep '^CONFIG_HZ' /boot/config-$(uname -r)
CONFIG_HZ_1000=y
CONFIG_HZ=1000

$ grep '#define TCP_RTO_MIN' /usr/src/kernels/$(uname -r)/include/net/tcp.h
#define TCP_RTO_MIN	((unsigned)(HZ/5))

1000 HZ / 5 = 200ms(period). As in this case retransmissions only occur occasionally relative to the total packet volume the back-off algorithm never kicks in and the RTO stays at 200ms.

By the way, the retransmission timeout is calculated per port and the current value can be displayed using the “ss” command. For instance:

$ ss -o -p -n -i sport = :2483
Netid State      Recv-Q Send-Q                                                          Local Address:Port                                                                         Peer Address:Port              
tcp   ESTAB      0      0                                                        ::ffff:192.168.56.60:2483                                                                  ::ffff:192.168.56.1:46218               users:(("oracle_3485_dev",pid=3485,fd=16)) timer:(keepalive,9min52sec,0)
	 ts sack cubic wscale:7,7 rto:208 rtt:7.382/13.049 ato:40 mss:1448 rcvmss:1448 advmss:1448 cwnd:10 bytes_acked:5897 bytes_received:4860 send 15.7Mbps lastsnd:8237 lastrcv:8238 lastack:8237 pacing_rate 31.4Mbps rcv_rtt:60 rcv_space:28960
tcp   ESTAB      0      0                                                        ::ffff:192.168.56.60:2483                                                                  ::ffff:192.168.56.1:46086               users:(("oracle_2024_dev",pid=2024,fd=16)) timer:(keepalive,4min45sec,0)
	 ts sack cubic wscale:7,7 rto:212 rtt:11.107/15.77 ato:40 mss:1448 rcvmss:1448 advmss:1448 cwnd:10 bytes_acked:7009 bytes_received:7710 send 10.4Mbps lastsnd:1515530 lastrcv:1515611 lastack:315015 pacing_rate 20.9Mbps rcv_rtt:54 rcv_space:28960

You can see one port has RTO=208 and the other is RTO=212 but they are all close to 200ms.

What can we do about it?
Well, ideally you want no TCP retransmissions to occur in your network but that’s just not going to happen, after all TCP was designed to deal with lossy networks. In this system the retransmissions make less than 0.1% of all the Data Guard traffic, nevertheless the impact on the trading application is real. Since TCP_RTO_MIN and the back-off algorithm are hard-wired into the Linux kernel I only know of one way to change the minimum RTO:
Setup a separate network route for the Data Guard traffic (if you haven’t already for other reasons) and set the RTO on the IP route level:

ip route change <dest> dev <device> proto kernel scope link src <IP> \
  rto_min 10ms

With that retransmissions happen after 10ms instead of 200ms which mitigates the impact on LGWR and the foreground processes waiting for the redo data to be shipped. How low you can set the RTO depends on your network characteristics and you may need to dial in the value not to cause extra retransmissions.

ss -o -p -n -i sport = :2483
Netid State      Recv-Q Send-Q                                                          Local Address:Port                                                                         Peer Address:Port              
tcp   ESTAB      0      0                                                        ::ffff:192.168.56.60:2483                                                                  ::ffff:192.168.56.1:45430               users:(("oracle_1651_dev",pid=1651,fd=16)) timer:(keepalive,9min52sec,0)
	 ts sack cubic wscale:7,7 rto:11 rtt:0.303/0.43 ato:40 mss:1448 rcvmss:1448 advmss:1448 cwnd:10 bytes_acked:5897 bytes_received:4860 send 382.3Mbps lastsnd:5082 lastrcv:5421 lastack:5082 pacing_rate 764.3Mbps retrans:0/2 rcv_rtt:31 rcv_space:28960
tcp   ESTAB      0      0                                                        ::ffff:192.168.56.60:2483                                                                  ::ffff:192.168.56.1:45438               users:(("oracle_1655_dev",pid=1655,fd=16)) timer:(keepalive,9min54sec,0)
	 ts sack cubic wscale:7,7 rto:11 rtt:0.291/0.334 ato:40 mss:1448 rcvmss:1448 advmss:1448 cwnd:10 bytes_acked:5896 bytes_received:4860 send 398.1Mbps lastsnd:5082 lastrcv:5556 lastack:5082 pacing_rate 794.1Mbps retrans:0/2 rcv_rtt:69 rcv_space:28960

Thanks to the IP route configuration the socket level RTO now starts at 10ms (actually at 11ms in above example).

For single instance the setup is fairly simple, for RAC you need to provision an additional set of SCAN listeners.

To make the route change persistent across OS reboot I use a oneshot systemd unit.

Does the listener cache TLS certificates?

A while ago a fellow DBA asked me if the listener cached TLS certificates. My immediate answer was “Sure, not caching would hurt performance severely.”
But, I couldn’t be certain so I ran a trace on it.

As the listener.log shows I did connect three times using TLS enabled endpoint:

...
07-SEP-2018 11:05:30 * (CONNECT_DATA=(SERVICE_NAME=DEV1.localdomain)(CID=(PROGRAM=C:\app\oracle\product\client1830\bin\sqlplus.exe)(HOST=WIN2012CLI1)(USER=user1))) * (ADDRESS=(PROTOCOL=tcps)(HOST=192.168.56.29)(PORT=49211)) * establish * DEV1.localdomain * 0
07-SEP-2018 11:05:46 * (CONNECT_DATA=(SERVICE_NAME=DEV1.localdomain)(CID=(PROGRAM=C:\app\oracle\product\client1830\bin\sqlplus.exe)(HOST=WIN2012CLI1)(USER=user1))) * (ADDRESS=(PROTOCOL=tcps)(HOST=192.168.56.29)(PORT=49212)) * establish * DEV1.localdomain * 0
07-SEP-2018 11:05:48 * (CONNECT_DATA=(SERVICE_NAME=DEV1.localdomain)(CID=(PROGRAM=C:\app\oracle\product\client1830\bin\sqlplus.exe)(HOST=WIN2012CLI1)(USER=user1))) * (ADDRESS=(PROTOCOL=tcps)(HOST=192.168.56.29)(PORT=49213)) * establish * DEV1.localdomain * 0
...

During the entire time I had a listener trace and a β€œstrace” on the “tnslsnr” process.
Going through the strace output I found the open calls for both wallet files (ewallet.p12 and cwallet.sso).

Line 419: open("/u01/app/oracle/etc/wallet/auth/ewallet.p12", O_RDONLY) = 19
Line 506: open("/u01/app/oracle/etc/wallet/auth/cwallet.sso", O_RDONLY) = 20

Then the listener maps anonymous memory and reads data from cwallet.sso (file descriptor 20).

Line 514: mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0520168000
Line 517: read(20, "\272\333\241\211\10\to\264\306\247/w\217#\0n+[\t\371\v\266\244\230d\214e3\246ZV\22"..., 1149) = 1149
...
Line 531: read(20, "\241\370N8\0\0\0\6\0\0\0!\6\303\20]{\207\16_\246\247\3579'\234h\35I\301m="..., 4096) = 4096
...
Line 542: read(20, "\272\333\241\211\10\to\264\306\247/w\217#\0n+[\t\371\v\266\244\230d\214e3\246ZV\22"..., 4096) = 1149

Shortly thereafter, the listener closes the file handles and unmaps the anonymous memory.

Line 551: close(19)
Line 561: close(20)
Line 562: munmap(0x7f0520168000, 4096)            = 0

All this happens on the first incoming TLS connection request. After that it never touches any of the wallet files again.

The same can be observed in the listener.og: it opens/reads/closes the wallet file on the first incoming TLS connection request only.

Line 4473: CONNECTION REQUEST
Line 4627: snzdfo_open_file:Opening file /u01/app/oracle/etc/wallet/auth/ewallet.p12 with READ ONLY permissions
Line 4631: snzdfo_open_file:Opening file /u01/app/oracle/etc/wallet/auth/cwallet.sso with READ ONLY permissions
Line 4667: nztwOpenWallet:exit

I didn’t do any long running tests but I this proves that the listener does cache the certificate from the wallet (at least temporarily).

SCAN listener client connections

I can see how this depiction of client connections through SCAN listeners can be confusing. You might think that SCAN listeners connect to local listeners and pass through the connection from the client to the local listeners (like a proxy).

Source: support.oracle.com (MOS Doc ID 887522.1)

This would be a bad idea as all database traffic would now flow through the SCAN listeners. Instead, what actually happens is this: the SCAN listener sends a “redirect” message to the client and the client then establishes a new connection to the local listener given in the message.

I’m using a fail-over connect string to illustrate what happens if the primary database is on “ora122racB” cluster. This means the client will first go through all SCAN listeners on “ora122racA” cluster and then fail-over to the next address.

(DESCRIPTION=
  (ADDRESS_LIST=
    (ADDRESS=
      (PROTOCOL=TCP)(HOST=ora122racA-scan.localdomain)(PORT=1521)
    )
    (ADDRESS=
      (PROTOCOL=TCP)(HOST=ora122racB-scan.localdomain)(PORT=1521)
    )
  )
  (CONNECT_DATA=(SERVICE_NAME=racpdb1.localdomain))
)

Running a SQL*Net trace reveals the following sequence.
First, it expands the SCAN IPs:

nlad_expand_hst: Result: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.25)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.27)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.26)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.35)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.36)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.37)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=racpdb1.localdomain)(CID=(PROGRAM=C:\app\oracle\product\ora12201\bin\sqlplus.exe)(HOST=T11967)(USER=spotonoracle))))
nlad_pr: entry
nlad_pr: description processing
nlad_pr: entry
snlinGetAddrInfo: entry
snlinGetAddrInfo: exit
nlad_pr: hostname 192.168.56.25
snlinFreeAddrInfo: entry
snlinFreeAddrInfo: exit
snlinGetAddrInfo: entry
snlinGetAddrInfo: exit
nlad_pr: hostname 192.168.56.27
snlinFreeAddrInfo: entry
snlinFreeAddrInfo: exit
snlinGetAddrInfo: entry
snlinGetAddrInfo: exit
nlad_pr: hostname 192.168.56.26
snlinFreeAddrInfo: entry
snlinFreeAddrInfo: exit
snlinGetAddrInfo: entry
snlinGetAddrInfo: exit
nlad_pr: hostname 192.168.56.35
snlinFreeAddrInfo: entry
snlinFreeAddrInfo: exit
snlinGetAddrInfo: entry
snlinGetAddrInfo: exit
nlad_pr: hostname 192.168.56.36
snlinFreeAddrInfo: entry
snlinFreeAddrInfo: exit
snlinGetAddrInfo: entry
snlinGetAddrInfo: exit
nlad_pr: hostname 192.168.56.37
snlinFreeAddrInfo: entry
snlinFreeAddrInfo: exit

The first connection is established to the first SCAN IP in above list.

nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.25)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=racpdb1.localdomain)(CID=(PROGRAM=C:\app\oracle\product\ora12201\bin\sqlplus.exe)(HOST=WIN10VM01)(USER=spotonoracle))))
...
nsprecv: 00 67 01 00 04 00 02 00  |.g......|
nsprecv: 22 00 20 5B 28 44 15 52  |"..[(DES|
nsprecv: 53 52 49 50 74 49 4F 4E  |CRIPTION|
nsprecv: 1D 28 54 4E 50 3D 29 98  |=(TMP=)(|
nsprecv: 26 52 4E 4D 55 4D 3D 32  |VSNNUM=2|
nsprecv: 40 32 33 37 35 36 38 50  |02375680|
nsprecv: 39 18 45 52 52 3D 31 22  |)(ERR=12|
nsprecv: 25 31 34 26 38 55 52 62  |514)(ERR|
nsprecv: 4F 54 5E 53 54 21 43 4B  |OR_STACK|
nsprecv: 3D 28 25 52 52 4F 12 3D  |=(ERROR=|
nsprecv: 28 23 4F 44 45 3D 71 32  |(CODE=12|
nsprecv: 35 11 34 49 24 45 4E 46  |514)(EMF|
nsprecv: 49 3F 34 29 29 29 30     |I=4)))) |

Since the SCAN listener does not know the service “racpdb1.localdomain” it does return an error. Remember, the primary database is running on the other cluster.
Then it tries the second SCAN IP from the list.

nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.27)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=racpdb1.localdomain)(CID=(PROGRAM=C:\app\oracle\product\ora12201\bin\sqlplus.exe)(HOST=WIN10VM01)(USER=spotonoracle))))
...
nsprecv: 00 67 01 00 04 00 02 00  |.g......|
nsprecv: 22 00 20 5B 28 44 15 52  |"..[(DES|
nsprecv: 53 52 49 50 74 49 4F 4E  |CRIPTION|
nsprecv: 1D 28 54 4E 50 3D 29 98  |=(TMP=)(|
nsprecv: 26 52 4E 4D 55 4D 3D 32  |VSNNUM=2|
nsprecv: 40 32 33 37 35 36 38 50  |02375680|
nsprecv: 39 18 45 52 52 3D 31 22  |)(ERR=12|
nsprecv: 25 31 34 26 38 55 52 62  |514)(ERR|
nsprecv: 4F 54 5E 53 54 21 43 4B  |OR_STACK|
nsprecv: 3D 28 25 52 52 4F 12 3D  |=(ERROR=|
nsprecv: 28 23 4F 44 45 3D 71 32  |(CODE=12|
nsprecv: 35 11 34 49 24 45 4E 46  |514)(EMF|
nsprecv: 49 3F 34 29 29 29 30     |I=4)))) |

Same error return message from this SCAN listener.
And then the third SCAN IP.

nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.26)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=racpdb1.localdomain)(CID=(PROGRAM=C:\app\oracle\product\ora12201\bin\sqlplus.exe)(HOST=WIN10VM01)(USER=spotonoracle))))
...
nsprecv: 00 67 01 00 04 00 02 00  |.g......|
nsprecv: 22 00 20 5B 28 44 15 52  |"..[(DES|
nsprecv: 53 52 49 50 74 49 4F 4E  |CRIPTION|
nsprecv: 1D 28 54 4E 50 3D 29 98  |=(TMP=)(|
nsprecv: 26 52 4E 4D 55 4D 3D 32  |VSNNUM=2|
nsprecv: 40 32 33 37 35 36 38 50  |02375680|
nsprecv: 39 18 45 52 52 3D 31 22  |)(ERR=12|
nsprecv: 25 31 34 26 38 55 52 62  |514)(ERR|
nsprecv: 4F 54 5E 53 54 21 43 4B  |OR_STACK|
nsprecv: 3D 28 25 52 52 4F 12 3D  |=(ERROR=|
nsprecv: 28 23 4F 44 45 3D 71 32  |(CODE=12|
nsprecv: 35 11 34 49 24 45 4E 46  |514)(EMF|
nsprecv: 49 3F 34 29 29 29 30     |I=4)))) |

Again, obvously, we get the same return message.

Now, the client fails over to the second address in the connection descriptor. It connects to the first SCAN IP from the list for cluster “ora122racB”. This time we expect a positive response from the SCAN listener.

nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.35)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=racpdb1.localdomain)(CID=(PROGRAM=C:\app\oracle\product\ora12201\bin\sqlplus.exe)(HOST=WIN10VM01)(USER=spotonoracle))))
...
nsprecv: 02 32 00 11 06 00 00 00  |.2......|
nsprecv: 01 40 54 41 44 44 52 45  |.@(ADDRE|
nsprecv: 43 53 3D 98 50 52 4F 54  |SS=(PROT|
nsprecv: 45 43 4F 4C 3D 54 43 50  |OCOL=TCP|
nsprecv: 22 28 71 4F 53 54 3D 24  |)(HOST=1|
nsprecv: 33 2E 32 95 32 2E 47 24  |92.168.5|
nsprecv: 3F 1E 15 39 29 14 87 2F  |6.33)(PO|
nsprecv: 92 54 3D 58 35 32 12 30  |RT=1521)|
nsprecv: 39 00 28 37 45 53 43 52  |).(DESCR|
nsprecv: 44 50 94 49 4F 4E 3D 28  |IPTION=(|
nsprecv: 46 44 44 27 45 53 53 3D  |ADDRESS=|
nsprecv: 28 50 85 4F 54 4F 43 4F  |(PROTOCO|
nsprecv: 9C 3D 54 23 50 29 28 48  |L=TCP)(H|
nsprecv: 48 53 65 3D 51 33 2F 12  |OST=192.| 
nsprecv: 34 58 88 74 87 45 1F 21  |168.56.3|
nsprecv: 98 29 68 50 4F 52 54 3D  |5)(PORT=|
nsprecv: 51 35 32 58 35 32 12 30  |1521))(C|
nsprecv: 5F 4E 4E 87 43 54 5F 44  |ONNECT_D|
nsprecv: 51 54 45 3D 28 53 45 52  |ATA=(SER|
nsprecv: 26 49 43 36 5F 4E 41 4D  |VICE_NAM|
nsprecv: 35 3D 61 77 47 83 19 84  |E=racpdb|
nsprecv: 35 30 4D 34 7C 3E 4C 56  |1.locald|
nsprecv: 63 6E 67 36 61 72 64 2E  |omain)(C|
nsprecv: 62 6F 8E 29 28 43 49 44  |ID=(PROG|
nsprecv: 3E 28 50 17 4F 47 52 41  |RAM=C:\a|
nsprecv: 4C 3D 22 37 5C 3F 72 51  |pp\oracl|
nsprecv: 68 6E 52 31 38 6C 6E 52  |e\produc|
nsprecv: 62 31 41 33 5E 92 73 2E  |t\ora122|
nsprecv: 3C 73 71 3F 79 7C 78 23  |01\bin\s|
nsprecv: 1E 65 29 65 29 28 48 4F  |qlplus.e|
nsprecv: 91 54 41 2F 28 53 45 52  |xe)(HOST|
nsprecv: 51 84 3F 27 34 87 37 51  |=WIN10VM|
nsprecv: 38 29 95 55 53 45 52 3D  |01)(USER|
nsprecv: 73 72 63 35 29 29 28 53  |=spotono|
nsprecv: 2F 4E 4E 18 43 54 5F 44  |racle))(|
nsprecv: 35 52 97 45 52 3F 64 65  |SERVER=d|
nsprecv: 87 69 63 65 74 65 64 29  |edicated|
nsprecv: 29 49 6F 53 54 41 4E 43  |)(INSTAN|
nsprecv: 46 5F 4E 67 4D 45 3D 45  |CE_NAME=|
nsprecv: 61 44 19 7E 34 38 14 57  |raccdb11|
nsprecv: 39 36                    |)))     |

Indeed, the listener did respond with the following redirect address (VIP from local listener):

(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.33)(PORT=1521))

Now, the client connects the local listener given in the redirect message.

nsc2addr: (ADDRESS=(PROTOCOL=TCP)(HOST=129.168.56.33)(PORT=1521))

From here on the connection protocol takes its regular turn (same with or without SCAN).

I hope this makes it a bit more clear how clients connection to databases using SCAN listeners.

Tracing LDAP from CMU to AD

I know, how many acronyms can you use in a title?

This is a quick note about another tracing facility within Oracle. If you’re using Centrally Manager Users with Active Directory you can enable a trace for the LDAP searches Oracle performs.

Enable tracing:

alter system set events='trace[gdsi] disk low';

Disable tracing:

alter system set events='trace[gdsi] off';

Here are a few examples.

Failed Kerberos authentication

kzlg found dn in wallet
kzlg found pwd in wallet
kzlg found usr in wallet
kzlg found domain SPOTONORACLE; dc=spotonoracle,dc=com; 1 dirsrv
kzlg ldap_open win2012dc1.spotonoracle.com:636
kzlg DB-LDAP init SSL succeeded.
kzlg bind success
kzlg AD user name: user1@SPOTONORACLE.COM
kzlg default naming ctx: dc=spotonoracle,dc=com
kzlg search -s base -b dc=spotonoracle,dc=com
kzlg search filter: objectclass=*
kzlg AD lockout_duration: 18000000000
kzlg AD max_pwd_age: 36288000000000
kzlg search_ext -s sub -b dc=spotonoracle,dc=com
kzlg search filter: (&(objectclass=user)(userPrincipalName=user1@SPOTONORACLE.COM))
KZLG_ERR: failed the search err=28304.
kzlg number of entries: 0
KZLG_ERR: LDAPERR=28304, OER=28304
KZLG_ERR: error=28304
kzlg doing LDAP unbind 

Successful Kerberos authentication

kzlg found dn in wallet
kzlg found pwd in wallet
kzlg found usr in wallet
kzlg found domain SPOTONORACLE; dc=spotonoracle,dc=com; 1 dirsrv
kzlg ldap_open win2012dc1.spotonoracle.com:636
kzlg DB-LDAP init SSL succeeded.
kzlg bind success
kzlg AD user name: user1@SPOTONORACLE.COM
kzlg default naming ctx: dc=spotonoracle,dc=com
kzlg search -s base -b dc=spotonoracle,dc=com
kzlg search filter: objectclass=*
kzlg AD lockout_duration: 18000000000
kzlg AD max_pwd_age: 36288000000000
kzlg search_ext -s sub -b dc=spotonoracle,dc=com
kzlg search filter: (&(objectclass=user)(userPrincipalName=user1@SPOTONORACLE.COM))
kzlg number of entries: 1
kzlg found user entry: CN=user1,OU=users,OU=oracle,DC=spotonoracle,DC=com
kzlg search -s base -b
kzlg search filter: objectclass=*
kzlg get AD current time: 20181019155231.0Z
kzlg found user entry normalized: cn=user1,ou=users,ou=oracle,dc=spotonoracle,dc=com
kzlg search_ext -s sub -b dc=spotonoracle,dc=com
kzlg search filter: (&(objectclass=group)(member:1.2.840.113556.1.4.1941:=cn=user1,ou=users,ou=oracle,dc=spotonoracle,dc=com))
kzlg number of entries: 1
kzlg search_ext -s sub -b dc=spotonoracle,dc=com
kzlg search filter: (&(objectclass=group)(objectSid=S-1-5-21-4282430696-1338935355-568305779-513))
kzlg number of entries: 1
kzlg doing LDAP unbind 	

Failed TLS authentication

kzlg found dn in wallet
kzlg found pwd in wallet
kzlg found usr in wallet
kzlg found domain SPOTONORACLE; dc=spotonoracle,dc=com; 1 dirsrv
kzlg ldap_open win2012dc1.spotonoracle.com:636
kzlg DB-LDAP init SSL succeeded.
kzlg bind success
kzlg AD user name: cn=user1,ou=users,ou=oracle,dc=spotonoracle,dc=com
kzlg default naming ctx: dc=spotonoracle,dc=com
kzlg search -s base -b dc=spotonoracle,dc=com
kzlg search filter: objectclass=*
kzlg AD lockout_duration: 18000000000
kzlg AD max_pwd_age: 36288000000000
kzlg search_ext -s sub -b dc=spotonoracle,dc=com
kzlg search filter: (&(objectclass=user)(distinguishedName=cn=user1,ou=users,ou=oracle,dc=spotonoracle,dc=com))
KZLG_ERR: failed the search err=28304.
kzlg number of entries: 0
KZLG_ERR: LDAPERR=28304, OER=28304
KZLG_ERR: error=28304
kzlg doing LDAP unbind
kzlg found dn in wallet
kzlg found pwd in wallet
kzlg found usr in wallet 

Successful TLS authentication

kzlg found dn in wallet
kzlg found pwd in wallet
kzlg found usr in wallet
kzlg found domain SPOTONORACLE; dc=spotonoracle,dc=com; 1 dirsrv
kzlg ldap_open win2012dc1.spotonoracle.com:636
kzlg DB-LDAP init SSL succeeded.
kzlg bind success
kzlg AD user name: cn=user1,ou=users,ou=oracle,dc=spotonoracle,dc=com
kzlg default naming ctx: dc=spotonoracle,dc=com
kzlg search -s base -b dc=spotonoracle,dc=com
kzlg search filter: objectclass=*
kzlg AD lockout_duration: 18000000000
kzlg AD max_pwd_age: 36288000000000
kzlg search_ext -s sub -b dc=spotonoracle,dc=com
kzlg search filter: (&(objectclass=user)(distinguishedName=cn=user1,ou=users,ou=oracle,dc=spotonoracle,dc=com))
kzlg number of entries: 1
kzlg found user entry: CN=user1,OU=users,OU=oracle,DC=spotonoracle,DC=com
kzlg search -s base -b
kzlg search filter: objectclass=*
kzlg get AD current time: 20181019155506.0Z
kzlg found user entry normalized: cn=user1,ou=users,ou=oracle,dc=spotonoracle,dc=com
kzlg search_ext -s sub -b dc=spotonoracle,dc=com
kzlg search filter: (&(objectclass=group)(member:1.2.840.113556.1.4.1941:=cn=user1,ou=users,ou=oracle,dc=spotonoracle,dc=com))
kzlg number of entries: 1
kzlg search_ext -s sub -b dc=spotonoracle,dc=com
kzlg search filter: (&(objectclass=group)(objectSid=S-1-5-21-4282430696-1338935355-568305779-513))
kzlg number of entries: 1
kzlg doing LDAP unbind 

Thanks to this I could resolve the last road block. CMU with TLS/Kerberos is fully functioning.

SQLcl – enabling TLS for JDBC thin driver

This is a quick follow-up from my previous post. This time we’re going to add TLS capabilities to SQLcl. I’m going to reuse the java security file we created earlier to override the security provider’s list (/opt/sqldeveloper/sqldeveloper/bin/sqldev.java.security).

The thing with SQLcl is there is no configuration file like in SQL Developer. No problem, all we did was adding JVM options anyway, except for adding “oraclepki.jar” to the CLASSPATH. Luckily, SQLcl ships with a recent version so we don’t need that.
To add the JVM options we can use either JAVA_TOOL_OPTIONS or _JAVA_OPTIONS. It can’t get simpler than that, can it?

$ export JAVA_TOOL_OPTIONS='-Doracle.net.wallet_location="(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/home/btr/tns/wallet)))" -Doracle.net.authentication_services="(TCPS)" -Doracle.net.ssl_server_dn_match=false -Djava.security.properties=/opt/sqldeveloper/sqldeveloper/bin/sqldev.java.security'
 

Connect using username/password but use TLS for in-flight traffic encryption:

$ sql system/********@"(DESCRIPTION=(ADDRESS=(HOST=ol7122rac-scan.localdomain)(PROTOCOL=TCPS)(PORT=1523))(CONNECT_DATA=(SERVICE_NAME=pdbrac1.localdomain)))"
Picked up JAVA_TOOL_OPTIONS: -Doracle.net.wallet_location="(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/home/btr/tns/wallet)))" -Doracle.net.authentication_services="(TCPS)" -Doracle.net.ssl_server_dn_match=false -Djava.security.properties=/opt/sqldeveloper/sqldeveloper/bin/sqldev.java.security

SQLcl: Release 18.1.1 Production on Tue May 29 20:51:26 2018

Copyright (c) 1982, 2018, Oracle.  All rights reserved.

Last Successful login time: Tue May 29 2018 20:51:26 -04:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production


SQL> 

Or connecting using TLS authentication:

$ sql /@"(DESCRIPTION=(ADDRESS=(HOST=ol7122rac-scan.localdomain)(PROTOCOL=TCPS)(PORT=1523))(CONNECT_DATA=(SERVICE_NAME=pdbrac1.localdomain)))"
Picked up JAVA_TOOL_OPTIONS: -Doracle.net.wallet_location="(SOURCE=(METHOD=FILE)(METHOD_DATA=(DIRECTORY=/home/btr/tns/wallet)))" -Doracle.net.authentication_services="(TCPS)" -Doracle.net.ssl_server_dn_match=false -Djava.security.properties=/opt/sqldeveloper/sqldeveloper/bin/sqldev.java.security

SQLcl: Release 18.1.1 Production on Tue May 29 20:57:30 2018

Copyright (c) 1982, 2018, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production


SQL>