Monthly Archives: May 2017

Starting with Oracle 12.2 it is possible to encrypt all Tablespaces including SYSTEM, SYSAUX, TEMP, and UNDO. Off the top of my head I can think of a couple of reasons why encrypting SYSTEM and SYSAUX might be desired:

Histograms in SYSAUX might contain sensitive data
Application tables/indexes in SYSTEM or SYSAUX Tablespace
Hard-coded secrets in PL/SQL objects (yes, I’ve seen it all!)

Strangely enough, the 12.2 TDE FAQ discourages encryption of “internal objects”:

I believe this to be a leftover from previous releases which hasn’t been cleaned up, yet. Because it is supported and the Advanced Security Guide provides plenty of information about it.

Splitting hairs, “Fully Encrypt Databases” is not entirely correct. There is currently no supported way to encrypt PDB$SEED Tablespaces…and I’ve tried for the fun of it:-)
Forcing the PDB$SEED open read-write with “alter pluggable database PDB$SEED open force” and then open the keystore works. I could even create a new encryption key, but it won’t let me activate it:

SQL> show con_name
CON_NAME 
------------------------------
PDB$SEED

SQL> administer key management create encryption key using tag 'PDBSEED_MK_1' force keystore identified by "Strng-4-Scrty" with backup using 'bck1';
keystore altered.

SQL> column key_id new_value new_key_id
SQL> select key_id from v$encryption_keys where con_id = 0 and tag = 'PDBSEED_MK_1';
KEY_ID                                                                        
------------------------------------------------------------------------------
AVVfIyuTLk9sv2EyAw9dJ18AAAAAAAAAAAAAAAAAAAAAAAAAAAAA

SQL> administer key management use encryption key '&new_key_id' identified by "Strng-4-Scrty" with backup using 'bck2';
*
ERROR at line 1:
ORA-46656: operation not valid for container database

I know it doesn’t make a lot of sense as for instance, cloning from a encrypted PDB$SEED would be kind of a Catch-22.

Anyway, back to encrypting Oracle’s “internal” Tablespaces. It’s pretty straight forward. It’s the same as you would setup the database for any TDE encryption.
Add ENCRYPITON_WALLET_LOCATION to your sqlnet.ora

ENCRYPTION_WALLET_LOCATION =
  (SOURCE =
    (METHOD = FILE)
    (METHOD_DATA =
      (DIRECTORY = /u01/app/oracle/admin/$ORACLE_SID/wallet/tde)
    )
  )

Setup a keystore in the CDB$ROOT and generate a new key to use:

alter session set container=cdb$root;

-- create PKCS#12 keystore (ewallet.p12)
administer key management
  create keystore '/u01/app/oracle/admin/DEV1/wallet/tde'
  identified by "Strng-4-Scrty"
;

-- open the keystore
administer key management
  set keystore open
  identified by "Strng-4-Scrty"
  container = current
;

-- create a new encryption key
administer key management
  create encryption key using tag 'DEV1_ROOT_MK_1'
  identified by "Strng-4-Scrty"
  with backup using 'Before-Create-DEV1_ROOT_MK_1'
  container = current
;

-- activate the key for use in this container
--   note: newly created keys have "con_id = 0", as soon as they are activated they get assigned to the respective PDB
column key_id new_value new_key_id
select key_id from v$encryption_keys where con_id = 0 and tag = 'DEV1_ROOT_MK_1';
administer key management
  use encryption key '&new_key_id'
  using tag 'DEV1_ROOT_MK_1'
  identified by "Strng-4-Scrty"
  with backup using 'Before-Activate-DEV1_ROOT_MK_1'
;

-- we also want a local auto-login keystore (cwallet.sso)
--   otherwise manual keystore password input is required during database startup
administer key management
  create local auto_login keystore from keystore '/u01/app/oracle/admin/DEV1/wallet/tde'
  identified by "Strng-4-Scrty"
;

Now, that we have the key setup it’s as simple as an ALTER TABLESPACE statement. We used the default algorithm AES128 when generating the encryption key so we specify the same here.

alter tablespace system
  encryption online using 'AES128' encrypt
  file_name_convert = ('system', 'system-enc')
;
alter tablespace sysaux
  encryption online using 'AES128' encrypt
  file_name_convert = ('sysaux', 'sysaux-enc')
;

Be careful, if the specified algorithm is different from the one of the key the statement will still succeed. You’ll see following warning in the alert.log:

An algorithm different from the database key is specified for SYSTEM tablespace. The algorithm has been overridden to AES128 per V$DATABASE_KEY_INFO view.

Let’s move to my PDB.

alter session set container=pdbdev1;

-- assuming auto-login keystore is now open
-- otherwise open password keystore

administer key management
  create encryption key using tag 'PDBDEV1_MK_1'
  force keystore
  identified by "Strng-4-Scrty"
  with backup using 'Before-Create-PDBDEV1_MK_1'
;

column key_id new_value new_key_id
select key_id from v$encryption_keys where con_id = 0 and tag = 'PDBDEV1_MK_1';
administer key management
  use encryption key '&new_key_id'
  using tag 'PDBDEV1_MK_1'
  force keystore
  identified by "Strng-4-Scrty"
  with backup using 'Before-Activate-PDBDEV1_MK_1'
;

To encrypt the SYSTEM/SYSAUX Tablespace on our PDB you can use the same statements from above:

alter tablespace system
  encryption online using 'AES128' encrypt
  file_name_convert = ('system', 'system-enc')
;
alter tablespace sysaux
  encryption online using 'AES128' encrypt
  file_name_convert = ('sysaux', 'sysaux-enc')
;

According to the documentation, undo, temp, and redo data originating from objects in encrypted tablespaces will also be encrypted.

There’s still reason to encrypt temporary Tablespaces in full:

Temporary LOBs
Global temporary tables

Encrypting existing TEMP Tablespaces is not supported, we just create a new one and drop the old.

create temporary tablespace tempts_enc
  tempfile '/u02/oradata/DEV1/tempts_enc-001.dbf' size 512M autoextend off
  encryption using 'AES128' encrypt
;
alter database default temporary tablespace tempts_enc;
drop tablespace tempts including contents and datafiles;

Once I have encrypted SYSTEM, SYSAUX, TEMP, and all application Tablespaces there should be no need to encrypt the UNDO Tablespace, right? All undo data is then based on objects from encrypted Tablespaces and should therefore also be encrypted in the UNDO stream. This is to be further investigated. If you have any input, please leave a comment.

One final note: if you want to understand the performance impact of TDE I highly recommend to use SLOB.

Scalable primary keys (GUID follow-up)

Leave a reply

In the previous post we have seen that GUIDs as primary keys can lead to some serious performance problems. What we need is a primary key that won’t cause excessive I/O on inserts and reduce contention on the right hand side index leaf block. And I like it to be RAC aware.
To build a primary key that addresses above mentioned issue we use three different components:

the Oracle instance number
a one (or two) digit number
Oracle generated sequence number

The Oracle instance number is useful in Real Application Cluster setups to minimize data block transfers between nodes on inserts as each instance has its own part of the index tree to work on.
The one (or two) digit number could be anything that allows for distribution between sessions. For instance “mod(sid, 10)” or “round(dbms_random.value(0, 9), 0)”. The goal here is that each session works on different parts of the index.
The Oracle sequence number makes the key values unique.

I’m going to use the same test setup from the previous post. Tables TESTS and RESULTS are the same except that I changed the data type from “RAW(16)” to “NUMBER(10, 0)”.

create table tests (
    test_id number(10, 0) not null
  , vc2_long varchar2(4000 byte)
) tablespace myapp_data;

create unique index tests_idx_01 on tests (test_id) tablespace myapp_index;
alter table tests add constraint tests_pk primary key (test_id) using index tests_idx_01;

create table results (
    result_id number(10, 0) not null
  , test_id number(10, 0) not null
  , dt date
  , vc2_short varchar2(200 byte)
) tablespace myapp_data;

create unique index results_idx_01 on results(result_id) tablespace myapp_index;
alter table results add constraint results_pk primary key (result_id) using index results_idx_01;
create index results_idx_02 on results(test_id) tablespace myapp_index;
alter table results add constraint results_fk_01 foreign key (test_id) references tests(test_id) on delete cascade;

And we need sequences to get uniqueness.

create sequence tests_seq_pk cache 1000 noorder;
create sequence results_seq_pk cache 10000 noorder;

From the table definition above you can see I’m using one single number column to hold the three-part primary key values. As we need the three parts put together in a specific order (instance, some number, sequence) we cannot simply add up the numbers. I’ll use stored functions to concatenate the parts as strings and then cast the result to a number.

create or replace function tests_id_distr
return number
is
begin
  return to_number(sys_context('userenv', 'instance') || to_char(mod(sys_context('userenv', 'sid'), 10)) || to_char(tests_seq_pk.nextval) || dummy_java_call);
end tests_id_distr;
/

create or replace function results_id_distr
return number
is
begin
  return to_number(sys_context('userenv', 'instance') || to_char(mod(sys_context('userenv', 'sid'), 10)) || to_char(results_seq_pk.nextval || dummy_java_call));
end results_id_distr;
/

What’s the “dummy_java_call” at the end there, you ask?
Well, PL/SQL is blazingly fast 🙂 …and concatenating & casting a few strings and numbers doesn’t do the Java GUID implementation justice. To get fairly comparable results I made this implementation also do a context switch to the JVM for every key generated. The Java method simply returns an empty string (or NULL if you will in the PL/SQL context).

create or replace and compile java source named "DummyJavaCall"
as
public class DummyJavaCall {
  public static String create() {
    return "";
  }
}
/

create or replace function dummy_java_call
return varchar2
is
language java name 'DummyJavaCall.create() return java.lang.String';
/

The test script was just slightly modified to use the new stored functions to get the primary key values. Otherwise it is the same: inserting 1000 Tests in a loop. After each test it inserts 100 Results per Test in the inner loop and commits.

set verify off feedback off
set serveroutput on size unlimited

<<myapp>>
declare
  test_id tests.test_id%type;
begin
  rrt.stat_cfg_default;
  rrt.stat_start('&1');
  rrt.timer_start();

  for tid in 1 .. 1000 loop
    insert into tests t (test_id, vc2_short) values (tests_id_distr(), dbms_random.string('A', trunc(dbms_random.value(200, 4000))))
    returning t.test_id into myapp.test_id;
    
    for rid in 1 .. 100 loop
      insert into results r (result_id, test_id, dt, vc2_short) values (results_id_distr(), myapp.test_id, sysdate, dbms_random.string('L', trunc(dbms_random.value(10, 200))));
    end loop;
    
    commit;
  end loop;

  dbms_output.put_line('Elapsed:' || to_char(rrt.timer_elapsed));
  rrt.stat_stop;
end;
/

exit success

I again used the shell script from the GUID test case to run 10 concurrent session for each run. The VM and database specification is the exact same for both test cases. To get approximately the same data volume for the indexes I had to run this test case 12 times.
RESULTS_IDX_01 is 34 MB bigger as in the GUID test case (385 vs 351 MB)
RESULTS_IDX_02 is 6 MB smaller as in the GUID test case (389 vs 395 MB)
Also the number of leaf blocks are in the same ballpark.

SQL> select segment_name, tablespace_name, blocks, extents, bytes/1024/1024 mb from dba_segments where (segment_name like 'TESTS%' or segment_name like 'RESULTS%') order by segment_name desc;

SEGMENT_NAME    TABLESPACE_NAME  BLOCKS  EXTENTS  MB    
--------------- ---------------- ------- -------- ----
TESTS_IDX_01    MYAPP_INDEX      512     4        4     
TESTS           MYAPP_DATA       41216   322      322   
RESULTS_IDX_02  MYAPP_INDEX      49792   389      389   
RESULTS_IDX_01  MYAPP_INDEX      49280   385      385   
RESULTS         MYAPP_DATA       220160  1720     1720

SQL> select table_name, index_name, uniqueness, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key, avg_data_blocks_per_key, clustering_factor, num_rows from dba_indexes where index_name in ('TESTS_IDX_01', 'RESULTS_IDX_01', 'RESULTS_IDX_02') order by table_name desc, index_name;

TABLE_NAME  INDEX_NAME      UNIQUENESS  BLEVEL  LEAF_BLOCKS  DISTINCT_KEYS  AVG_LEAF_BLOCKS_PER_KEY  AVG_DATA_BLOCKS_PER_KEY  CLUSTERING_FACTOR  NUM_ROWS  
----------- --------------- ----------- ------- ------------ -------------- ------------------------ ------------------------ ------------------ --------
TESTS       TESTS_IDX_01    UNIQUE      1       442          120000         1                        1                        92579              120000    
RESULTS     RESULTS_IDX_01  UNIQUE      2       47849        11704420       1                        1                        1082748            11704420  
RESULTS     RESULTS_IDX_02  NONUNIQUE   2       49037        118856         1                        4                        497219             11986476

From this setup the aggregated session statistics show that we constantly get low numbers of physical reads and elapsed times irrespective of the data volume of the underlying tables and indexes.

I like this solution because it addresses the scalability issues and doesn’t rely on any fancy Oracle feature (which you might need extra cost options for).

GUID, or how to Get Ur Index from Disk

1 Reply

Who doesn’t want to write scalable applications these days? Then maybe you should think twice about using GUIDs as primary keys. It’s mostly not about storage overhead, or the fact that surrogate keys are not always the best solution. It is about performance!

I’m going to talk about Oracle here in particular but most of the concepts are the same/or similar in other database systems.
Usually, you’re using B-tree indexes and unique constraints to police primary keys. And if you read the excellent book SQL Performance Explained by Markus Winand you might already know what pain you’ll be facing by using GUIDs. As long as your data volume is tiny and can fully live in the buffer cache you’re probably fine. To illustrate what happens once your dataset outgrows the buffer cache I setup a small test kit.

The data model is simple. I have two tables, TESTS and RESULTS. Each Test has zero or more Results. To store GUIDs I will use data type RAW(16). The rest of the columns will hold random data to fill the space.

create table tests (
    test_id raw(16) not null
  , vc2_long varchar2(4000 byte)
) tablespace myapp_data;

create unique index tests_idx_01 on tests (test_id) tablespace myapp_index;
alter table tests add constraint tests_pk primary key (test_id) using index tests_idx_01;


create table results (
    result_id raw(16) not null
  , test_id raw(16) not null
  , dt date
  , vc2_short varchar2(200 byte)
) tablespace myapp_data;

create unique index results_idx_01 on results(result_id) tablespace myapp_index;
alter table results add constraint results_pk primary key (result_id) using index results_idx_01;
create index results_idx_02 on results(test_id) tablespace myapp_index;
alter table results add constraint results_fk_01 foreign key (test_id) references tests(test_id) on delete cascade;

For ease of grep’ing SQL traces I separated tables and indexes to different Tablespaces. MYAPP_INDEX has one datafile with file# 20.

I wanted to use PL/SQL to keep the test case short and concise. Unfortunately, Oracle’s SYS_GUID function produces “sequential” GUIDs in my setup (Oracle 12.2 CDB on Oracle Linux 7.2). To have random GUIDs in PL/SQL I resorted to Java stored procedures (the implementation is not relevant to the case at hand):

create or replace and compile java source named "RandomGUID"
as
public class RandomGUID {
  public static String create() {
    return java.util.UUID.randomUUID().toString().replaceAll("-", "").toUpperCase();
  }
}
/

create or replace function random_java_guid
return varchar2
is
language java name 'RandomGUID.create() return java.lang.String';
/

create or replace function random_guid
return raw
is
begin
  return hextoraw(random_java_guid());
end random_guid;
/

Works like a charm 🙂

select random_guid() guids from dual connect by level <= 5;

GUIDS
--------------------------------
54746B0750374133BCF9FA85A6F2F532
C92168647BEC4A93982F19498238757E
3E4B858F41764126B177FCCB30CC73C5
4B7CD39D222D4E339482CD25F9AD6EA2
B8C5367F6B944EEA9BD71611CCB54E72

That’s it for the schema setup. To load data I use below code snippet (test1.sql) which in a loop inserts 1000 Tests. After each test it inserts 100 Results per Test in the inner loop and commits. The “RRT” package is used to collect session statistics (gv$session) and tracking wall clock time.

set verify off feedback off
set serveroutput on size unlimited

<<myapp>>
declare
  test_id tests.test_id%type;
begin
  rrt.stat_cfg_default;
  rrt.stat_start('&1');
  rrt.timer_start();

  for tid in 1 .. 1000 loop
    insert into tests t (test_id, vc2_long) values (random_guid(), dbms_random.string('A', trunc(dbms_random.value(200, 4000))))
    returning t.test_id into myapp.test_id;
    
    for rid in 1 .. 100 loop
      insert into results r (result_id, test_id, dt, vc2_short) values (random_guid(), myapp.test_id, sysdate, dbms_random.string('L', trunc(dbms_random.value(10, 200))));
    end loop;
    
    commit;
  end loop;

  dbms_output.put_line('Elapsed:' || to_char(rrt.timer_elapsed));
  rrt.stat_stop;
end;
/

To run the script concurrently in multiple sessions I use following shell script. It just spawns N SQL*Plus processes and runs the SQL script.

#!/usr/bin/bash

SCRIPT=$1
DEGREE=$2
EZCONN=$3

declare -i runid=$$
declare -i cnt=1

echo "Coordinator process pid ${runid}"
until [ $cnt -gt ${DEGREE} ]
do
  sqlplus -S -L ${EZCONN} @${SCRIPT} ${SCRIPT}.${runid}.${cnt}.log > ${SCRIPT}.${runid}.${cnt}.log &
  echo "Started sub-process ${cnt} with pid $!"
  cnt=$(( cnt + 1 ))
done

I ran the test case multiple times letting the database grow bigger with every run. I kept running until the performance got unbearable. On my VirtualBox VM setup it took 9 runs with 10 concurrent processes for the performance to go way down south. I deliberately configured a small SGA of 800 MB which resulted in about 500 MB buffer cache to hit the issue soon. Obviously, your milage may vary as the larger the buffer cache the longer you can sustain.
Let’s look at resource usage profile of one of the sessions from run 9. I processed the raw extended SQL trace file using Chris Antognini’s TVD$XTAT.

Wow, this single session did 16,158 random physical reads (db file sequential read). Let’s see which statements contributed to those reads:

Almost all of the random reads (16,078) came from the INSERT statement with ID 2 which is:

All this physical read I/O to INSERT records, which is all this test case does. And as you can see above the I/O is done on file 20 from the MYAPP_INDEX Tablespace. What’s the distribution between objects?

$ grep 'db file.*file#=20' *ora_5592.trc | cut -d ' ' -f12 | sort | uniq -c
     42 obj#=61805
  14538 obj#=61807
   1498 obj#=61808

select object_id, object_name from cdb_objects where con_id = 4 and owner = 'MYAPP' and object_id in (61805, 61807, 61808) order by object_id;

 OBJECT_ID OBJECT_NAME
---------- ---------------
     61805 TESTS_IDX_01
     61807 RESULTS_IDX_01
     61808 RESULTS_IDX_02

Considering the data pattern it makes sense that RESULTS_IDX_01 is suffering the most. It is the unique index that gets the most index entries added. Index RESULTS_IDX_02 benefits from the repeating foreign key values per outer-loop that all go into the same leaf blocks which Oracle most likely caches.

select table_name, index_name, uniqueness, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key, avg_data_blocks_per_key, clustering_factor, num_rows from dba_indexes where index_name in ('TESTS_IDX_01', 'RESULTS_IDX_01', 'RESULTS_IDX_02') order by table_name desc, index_name;

TABLE_NAME  INDEX_NAME      UNIQUENESS  BLEVEL  LEAF_BLOCKS  DISTINCT_KEYS  AVG_LEAF_BLOCKS_PER_KEY  AVG_DATA_BLOCKS_PER_KEY  CLUSTERING_FACTOR  NUM_ROWS
----------- --------------- ----------- ------- ------------ -------------- ------------------------ ------------------------ ------------------ --------
TESTS       TESTS_IDX_01    UNIQUE      1       467          90000          1                        1                        89999              90000
RESULTS     RESULTS_IDX_01  UNIQUE      2       44237        8941107        1                        1                        8941107            8941107
RESULTS     RESULTS_IDX_02  NONUNIQUE   2       50757        92216          1                        4                        424304             9137794

Up to test run 8 Oracle could keep the most relevant parts of the indexes in the buffer cache. After that Oracle had to constantly “swap” index leaf blocks in and out of the buffer cache.

select segment_name, tablespace_name, blocks, extents, bytes/1024/1024 mb from dba_segments where (segment_name like 'TESTS%' or segment_name like 'RESULTS%') order by segment_name desc;

SEGMENT_NAME    TABLESPACE_NAME  BLOCKS  EXTENTS  MB
--------------- ---------------- ------- -------- ----
TESTS_IDX_01    MYAPP_INDEX      512     4        4
TESTS           MYAPP_DATA       31360   245      245
RESULTS_IDX_02  MYAPP_INDEX      50560   395      395
RESULTS_IDX_01  MYAPP_INDEX      44928   351      351
RESULTS         MYAPP_DATA       191744  1498     1498

I also graphed the session statistics aggregated (average of the 10 concurrent sessions) by test runs. It clearly shows a correlation between physical reads and elapsed time. Who’d a thunk it? 🙂

Due to the random character of GUIDs every new index entry potentially goes into any of the index leaf blocks. Thus, performance is good as long as you can cache the entire index, meaning every index supporting primary and foreign keys in you entire application. That’s probably too much to ask.
Now, you might know about sequential GUIDs that Oracle and SQL Server implement and start using them. This might alleviate the problem somewhat, but could result in the next performance issue: right hand side index leaf block contention!

As this post is getting long already I will talk about a solution that scales in my next installment.

Spot on Oracle

Hopefully enlightening

Monthly Archives: May 2017

Patch the patch: 12.2 DIRECTORY path name

Oracle 12.2 full database encryption (TDE)

Scalable primary keys (GUID follow-up)

GUID, or how to Get Ur Index from Disk