Venzi's Tech-Blog

3 November 2012

Loading data fast – DML error logging performance

Filed under: Code,Java,Oracle,Performance,Programming — Venzi @ 14:49

In the last post of my “Loading data fast” series I showed how DML error logging can be used to prevent batch loads from failing when one or multiple rows can’t be inserted. Now the question is how much performance impact errors have on the mechanism. What I try to prove is that a developer should not just blindly use DML error logging instead of thinking whether the data he intents to insert is primary valid. So let’s have a look:

First I create the same simple table with a primary key on it that I used before:

CREATE TABLE TESTLOADTABLE (id NUMBER, text VARCHAR2(255));
table TESTLOADTABLE created.

CREATE UNIQUE INDEX TESTLOADTABLE_PK ON TESTLOADTABLE(id);
unique index TESTLOADTABLE_PK created.

ALTER TABLE TESTLOADTABLE ADD PRIMARY KEY (id) USING INDEX TESTLOADTABLE_PK;
table TESTLOADTABLE altered.

Then I create the error logging table:

BEGIN
  DBMS_ERRLOG.CREATE_ERROR_LOG('TESTLOADTABLE','ERR_TESTLOADTABLE');
END;
anonymous block completed

So far so good, now let’s execute following tests:

1) Load 10k rows with no errors
2) Load 10k rows with 100 (1%) rows failing
3) Load 10k rows with 1000 (10%) rows failing
4) Load 10k rows with 2500 (25%) rows failing
5) Load 10k rows with 5000 (50%) rows failing
6) Load 10k rows with 7500 (75%) rows failing
And just out of curiosity let’s see how long a regular insert with no error logging takes
7) Load 10k rows with no errors and no error logging

Test 1 – no errors:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 861
Executing overall took: 1020
Preparation took: 159

So overall it took 1020 milli seconds while the job spent 861 milli seconds in the execution of the insert.

Test 2 – 100 errors (1%):

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%100 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 1017
Executing overall took: 1069
Preparation took: 52

This time it took quite a bit long to execute the batch, 1017 milli seconds. The reason for this is obvious. Oracle now has not only to insert 10k rows but also to reject rows and insert them into another table.

Test 3 – 1000 (10%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%10 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 1420
Executing overall took: 1470
Preparation took: 50

And as you can see the more rows that are failing, the longer the insert takes. So let’s have a look at the rest of the tests and see how bad it gets:

Test 4 – 2500 (25%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%4 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 1877
Executing overall took: 1961
Preparation took: 84

Test 5 – 5000 (50%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%2 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 2680
Executing overall took: 2765
Preparation took: 85

Test 6 – 7500 (75%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i<=7500)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 3349
Executing overall took: 3412
Preparation took: 63

So as you can see, the more errors you get, the longer your batch will need to execute. This is only logical of course as the more errors you get, the more exceptions are thrown which then lead to more rows inserted into the error table. The main takeaway is that your insert is not failing and you can smoothly query the error logging table and act appropriately. Just in comparison:

  • Insert no errors: 861ms
  • Insert 1% errors: 1017ms
  • Insert 10% errors: 1420ms
  • Insert 25% errors: 1877ms
  • Insert 50% errors: 2680ms
  • Insert 75% errors: 3349ms

Now let’s do one more test and see how fast a regular insert without DML error logging is. You would think it’s the same but during my experiments if found an interesting fact:

Test 7 – no errors, no error logging:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* regular bulk insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 212
Executing overall took: 372
Preparation took: 160

So a regular insert takes only 212ms while a DML error logging insert takes 861ms. That’s 4 times longer!
The reason for this is because of the unpublished bug 11865420 (My Oracle Support Doc Id: 11865420.8). Once you download the patch and update the system accordingly the insert with DML error logging is just as fast as without:

Regular batch insert:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* regular bulk insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 248
Executing overall took: 399
Preparation took: 151

DML error logging insert:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 227
Executing overall took: 384
Preparation took: 157

Conclusion: This is the last article of the series Loading data fast! In this series I’ve not only shown how bad it is to commit after each row (remember Autocommit in JDBC!) but also how much more speed you can get out of your program by doing batch inserts. I’ve also shown how to deal with potential errors during batch inserts and how much performance impact errors produce. The latter I have done to make it clear that developers should not just throw DML error logging inserts at every problem as they will lose the performance benefit again that batch inserts provide. So now, go over your code and batch up! ūüėČ

25 October 2012

Loading data fast ‚Äď Batch inserts and errors (DML error logging)

Filed under: Code,Java,Oracle,Performance,Programming — Venzi @ 11:02

In the last post of my series “Loading data fast” I showed how batch inserts can actually make a huge difference in insert performance. Now one question remains: What happens when an error occurs like a unique key violation? The answer is: The insert statement will fail with an error and stop. Now, if you didn’t catch the exception, it will be raised and the commit will never be issued which will lead to a loss of all previous inserted data as well. However, if you catch the exception and make sure that you issue a commit afterwards, you will at least have your previous successful inserted data in the table if that is suitable for the business logic¬† – basically meaning that you know where you’ve stopped. Let’s have a look:

In Java you will have to run the PreparedStatement.executeBatch() routine in a try block and execute the Connection.commit() routine in the finally block, like this:


PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}

try
{
  stmt.executeBatch();
}
catch (BatchUpdateException e)
{
  System.out.println("Error during my batch insert example");
  e.printStackTrace();
}
finally
{
  conn.commit();
}

In PL/SQL you will have to run your FORALL statement in a new block, like this:


DECLARE
  TYPE tab is TABLE OF testloadtable%ROWTYPE;
  myvals tab;
BEGIN
  SELECT rownum, 'x'
    BULK COLLECT INTO myvals
      FROM dual
        CONNECT BY LEVEL <= 10000;
  BEGIN
    FORALL i IN myvals.FIRST..myvals.LAST
      INSERT INTO testloadtable (id, text)
        VALUES (myvals(i).id, myvals(i).text);
  EXCEPTION
    WHEN DUP_VAL_ON_INDEX THEN
      DBMS_OUTPUT.PUT_LINE('Unique key violation!');
  END;
  COMMIT;
END;

By doing this, you will at least keep all previous inserted data. However, you will still have to fix the error and run the rest of the batch again and latter can especially in batch load scenarios be rather difficult. If you catch the exception, you will have to find the right position in your batch again in order to continue. In some cases that could imply to rebuild the entire batch again. If you don’t catch the exception, because your batch is one logical unit and either it succeeds or will be rolled back, then you will have to start from scratch again. So the conclusion is: Neither of the two options is optimal!
Fortunately, Oracle provides a feature called “DML error logging“! This feature allows you to simply log all errors into an error table and continue the execution of your DML statement with no error being raised! This feature works with all DML statements: Insert, Update, Merge¬†and¬†Delete. I will, however, focus only on insert statements in this post.

First let’s prove what I just said. All I do is to create a primary key on the table and insert a row before I execute my batch.


CREATE TABLE TESTLOADTABLE (id NUMBER, text VARCHAR2(255));
table TESTLOADTABLE created.

CREATE UNIQUE INDEX TESTLOADTABLE_PK ON TESTLOADTABLE(id);
unique index TESTLOADTABLE_PK created.

ALTER TABLE TESTLOADTABLE ADD PRIMARY KEY (id) USING INDEX TESTLOADTABLE_PK;
table TESTLOADTABLE altered.

Now that I have my primary key created, I insert 1 row with the id 3000 before I execute my batch. The batch contains 10k rows with an id range of 0 – 9999. Once it reaches the row with the id 3000 (3001st row, as I start with id 0), I’ll receive an exception that the row already exists. Note, that all other rows would be fine and valid except this single one with the id 3000. Nevertheless, as I do not make sure that a commit is happening, I will lose all of them:

conn.prepareStatement("INSERT INTO testloadtable (id, text) VALUES (3000,'x')").execute();
conn.commit();

stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
stmt.executeBatch();
conn.commit();

As expected, I get following exception back:

Exception in thread "main" java.sql.BatchUpdateException: ORA-00001: unique constraint (TEST.SYS_C0010986) violated

And when I do a count on the table, I have only my previously inserted row in there:


SELECT COUNT(*) FROM testloadtable
COUNT(*)
--------
1

Now let’s see what I get when I do catch the exception and make sure to issue a commit:

conn.prepareStatement("INSERT INTO testloadtable (id, text) VALUES (3000,'x')").execute();
conn.commit();

PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}

try
{
  stmt.executeBatch();
}
catch (BatchUpdateException e)
{
  System.out.println("Error during my batch insert example");
  e.printStackTrace();
}
finally
{
  conn.commit();
}

The output shows that I caught the exception successfully:


Error during my batch insert example
java.sql.BatchUpdateException: ORA-00001: unique constraint (TEST.SYS_C0010986) violated

And a count on the table now shows that I have 3001 rows in it; my previously inserted row with id 3000 and the first 3000 rows from the batch with id 0 – 2999:

SELECT COUNT(*) FROM testloadtable
COUNT(*)
--------
3001

Now as said before, the great thing about DML error logging is, that it offers you the ability to load your entire batch without receiving an error at all. Instead the error(s) will be logged into a separate error table which can then be queried and appropriate actions taken afterwards. All you need to do is to modify your DML statement to include the LOG ERRORS clause. You will also have to create an error logging table via the DBMS_ERRLOG package first. Optionally, you can:

  • Include a tag that gets added to the error log to help identify the statement that caused errors
  • Include the REJECT LIMIT subclause, which allows you to define a upper limit of errors that can be encountered before the statement fails. If you have a batch of 10k rows where 7k rows are failing, it probably doesn’t make much sense anymore to continue as there seems to be a bigger problem. This clause offers you the functionality to still raise the error if a certain threshold is reached. I guess in order to make sure that nobody accidentally adds the error logging clause and so suppresses all errors, the default value is 0. Which means that if you omit the REJECT LIMIT clause, an error will be logged into the error logging table but the statement will also be terminated. If you want to log all errors but raise no error you will have to define UNLIMITED

The error table itself is a copy of the target table with a few more columns, telling you things like error number, error message and so forth. But the important part is that it will contain all the columns of your target table and all those columns will contain the values from the failing row of the insert. This means that you will not lose any information of your batch and do not need to re-execute the entire batch again!
Let’s see how it works! First I create an error logging table called ERR_TESTLOADTABLE:


BEGIN
  DBMS_ERRLOG.CREATE_ERROR_LOG('TESTLOADTABLE','ERR_TESTLOADTABLE');
END;
anonymous block completed
DESC err_testloadtable
Name            Null Type
--------------- ---- --------------
ORA_ERR_NUMBER$      NUMBER
ORA_ERR_MESG$        VARCHAR2(2000)
ORA_ERR_ROWID$       UROWID()
ORA_ERR_OPTYP$       VARCHAR2(2)
ORA_ERR_TAG$         VARCHAR2(2000)
ID                   VARCHAR2(4000)
TEXT                 VARCHAR2(4000)

All I then have to do, is to modify my original code so that the insert statement includes a “LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED” clause at the end of it:

conn.prepareStatement("INSERT INTO testloadtable (id, text) VALUES (3000,'x')").execute();
conn.commit();

PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
stmt.executeBatch();
conn.commit();

Although I inserted the row with the id 3000 first, no error was reported anymore during the execution. However, if I have a look at my ERR_TESTLOADTABLE, I see following:


SELECT * FROM ERR_TESTLOADTABLE
ORA_ERR_NUMBER$ ORA_ERR_MESG$                                             ORA_ERR_ROWID$  ORA_ERR_OPTYP$ ORA_ERR_TAG$ ID    TEXT
--------------- --------------------------------------------------------- --------------- -------------- ------------ ----- --------
1               ORA-00001: unique constraint (TEST.SYS_C0010987) violated                 I                           3000  test3000

The error logging table shows me:

  1. The error number “1” which means “ORA-00001”
  2. The error message
  3. RowId: In this case null as it was an insert and the row didn’t exist before
  4. The operation type “I” indicating Insert
  5. An empty error tag, as I have not defined one
  6. ID column value from the failing row
  7. Text column value from the failing row

Conclusion: DML error logging allows you to use batch DML statements that won’t fail on errors. This gives you the benefit of maximum flexibility when you perform batch operation!
In my next post I will take a look at the performance impact of DML error logging.

31 March 2012

Loading data fast – regular insert vs. bulk insert

Filed under: Code,Java,Oracle,Performance,Programming — Venzi @ 10:39

In my last post I talked about how persisting data can become the bottleneck on large high scale systems nowadays. I also talked about that more and more people tend to think that databases are simply slow, seeing them as just big I/O systems. And I talked about how lots of applications are still inserting data as they used to do years ago rather to use bulk inserts.

In this post I will show you how bulk inserts can actually boost your inserts and therefore you systems. I will use a simple example showing the difference between:

  • Single row insert with commit
  • Single row insert with only one final commit
  • Bulk insert with final commit

Let’s assume you have a java program that needs to load some data from a file into a single table. Each line in the file represents a row in the database. I won’t go into how to read from the file and build your data together. This is out of scope for this post and not relevant to show the benefit of bulk inserts over regular ones. First let’s build a simple two column table, including an “id” column as NUMBER and a “text” column as VARCHAR2:

CREATE TABLE TESTLOADTABLE (id NUMBER, text VARCHAR2(255));

table TESTLOADTABLE created.
 

For each test I truncate the table first just to make sure that I always load the same amount of data into the same empty table. I add a comment in the statements, so that I can separate them out later on in the trace file.

The first example loads 10,000 rows into the table. It will simply insert an incrementing counter and a string into the table followed by a commit.

conn.prepareStatement("TRUNCATE TABLE testloadtable").execute();
conn.prepareStatement("ALTER SESSION SET SQL_TRACE=TRUE").execute();

PreparedStatement stmt = conn.prepareStatement("INSERT /* conventional insert with commit */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.execute();
  conn.commit();
}

conn.prepareStatement("ALTER SESSION SET SQL_TRACE=FALSE").execute();

Looking at the trace file it took the program 2.21 seconds to load these 10,000 rows. You can also see that the statement got actually executed 10,000 times – the commits unfortunately don’t show up in the formatted trace file but they would be listed in the raw data trace file.

 SQL ID: 337xy5qc84nsq Plan Hash: 0

INSERT /* conventional insert with commit */ INTO testloadtable (id, text)
 VALUES
 (:1 ,:2 )

call     count       cpu    elapsed       disk      query    current        rows
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 Parse        1      0.00       0.00          0          0          0           0
 Execute  10000      2.06       2.21          2         90      20527       10000
 Fetch        0      0.00       0.00          0          0          0           0
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 total    10001      2.06       2.21          2         90      20527       10000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 111
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
 ---------- ---------- ----------  ---------------------------------------------------
 0          0          0  LOAD TABLE CONVENTIONAL  (cr=5 pr=2 pw=0 time=1637 us)

The next test puts the commit outside of the loop. So I still add the data row by row to the table but the commit itself happens only once after all rows were loaded:

conn.prepareStatement("TRUNCATE TABLE testloadtable").execute();
conn.prepareStatement("ALTER SESSION SET SQL_TRACE=TRUE").execute();

PreparedStatement stmt = conn.prepareStatement("INSERT /* conventional insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.execute();
}
conn.commit();

conn.prepareStatement("ALTER SESSION SET SQL_TRACE=FALSE").execute();

As the results show the statement was still executed 10000 times. However, this time it took only 1.19 seconds to insert all the data. So by just moving the commit to the end, after all inserts were done, I gained already 57% more performance! Although it seems that commits are light weighted and don’t do much, the database still has some tasks to accomplish to make sure that your transaction is saved and visible. And of course instead of having only 1 transaction, I have 10,000 in this case.

 SQL ID: drsv4dw4037zj Plan Hash: 0

INSERT /* conventional insert */ INTO testloadtable (id, text)
 VALUES
 (:1 ,:2 )

call     count       cpu    elapsed       disk      query    current        rows
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 Parse        1      0.00       0.00          0          0          0           0
 Execute  10000      1.09       1.19          2        114      10562       10000
 Fetch        0      0.00       0.00          0          0          0           0
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 total    10001      1.09      1.19          2        114      10562       10000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 111
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
 ---------- ---------- ----------  ---------------------------------------------------
 0          0          0  LOAD TABLE CONVENTIONAL  (cr=5 pr=2 pw=0 time=1507 us)
 

The Oracle JDBC driver does support bulk inserts. What you can do is that you add all your data to a “batch” and then execute the entire batch. This gives you the advantage that there is only 1 INSERT statement executed which inserts all your data into the table at once! So instead of 10,000 round-trips, I only have 1 sending all the data over in one big chunk:

conn.prepareStatement("TRUNCATE TABLE testloadtable").execute();
conn.prepareStatement("ALTER SESSION SET SQL_TRACE=TRUE").execute();

PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
stmt.executeBatch();
conn.commit();

conn.prepareStatement("ALTER SESSION SET SQL_TRACE=FALSE").execute();

Now these results are pretty amazing! There was only 1 execution of that insert statement and that loaded the entire batch in only 0.06 seconds!

 SQL ID: gfkg1d43va20y Plan Hash: 0

INSERT /* addBatch insert */ INTO testloadtable (id, text)
 VALUES
 (:1 ,:2 )

call     count       cpu    elapsed       disk      query    current        rows
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 Parse        1      0.00       0.00          0          0          0           0
 Execute      1      0.05       0.06          0        129        394       10000
 Fetch        0      0.00       0.00          0          0          0           0
 ------- ------  -------- ---------- ---------- ---------- ----------  ----------
 total        2      0.05       0.06          0        129        394       10000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 111
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
 ---------- ---------- ----------  ---------------------------------------------------
 0          0          0  LOAD TABLE CONVENTIONAL  (cr=139 pr=0 pw=0 time=51971 us)
 

As you can see, bulk inserting can give your system a huge performance boost! In this example here we are not talking about an improvement of percents anymore, but about factors! Imagine what this could gain you when you have even more data!

This post shows that persisting data into a database is not just simply slow but it also matters a lot how you actually insert that data!

23 March 2012

Loading data fast

Filed under: Code,Oracle,Performance,Programming — Venzi @ 14:00

Today more and more applications have the requirement to load data fast. While this requirement isn’t new in IT, nowadays it mainly means to persist data fast into the database. In times of cheap memory and lots of processing power, system have developed from slim single-threaded applications to huge multi-threaded clusters using lots of memory which provides them with low latency. As a result of that, working with data inside the application became cheap, in some cases even bringing data into the system, e.g. over some in-memory queuing systems. However, the persisting stage still needs to do I/O at some point and although with faster disks emerging these days, I/O can still not compete with the low latency that memory provides. Based on that conclusion there are more and more programmers out there nowadays, thinking that I/O is bad thing and as memory became cheap, they rather dump everything into memory and tell the client to buy more if needed, instead of thinking clearly what needs to be cached and what not. They also tend to see the database as just a big I/O system and therefore associate automatically that it is simply slow and should be avoided. But that’s far away from the truth! Databases also use caching mechanisms which usually are highly sophisticated, having years and years of brainpower in their algorithms. Just because you insert data into the database does not mean at all that this data goes straight to disk. If you insert data into the Oracle database your data will be inserted into the buffer pool cache of the SGA memory structure. And more important, assuming the buffer pool is large enough, it will stay there! Only later on the DB writer background process(es) will actually take that data and persist it onto disk – completely decoupled from your transaction. So but if the database also uses memory for caching, how come that my inserts are still so much slower? In my experience programmers forget to ask one important question: How does the data get inserted? There are tons and tons of applications out there that didn’t change the way of inserting data since years. But just because it was the best way to do so years ago doesn’t necessarily mean that it still is! Or in other scenarios the programmer does not even know about it, because the application is using an object-relational mapping (ORM) tool.

There are several ways how to load data into a table. Most commonly used are following:

  • SQL*Loader – An Oracle tool for loading data fast from external files into tables
  • CREATE TABLE … AS SELECT (CTAS) statement – This SQL statement allows you to create a table and populate it with data selected from another existing table which can also be an external table.
  • MERGE statement – This SQL statement enables you either insert into or update rows of a table by selecting them from another existing table. If a row in the new data corresponds to an already existing row in the table, then an UPDATE is performed instead of an INSERT.
  • INSERT statement – The traditional INSERT statement for inserting one or more rows into a table

In this post I will focus only on the INSERT statement as this is used within most applications.

How often have you seen code like this:

 while (!file.isEOF())
 {
   line = file.getNextLine();
   INSERT INTO table (col1, col2, col3) VALUES (line.val1, line.val2, line.val3);
   COMMIT;
 }
 

What this piece of code does, is to read line by line from a file and for each line it inserts the data into a table. On the first sight this seems pretty much the best way of doing it and years ago it probably was. However, what happens here is that for each line in the file an INSERT statement¬†with a set of data is send to the database¬† followed by a COMMIT. So for each line one DML statement gets executed, including a round-trip from your application to the database. But that’s not all. Once the INSERT statement arrives at the database, the database has to parse the statement and determine whether it is already known and an execution plan exists for it. Only after all those steps are done it can then execute the statement and finally load the data into the table. After the row got into the table the application executes a COMMIT which triggers another round-trip to the database and causes the databases to make sure that the row is visible to everybody and its transactional integrity given. For a file that contains only 10 lines all these steps might not be a lot of effort and is done pretty quick, but imagine when you have a file with 10,000 lines! Suddenly having 10,000 round-trips¬† for the INSERT statement, another 10,000 round-trips for the COMMIT, 10,000 parses (soft parses) of the INSERT statement and 10,000 times making sure that transactional integrity is given, becomes very expensive. And as you execute the same statement over and over again, some of the steps performed by the database become redundant. The database doesn’t have to check each time if the statement is already known as you execute it over and over again. It also does not make sense to perform 20,000 round-trips for only 10,000 rows of data. And you might not even want the user to be able to see the data already as long as the file is not completely loaded. So ideally what you want is something like this:

 while (!file.isEOF())
 {
   lines += file.getNextLine();
 }
 INSERT INTO table (col1, col2, col3) VALUES (lines);
 COMMIT;
 

First you read all the data from the file and keep them in memory. Then you insert all that data into the database at once. The data is sent over to the database only once, the parsing of the statement happens only once and the transactional integrity checks/commit happens only one time. Instead of 20,000 round-trips you have only 2!
Now the good news: Databases nowadays do support these kind of loading operations! It is referred as BULKING or BULK INSERTS. While in the past there were various reasons that did not allow you bulking, either because the mechanism didn’t exist back then or there was not sufficient memory to load more than 1 line at once anyway, there are no more reasons these days anymore. But the point is: You have to change the way how you insert data into your database. Sometimes this can be changed rather quick but sometimes there can be major efforts involved with that.

In my next post I will show some examples how to use bulk inserts and what difference they can make!

27 May 2008

Easy way to write SYS.XMLTYPE into a file using JAVA

Filed under: Code,Java,Oracle,Work — Venzi @ 09:42

It has been a while since my last post and I’m very sorry for that but I’m currently very busy in work as in my private life.

Yesterday I struggled a little bit around with generating xml from a select and put it into a plain text file. I needed a fast way to put the xml into a file for a temporary workaround so I didn’t really care about clean way to do this. The first part was pretty easy (generating xml from a select) but on the second part I wasted too much time for that simple task. Unfortunately also google wasn’t really helpful so I decided to put the few lines of code into my blog so that it’s documented for all the other peoples out there.

First part – Generate the XML using SQL/XML:

PreparedStatement stmt1 = Conn1.prepareStatement(“SELECT XMLELEMENT(\”TestSuite\”,” +
“(SELECT XMLAGG(XMLELEMENT(\”TestCase\”, XMLELEMENT(\”RuntimeInstanceId\”, instance_id)))” +
“FROM MYTESTS WHERE test_id = ?)).getclobval() as Result FROM dual”);
stmt1.setString(1, this.sTestId);
ResultSet result = stmt1.executeQuery();

The important thing here is the .getclobval() after the column in the select clause but first let’s see the second part.

Second part – Writing the XML to a plain text file:

result.next();
Writer output = new BufferedWriter(new FileWriter(new File(this.sFile)));
output.write(result.getString(1));
output.close();

That part looks now also pretty easy as you just can use result.getString(1) in here. But this is just possible because we called the getclobval() function in the select. This is a method of the XMLTYPE result object and will cast the result from XMLTYPE to CLOB. Without using getclobval() you will be surprised because result.getSQLXML(1) will simply not work. So if you need a fast way to write a XML to a plain text file just use getclobval() method of the XMLTYPE object.

11 January 2008

11g and the new SecureFILE LOB data type (simple quick test)

Filed under: Code,Oracle,Performance,Work — Venzi @ 18:21

As I had to run some comparison test Oracle 10g and 11g I also did a little test with the new and fast (as Oracle promises) SecureFILE LOB data type. The test is really little as there wasn’t any time to test the feature totally out, but also this result is impressive. Of course all test occurred on the same box and the same settings:

First the 10g test:

The first step is to create a table with a CLOB data type:

SQL> create table test1 (created_tms date, data clob) tablespace data1 lob (data) store as (cache);
Table created.
SQL>

I used cache to get as much bytes as possible into the buffer cache. If you need performance on LOB anyone would activate this as first step. As next step I wrote a little PL/SQL program¬† which inserts 100.000 rows with a 4402 bytes (that’s 4.4GB all together!):

SQL> set serveroutput on;
SQL> declare
2    thedata CLOB := TO_CLOB (
3¬† ‘somedata somedata somedata somedata somedata somedata somedata somedata somedata somedata
4  somedata somedata somedata somedata somedata somedata somedata somedata somedata somedata
……
49  somedata somedata somedata somedata somedata somedata somedata somedata somedata somedata
50¬† somedata somedata somedata somedata somedata somedata somedata somedata somedata somedata ‘);
51  begin
52¬†¬†¬† dbms_output.put_line(‘Begin: ‘ || systimestamp);
53    for n in 1..100000 loop
54      insert into test1 values (sysdate,thedata);
55    end loop;
56¬†¬†¬† dbms_output.put_line(‘End: ‘ || systimestamp);
57  end;
58  /
Begin: 11-JAN-08 04.30.50.203325000 PM +01:00
End: 11-JAN-08 04.32.50.422759000 PM +01:00

PL/SQL procedure successfully completed.

Checking the time stamp you can see that with 10g the inserts took nearly exactly 2 minutes (the last time, I executed it several times). Also you can see that I inserted the actual sysdate in a column too. So now we can compute a average of inserts per second:

SQL> select count(*)/((max(created_tms)-min(created_tms))*24*60*60) from test1;

COUNT(*)/((MAX(CREATED_TMS)-MIN(CREATED_TMS))*24*60*60)
——————————————————-
833.333333

The last 4 runs made (average inserts/second):

840.336134
826.446281
800
833.333333

So, now we going to 11g. First step: We need a table with the new SecureFILE LOB type:

SQL>
SQL> create table test1 (created_tms date, data clob) tablespace data1 lob (data) store as securefile cache;

Table created.

Next, I executed exactly the same procedure:

SQL> set serveroutput on;
SQL> declare
2    thedata CLOB := TO_CLOB (
3¬† ‘somedata somedata somedata somedata somedata somedata somedata somedata somedata somedata
……
49  somedata somedata somedata somedata somedata somedata somedata somedata somedata somedata
50¬† somedata somedata somedata somedata somedata somedata somedata somedata somedata somedata ‘);
51  begin
52¬†¬†¬† dbms_output.put_line(‘Begin: ‘ || systimestamp);
53    for n in 1..100000 loop
54      insert into test1 values (sysdate,thedata);
55    end loop;
56¬†¬†¬† dbms_output.put_line(‘End: ‘ || systimestamp);
57  end;
58  /
Begin: 11-JAN-08 04.19.06.534822000 PM +01:00
End: 11-JAN-08 04.20.01.703784000 PM +01:00

PL/SQL procedure successfully completed.

Looking at the time stamps the run took less than a minute! Remember, 10g needed around 2 minutes. So lets look to the average inserts per second:

SQL> select count(*)/((max(created_tms)-min(created_tms))*24*60*60) from test1;

COUNT(*)/((MAX(CREATED_TMS)-MIN(CREATED_TMS))*24*60*60)
——————————————————-
1818.18182

We can see that the program made average 1818 inserts per second. Compared to 833 from 10g it’s more than 100% faster! And that’s not bad without any fine tuning I would say!

The last 4 runs made:

1960.78431
1818.18182
1754.38596
1818.18182

I hope I’ve some time left to provide a good test case, but also this quick one showed that the SecureFILE¬† data type really speeded-up compared to convention LOB, and that’s what Oracle promised! ūüėČ

28 November 2007

Prepared statement in batch mode vs. FORALL in PL/SQL

Filed under: Code,Java,Oracle,Performance,PL/SQL,Work — Venzi @ 12:55

A few days ago I had a nice discussion with one of our developers here in the office. The subject: What’s faster? Prepared statements with batch mode or calling a PL/SQL function which is using FORALL.

So I made a simple test: Insert 1 million rows with two columns (a number and a string column) into a table. The first part builds the values in java give it to the batch array and executes the insert in batch mode (This feature of JDBC stores the values in the memory until you execute the batch. Then JDBC executes the DML and loads the array at ones, very similar to FORALL.) The second part builds the values also in java creates two collections and call a procedure with the collections. The procedure itself just performs a FORALL over the insert.

So the first part of the java code looks like this:

long l = System.currentTimeMillis();
PreparedStatement pstmt = conn1.prepareStatement(“INSERT INTO TEST VALUES(?,?)”);
System.out.println(“PrepareStatement took: “+(System.currentTimeMillis() – l));
long l1 = System.currentTimeMillis();
for (int i=0;i<1000000;++i)
{
pstmt.setInt(1, i);
pstmt.setString(2, “testvalue”+i);
pstmt.addBatch();
if(i%16961 == 0)
pstmt.executeBatch();
}

pstmt.executeBatch();
System.out.println(“ExecuteBatch took: “+ (System.currentTimeMillis() – l1));

Don’t worry about the if with the modulo function. It turned out that JDBC has a bug with addBatch. It can just execute around 16000 rows at ones. If you add more rows it just ignores them! So I had to call the executeBatch more often. But we will see, if this slows down the test.

The second part of the java code looks like this:

l = System.currentTimeMillis();

ArrayDescriptor integer = ArrayDescriptor.createDescriptor(“INTARRAY”, conn1);
ArrayDescriptor varchar = ArrayDescriptor.createDescriptor(“VARARRAY”, conn1);

ARRAY pliArray = new ARRAY(integer,conn1, iArray);
ARRAY plsArray = new ARRAY(varchar,conn1, sString);
System.out.println(“Creating the Array took: “+(System.currentTimeMillis() – l));

l = System.currentTimeMillis();
CallableStatement callStatement = (CallableStatement)conn1.prepareCall(“{call insertvalues(?,?)}”);
System.out.println(“Preparing Call took: “+(System.currentTimeMillis() – l));

callStatement.setArray(1, pliArray);
callStatement.setArray(2, plsArray);

l1 = System.currentTimeMillis();
callStatement.execute();
System.out.println(“Executing PL/SQL took: “+(System.currentTimeMillis() – l1));

The PL/SQL code:

CREATE OR REPLACE TYPE TEST.intArray IS VARRAY(1000000) OF NUMBER;
show errors;
CREATE OR REPLACE TYPE TEST.varArray IS VARRAY(1000000) OF VARCHAR2(18);
show errors;

create or replace procedure insertvalues (nValue1 IN intArray, sValue2 IN varArray) is
begin
forall nvalue in nValue1.first..nValue1.last
insert into test values (nValue1(nvalue), sValue2(nvalue));
end;
show errors;

Well the result surprised me a little bit. Java was much faster. Look at the trace alone (trimmed!):

INSERT INTO TEST
VALUES
(:1,:2)

call count cpu elapsed disk query current rows
——- —— ——– ———- ———- ———- ———- ———-
Parse 1 0.00 0.00 0 0 0 0
Execute 60 1.38 2.08 0 7646 34579 1000000
Fetch 0 0.00 0.00 0 0 0 0
——- —— ——– ———- ———- ———- ———- ———-
total 61 1.38 2.08 0 7646 34579 1000000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 1159

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
—————————————- Waited ———- ————
SQL*Net message to client 60 0.00 0.00
SQL*Net message from client 60 0.09 1.64
SQL*Net more data from client 10726 0.00 0.72
********************************************************************************

BEGIN insertvalues(:1,:2); END;

call count cpu elapsed disk query current rows
——- —— ——– ———- ———- ———- ———- ———-
Parse 1 0.01 0.00 0 0 0 0
Execute 1 7.77 9.01 0 9277 34886 1
Fetch 0 0.00 0.00 0 0 0 0
——- —— ——– ———- ———- ———- ———- ———-
total 2 7.78 9.01 0 9277 34886 1

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 1159

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
—————————————- Waited ———- ————
SQL*Net more data from client 10399 0.00 1.17
SQL*Net message to client 1 0.00 0.00
log file sync 1 0.21 0.21
SQL*Net message from client 1 0.00 0.00
********************************************************************************
As you can see, the count of the procedure execution is just 1, from JDBC it is 60 (because of the bug with the 16000 rows). The elapsed time with prepared statement just took 2.08 seconds but with FORALL it took 9.01 seconds. And the CPU time is also of course lower than in PL/SQL. To be fair: That 9.01 seconds also include the call of the procedure plus the load of the collections. As it showed, just creating the collection in java took around 2 seconds!

In summary: You can see that prepared statement is much faster than FORALL. But to be fair: FORALL was developed to have some kind of bulking in PL/SQL within cursors to decrease the calls between SQL and PL/SQL engine. But if you using Java and JDBC anyway, work with prepared statements and with batch mode if possible. Don’t mind about the SQL*Net waiting events, these are idle events what means that your application isn’t waiting for it!

BTW: That JDBC bug is fixed in 11g and 10.2.0.3!

2 October 2007

The beauty of Java stored procedures

Filed under: Code,Java,Oracle,PL/SQL — Venzi @ 09:05

Since 8i you can write stored procedures also in Java. But what is the advantage of writing a procedure in Java? Well the first advantage is that you’re now able to write complex solutions and use them in PL/SQL. You can write your own package for file I/O, email reading/sending, OS watching, an API for ZIP-files and many more. You’ve all the possibilities from Java now available in PL/SQL! And another advantage: It runs in the Oracle JVM which means it have access to the huge SGA and can do DB operations with better performance than outside it! And it’s easy to write it. For example a little Java stored procedure which shows you the properties of the Oracle JVM:

The first step is to write the java source which looks like this:

create or replace and compile java source named “JavaProps” as

And now continue with normal Java code:

public class props
{
public static void GetProps( )
{
System.getProperties().list(System.out);
}
}
/

Well this now creates the Java source inside the database. Now we create the stored procedure over it to use it inside PL/SQL:

create or replace procedure JavaProps
as language java name ‘props.GetProps()’;
/

And that it was! Now run the procedure in an anonymous PL/SQL block:

set serveroutput on size 10000;
begin
dbms_java.set_output (10000);
JavaProps;
end;
/

So you see, writing Java stored procedures isn’t difficult and allows you to do now powerful operations inside of PL/SQL!

27 September 2007

BULK COLLECT & FORALL vs. CURSOR & FOR-LOOP

Filed under: Code,Oracle,Performance,PL/SQL — Venzi @ 13:49

After more and more reads about BULK COLLECT and FORALL and their performance improvements I decided to have a closer look on it by myself to see how powerful they really are. So I built a little test-case which inserts all entries from the all_object view into another table. The inserts happens on three different ways:
First way is a simple cursor over the view and a insert in a loop with FETCH into local variables. This way also shows how slow the opening of the cursor itself is.
The second way is a simple FOR – IN LOOP with the insert of the cursor variables.
And, of course, the third way is the way with bulking the rows and inserting them with FORALL so lets see.

So the other table looks like this (three columns are enough for this tests)

SQL> create table temp (owner varchar2(30), name varchar2(30), type varchar2(19));

Table created.

And the three diffrent procedures looks like this

CREATE OR REPLACE PROCEDURE CURSOR_FOR_OPEN_QUERY
 IS
 l_sOwner VARCHAR2(30);
 l_sName VARCHAR2(30);
 l_sType VARCHAR2(19);
 CURSOR cur IS SELECT owner, object_name name, object_type type FROM all_objects;
 BEGIN
 dbms_output.put_line('Before CURSOR OPEN: ' || systimestamp);
 OPEN cur;
 dbms_output.put_line('Before LOOP: ' || systimestamp);
 LOOP
 FETCH cur INTO l_sOwner, l_sName, l_sType;
 IF cur%NOTFOUND THEN
 EXIT;
 END IF;
 INSERT INTO temp values (l_sOwner, l_sName, l_sType);
 END LOOP;
 CLOSE cur;
 dbms_output.put_line('After CURSOR CLOSE: ' || systimestamp);
 COMMIT;
 END;
 /

CREATE OR REPLACE PROCEDURE CURSOR_FOR_QUERY
 IS
 BEGIN
 dbms_output.put_line('Before CURSOR: ' || systimestamp);
 FOR cur IN (SELECT owner, object_name name, object_type type FROM all_objects) LOOP
 INSERT INTO temp values (cur.owner, cur.name, cur.type);
 END LOOP;
 dbms_output.put_line('After CURSOR: ' || systimestamp);
 COMMIT;
 END;
 /

CREATE OR REPLACE PROCEDURE BULK_COLLECT_QUERY
 IS
 TYPE sOwner IS TABLE OF VARCHAR2(30);
 TYPE sName IS TABLE OF VARCHAR2(30);
 TYPE sType IS TABLE OF VARCHAR2(19);
 l_sOwner sOwner;
 l_sName sName;
 l_sType sType;
 BEGIN
 dbms_output.put_line('Before Bulk Collect: ' || systimestamp);
 SELECT owner, object_name, object_type
 BULK COLLECT INTO l_sOwner, l_sName, l_sType
 FROM all_objects;
 dbms_output.put_line('After Bulk Collect: ' || systimestamp);
 --
 FORALL indx IN l_sName.FIRST..l_sName.LAST
 INSERT INTO temp values (l_sOwner(indx), l_sName(indx), l_sType(indx));
 --
 dbms_output.put_line('After FORALL: ' || systimestamp);
 COMMIT;
 END;
 /

Ok, then I bounced the database to get no buffers, caching, etc. on it.

So the first execute

SQL> exec cursor_for_open_query
Before CURSOR OPEN: 27-SEP-07 10.56.30.699401000 AM +02:00
Before LOOP: 27-SEP-07 10.56.30.922366000 AM +02:00
After CURSOR CLOSE: 27-SEP-07 10.57.07.699791000 AM +02:00

Only look at the seconds it took 37 seconds and nearly nothing for opening the cursor! But how much rows were inserted?

SQL> select count(*) from temp;

COUNT(*)
———-
49424

Truncate the table (truncate to free the extends!) and bounce the database again and now the second run

SQL> exec cursor_for_query
Before CURSOR: 27-SEP-07 10.59.47.848249000 AM +02:00
After CURSOR: 27-SEP-07 11.00.09.072525000 AM +02:00

The whole loop took 22 seconds, well this looks already better. Well, also all rows inserted?

SQL> select count(*) from temp;

COUNT(*)
———-
49424

But now (after truncate and bouncing) the bulk collect run

SQL> exec bulk_collect_query
Before Bulk Collect: 27-SEP-07 11.01.33.553224000 AM +02:00
After Bulk Collect: 27-SEP-07 11.01.41.874054000 AM +02:00
After FORALL: 27-SEP-07 11.01.42.065753000 AM +02:00

Look at this, for bulking all the lines into the collection took just 8 seconds (for 49 424 rows) and the inserts just 1 second! Unbelievable, together we did everything in 9 seconds where the other ways took over 20 seconds!

Well now lets try to first execute the bulk load then truncate the table again but not bouncing the database so that the buffers and caches a still filled

SQL> exec bulk_collect_query
Before Bulk Collect: 27-SEP-07 11.02.31.257498000 AM +02:00
After Bulk Collect: 27-SEP-07 11.02.41.614205000 AM +02:00
After FORALL: 27-SEP-07 11.02.41.818092000 AM +02:00

PL/SQL procedure successfully completed.

SQL> select count(*) from temp;

COUNT(*)
———-
49423

SQL> truncate table temp;

Table truncated.

SQL> exec cursor_for_query
Before CURSOR: 27-SEP-07 11.04.04.960254000 AM +02:00
After CURSOR: 27-SEP-07 11.04.25.749038000 AM +02:00

Ok so now we need 10 seconds for the run with the bulk but we sill need 21 seconds for the cursor! So not really a improvement with the cache and so on. Ok final test on a big system with over 268 thousand rows

Before Bulk Collect: 27-SEP-07 11.24.17.034732000 AM +02:00
After Bulk Collect: 27-SEP-07 11.24.25.111020000 AM +02:00
After FORALL: 27-SEP-07 11.24.26.129826000 AM +02:00
PL/SQL procedure successfully completed.

COUNT(*)
———-
267985

Table truncated.

Before CURSOR: 27-SEP-07 11.24.29.629354000 AM +02:00
After CURSOR: 27-SEP-07 11.25.02.244549000 AM +02:00
PL/SQL procedure successfully completed.

COUNT(*)
———-
268056
And again, bulking took 8 seconds and the inserts just 1 second! But the run with the cursor took 33 seconds!

So this was just a short test but it definitely shows that BULK COLLECT and FORALL are much faster than cursors within the FOR loop! Only disadvantage of FORALL as you maybe already guess if you looked at the code: You can just perform one DML statement, there is no “FORALL END” clause! But anyway also bulking is a very high-performance functionality of Oracle! So if you have to run throw data collections then use BULK COLLECT!

13 September 2007

JDBCs setMaxRows and the SELECT … FOR UPDATE clause

Filed under: Code,Oracle,Work — Venzi @ 14:07

Today following question came up: Would the JDBC driver of Oracle regonize it, when you set the setMaxRows and perform a SELECT using the FOR UPDATE clause. Example: You’ve a table with 40 rows. Now you call setMaxRows(20) so that you only would get the first 20 rows and perform the select without any where clause (I know, not really a pretty solution: Instead of using rownum, you would select all rows and just stop fetching at the 21th row, but however). Would you now lock all 40 rows or just 20. Well, the answer is (I’ve expected it): You get a row lock on the entry table, not only the first 20 lines. With rownum in the where clause it is now problem. You just lock the selected 20 rows and the other 20 are available for other transactions. So developers: Use the rownum pseudo column instead of the setMaxRows function specially if you perform a select… for update!

But the nicest thing on this little test: I wrote my first Java code since 4 years! ūüôā

Create a free website or blog at WordPress.com.