Venzi's Tech-Blog

3 November 2012

Loading data fast – DML error logging performance

Filed under: Code,Java,Oracle,Performance,Programming — Venzi @ 14:49

In the last post of my “Loading data fast” series I showed how DML error logging can be used to prevent batch loads from failing when one or multiple rows can’t be inserted. Now the question is how much performance impact errors have on the mechanism. What I try to prove is that a developer should not just blindly use DML error logging instead of thinking whether the data he intents to insert is primary valid. So let’s have a look:

First I create the same simple table with a primary key on it that I used before:

CREATE TABLE TESTLOADTABLE (id NUMBER, text VARCHAR2(255));
table TESTLOADTABLE created.

CREATE UNIQUE INDEX TESTLOADTABLE_PK ON TESTLOADTABLE(id);
unique index TESTLOADTABLE_PK created.

ALTER TABLE TESTLOADTABLE ADD PRIMARY KEY (id) USING INDEX TESTLOADTABLE_PK;
table TESTLOADTABLE altered.

Then I create the error logging table:

BEGIN
  DBMS_ERRLOG.CREATE_ERROR_LOG('TESTLOADTABLE','ERR_TESTLOADTABLE');
END;
anonymous block completed

So far so good, now let’s execute following tests:

1) Load 10k rows with no errors
2) Load 10k rows with 100 (1%) rows failing
3) Load 10k rows with 1000 (10%) rows failing
4) Load 10k rows with 2500 (25%) rows failing
5) Load 10k rows with 5000 (50%) rows failing
6) Load 10k rows with 7500 (75%) rows failing
And just out of curiosity let’s see how long a regular insert with no error logging takes
7) Load 10k rows with no errors and no error logging

Test 1 – no errors:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 861
Executing overall took: 1020
Preparation took: 159

So overall it took 1020 milli seconds while the job spent 861 milli seconds in the execution of the insert.

Test 2 – 100 errors (1%):

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%100 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 1017
Executing overall took: 1069
Preparation took: 52

This time it took quite a bit long to execute the batch, 1017 milli seconds. The reason for this is obvious. Oracle now has not only to insert 10k rows but also to reject rows and insert them into another table.

Test 3 – 1000 (10%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%10 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 1420
Executing overall took: 1470
Preparation took: 50

And as you can see the more rows that are failing, the longer the insert takes. So let’s have a look at the rest of the tests and see how bad it gets:

Test 4 – 2500 (25%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%4 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 1877
Executing overall took: 1961
Preparation took: 84

Test 5 – 5000 (50%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i%2 == 0)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 2680
Executing overall took: 2765
Preparation took: 85

Test 6 – 7500 (75%) errors:

PreparedStatement stmt0 = conn.prepareStatement("INSERT /* generate some rows before */ INTO testloadtable (id, text) VALUES(?,?)");
for (int i=0;i<10000;i++)
{
  if (i<=7500)
  {
    stmt0.setInt(1, i);
    stmt0.setString(2, "test" + i);
    stmt0.addBatch();
  }
}
stmt0.executeBatch();
conn.commit();

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 3349
Executing overall took: 3412
Preparation took: 63

So as you can see, the more errors you get, the longer your batch will need to execute. This is only logical of course as the more errors you get, the more exceptions are thrown which then lead to more rows inserted into the error table. The main takeaway is that your insert is not failing and you can smoothly query the error logging table and act appropriately. Just in comparison:

  • Insert no errors: 861ms
  • Insert 1% errors: 1017ms
  • Insert 10% errors: 1420ms
  • Insert 25% errors: 1877ms
  • Insert 50% errors: 2680ms
  • Insert 75% errors: 3349ms

Now let’s do one more test and see how fast a regular insert without DML error logging is. You would think it’s the same but during my experiments if found an interesting fact:

Test 7 – no errors, no error logging:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* regular bulk insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 212
Executing overall took: 372
Preparation took: 160

So a regular insert takes only 212ms while a DML error logging insert takes 861ms. That’s 4 times longer!
The reason for this is because of the unpublished bug 11865420 (My Oracle Support Doc Id: 11865420.8). Once you download the patch and update the system accordingly the insert with DML error logging is just as fast as without:

Regular batch insert:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* regular bulk insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 248
Executing overall took: 399
Preparation took: 151

DML error logging insert:

long startOverall = System.currentTimeMillis();
PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<10000;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
long startExecute = System.currentTimeMillis();
stmt.executeBatch();
long endExecute = System.currentTimeMillis();
conn.commit();
long endOverall = System.currentTimeMillis();

System.out.println("Executing batch took: " + (endExecute-startExecute));
System.out.println("Executing overall took: " + (endOverall-startOverall));
System.out.println("Preparation took: " + ((endOverall-startOverall)-(endExecute-startExecute)));

Executing batch took: 227
Executing overall took: 384
Preparation took: 157

Conclusion: This is the last article of the series Loading data fast! In this series I’ve not only shown how bad it is to commit after each row (remember Autocommit in JDBC!) but also how much more speed you can get out of your program by doing batch inserts. I’ve also shown how to deal with potential errors during batch inserts and how much performance impact errors produce. The latter I have done to make it clear that developers should not just throw DML error logging inserts at every problem as they will lose the performance benefit again that batch inserts provide. So now, go over your code and batch up! ;)

About these ads

2 Comments »

  1. This approach could hurt on large datasets, you want to use oracle specific tools(sqlloader, data pump) to load large data at faster rate. Jdbc approach can never be faster than doing odbc way.

    .Raju

    Comment by Raju — 4 November 2012 @ 16:12 | Reply

    • Obviously on very large data sets you will have to start thinking about your batching process. You would probably not want to load everything in one go and rerun potential hours of work nor would you like your users being waiting for the data to show up.

      This has nothing to do with ODBC, ODBC and JDBC are both drivers, you probably meant OCI but even that is not true as you can run the JDBC driver over the OCI interface if you want.

      Comment by Venzi — 5 November 2012 @ 10:10 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: