Venzi's Tech-Blog

25 October 2012

Loading data fast – Batch inserts and errors (DML error logging)

Filed under: Code,Java,Oracle,Performance,Programming — Venzi @ 11:02

In the last post of my series “Loading data fast” I showed how batch inserts can actually make a huge difference in insert performance. Now one question remains: What happens when an error occurs like a unique key violation? The answer is: The insert statement will fail with an error and stop. Now, if you didn’t catch the exception, it will be raised and the commit will never be issued which will lead to a loss of all previous inserted data as well. However, if you catch the exception and make sure that you issue a commit afterwards, you will at least have your previous successful inserted data in the table if that is suitable for the business logic  – basically meaning that you know where you’ve stopped. Let’s have a look:

In Java you will have to run the PreparedStatement.executeBatch() routine in a try block and execute the Connection.commit() routine in the finally block, like this:


PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}

try
{
  stmt.executeBatch();
}
catch (BatchUpdateException e)
{
  System.out.println("Error during my batch insert example");
  e.printStackTrace();
}
finally
{
  conn.commit();
}

In PL/SQL you will have to run your FORALL statement in a new block, like this:


DECLARE
  TYPE tab is TABLE OF testloadtable%ROWTYPE;
  myvals tab;
BEGIN
  SELECT rownum, 'x'
    BULK COLLECT INTO myvals
      FROM dual
        CONNECT BY LEVEL <= 10000;
  BEGIN
    FORALL i IN myvals.FIRST..myvals.LAST
      INSERT INTO testloadtable (id, text)
        VALUES (myvals(i).id, myvals(i).text);
  EXCEPTION
    WHEN DUP_VAL_ON_INDEX THEN
      DBMS_OUTPUT.PUT_LINE('Unique key violation!');
  END;
  COMMIT;
END;

By doing this, you will at least keep all previous inserted data. However, you will still have to fix the error and run the rest of the batch again and latter can especially in batch load scenarios be rather difficult. If you catch the exception, you will have to find the right position in your batch again in order to continue. In some cases that could imply to rebuild the entire batch again. If you don’t catch the exception, because your batch is one logical unit and either it succeeds or will be rolled back, then you will have to start from scratch again. So the conclusion is: Neither of the two options is optimal!
Fortunately, Oracle provides a feature called “DML error logging“! This feature allows you to simply log all errors into an error table and continue the execution of your DML statement with no error being raised! This feature works with all DML statements: Insert, Update, Merge and Delete. I will, however, focus only on insert statements in this post.

First let’s prove what I just said. All I do is to create a primary key on the table and insert a row before I execute my batch.


CREATE TABLE TESTLOADTABLE (id NUMBER, text VARCHAR2(255));
table TESTLOADTABLE created.

CREATE UNIQUE INDEX TESTLOADTABLE_PK ON TESTLOADTABLE(id);
unique index TESTLOADTABLE_PK created.

ALTER TABLE TESTLOADTABLE ADD PRIMARY KEY (id) USING INDEX TESTLOADTABLE_PK;
table TESTLOADTABLE altered.

Now that I have my primary key created, I insert 1 row with the id 3000 before I execute my batch. The batch contains 10k rows with an id range of 0 – 9999. Once it reaches the row with the id 3000 (3001st row, as I start with id 0), I’ll receive an exception that the row already exists. Note, that all other rows would be fine and valid except this single one with the id 3000. Nevertheless, as I do not make sure that a commit is happening, I will lose all of them:

conn.prepareStatement("INSERT INTO testloadtable (id, text) VALUES (3000,'x')").execute();
conn.commit();

stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
stmt.executeBatch();
conn.commit();

As expected, I get following exception back:

Exception in thread "main" java.sql.BatchUpdateException: ORA-00001: unique constraint (TEST.SYS_C0010986) violated

And when I do a count on the table, I have only my previously inserted row in there:


SELECT COUNT(*) FROM testloadtable
COUNT(*)
--------
1

Now let’s see what I get when I do catch the exception and make sure to issue a commit:

conn.prepareStatement("INSERT INTO testloadtable (id, text) VALUES (3000,'x')").execute();
conn.commit();

PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch insert */ INTO testloadtable (id, text) VALUES (?,?)");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}

try
{
  stmt.executeBatch();
}
catch (BatchUpdateException e)
{
  System.out.println("Error during my batch insert example");
  e.printStackTrace();
}
finally
{
  conn.commit();
}

The output shows that I caught the exception successfully:


Error during my batch insert example
java.sql.BatchUpdateException: ORA-00001: unique constraint (TEST.SYS_C0010986) violated

And a count on the table now shows that I have 3001 rows in it; my previously inserted row with id 3000 and the first 3000 rows from the batch with id 0 – 2999:

SELECT COUNT(*) FROM testloadtable
COUNT(*)
--------
3001

Now as said before, the great thing about DML error logging is, that it offers you the ability to load your entire batch without receiving an error at all. Instead the error(s) will be logged into a separate error table which can then be queried and appropriate actions taken afterwards. All you need to do is to modify your DML statement to include the LOG ERRORS clause. You will also have to create an error logging table via the DBMS_ERRLOG package first. Optionally, you can:

  • Include a tag that gets added to the error log to help identify the statement that caused errors
  • Include the REJECT LIMIT subclause, which allows you to define a upper limit of errors that can be encountered before the statement fails. If you have a batch of 10k rows where 7k rows are failing, it probably doesn’t make much sense anymore to continue as there seems to be a bigger problem. This clause offers you the functionality to still raise the error if a certain threshold is reached. I guess in order to make sure that nobody accidentally adds the error logging clause and so suppresses all errors, the default value is 0. Which means that if you omit the REJECT LIMIT clause, an error will be logged into the error logging table but the statement will also be terminated. If you want to log all errors but raise no error you will have to define UNLIMITED

The error table itself is a copy of the target table with a few more columns, telling you things like error number, error message and so forth. But the important part is that it will contain all the columns of your target table and all those columns will contain the values from the failing row of the insert. This means that you will not lose any information of your batch and do not need to re-execute the entire batch again!
Let’s see how it works! First I create an error logging table called ERR_TESTLOADTABLE:


BEGIN
  DBMS_ERRLOG.CREATE_ERROR_LOG('TESTLOADTABLE','ERR_TESTLOADTABLE');
END;
anonymous block completed
DESC err_testloadtable
Name            Null Type
--------------- ---- --------------
ORA_ERR_NUMBER$      NUMBER
ORA_ERR_MESG$        VARCHAR2(2000)
ORA_ERR_ROWID$       UROWID()
ORA_ERR_OPTYP$       VARCHAR2(2)
ORA_ERR_TAG$         VARCHAR2(2000)
ID                   VARCHAR2(4000)
TEXT                 VARCHAR2(4000)

All I then have to do, is to modify my original code so that the insert statement includes a “LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED” clause at the end of it:

conn.prepareStatement("INSERT INTO testloadtable (id, text) VALUES (3000,'x')").execute();
conn.commit();

PreparedStatement stmt = conn.prepareStatement("INSERT /* addBatch with logging errors */ INTO testloadtable (id, text) VALUES (?,?) LOG ERRORS INTO ERR_TESTLOADTABLE REJECT LIMIT UNLIMITED");

for (int i=0;i<ROWS;i++)
{
  stmt.setInt(1, i);
  stmt.setString(2, "test" + i);
  stmt.addBatch();
}
stmt.executeBatch();
conn.commit();

Although I inserted the row with the id 3000 first, no error was reported anymore during the execution. However, if I have a look at my ERR_TESTLOADTABLE, I see following:


SELECT * FROM ERR_TESTLOADTABLE
ORA_ERR_NUMBER$ ORA_ERR_MESG$                                             ORA_ERR_ROWID$  ORA_ERR_OPTYP$ ORA_ERR_TAG$ ID    TEXT
--------------- --------------------------------------------------------- --------------- -------------- ------------ ----- --------
1               ORA-00001: unique constraint (TEST.SYS_C0010987) violated                 I                           3000  test3000

The error logging table shows me:

  1. The error number “1″ which means “ORA-00001″
  2. The error message
  3. RowId: In this case null as it was an insert and the row didn’t exist before
  4. The operation type “I” indicating Insert
  5. An empty error tag, as I have not defined one
  6. ID column value from the failing row
  7. Text column value from the failing row

Conclusion: DML error logging allows you to use batch DML statements that won’t fail on errors. This gives you the benefit of maximum flexibility when you perform batch operation!
In my next post I will take a look at the performance impact of DML error logging.

About these ads

2 Comments »

  1. [...] the last post of my “Loading data fast” series I showed how DML error logging can be used to prevent [...]

    Pingback by Loading data fast – DML error logging performance « Venzi's Tech-Blog — 3 November 2012 @ 14:57 | Reply

  2. A very good article which safe my life. Cheers!

    Comment by CraftKung Fitz — 26 January 2014 @ 09:32 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: