Simplifying your data validation code with Database 12.2

Data is good

Image courtesy of pixabay.com
 

Doesn’t matter who much testing you do (well, it actually does but that’s a whole different issue) you can almost guarantee that at some point your beautiful data validation code, that parses data input from a web form or loads data from some external file, will pop up with the error:

SQL Error: ORA-01722: invalid number

01722. 00000 - "invalid number"
*Cause: The specified number was invalid.
*Action: Specify a valid number.

Of course, what’s is really annoying at this point is that you don’t know which column value of the record failed (assuming that you have more than one numeric column)

Managing conversion errors during data loads

What’s to do? Of course the sensible thing is to add lots of data validation checks into your code to try and catch the situations where the wrong type of data  arrives from your data source. It’s likely that all the additional validation checks will slow down the process of inserting data, which is not a great result.

If your data is arriving via an external file then you can use the BADFILE clause to capture records that cannot be loaded because of data type errors. But what if the data source for your insert statement is a staging table that was populated by an ETL job or a series of values from a web form? 

How to manage conversion errors during INSERTs

Panic over - Database 12c Release 2 contains important changes to the CAST and TO_xxx functions to manage the most common data conversion errors. The CAST function now has the ability to return a user-specified value if there is a conversion error.

For example, let’s build a simple staging table in the schema:

CREATE TABLE STAGING_EMP
( "EMPNO" VARCHAR2(6),
  "ENAME" VARCHAR2(10),
  "JOB" VARCHAR2(9),
  "MGR" VARCHAR2(4),
  "HIREDATE" VARCHAR2(10),
  "SAL" VARCHAR2(7),
  "COMM" VARCHAR2(9),
  "DEPTNO" VARCHAR2(6));

and let’s insert some data, which includes values that will cause data conversion errors when we try to add the values into our target table:

-- INSERTING DATA INTO STAGING_EMP
Insert into STAGING_EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
            values ('GB9369','SMITH','CLERK','7902','17-DEC-80','800',null,'20');

-- INVALID DATE
Insert into STAGING_EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
           values ('9499','ALLEN','SALESMAN','7698','31-FEB-81','1600','300','30');

-- INVALID NUMBER FOR DEPTNO
Insert into STAGING_EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
           values ('9521','WARD','SALESMAN','7698','22-FEB-81','1250','500','SALES');

-- INVALID NUMBER FOR EMPNO KEY
Insert into STAGING_EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
           values ('US9566','JONES','MANAGER','7839','02-APR-81','2975',null,'20');

Insert into STAGING_EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
           values ('9782','CLARK','MANAGER','7839','09-JUN-81','2450',null,'10');

-- INVALID NUMBER FOR EMPNO KEY
Insert into STAGING_EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
           values ('FR9788','SCOTT','ANALYST','7566','19-APR-87','3000',null,'20');

-- INVALID NUMBER FOR MGR KEY
Insert into STAGING_EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
           values ('9839','KING','PRESIDENT','null','17-NOV-81','5000',null,'10');

-- INVALID NUMBER FOR EMPNO KEY
Insert into STAGING_EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
           values ('DE9844','TURNER','SALESMAN','7698','08-SEP-81','1500',0,'30');

Insert into STAGING_EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO)
           values ('9876','ADAMS','CLERK','7788','23-MAY-87','1100',null,'20');

 

Now let’s try inserting the data from our staging table into the EMP table and see what happens:

INSERT INTO scott.emp SELECT * FROM staging_emp;

… and not surprisingly I get the following error:

Error starting at line : 52 in command -
INSERT INTO emp SELECT * FROM staging_emp
Error report -
SQL Error: ORA-01722: invalid number
01722. 00000 - "invalid number"
*Cause: The specified number was invalid.
*Action: Specify a valid number.

I can deal with this situation in a couple of different ways. Firstly let’s try and discover which rows and columns in my staging table contain values that are likely to cause data conversion errors. To do this I am going to use the new VALIDATE_CONVERSION() function which identifies problem data that cannot be converted to the required data type. It returns 1 if a given expression can be converted to the specified data type, else it returns 0.

SELECT
  VALIDATE_CONVERSION(empno AS NUMBER) AS is_empno,
  VALIDATE_CONVERSION(mgr AS NUMBER) AS is_mgr,
  VALIDATE_CONVERSION(hiredate AS DATE) AS is_hiredate,
  VALIDATE_CONVERSION(sal AS NUMBER) AS is_sal,
  VALIDATE_CONVERSION(comm AS NUMBER) AS is_comm,
  VALIDATE_CONVERSION(deptno AS NUMBER) AS is_deptno 
FROM staging_emp;

this produces a table where I can easily pick out the rows where the data conversion is going to succeed (column value is 1) and fail (column value is 0):

Results from validation_conversion function

I could use this information to filter the data in my staging table as I insert it into my EMP table or I could use the enhanced CAST and TO_xxx functions within the INSERT INTO ….. SELECT statements.

The CAST function (along with TO_NUMBER, TO_BINARY_FLOAT, TO_BINARY_DOUBLE, TO_DATE, TO_TIMESTAMP, TO_TIMESTAMP_TZ, TO_DSINTERVAL, and TO_YMINTERVAL functions) can now return a user-specified value, instead of an error, when data type conversion errors occur. This reduces failures during an data transformation and data loading processes.

Therefore, my new 12.2 self-validating SELECT statement looks like this:

INSERT INTO emp
SELECT
  empno,
  ename,
  job,
  CAST(mgr AS NUMBER DEFAULT 9999 ON CONVERSION ERROR),
  CAST(hiredate AS DATE DEFAULT sysdate ON CONVERSION ERROR),
  CAST(sal AS NUMBER DEFAULT 0 ON CONVERSION ERROR),
  CAST(comm AS NUMBER DEFAULT null ON CONVERSION ERROR),
  CAST(deptno AS NUMBER DEFAULT 99 ON CONVERSION ERROR)
FROM staging_emp
WHERE VALIDATE_CONVERSION(empno AS NUMBER) = 1;

which results in five rows being inserted into my EMP table - obviously this means that 4 rows were rejected during the insert process (rows 1, 4, 6 and 8) because they contain errors converting the contents to a number for the empno key. Here is the data that was loaded:

Data loaded into emp table from staging table

we can see that on row 1 the HIERDATE was invalid so it was replaced by the value from sys date (07-JUL-16). Row 2 the value of DEPTNO is the conversion default of 99 and on row 4 the value for MGR is the conversion default of 9999.

Conclusion

The enhanced  CAST function (along with TO_NUMBER, TO_BINARY_FLOAT, TO_BINARY_DOUBLE, TO_DATE, TO_TIMESTAMP, TO_TIMESTAMP_TZ, TO_DSINTERVAL, and TO_YMINTERVAL functions) can help you deal with data conversion errors without having to resort to complicated PL/SQL code or writing data validation routines within your application code.

The new VALIDATE_CONVERSION() function can be used to help you identify column values that cannot be converted to the required data type.

Hope these two features are useful. Enjoy!

Don’t forget that LiveSQL is now running Database 12c Release so check out all the new tutorials and code samples that have recently been posted. I have posted a tutorial for the features discussed above: https://livesql.oracle.com/apex/livesql/file/tutorial_EDVE861IMHO1W3Q654ES9EQQW.html

 

 

 

 

 

Comments

Popular posts from this blog

The Complete Guide To Data Warehousing and Big Data at Oracle OpenWorld 2016

Your Essential Online Session and Hands-on Lab Calendars for #oow16

Thursday's Top Picks at OpenWorld for Data Warehousing and Big Data