Creating validation rules for more complex requirements

Sometimes validation rules require multiple inputs to provide a pass/fail result, so it is often easier to build and understand the code if it is written using Java.

If you aren't familiar with code routines in Talend, it is recommended that you first complete the recipe Creating custom functions using code routines, Chapter 5 , Using Java in Talend that will take you through the setup of code routines.

Getting ready

Open the job jo_cook_ch03_0060_validationCodeRoutine.

How to do it…

  1. Create a new code routine called validation, and copy the following code into it:
        /**
         * validateCustomerAge: Check customer is 18 or over for UK, 21 or over for rest of world.
         * returns true if valid, false if invalid
         * e.g. validateCustomerAge(23,"UK")
         * 
         * {talendTypes} Boolean
         * 
         * {Category} Validation
         * 
         * {param} string(age) input: Customer age
         * {param} string(country) input: Customer country
         *  
         * {example} validateCustomerAge(23,"UK") # true
         */
        public static Boolean validateCustomerAge(Integer customerAge, String customerCountry) {
            if (customerAge == null || customerCountry == null) {
               return false;
            }else
            if (customerCountry.equals("UK".toUpperCase()) && customerAge >= 18){
                            return true;
                    } else {
              if (!(customerCountry.equals("UK".toUpperCase())) && customerAge >= 21){
                    return true;
                 }else{
                    return false;
                 }
               }
            }
  2. Open the tMap component, and in the filter criteria for the validRows output, click on the expression button (…)
  3. Select the function validateCustomerAge from the validation category and doubleclick to copy the example code to the editor.
  4. Change the expression to match the following:
    validation.validateCustomerAge(customer.age,customer.countryOfBirth)
  5. Also, add the same expression to the output column validationResult for both outputs.
  6. Run the job and you should see that two of the records are rejected and three are valid.

How it works…

The tMap expressions are limited to a single line of code, so complex tests on data cannot generally be performed directly within tMap.

The validateCustomerAge method returns a single Boolean value, so can be easily used within tMap expressions and filters was demonstrated in this recipe..

There's more…

Most data processes require validation of some sort or another, so it is a good idea to create a routine just for managing validations.

By collecting all the validation routines together, it makes them easier to find and removes the need for duplicated code.

Because they are stored centrally, a change to a routine is immediately available to all jobs using that particular routine, thus reducing time spent finding and fixing duplicated code in a project.

Tip

While the rule can be created directly using a tJavarow component, using a code routine enables the validation to be re-used across multiple jobs in a project as well as allowing the routine to be used within tMap. Another downside of the tJavaRow method is that a pass/fail flag would need to be added to each row to enable them to be filtered out in a downstream tMap.

See also

  • Creating custom functions using code routines in Chapter 5, Using Java in Talend.