Propagating schema changes

Often during development, it is necessary to change schemas by adding, removing, or re-ordering columns. This often is a very onerous task, especially if a schema is used in multiple jobs.

As discussed earlier in this chapter, storing schemas in the metadata enables the schema to be re-used. If a shared schema is changed, then Talend will prompt to find out if the changes should be applied to all jobs.

If the change is performed, then the next time that the job is opened, the component using the schema will normally be highlighted as in error, because the schema no longer matches.

Talend provides mechanisms within the schema dialogues that takes some of the pain away from ensuring that changes are assimilated into all the jobs.

Getting ready

Open the Talend Job jo_cook_ch02_0010_propagateSchema so that the right-hand palette becomes available. Then, from the metadata palette, open the Generic schema sc_cook_0010_genericCustomer.

How to do it…

  1. Add a new field emailAddress, as shown in the following screenshot:
    How to do it…
  2. Click Finish to save the change to the schema. Then, click Yes to apply the changes to all jobs when prompted.
    How to do it…
  3. Click Ok to accept the changes in the next dialogue box. You will now see that the job has an error on the output.
    How to do it…
  4. Open the tFileOutputDelimited, and click the Edit Schema button to open the schema and select the View Schema option.
  5. As you can see in the following screenshot, the table on the left-hand side is different from that on the right-hand side. Click the How to do it… to copy the right hand schema into the left-hand panel.
    How to do it…
  6. Click Ok to save the changes.

How it works…

When Talend updates the job schema for an output component, it does not propagate the change to the upstream component. Using the << option allows the developer to copy all the changes from the output schema back into the previous component, ready for a rule to be applied.

There’s more…

Using this method also ensures that the link to the Generic schema is maintained. It is possible to make the change in the previous tMap output; however, this would cause the output schema to become Built-in, which is an undesirable result.

In the preceding example, only one component is changed and the error is removed; however, in many jobs, this will not ensure that the changes are complete. It is a rarity to add fields only to then do nothing with them. Thus it is often necessary to propagate the changed row forward through all components in a job to ensure it is copied to the output correctly or ensure that a field that has been reverse propagated is correctly populated from upstream data.

Tip

When adding new fields to an output, it is best to change the schema of the output and reverse propagate the new field, especially when using Repository schemas. The reason for this is that if the schema is changed using tMap, then Talend will automatically change the type of schema from repository to Built-In, thus breaking the link to the Repository schema. In most cases, this is not a desirable outcome.

Be careful during reverse propagation that field names have not changed, especially with the tMap outputs. If you change the name of a field and reverse propagate to tMap, then the rule will disappear and will need to be re-entered.

In these cases, it is worth changing the field names in the tMap output schema prior to reverse propagating a schema. Make sure that you choose not to propagate this change from tMap to avoid the output being changed to Built-in. This will cause the output file to be in error, but when the Repository schema change is applied, the schemas will match, and the error will disappear.