Recoverable workflow in BODS
See more in our ebook
This post describes on how to:
• Design and implement recoverable work flows.
• Use Data Services conditionals.
• Specify and use the Auto correct load option.
• Replicate and rename objects in the object library.
Recovery Mechanisms
Creating a recoverable work flow manually
A recoverable work flow is one that can run repeatedly after failure without loading duplicate data.
Examples of failure include source or target server crashes or target database errors that could cause a job or work flow to terminate prematurely.
In the following exercise, you will learn how to:
• Design and implement recoverable work flows
• Use Data Services conditionals
• Specify and use the auto-correction table loader option
• Replicate and rename objects in the object library.
Adding the job and defining local variables
1. In the Class_Exercises project, add a new job named JOB_Recovery.
2. Open the job and declare these local variables:
Variable Type
$recovery_needed int
$end_time varchar(20)
The $recovery_needed variable determines whether or not to run a data flow in recovery mode. The $end_time variable determines the value of $recovery_needed. These local variables initialize in the script named GetWFStatus (which you will add in the next procedure).
Specifying a recoverable job
This recoverable job will consist of three objects:
• A script to determine if recovery is required
• A conditional to invoke the appropriate data flow
• A script to update a status table, signifying successful execution
Creating the script that determines the status
First you need to create a script called GetWFStatus.
The script reads the ending time in the status table that corresponds to the most recent start time. The idea is that if there is no ending time for the most recent starting time, the prior data flow invocation must not have completed properly.
To create GetWFStatus
1. In JOB_Recovery, add a script to the left side of the workspace and name it GetWFStatus.
2. Enter the script contents, which depend on the RDBMS on which status_table resides, as shown in the following table. Be sure that the SQL statements are on a single line (as opposed to multiple lines, as shown).
Note:
The quotes in the expression $end_time = '' are two single quotes (to denote an empty string).
RDBMS | Script |
Oracle | $end_time = sql('Target_DS', 'select to_char(end_time, 'YYYY-MM-DD HH24:MI:SS') from status_table where start_time = (select max(start_time) from status_table)'); if (($end_time IS NULL) or ($end_time = '')) $recovery_needed = 1; else $recovery_needed = 0; |
Microsoft SQL Server, Sybase ASE | $end_time = sql('Target_DS', 'select convert(char(20), end_time, 0) from status_table where start_time = (select max(start_time) from status_table)'); if (($end_time IS NULL) or ($end_time = '')) $recovery_needed = 1; else $recovery_needed = 0; |
3. Validate the script.
Defining the recoverable data flow with a conditional
You can use a conditional to specify a recoverable data flow. Conditionals have three components:
• A conditional expression to determine an execution path
• A work flow to execute for the true branch
• A work flow to execute for the false branch
A conditional is a single-use object in work flows that allows you to branch execution logic based on the results of an expression.
A conditional takes the form of an if/then/else expression.
To add the conditional
1. Open JOB_Recovery.
2. Click the conditional icon on the tool palette.
3. Click in the workspace to the right of the script GetWFStatus and name the conditional recovery_needed.
4. Open the conditional by clicking its name.
The conditional window contains three areas:
• The if expression
• A box for specifying the work flow to execute if the if condition evaluates true
• A box for specifying the work flow to execute if the if condition evaluates false
5. Specify the if expression. Enter an expression that evaluates to true or false. In this instance, enter:
($recovery_needed = 1)
To define the normal mode work flow
You will reuse the data flow created previously.
1. In the object library, click the Data Flows tab.
2. In the Data Flows list, drag DF_SalesOrg into the lower (Else) box of the conditional. This data flow is for the "false" branch of the conditional.
To specify the recovery mode work flow
The recovery mode work flow will call a modified version of DF_SalesOrg. The only difference is that the Update control for the salesorg_dim target table will have the Auto correct load option enabled.
The Auto correct load option causes the table loader to check for existing rows with the same key as each INSERT/UPDATE. When there is a match, the preexisting row will be deleted before inserting a new row (UPDATEs are mapped to INSERTs).
To minimize work in defining this new data flow, the following procedure replicates, renames, and edits DF_SalesOrg for the recovery branch.
1. In the object library, click the Data Flows tab.
2. In the Data Flows list, right-click DF_SalesOrg and click Replicate. The replicated data flow will be named Copy_1_DF_SalesOrg.
3. Click the highlighted name and rename the data flow to ACDF_SalesOrg.
4. Drag ACDF_SalesOrg to the upper (Then) box of the conditional. This data flow is for the "true" branch of the conditional.
To specify auto correction for the target table
1. Open the data flow ACDF_SalesOrg.
2. Open the target table SALESORG_DIM.
3. Click the Options tab. In the "Update control" area, select the Auto correct load to Yes.
• A box for specifying the work flow to execute if the if condition evaluates true
• A box for specifying the work flow to execute if the if condition evaluates false
5. Specify the if expression. Enter an expression that evaluates to true or false. In this instance, enter:
($recovery_needed = 1)
Adding the script that updates the status
This script executes after the work flows in the conditional have completed. Its purpose is to update the status table with the current timestamp to indicate successful execution.
To add the UpdateWFStatus script
1. Open JOB_Recovery.
2. Add a script to the right of the recovery_needed conditional and name it UpdateWFStatus.
3. Open UpdateWFStatus.
4. Enter the script contents, which depend on the RDBMS on which status_table resides, as shown in the following table. Be sure the entire statement is contained on a single line (not on multiple lines, as shown).
RDBMS | Script |
Oracle | sql('Target_DS', 'update status_table |
set end_time = SYSDATE where | |
start_time = (select | |
max(start_time) from status_table)'); | |
Microsoft SQL Server, | sql('Target_DS', 'update status_table |
Sybase ASE | set end_time = getdate() where |
start_time = (select | |
max(start_time) from status_table)'); | |
DB2 | sql('Target_DS','update status_table |
set end_time = current timestamp | |
where start_time = (select | |
max(start_time) from status_table)'); | |
5. Validate the script.
Specifying job execution order
Connect the objects in the JOB_Recovery job so that they execute in the following order:
1. GetWFStatus
2. Recovery_needed
3. UpdateWFStatus
Status of the exercise
At this point, the recoverable job is completely defined and ready to execute. Here is how it will work:
1. A script name GetWFStatus determines if recovery is required.
2. If recovery is required, a data flow named ACDF_SalesOrg executes.
3. Otherwise, the data flow named DF_SalesOrg executes.
4. After the data flow executes, a script named UpdateWFStatus executes to update a status table indicating successful completion.
To execute the job
1. Use a query tool and delete the existing data from the target table SALESORG_DIM.
2. Check the contents of status_table and notice that the end_time field is NULL.
3. Execute JOB_Recovery.
Because the end_time field is NULL in status_table, the initial execution of this job invokes the recovery-mode data flow.
4. Check the Monitor log to see which data flow was invoked.
5. Use a query tool to check the contents of the SALESORG_DIM table. You should see three rows.
6. Check the contents of status_table and notice that the end_time field now contains a value.
7. Execute the job again.
This time, the non-recovery-mode data flow runs. Verify by examining the monitor log messages.
Data Services automated recovery properties
Data Services provides job and work flow properties that you can use for automated recovery:
• In the job execution properties, you can select Enable recovery. Then, if the job fails, you will be able to select the job property Recover from last failed execution to skip any steps that successfully completed.
• In work flow properties, you can select Recover as a unit to guarantee that the entire work flow runs again if any part of it requires recovery (per the previous bullet).
See more in our ebook
Comments
Post a Comment