Monday, December 29, 2008

BODI FAQS

1)Is it necessary to put Input Primarykey column in Table Comparison Transform

It is mandatory to select primary key column

2)Parallel Processing

You can run transforms in parallel by entering a number in the Degree of Parallelism box on a data flow’s Properties window. The number is used to replicate transforms in the data flow which run as separate threads when the Job Server processes the data flow.

** Entering a number in the Degree of Parallelism box on a data flow’s Properties window.
** The number is used to replicate transforms in the data flow

3) Soure table Partitioning in Data Integrator , How will u load the data?


4) Performance tuning of a job

The following sections describe ways you can adjust Data Integrator
performance:

• Source-based performance options

• Using array fetch size
• Caching data
• Join ordering
• Minimizing extracted data

• Target-based performance options

• Loading method and rows per commit
• Staging tables to speed up auto-correct loads

• Job design performance options

• Improving throughput
• Maximizing the number of pushed-down operations
• Minimizing data type conversion
• Minimizing locale conversion
• Improving Informix repository performance


5) Parameters in Dataflow

LOCAL VARIABLES, GLOBAL VARIABLES, PARAMETERS.


Dataflow Level---Only Parameters.

Workflow Level---Parameters,Local Variables.

Job Level --- Local Variables, Global Variables.

You create local variables, parameters, and global variables using the Variables and Parameters window in the Designer.

You can set values for local or global variables in script objects. You can also
set global variable values using external job, execution, or schedule properties.


6) SCD Type 2 flow

1 way……….

In the DataFlow where the Intermediate Table and the Ultimate Target table are joined, we will be creating a port named DML_FLAG Which consists of {0/1/2} based on the following.

a) The SCD Columns that are retrieved from the Intermediate table & Ultimate Target table are compared over here.
b) If they are not matching we will assign with 2.
If they are matching we will assign with 1.
If the Surrogate Key of the Ultimate Target table is NULL then we will assign with 0.

So For DML_FLAG having value “2” we will have two dataflows one for insert and one for updating the target table.

Other way……….

SRC----TABLECOMPARISON----HISTORY PRESERVING----KEYGENERATION---TARGET.

Table Comparison-----Primary Key Columns,,,,,Compare columns(SCD Columns)
History Preserving----- Compare columns(SCD Columns)


7) Audit Strategy Implementation in the project



8) Different logs available in the Data Integrator

3 Data Integrator logs :

• Examining trace logs
• Examining statistics logs
• Examining error logs

**Use the trace logs to determine where an execution failed, whether the execution steps occur in the order you expect, and which parts of the execution are the most time consuming.

**The statistics log (also known as the monitor log) quantifies the activities of the components of the job. It lists the time spent in a given component of a job and the number of data rows that streamed through the component.

**Data Integrator produces an error log for every job execution. Use the error logs to determine how an execution failed. If the execution completed without error, the error log is blank.


9) Different Types of Embedded dataflows? Are they reusable.


An embedded data flow is a data flow that is called from inside another data flow.

There are four types of embedded data flows:

One input ---------Add an embedded data flow at the end of a data flow

One input and one output---------- Add an embedded data flow in the middle of a data flow.

One output --------Add an embedded data flow at the beginning of a data flow

No input or output---------- Replicate an existing data flow.

Yes. They are reusable.


10) Passing the parameters of one dataflow to other dataflows.

11) Bulk Loading

12)In how many ways Can we execute a job in BODI ?

3 ways :

a)Manually
b)Thru Scheduling in WEB ADMIN.
c) Export Execution Command which is used to export the job into a batch file which is again executable.


12)What are the different types of Caches in LOOKUP?

NO_CACHE — Does not cache any values.

• PRE_LOAD_CACHE — Preloads the result column and compare column
into memory (it loads the values before executing the lookup).
Use this option if the table can fit in memory.

• DEMAND_LOAD_CACHE — Loads the result column and compare
column into memory as the function executes.
Use this option when looking up highly repetitive values that are a small
subset of the data and when missing values are unlikely.


13)What is the significance of USE INPUT KEYS in Target table Options ?

By setting this Data Integrator will use the primary keys of the source table in order to update the data in the target table.

14)Tell about QUERY TRANSFORM

Retrieves a data set that satisfies conditions that you specify. A query transform is similar to a SQL SELECT statement.
Select-----Distinct Rows
Outer Join -----Outer Source----Inner Source
Where………..You can put any kind of codition like lookup()/any built-in function which returns a single value.
By using Propose Joins DI automatically joins the columns that are having the same name and data type.
Group By ……Drag the columns that r needed to group.
Order By……..Drag the columns you want.

Can Use SQL().
Can use LookUp,LookUpEXT(),LookupSeq() or Built-In functions.
Can use relational operators in the conditions.
Can use Global variables.
Can use Custom Functions.

15)Implementation of INCREMENTAL LOADING

Using the Last_Load_Timestamp of the target table.

If the source table's timestamp is > than the Maximum Last_Load_Timestamp

No comments: