Monday, December 29, 2008

BODI FAQS

1)Is it necessary to put Input Primarykey column in Table Comparison Transform

It is mandatory to select primary key column

2)Parallel Processing

You can run transforms in parallel by entering a number in the Degree of Parallelism box on a data flow’s Properties window. The number is used to replicate transforms in the data flow which run as separate threads when the Job Server processes the data flow.

** Entering a number in the Degree of Parallelism box on a data flow’s Properties window.
** The number is used to replicate transforms in the data flow

3) Soure table Partitioning in Data Integrator , How will u load the data?


4) Performance tuning of a job

The following sections describe ways you can adjust Data Integrator
performance:

• Source-based performance options

• Using array fetch size
• Caching data
• Join ordering
• Minimizing extracted data

• Target-based performance options

• Loading method and rows per commit
• Staging tables to speed up auto-correct loads

• Job design performance options

• Improving throughput
• Maximizing the number of pushed-down operations
• Minimizing data type conversion
• Minimizing locale conversion
• Improving Informix repository performance


5) Parameters in Dataflow

LOCAL VARIABLES, GLOBAL VARIABLES, PARAMETERS.


Dataflow Level---Only Parameters.

Workflow Level---Parameters,Local Variables.

Job Level --- Local Variables, Global Variables.

You create local variables, parameters, and global variables using the Variables and Parameters window in the Designer.

You can set values for local or global variables in script objects. You can also
set global variable values using external job, execution, or schedule properties.


6) SCD Type 2 flow

1 way……….

In the DataFlow where the Intermediate Table and the Ultimate Target table are joined, we will be creating a port named DML_FLAG Which consists of {0/1/2} based on the following.

a) The SCD Columns that are retrieved from the Intermediate table & Ultimate Target table are compared over here.
b) If they are not matching we will assign with 2.
If they are matching we will assign with 1.
If the Surrogate Key of the Ultimate Target table is NULL then we will assign with 0.

So For DML_FLAG having value “2” we will have two dataflows one for insert and one for updating the target table.

Other way……….

SRC----TABLECOMPARISON----HISTORY PRESERVING----KEYGENERATION---TARGET.

Table Comparison-----Primary Key Columns,,,,,Compare columns(SCD Columns)
History Preserving----- Compare columns(SCD Columns)


7) Audit Strategy Implementation in the project



8) Different logs available in the Data Integrator

3 Data Integrator logs :

• Examining trace logs
• Examining statistics logs
• Examining error logs

**Use the trace logs to determine where an execution failed, whether the execution steps occur in the order you expect, and which parts of the execution are the most time consuming.

**The statistics log (also known as the monitor log) quantifies the activities of the components of the job. It lists the time spent in a given component of a job and the number of data rows that streamed through the component.

**Data Integrator produces an error log for every job execution. Use the error logs to determine how an execution failed. If the execution completed without error, the error log is blank.


9) Different Types of Embedded dataflows? Are they reusable.


An embedded data flow is a data flow that is called from inside another data flow.

There are four types of embedded data flows:

One input ---------Add an embedded data flow at the end of a data flow

One input and one output---------- Add an embedded data flow in the middle of a data flow.

One output --------Add an embedded data flow at the beginning of a data flow

No input or output---------- Replicate an existing data flow.

Yes. They are reusable.


10) Passing the parameters of one dataflow to other dataflows.

11) Bulk Loading

12)In how many ways Can we execute a job in BODI ?

3 ways :

a)Manually
b)Thru Scheduling in WEB ADMIN.
c) Export Execution Command which is used to export the job into a batch file which is again executable.


12)What are the different types of Caches in LOOKUP?

NO_CACHE — Does not cache any values.

• PRE_LOAD_CACHE — Preloads the result column and compare column
into memory (it loads the values before executing the lookup).
Use this option if the table can fit in memory.

• DEMAND_LOAD_CACHE — Loads the result column and compare
column into memory as the function executes.
Use this option when looking up highly repetitive values that are a small
subset of the data and when missing values are unlikely.


13)What is the significance of USE INPUT KEYS in Target table Options ?

By setting this Data Integrator will use the primary keys of the source table in order to update the data in the target table.

14)Tell about QUERY TRANSFORM

Retrieves a data set that satisfies conditions that you specify. A query transform is similar to a SQL SELECT statement.
Select-----Distinct Rows
Outer Join -----Outer Source----Inner Source
Where………..You can put any kind of codition like lookup()/any built-in function which returns a single value.
By using Propose Joins DI automatically joins the columns that are having the same name and data type.
Group By ……Drag the columns that r needed to group.
Order By……..Drag the columns you want.

Can Use SQL().
Can use LookUp,LookUpEXT(),LookupSeq() or Built-In functions.
Can use relational operators in the conditions.
Can use Global variables.
Can use Custom Functions.

15)Implementation of INCREMENTAL LOADING

Using the Last_Load_Timestamp of the target table.

If the source table's timestamp is > than the Maximum Last_Load_Timestamp

Using a Table Without Importing It

In SQL Transform :

1)Create Output Columns that are required by seeing the table structure in the backend.

2)Do not import the table in the respective DataStore(The Schema i.e Connected with in which the table is available).

3)Select the appropriate DataStore in which the actual table is residing in the Backend but not in the datastore.

4)Write the appropriate SQL Query as per the columns created.

5)If you want, you can use the global variable which represent the respective schema name.

6) Do not click on the Update Schema.

7)Implement the Transformation Logic i.e required in the job.

8)Execute the job

DOWNLOAD ATL : http://www.divshare.com/download/6200214-d42

Sunday, December 28, 2008

Column having muliple values separated with ATTHERATE symbol

(b)DATA RECEIVED VIA FLATFILE (STRUCTURE)
Header :
REGNO,MAKE,YEAR ,DATES LEASED FROM,DATES LEASED TO,CUSTID

Ist Row :
SV53AYC,FORD,2003,28/10/06@22/10/06@12/10/06,30/10/06@27/10/06@20/10/06,HK113@TU141@BK121

IInd Row Data :

SV54KGH,HONDA,2004, 24/08/06@19/8/06, 28/08/2006@22/08/2006, BB141@TM186

IIIrd Row Data :

SV07XKY, TOYOTA, 2007, , ,

Iv Row Data :

SV06VWY,BMW, 2006, 11/03/07, 23/03/07, MK129

Plz Remove Header :, Ist Row Data :, II Row Data :, III Row Data :, Iv Row Data

Solution :





Downloadable Atl Link :
------------------------

http://www.divshare.com/download/6199434-52c

Contains Source CSV file ...and .....Solution ATL file.

Wednesday, December 17, 2008

Renaming the Datastores of Soure / Target / Intermediate Tables

In order to Rename the Datastores of Source / Target / Intermediate Tables.

1) Take the atl of the job / jobs / Dataflows / Workflows / Datastores

2)Open in Textpad / Notepad.

3) Thru Find & Replace rename the Datastores of the desired one.

4) Save and Import the modified atl into the Repository.

Tuesday, December 16, 2008

Extract Only the File Name from the relevant/Absolete Path

$G_FILENAME = 'C:\RAMEHS\PRINCE\KDKD.TXT';

$G_VAR1 = word_ext($G_FILENAME,-1,'\\');

PRINT('Filename Is...... '||$G_VAR1);

Strings Embedded with Commas enclosed with " "

Flat File... Comma Delimiter....Strings Embedded with Commas enclosed with " "....

Eg :

Id,Name,Address,Amt
123,"Ramesh Babu","2/16,KAD Road,GZB",50000



Solution :

In the Delimiters Section of Flat File Definition Set the properties like :


Column....... ,
Row..........{New Line}
Text.........." (By default this is None}

Thursday, December 11, 2008

Dataflow In a Infinite Loop searching for any Flat file Existance.

Job Is executed 1 time.

Check for Any Flat file(Column structure will be same) exists in "X" Folder.

Name of the flat file changes every time. But the Column Structure will be same.

If any Flat file exists then Dataflow will be executed.

Once the data is loaded into the target table the Source flat file in "X" folder should be deleted.

Dataflow will be in a Infinite Loop searching for any Flat file Existance.


Place a Script & fill the following code:

$G_Stop =1;

Place a While Loop Icon i.e available on the right hand side slider:

put the following condition :

($G_Stop = 1)

Inside the While Loop :

Place a Script & fill the following code :

exec('D:\Flat\b.bat','',2);

$G_a =file_exists('D:\zz.txt');


Code of b.bat :

cd\

d:

cd d:\flat

if not exist *.txt del d:\zz.txt

if exist *.txt dir > d:\zz.txt

Place a Conditional Icon next to the Script :

Put the following condition :

($G_a=1)

Inside the If Condition :

Place the Dataflow which u want to execute.

Next to it place a script and fill the following code.

exec('D:\Flat\Del_Flat.bat','',2);

Code of Del_Flat.bat :

cd\
d:
cd d:\flat
del *.txt

--------------------------------------------

Download Atl :

http://www.divshare.com/download/6200214-d42

Dynamic Target File Names in BODI

Pre-Requisites :

In Data Files Section Set the properties like

Root Directory : D:\
File Location : $Dynamic_Name

Code in the Script File :

$Dynamic_Name='DYNAMIC_'to_char(sysdate(),'YYYY.MM.DD.HH.MI.SS')'.txt';

exec('D:\dy.bat',$Dynamic_Name,8);

Code in Batch File :

d:

echo %1 > d:\z.txt

type k.txt > %1

ATL Downloadable Link :

http://www.divshare.com/download/6199552-ba0

Wednesday, December 10, 2008

Custom Function For Different Date Formats

Script of Custom Function For Converting Different Date Formats into a single Date Format" YYYYMMDD"

if( is_valid_date($P_Date_In ,'DD/MM/YYYY') = 1 AND SUBSTR($P_Date_In,3,1) <> '-')
begin
$L_Date_Out = TO_CHAR(TO_DATE($P_Date_In,'DD/MM/YYYY'),'YYYYMMDD') ;
print($P_Date_In'......1');
return($L_Date_Out);
end
else
begin
if( is_valid_date($P_Date_In ,'DD-MM-YYYY') = 1 AND SUBSTR($P_Date_In,3,1) = '-')
begin
$L_Date_Out = TO_CHAR(TO_DATE($P_Date_In,'DD-MM-YYYY'),'YYYYMMDD') ;
print($P_Date_In'......2');return($L_Date_Out);
end
else
begin
if( is_valid_date($P_Date_In ,'MM/DD/YYYY') = 1 AND SUBSTR($P_Date_In,3,1) <> '-')
begin$L_Date_Out = TO_CHAR(TO_DATE($P_Date_In,'MM/DD/YYYY'),'YYYYMMDD') ;
print($P_Date_In'......3');return($L_Date_Out);
end
else
begin
if( is_valid_date($P_Date_In ,'MM/DD/YYYY') = 1 AND SUBSTR($P_Date_In,3,1) <> '-')
begin
$L_Date_Out = TO_CHAR(TO_DATE($P_Date_In,'MM/DD/YYYY'),'YYYYMMDD') ;
print($P_Date_In'......4');return($L_Date_Out);
end
else
begin
if( is_valid_date($P_Date_In ,'MMDDYYYY') = 1 AND SUBSTR($P_Date_In,3,1) <> '-')
begin
$L_Date_Out = TO_CHAR(TO_DATE($P_Date_In,'MMDDYYYY'),'YYYYMMDD') ;
print($P_Date_In'......5');
return($L_Date_Out);
end
else
begin
if( is_valid_date($P_Date_In ,'YYYYMMDD') = 1 AND SUBSTR($P_Date_In,5,1) <> '-')
begin
print($P_Date_In'......6');
return($L_Date_Out);
end
else
begin
if( is_valid_date($P_Date_In ,'MM-DD-YYYY') = 1 AND SUBSTR($P_Date_In,3,1) = '-')
begin
$L_Date_Out = TO_CHAR(TO_DATE($P_Date_In,'MM-DD-YYYY'),'YYYYMMDD') ;
print($P_Date_In'......7');
return($L_Date_Out);
end
else
begin
if( is_valid_date($P_Date_In ,'YYYY-MM-DD') = 1 AND SUBSTR($P_Date_In,5,1) = '-')
begin
$L_Date_Out = TO_CHAR(TO_DATE($P_Date_In,'YYYY-MM-DD'),'YYYYMMDD') ;
print($P_Date_In'......8');return($L_Date_Out);
end
else
begin
if( is_valid_date($P_Date_In ,'YYYY/MM/DD') = 1 AND SUBSTR($P_Date_In,5,1) <> '-')
begin
$L_Date_Out = TO_CHAR(TO_DATE($P_Date_In,'YYYY/MM/DD'),'YYYYMMDD') ;
print($P_Date_In'......9');return($L_Date_Out);
end
else
begin
if( is_valid_date($P_Date_In ,'DDMMYYYY') = 1 AND SUBSTR($P_Date_In,3,1) <> '-')
begin
$L_Date_Out = TO_CHAR(TO_DATE($P_Date_In,'DDMMYYYY'),'YYYYMMDD') ;
print($P_Date_In'......10');
return($L_Date_Out);
end
else
begin
print($P_Date_In'......11');
return($P_Date_In);
end
end
end
end
end
end
end
end
end
end

Speeding up the BODI Repository

Repo Maintenance

I

1) Take Full Backup

2) Take Only Wanted Backup.

3) Then Project ---> Compact Repository

4) Reimport the Data From ATL files



If the repo is Still Slow

II

1) Go to Repo in Toad

2) Take AL_STATISTICS Backup

3) Truncate the table AL_STATISTICS

4) Reimport the data from ATL files


If the repo is Still Slow

III



1)Take the Backup of AL_MACHINE_INFO table.

2)Create a new repo

3)You will be losing AL_MACHINE_INFO, without this

jobs may/may not execute. So insert the data

back from backup table.


Finally


Clean the Bin Dir DMP Files.

Monday, December 8, 2008

ERRORS....IN.....BODI


1)For a pre/Post Load Command you have called a procedure.
Procedure is not declared is the error u r getting.


A)U Got to mention the Schema Name in which the Procedure resides.

2)Datatype Mismatch in Toad Date---Number

The Data in the table might have number instead of Date.

Check for the schemas bcoz the same table might contain in the same schema and also in other schemas.

3) ora-24334:

No Descriptor for this position in SQL Transfrom (Gives the entire query i.e written in SQL Transform?

A) This error comes If some column of the table has been added/removed in the background i.e in the database, but not reflected in the SQL Transform. you got to remove all the columns from the SQL Transfrom and update the schema again in SQL Transform.

Else Click on Update schema is enough.

Even If you have a query like “SELECT b.sno, b.effdate,b.data FROM scott.b b” in the SQL Transform and the columns that are visible are sno, effdate then you won’t be getting any error.

4) In Flat File Lookup the file i.e we are looking for, in that file definition we don't have to specify the root directory. Instead directly specify the path which is enough.
If you specify both while looking the file the path will come 2 times like C:\C:\.....
which is wrong.


5)A Single Job Server can be configured with multiple repositories and Viceversa.

6)Create a CTL file using SQL Loader .

This can be acheived thru dos or Toad.

If you are creating the CTL file thru Toad, at the same time you can load the data into the database also.But Make sure that you already created the table with the necessary structure in the database priorly.

If you want to load the data into the table manually via Dos or thru Data Integrator then

you got 2 use the SQLLDR Command of dos .

SQLLDR USERID=SCOTT/TIGER@ORADB5 CONTROL=D:\A.CTL LOG=D:\A.LOG

If you r using Data Integrator then Use the following command in Data Integrator.

EXEC('D:\BAT','D:\A.CTL',1);

Put the following command in a Batch file.

SQLLDR USERID=SCOTT/TIGER@ORADB5 CONTROL=%1 LOG=D:\A.LOG

%1 is the parameter(D:\A.CTL) i.e coming from the EXEC() command of Data Integrator


7)
ORA-01861: literal does not match format string

A) If the datatype in the table is date and you r passing a Varchar2/NUmber datatype then this error will occur.

8)BODI -1111340 ERROR ……….OCI.dll file cannot find.

Go to \Oracle\Ora92\Network\Admin\Tnsnames.ora

Open the file and check for SID=connection string thru which u r logging.

If not available over there add the SID=Connection String, Under the

**Connect Data**

9)I have a custom function that suddenly fails. The function is called in a script transform, passing global variables as parameters. The error msg says "Incorrect syntax near......due to error 70401" and it lists the function call w/ a lot more param values than actually defined in the function.

Sol) Might be using the D.I version 11.7.0.X, So please upgrade to 11.7.3.X.

Thursday, December 4, 2008

Job Failed----Error Log should be printed automatically.

Script In BODI :

$G_PRINT = replace_substr(get_error_filename(),' ','*');
$G_PRINT2 = word_ext($G_PRINT, 8, '/');
$g_print3 = substr($g_print,1,index( $g_print,'/error_',1)-1);
exec('c:\a1.bat', $g_print3||' '||$g_print2 , 8);

Batch File Script
c:
cd\
cd %1
cd\

type %2 > c:\echo11.txt
lpr -S 192.168.10.99 -P M3027 -J error_logs c:\echo11.txt

=======================================

Other commands that can be written in Batch file Script
For copying the error file to other folder
------------------------
copy %2 c:\echo11.txt

type %2 > c:\echo11.txt

For Appending the data of 1 file in another
---------------------
type echo11.txt >> abc.txt




******************************************

Thursday, November 27, 2008

BODI Tips & Tricks

The Data Store Types and Database Types of the Data Integrator repository.


Data Store Types :




Database Types ;




Flat Files :




^^General------ Type……………………………..Delimited or Fixed Width or R/3.
Name……………………………..Any Name
Custom Transfer Program………..Yes/No
Parallel Process Threads…………Any Number(2,5,10…)

Delimited — Select Delimited if the file uses a character sequence
to separate columns
Fixed width — Select Fixed width if the file uses specified widths
for each column.




^^ Data File -- Location …………….Local or JobServer
Root Directory……………C:\DI or C:\ or….
File Name …………………A.txt or *.txt or A.txt, B.txt, C.txt
Delete File………………….YES/NO. Yes If you don’t want
to append any data. If u want
to append the data you can set it to NO.
This option will be visible only if you use the flat file as target.




^^ Delimiters - Column …………Comma, Tab, Semicolon,Space or any special
character like | or ~ or # what ever it is….
Row…………….. {new line}
Text…………….. & or .anything

^^ Default Format- The file format editor provides a way to ignore rows containing a specified marker (or markers) when reading files.

For example, you might want to ignore comment line markers such as # and //.

Use the semicolon to delimit each marker, and use the backslash to indicate special characters as markers (such as the backslash and the semicolon). The default marker value is an empty string. When you specify the default value, no rows are ignored.
Eg : 1) Abc 2) abc;def;ghi 3) abc;\; 4) abc;\\;\;

We can even change the date, time or datetime formats also here.
^^Blank Padding The most important thing is the Blank Padding property will
occur only when you are selecting Fixed Width property for
Type in General.Then the Options are Trailing and Leading.
If we are using Trailing we will be getting right padding of spaces for all the rows of data If we are using Leading then we are getting Left padding of spaces for all the rows.





^^Input/Output-- Style ……………………..Headers,BOF/EOF
Skipped Rows…………….a NUMBER(like 2,5…)
Skip Row Header………….Yes/No
Write Row Header…………Yes/No

Skipped Rows is No: of rows u want to skip at beginning.
If u use Headers then only the other 3 properties work.
If u use BOF/EOF then the properties that appears are ….
BOF Marker………….{None}
EOF Marker………….{None}

^^Custom Transfer In order to activate this we got to check YES in General tab at
the beginning.
Then this tab will have the following :
Program Executable……… Any .Bat or .Exe File
User Name………………
Password…………….
Arguments…………..
The above can be justified by their names itself.

^^Locale------….-- Language………….. English,canada…….etc.
Code Page……………Various Code pages will b there thru
which the data from sources will be loaded into targets.


^^Error Handling Write Errors to File………………..
Error File Directory Location……….
Error File Name…………………
Capture row format error……………
The above can be justified by their names itself.



^^ Create File Format...
Creates a file format from a relational table schema. All data types match those of the original table schema.

To create a file format from an existing flat table schema

From the Query editor, right-click a schema and select Create File format.





#@) If the data in the source flat file is like

"CARE AMBULANCE OF ALABAMA, INC.","01-01252-","RISK"

Here the delimiter is , (Comma). Even the Comma is embedded in the first column data.
In this situation we got to do like this.

In the Delimiter section we got to put data in the following manner.

Column , (Comma)
Text “ (Double Quotes)

So we are explicitly telling data integrator to take text if it is embedded in “ (double quotes). In this way we can solve this issue.



Escape char
A special character sequence that causes Data Integrator to ignore the normal column delimiter. Characters following the escape character sequence are never used as column delimiters. For example, suppose you specify a forward slash as the escape character and a comma as the column delimiter. Then, you must have a forward slash to have a comma appear inside a field.

It’s nothing but to concatenate two words by removing the escape character.


Ignore row marker(s) :
Character sequence, which when found at the beginning of rows, cause Data Integrator to ignore the row when reading the file or automatically creating metadata. To enter multiple character sequences, separate each with a semi-colon. To include a semi-colon or backslash as a marking character, precede with a backslash.

Allowing the data even they r having some escape character.




*@) If you are using MAP-OP (operation) transform then before the Query Transform it should contain the KEY COLUMNS (Primary Key). We should not put all the columns in the Query Transform , We should have ONLY THE COLUMNS that are changing.

*@) If a DataFlow contains more than one Insert Flow Paths to the target, or if you are using more than one Source Table in your DataFlow then
Make Sure that you got to create the Source Paths named DF_SourcePath,DF_SourcePath1 etc. declared in the Frame Works at Workflow Level . The number of Source Paths will depend upon the No: of Insert Flow Paths in your DataFlow.


*@) While developing a Dataflow the Select Statement that is given in the Core Logic of Mapping Logic.Xls document must be put in the Where Condition Of the DataFlow.

*@) While working with flat files, if u have used *.txt make sure of the following :

• Don’t create Output file in the Same directory in which *.txt are there
• Be sure are you in need of all the TXT files in the directory or not


*@ We cannot import a table from a user, if there are indexes on that table.
The only way to import that table is

• Take a Backup of the Script of indexes
• Then Drop the indexes
• Now import the tables.
• Again Create the indexes.

*@ If the target column is of Data type CHAR(20) then while comparing with source column put Rpad() to the Source Column in order to make them equal.

*@ Got to see the Partitioned Columns if there are performanance Issues.

*@ Before asking an issue make a clear study of it (It might be a very silly question sometimes, that’s why be prepared before asking any issue)

*@ Unit Testing of Direct Loading is …..

Record count thru Optimized Sql comparing with Record count in the target table.

Taking the First row thru Optimized Sql and comparing with the First row in the target table. The data should be same.

*@ If we want to check for NULL values in DI, In Oracle or in Toad, we will delete the value of the particular column simply.Dont put any thing like NULL or spaces over there .Then we can compare the column to NULL in DI.(That means If there is no data in a column of a table in oracle , that will be treated as NULL in DataIntegrator)

*@ In Lookups while comparing the columns, we need the expression as source Column names. These expressions wont be validated. When you are running the job then only these expressions are validated.

The most important thing and the error gets regarding this is Some particular column i.e not selected from the table name.Actually this error comes if the expression is not found in that particular flow.

*@If the size of the column is not large enough with the source columns data then the source columns data will be truncated . The remedy is to increase the size of the column in the target table.

*@ To Avoid Initial Duplicate Records we got use Table Comparison, Two MAP Operations one for Insert Normal and other for Update Normal. In the Table Comparison
we will compare the key columns which can be noted from the indexed columns in Toad.
The Source and target columns are dragged from the source transform only. In the target side of Table comparison we will be placing an extra column which is changing.

*@Clean Load is nothing but the first time insertion into the target i.e target will not be having any records, since this is the first time load.

*@if you want to use ’N’ in an SQL( ) then you got to write it as \’N\’

For eg : sql('oracle_ds', ('SELECT "start_timestamp"
FROM "status_table"
WHERE "extract_name" = \'DF_RecoverDim\'
AND "stop_timestamp" = NULL')
Returns the timestamp

Returns the timestamp from a status table for the completed data flow. An Oracle datastore requires double quotation marks around the lowercase column and table names.

*@)While working with Lookup_ext and Lookup_Seq, we are having a special property called CUSTOM SQL, In order to work it out we got to follow the same procedure as we do with the normal Lookup_ext. But in the Lookup Condition we can put the where condition of SQL Stt.(One condition is enough). While returning Columns we got to use the same no: of columns with the same data type as we are using in the Select Stt. In CUSTOM SQL.

*@) In a single DATAFLOW We can have two flows which are having different Sources and different Targets.(There would be no connection between these 2 flows.)

*@) Generates the next value in a series, after determining the last value in the
series.
In Key_generation() function we got to pass 3 parameters.
1) DataStorename.AliasName.TableName (Of UserName, If alias is Created.)
DataStorename.UserName.TableName (If alias is not Created)
2)Column Name
3) Incremental Value.
Eg : DataStore Name……Scott_DS
Alias For User…..Scott_A
Table Name………Emp
Column Name ………Empno
Key_Generation(‘Scott_DS.Scott_A.Emp’,’Empno’,1)
If Alias is not Created for User Scott. Then
Key_Generation(‘Scott_DS.Scott.Emp’,’Empno’,1)


*@)if a Select Statement is returning Multiple Values and you want to check a column value whether it presents in the multiple values then
Deptno_Src in( sql(‘scott’,’select empno from emp where deptno > 20’))
Deptno_Src is the column that is coming from the source.
For Suppose the Deptno should not present in those Multiple values then

Not Deptno_Src in( sql(‘scott’,’select empno from emp where deptno > 20’))

If u want to check this in where Condition u can directly put it in Where condition or u can write the select stt. In Custom Function and call it in Where Condition.

*@) The square brackets ([]) indicate that the value of the expression should be substituted.
# The curly braces ({}) indicate that the value of the expression should be quoted with single quotation marks.

$Gbl_No=10;
$Gbl_Name='BBB';

Sql('SOURCE','Insert Into TST Values(10,\'AAA\')');

Sql('SOURCE','Insert Into TST Values(' || $Gbl_No || ',\'' || $Gbl_Name || '\')');

Sql('SOURCE','Insert Into TST Values([$Gbl_No],{$Gbl_Name})');

*@) If the DF contains a group by, then all the query transforms after the group by will not be pushed down to the DB? And in this case, we have to use a Data_Transfer component to enable the full push-down.

DI can push down some nesting. For example, in DI if you have in a mapping:

sum ( ifthenelse (a=1, 1, 0) )

This gets pushed down to a CASE within a SUM in SQL Server properly, and the group by is done in the database. Complex nesting, it cannot do.

*@) In order to replicate a job, simply take a atl of that job and rename the job,workflows and Dataflows. Then import the atl of the job u just exported.

*@) if a condition is like Column is NULL, if we put that one in Query Transform named Query1 that is put after the Source table, if 2 flows are coming from the Query1 then the data won’t go in 2 flows, the data comes in one flow only , the data doesn’t go thru another flow.

Eg :
*@ If the Column Mapping is like Is Number(Columnname) or Columnname is Numeric.
Then we can do the following.

Is_Valid_Int(Columnname,’9’)






@$)

XML SCHEMA AS SOURCE :

Ingredients : DTD(Document Type Definition)
XSD(XML(Extensible Markup Language) Type Definition)
In which the structure of the table will be available.
XML…in which the data related to the table will be available.
Need to fill :

Format Name…. Any valid name as per the rules
Filename/URL…. Locate the Valid filename /URL.
Root Element Name …. Fill this by seeing the XML file/XSD file. It is the element
that comes in the beginning of the file and at the end of the file.

We can create the New XSD, XML files in XML Schemas itself.
We can create the New DTD,XML files in DTD Schemas itself.




TERMS TO REMEMBER Make Current, Nested, Delete, Element, Tree,Field.
FIELD It is the last object which can’t be further expanded. Also contains, data.
TREE It is the parent of a object. Can have sub trees (Also called Elements). As Called till it can be expanded.
ELEMENT Can be called as Subset of Tree. On the whole we can sayTree as Element.
MAKE CURRENT In order to DELETE a field in a Element we got to make the Element as MAKE CURRENT. It will be more useful if we can do MAKE CURRENT to the parent ELEMENT itself so that we can delete any of the child ELEMENTS that comes under
that PARENT.
NESTED In order to get the fields that are needed to your target table, just make all the ELEMENT’s as NESTED and DELETE the fields that are not needed and keep the fields that are needed.

After making NESTED to all the the ELEMENTS and DELETING all the unnecessary FIELDS, then you can drag the FIELDS into the next Transform.



*@) In the Lookup table almost in all the cases we will be finding the multiple “AND” conditions. But rarely you will be finding multiple “AND”s along with a single/Mutiple “OR” condition. So at this time we are having 2 options.

1) Applying the “AND” conditions first and returning the columns that are needed for “OR” condition. Place another Query Transform and in that apply the “OR” Condition by using the return Columns.
2) Implementing it in two flows and then merging.

*&)In the case of normal LOOKUP(Not the Modify Function Call) We can use Global variables in order to assign the return columns data.

*&) A stored procedure completed with in 4 minutes in db engine is running more than 20 minutes to finish through DI job. How to tune the stored procedure calls for better performance?

print('starting SQL call');
sql('CIP_Staging', 'EXEC SP_GENERATE_GOLDENRECORD');
print('finished SQL call');


DI puts a transaction around the DB calls,

If this is SQL, either put into the SP as the first item (before it does anything to manipulate data)...

SET IMPLICIT_TRANSACTIONS OFF

...or in the script within the sql function.
*&) An ATL has been mailed to u. At which level has it been exported
how can we know?

In Order to find out the content of ATL remember the following and search for them in the following order in the A.T.L :

AL_PROJECT ------- PROJECT

CREATE SESSION -------- JOB

CREATE PLAN ------- WORKFLOW

CREATE DATAFLOW -------- DATAFLOW



*&) Which is best way to use lookups in query transform? Use multiple query transform or single query transform? Shortest way or big data flow?

I have five lookup for target dimension data.
Emp_ID
Department_id
Location_id
City_id
Country_id


Which is best way to use these lookups in data flows?
1: (Query1 + lookupEmpMast) + (Query1 + lookupDepartMast) + (Query1 + lookupLocMast) +… Multiple query transform with each lookup

Or

2 (Query 1 + looupupEmpMast + lookupDepartMast + lookupLocMast …)… Single query transform and all lookups in query.

Q1:How the performance is affected in both ways?
Q2:First method requires more cache memory (temporary space) on server?
A)
having multiple queries does not cost you more memory (almost true) as only pointers are moved in case the column value does not change. So performance wise, multiple queries will not be fast nor slower.

On the other hand, one query is one CPU-thread only, so if you have an unblievable numbers of lookups they would saturate one CPU only leaving the others idle. Splitting those lookups into multiple queries would avoid that. But actually, you can dived one query onto multiple CPUs by using the DegreeOfParallelism Parameter of the dataflow. So again, no reason.

To sum up, I would use one query for handling reasons.

Go to the DataFlow of that query, rightclick it an select properties. There is one field called "Degree of Parallelism"

0....same as Global Setting = 1
1....no partitioning
n....split all (possible) transforms into n parts.

*&)
I have that unbelievable 'Lookups' scenario. Have about 30-40 dimension lookups from a fact table. It is taking a long time to run the Job. 20 Minutes to run a 10000 row load. With the Dim Lookups not enabled, it is running in a minute. We have 4 CPUs on the Windows Job Server.
Any suggestions here for us? I don't want to write a SQl outer join on from Facts to Dims as that'll be not fully using DI's functionality. But this delay is causing us issues. We also have the Business Objects Reporting server on the same machine as the Job Server. I know that could be causing some issues too. There is 3GB RAM on the machine.
Should I use logical partitions along with DoP? We're on a DB2 DB. The fact doesn't have any DB partitions

A)
First, you should never have to do outer-joins from facts to dimensions. If you do, then it's philosophically a design problem. If you have missing data in your facts (i.e. dimension not present), then have an entry in your dimension called "NONE".

That is, if sometimes you have a Customer and sometimes not, in the facts where there is no customer, put a foreign key of "-1". In your Customer dimension, put an entry for "NO CUSTOMER" with a key of -1. Then all joins can be inner joins.

As far as using joins vs. lookups... our rule of thumb has been this:

- If you only need a single column of data from the table being looked up, then lookup() is fine, particularly if you can cache the results. (By the way, be sure to investigate doing a DEMAND_LOAD cache or a PRE_LOAD cache. It can make a huge difference if the table will fit in memory.)

- If you need more than one column of data from the table being looked up, it's probably more efficient to join it in, because then you can reference multiple columns without doing multiple lookups.


Be sure that your dimension tables are properly indexed. Not just the artitrary primary key of the dimensions, but whatever foreign keys you might be using if you're doing "reverse" lookups (meaning customer ID "ABCDE" from your source system is really customer ID 12345 in your dimension table.)

10000 rows -- even with 30-40 lookups -- should be do-able in 5 minutes or less. However if you have users doing reporting on the fact table in the middle of this, you're sure to have locking issues (along with data consistency problems).

3Gb on one machine to include both DI and BO seems a bit light, especially if you're going to try caching everything in sight.

-------------------------------------------------------
I assume you are doing the lookups to get the surrogate keys of all or dimensions. Otherwise I fully agree with the above one. Do you know now why I favor natural keys if possible?

Nevertheless you can limit the impact of using surrogate keys in dimension by using a trick in the data model.

Let's assume you have three source tables, customer, accounts and revenues. Obviously, customers and accounts will be loaded in the DIM_CUSTOMER and DIM_ACCOUNTS, both with surrogate keys as primary key. And when loading the revenues into the fact you need two lookups...... No you don't!

It would not be a big deal to have an extra column in the DIM_ACCOUNTS called ACCOUNT_CUSTOMER (each account belongs to customer) and you lookup the surrogate key of the customer while loading this DIM_ACCOUNTS. And if there are other dimension tables as well, like CUSTOMER_REGION, you load that surrogate key into the DIM_CUSTOMER and copy it to DIM_ACCOUNTS as well.

So at the end, when loading the fact, you have to lookup the account only and you get the other dimension keys for free.

So if your 20 tables are all independant - you do not benefit from that technique. But often you can use it.

There is another positive side effect, you will reduce not only the numbers of lookups but also the number of lookup calls. As there are far less customers than fact rows, you call the lookup for the region just customer-times, not as often as there are fact rows.

*&)
Source table has 6 million rows. Some of them are duplicates that cannot be fixed at the source side. Therefore I have to take care in ETL which I am doing through "AUTOCORRECT"

It's taking 2 days to load data from source to target.

Is there any other way to make it faster? I think Lookup / Table Comparison will not help because of the source size.

What other ways do I have?

A)
When dealing with autocorrect load, make sure the target table has a primary key and DI does actually know about, meaning in Designer you can see it.

Let's assume I have a table called CUSTOMER ( CUST_ID number, gender varchar(1));

Depending on the primary key as know within DI, autocorrect load defines the where clause: "update ....where CUST_ID = :Parameter". In case no PK is found we assume all columns are primary keys.

So it is a very good idea to make sure CUST_DI is a primary key otherwise a wrong where clause is generated. Second, it would be benefitial to have a physical primary key as well, otherwise there is no index the database can use to find this one CUST_ID efficently.

&*) LINKED DATASTORES :

First of all we got to create a datastore naming dst1 connecting with Oracle database and give the alias name as DS_OWNER1 and make this as the default configuration,. We got to create another configuration for the same datastore naming config2 in which we will be connected to SQL SERVER database with the alias name as DS_OWNER2.

The crutial thing comes now, we got to create a database link in the oracle database by using the following syntax for a particular user.

create database link lnk connect to scott identified by tiger using 'oradb9';

where oradb9 is the connection string.

After successfully completing all the above steps, edit the datastore dst1,

Click on the advanced button, then click on the Edit button of Configurations,
now go to the alias config1, there click on the LINKED DATASTORES option i.e available over there. After selecting the Datastore dst1 over there u got to click on the browse button over there, then check the use the database link and select the link u want over there.

*@) COMPACT REPOSITORY :


Data Integrator saves a version of each object every time you save the object.
Repeatedly modified object definitions can consume a substantial amount of
space. If you notice your repository performance degrading, consider
compacting the repository.

To access the Compact Repository command, select Project > Compact
Repository from the menu bar. This command removes previous object
versions maintained by Data Integrator.
You can also compact your repository manually. If you have never compacted
the repository, the majority of space in the repository could be occupied by old
versions of Data Integrator objects. In this case, the Compact Repository
Export/import
Backing up repositories 4

Data Integrator Advanced Development and Migration Guide
command might be too slow and tedious. Instead, you can export the latest
versions of the repository object definitions to a file, clear the repository
database by creating a new repository, then reimport the object definitions.
�� To compact your repository by creating a new repository
1. Export the repository to a file.
The file type can be either XML or ATL. The latest version of each object
is exported.
2. Choose Repository Manager from the Start > Programs > Data
Integrator menu.
3. From the Repository Manager, add the database connection information
for the repository.
4. Click Create.
Data Integrator warns that a valid repository already exists.
5. Click Yes to overwrite the old repository.
The Repository Manager creates a new repository, removing all of the old
objects.
6. Import the previously exported repository.

*@#) What are the different REPLICATABLE and NON-REPLICATABLE objects in D I?

REPLICATABLE OBJECTS NON-REPLICATABLE OBJECTS
Workflow Database Function
Dataflow XML Schema
Batch Job Data Store
Custom Functions Tables, Template tables
Flat Files

In Order to replicate the Non-Replicatable Objects, we got to take the A.T.L of that object and replace the existing name of it to the desired new name and then save it, import that A.T.L. Automatically a new object with the new name will be visible in the Designer.






Select this check box to use auto correct loading.
Auto correct loading ensures that the same row is not duplicated in
a target table. This is particularly useful for data recovery
operations.
Note: This option is not available for targets in real time jobs
or target tables that contain LONG column(s).
When you select this check box, Data Integrator:
1. Reads a row from the source.
2. Checks if a row exists in the target table with the same
values in the primary key. If the Use input keys check
box is selected, Data Integrator uses the primary key of
the source table. Otherwise, Data Integrator uses the
primary key of the target table; if the target table has no
primary key, Data Integrator considers the primary key to
be all the columns in the target.
• If a matching row does not exist, inserts the new row
regardless of other options.
• If a matching row exists, updates the row depending
on the values of Ignore columns with value and
Ignore columns with null:
• When the column data from the source matches
the value in Ignore columns with value, the
corresponding column in the target table is not
updated. The value may be spaces. Otherwise,
the corresponding column in the target is
updated with the source data.
• When the Ignore columns with null check box is
selected and the column data from the source is
NULL, then the corresponding column in the
target table is not updated. Otherwise, the
corresponding target column is updated as
NULL since the source column is NULL.
Data Integrator can optimize data flows such that an Oracle
source database completes the auto correct load operation.
When all other operations in the data flow can be pushed
down to the source database, the auto-correct loading
operation is also pushed down. The generated SQL
implements the Ignore columns with value and Ignore
columns with null options.
Tab Option Description





STAGING TABLES AUTOMATED
Occasionally, one dataflow became so complex or required so much memory that we split the dataflow into parts with staging tables inbetween.
The very same thing can be done with the new transform "Data_Transfer". Wherever you place that transform inside your complex dataflow, the engine will actually split the dataflow into two (or more) parts just as we did manually before. The major advantage is, such a Data_Transfer object can be reconnected or even disabled. Much easier to change the split points. Try to move one transform from one dataflow into the other of your manually broken up dataflow.

*&$) If we are having the TimeStamp Columns in the source tables and that to if we are using them in the SQL TRANSFORM then

1)Make sure to convert them to varchar using TO_CHAR() in the SQL TRANSFORM itself.

2) In the next QUERY TRANFORM convert the same to date by using TO_DATE().

3) Change the datatype in the QUERY TRANFORM for the particular column with DATE.

*&) If the column in the target table is having NUMBER Datatype with no Size and Precision
in the database then

1) By Default the datatype will be DECIMAL(28,7).

Change the datatype of that particular column in the QUERY TRANSFORM just before the target table with DECIMAL(28,0)























































ORACLE

*&) TRUNC(DBMS_RANDOM.VALUE(1,100))

DBMS_RANDOM.VALUE(range) is used to get a random value from the specified range mentioned in it. Since it returns a decimal value according to the needs we can use TRUNC() to make it an integer.

Eg :

select b.*,trunc(dbms_random.value(1,100)) from emp b order by trunc(dbms_random.value(1,100));

@#) While calling a procedure/Function thru EXECUTE IMMEDIATE we got to pass a parameter in a procedure/Function and it can be of any as below.One more thing we got to write USING IN OUT in order to use the parameters.

declare
k varchar2(30):='ramesh';
a varchar2(10):='prc';
begin
execute immediate ' begin '|| a || '(:1); end;'
using IN OUT k;
dbms_output.put_line(k);
end;
*$) In order to get the date after the difference on the basis of month,year we can use

NUMTOYMINTERVAL()

We can use this for both year and month.

If we want the difference with days , we can use the no: of days directly without using any function.

Eg : select to_char((sysdate - NUMTOymINTERVAL(3, 'month')),'dd-mon-yyyy') from dual;

select to_char((sysdate - NUMTOymINTERVAL(3, 'year')),'dd-mon-yyyy') from dual;

select sysdate – 20 from dual;


*$) In order to get the date after the difference on the basis of day,hour,minute,second we can use
NUMTODSINTERVAL()

We can use this for day,hour,minute,second.

eg: select to_char((sysdate - NUMTOdsINTERVAL(3, 'hour')),'dd-mon-yyyy hh:mi:ss') from dual;


*@$)Oracle error message for operation : 1.


#@) DATABASE LINK SYNTAX

create database link lnk connect to scott identified by tiger using 'oradb9';

Business Objects Data Integrator

1) Is it necessary to put Input Primary key column in Table Comparison Transform …A) It is mandatory to select primary key column

2) Parallel Processing

3) Soure table Partitioning in Data Integrator , How will u load the data?

4) Performance tuning of a job

5) Parameters in Dataflow

6) SCD Type 2 flow

7) Audit Strategy Implementation in the project

8) Different logs available in the Data Integrator

9) Different Types of Embedded dataflows? Are they reusable.

10) Passing the parameters of one dataflow to other dataflows.

11) Degree of Parallelism. Where we will be setting this?

12) Bulk Loading

Bulk Loading :

You can bulk load to Oracle using an API or a staging file:
• If you select the API method, Data Integrator accesses the direct path engine of Oracle’s database server associated with the target table and connected to the target database. Using Oracle’s Direct-Path Load API, input data feeds directly into database files. To use this option, you must have Oracle version 8.1 or later.


Logs :

Using Data Integrator logs
• Examining trace logs
• Examining statistics logs
• Examining error logs

 Use the trace logs to determine where an execution failed, whether the execution steps occur in the order you expect, and which parts of the execution are the most time consuming.

 The statistics log (also known as the monitor log) quantifies the activities of the components of the job. It lists the time spent in a given component of a job and the number of data rows that streamed through the component.

 Data Integrator produces an error log for every job execution. Use the error logs to determine how an execution failed. If the execution completed without error, the error log is blank.

Examining target data

The best measure of the success of a job is the state of the target data.

Always examine your data to make sure the data movement operation
produced the results you expect. Be sure that:

• Data was not converted to incompatible types or truncated.
• Data was not duplicated in the target.
• Data was not lost between updates of the target.
• Generated keys have been properly incremented.
• Updated values were handled properly.

Embedded Data Flow

An embedded data flow is a data flow that is called from inside another data flow. Data passes into or out of the embedded data flow from the parent flow through a single source or target. The embedded data flow can contain any number of sources or targets, but only one input or one output can pass data
to or from the parent data flow.

There are four types of embedded data flows:

One input ---------Add an embedded data flow at the end of a data flow

One input and one output---------- Add an embedded data flow in the middle of a data flow.

One output --------Add an embedded data flow at the beginning of a data flow

No input or output---------- Replicate an existing data flow.

Partitioning :

You can set Data Integrator to perform data extraction, transformation, and loads in parallel by setting parallel options for sources, transforms, and targets.

In addition, you can set individual data flows and work flows to run in parallel by simply not connecting them in the workspace.

If the Data Integrator Job Server is running on a multi-processor computer, it takes full advantage of available CPUs.

Parallel Execution in data flows

* Table partitioning
• Degree of parallelism
* Combining table partitioning and a degree of parallelism
• File multi-threading


Table Partitioning

Data flow with source partitions only
Data flow with target partitions only
Dataflow with source and target partitions

Data Integrator instantiates a source thread for each partition, and these threads run in parallel. The data from these threads later merges into a single stream by an internal merge transform before processing the query.

Data Integrator inserts an internal Round Robin Splitter (RRS) transform after the Query transform, which routes incoming rows in a round-robin fashion to internal Case transforms. The Case transforms evaluate the rows to determine the partition ranges. Finally, an internal Merge transform collects the incoming rows from different Case transforms and outputs a single stream of rows to the target threads. The Case, Merge, and the target threads execute in parallel.

Degree Of Parallelism :

You can run transforms in parallel by entering a number in the Degree of Parallelism box on a data flow’s Properties window. The number is used to replicate transforms in the data flow which run as separate threads when the Job Server processes the data flow.


*** Entering a number in the Degree of Parallelism box on a data flow’s Properties window.

*** The number is used to replicate transforms in the data flow