Key_generation Function In Bods
Feb 29, 2012 Although the Key Generation Transform actually does execute a SQL I put it into the Streamline category because it executes one and only one SQL at the start of the transform and this is a 'select max(key) from table'.In almost all cases this key will be the primary key column which is indexed. Oct 24, 2013 Post subject: Re: Key Generation Function in BODS Scripts: You can do the same in SQL Server and other DB's, but I tend to avoid them after having too many DBA's reset sequences and auto-numbers for their own purposes or by mistake. If you have a good relationship with your DBA, feel free, but I have never experienced a performance problem that. SAP BODS Team Lead. Responsibilities: Designed and developed the jobs in BODS by understanding the functional specification and mapping document. Mapped the data from Netezza database, flatfile to SQL Server database based on the requirements. Generated reporting table as per the business requirement.
When you import a table with an Identity column it is treated like any other regular INT datatype column. And when you load it, an error is raised by the database: 'An explicit value kann not be inserted into an Identity column if IDENTITY_INSERT is set to OFF'.
But how can we load a table that has an Identity column?
First, simply do not set any column to type Identity. DI has the key generation transform to provide values for surrogate keys. It works similar to the identity column logic, take the highest key value and increment it by one. However, the Identity column checks the highest value for each and every row, Key Generation transform only once, at the beginning of the dataflow. So you cannot use the Key Generation if..
- two DI dataflows run in parallel loading the same table. Both will use the same key values.
- DI loading the table and at the same time some other application inserts data.
What is the problem with identity columns actually? /openssl-generate-from-csr-seperate-private-key-file.html. DI adds them into the insert statement, even if they have a NULL value and SQL Server complaints. It wouldn't be a problem if SQL Server would look at the column value, figure there is a NULL value and hence ignore the column by itself. But no, the logic for SQL Server is: Identity column has to be omitted from the insert statement.
Actually, this can be accomplished in DI quite easily. DI does not base the insert/update/delete statement on the table schema, it does use the columns of the input schema. So all we have to do is to remove the identitiy column from the query before the table loader. And as this column is the physical primary key to the table, wher have to check the flag 'use input keys' in the table loader.
Just keep in mind, above works for insert statements only. In update statements the primary key column is not updated anyway, it is used in the where clause only. Hence, for deletes/updates not only is there no problem, but the key column is required there. That makes a dataflow with mixed inserts and updates quite ugly. We need two separate streams of data, updates go into the table unchanged, inserts are converted to normal rows so we can add a query downstream where the identity column is omitted.
A completely different approach is to do what the SQL Server error message advised us: To issue the command 'set IDENTITY_INSERT table ON'. This is a session command, so needs to be part of the session loading the data. A sql() function call cannot be used as this would be a session of its own. The only place where you can add that is the Preload Command in the table loader.
Neither of this solution is perfect. So long term a feature has to be implemented where the enduser simply can chose what method should be used, let DI generate the key or omit the identity column in inserts.
Skip to end of metadataGo to start of metadataAlthough the Key Generation Transform actually does execute a SQL I put it into the Streamline category because it executes one and only one SQL at the start of the transform and this is a 'select max(key) from table'. In almost all cases this key will be the primary key column which is indexed. So to find a max() the database has to walk to the last element in the key which is an operation done in no-time. Based on this starting value, the transform increases the value by one for each row with OP code Insert or Normal and puts the value into the 'Generated Key Column'.
Throughput | 237'000 rows/sec |
CPU | 100% |
Memory | none |
Disk | none |
DoP | singularization point |
Sap Bods Functions
Transform Usage
Securecrt 7.1 license key generator.
Transform Settings
To test it, we had to rename the column from DI_ROW_ID to KEY_ID which is the primary key name of another table.
The execution time for above flow without the key generation was 84 secs, so the overhead for the query is about 20 secs. The execution time with Key Generation Transform added as well was 422 seconds which is quite surprising.
Anyway, no memory is consumed, only CPU and the throughput is still at excellent 237'000 rows per second.
Key Generation Function In Bods
Attachment
Job_Key_Generation_Transform.zip (4.98 KB)