How Is Surrogate Key Generated
Generally, a Surrogate Key is a sequential unique number generated by SQL Server or the database itself. The purpose of a Surrogate Key is to act as the Primary Key. There is a slight difference between a Surrogate Key and a Primary Key. Ideally, every row has both a Primary Key and a Surrogate Key. If what you choose is not a nature key, but a system generated identifier, it is called surrogate key. Or we can say that you use a surrogate key' as the primary key. Openssl generate ssh-rsa key. In the avove example, the customerid is a surrogate key and the customernumber is the nature key. These are just terms to describe the table design.
Goal
Fill in a data warehouse dimension table with data which comes from different source systems and assign a unique record identifier (surrogate key) to each record.
Scenario overview and details
To illustrate this example, we will use two made up sources of information to provide data about customers dimension. Each extract contains customer records with a business key (natural key) assigned to it.
In order to isolate the data warehouse from source systems, we will introduce a technical surrogate key instead of re-using the source system's natural (business) key.
A unique and common surrogate key is a one-field numeric key which is shorter, easier to maintain and understand, and independent from changes in source system than using a business key. Also, if a surrogate key generation process is implemented correctly, adding a new source system to the data warehouse processing will not require major efforts.
Surrogate key generation mechanism may vary depending on the requirements, however the inputs and outputs usually fit into the design shown below:
Inputs:
- an input respresented by an extract from the source system
- datawarehouse table reference for identifying the existing records
- maximum key lookup
Outputs:
- output table or file with newly assigned surrogate keys
- new maximum key
- updated reference table with new records
Proposed solution
Assumptions:
- The surrogate key field for our made up example is WH_CUST_NO.
- To make the example clearer, we will use SCD 1 to handle changing dimensions. This means that new records overwrite the existing data.
The ETL process implementation requires several inputs and outputs.
Input data:
- customers_extract.csv - first source system extract
- customers2.txt - second source system extract
- CUST_REF - a lookup table which contains mapping between natural keys and surrogate keys
- MAX_KEY - a sequence number which represents last key assignment
Output data:
- D_CUSTOMER - table with new records and correctly associated surrogate keys
- CUST_REF - new mappings added
- MAX_KEY sequence increased
The design of an ETL process for generating surrogate keys will be as follows:
- PROD_REF table
- max_key sequence
What Is Surrogate Key And Why It Is Needed
- populate a new surrogate key and assign it to the record. The new key will be populated by incrementing the old maximum key by 1.
- insert a new record into the products table
- insert a new record into the mapping table (which stores business and surrogate keys mapping)
- update the new maximum key
Sample Implementations
Generation of surrogate key implementation in various ETL environments:PDI surrogate key - surrogate key generation example implemented in Pentaho Data Integration