How Is Surrogate Key Generated

admin 17.12.2020 17.12.20

Generally, a Surrogate Key is a sequential unique number generated by SQL Server or the database itself. The purpose of a Surrogate Key is to act as the Primary Key. There is a slight difference between a Surrogate Key and a Primary Key. Ideally, every row has both a Primary Key and a Surrogate Key. If what you choose is not a nature key, but a system generated identifier, it is called surrogate key. Or we can say that you use a surrogate key' as the primary key. Openssl generate ssh-rsa key. In the avove example, the customerid is a surrogate key and the customernumber is the nature key. These are just terms to describe the table design.

What Is Surrogate Key And Why It Is Needed
How Is Surrogate Key Generated Key
Surrogate Key Database

Goal

Fill in a data warehouse dimension table with data which comes from different source systems and assign a unique record identifier (surrogate key) to each record.

Scenario overview and details

To illustrate this example, we will use two made up sources of information to provide data about customers dimension. Each extract contains customer records with a business key (natural key) assigned to it.
In order to isolate the data warehouse from source systems, we will introduce a technical surrogate key instead of re-using the source system's natural (business) key.
A unique and common surrogate key is a one-field numeric key which is shorter, easier to maintain and understand, and independent from changes in source system than using a business key. Also, if a surrogate key generation process is implemented correctly, adding a new source system to the data warehouse processing will not require major efforts.
Surrogate key generation mechanism may vary depending on the requirements, however the inputs and outputs usually fit into the design shown below:
Inputs:
- an input respresented by an extract from the source system
- datawarehouse table reference for identifying the existing records
- maximum key lookup
Outputs:
- output table or file with newly assigned surrogate keys
- new maximum key
- updated reference table with new records

Proposed solution

Assumptions:
- The surrogate key field for our made up example is WH_CUST_NO.
- To make the example clearer, we will use SCD 1 to handle changing dimensions. This means that new records overwrite the existing data.
The ETL process implementation requires several inputs and outputs.
Input data:
- customers_extract.csv - first source system extract
- customers2.txt - second source system extract
- CUST_REF - a lookup table which contains mapping between natural keys and surrogate keys
- MAX_KEY - a sequence number which represents last key assignment
Output data:
- D_CUSTOMER - table with new records and correctly associated surrogate keys
- CUST_REF - new mappings added
- MAX_KEY sequence increased
The design of an ETL process for generating surrogate keys will be as follows:

The loading process will be executed twice - once for each of the input files

Check if the lookup reference data is correct and available:
- PROD_REF table
- max_key sequence

Read the extract and first check if a record already exists. If it does, assign an existing surrogate key to it and update the desciptive data in the main dimension table.

If it is a new record, then:

What Is Surrogate Key And Why It Is Needed

- populate a new surrogate key and assign it to the record. The new key will be populated by incrementing the old maximum key by 1.
- insert a new record into the products table
- insert a new record into the mapping table (which stores business and surrogate keys mapping)
- update the new maximum key

Sample Implementations

Generation of surrogate key implementation in various ETL environments:
PDI surrogate key - surrogate key generation example implemented in Pentaho Data Integration

1ns4ance.netlify.app

How Is Surrogate Key Generated

Goal

Scenario overview and details

Proposed solution

What Is Surrogate Key And Why It Is Needed

Sample Implementations

How Is Surrogate Key Generated Key

Comments

Surrogate Key Database