How Is Surrogate Key Generated

  

Generally, a Surrogate Key is a sequential unique number generated by SQL Server or the database itself. The purpose of a Surrogate Key is to act as the Primary Key. There is a slight difference between a Surrogate Key and a Primary Key. Ideally, every row has both a Primary Key and a Surrogate Key. If what you choose is not a nature key, but a system generated identifier, it is called surrogate key. Or we can say that you use a surrogate key' as the primary key. Openssl generate ssh-rsa key. In the avove example, the customerid is a surrogate key and the customernumber is the nature key. These are just terms to describe the table design.

Goal

Fill in a data warehouse dimension table with data which comes from different source systems and assign a unique record identifier (surrogate key) to each record.

Scenario overview and details

Surrogate

To illustrate this example, we will use two made up sources of information to provide data about customers dimension. Each extract contains customer records with a business key (natural key) assigned to it.
In order to isolate the data warehouse from source systems, we will introduce a technical surrogate key instead of re-using the source system's natural (business) key.
A unique and common surrogate key is a one-field numeric key which is shorter, easier to maintain and understand, and independent from changes in source system than using a business key. Also, if a surrogate key generation process is implemented correctly, adding a new source system to the data warehouse processing will not require major efforts.
Surrogate key generation mechanism may vary depending on the requirements, however the inputs and outputs usually fit into the design shown below:
Inputs:
- an input respresented by an extract from the source system
- datawarehouse table reference for identifying the existing records
- maximum key lookup
Outputs:
- output table or file with newly assigned surrogate keys
- new maximum key
- updated reference table with new records

Proposed solution

Assumptions:
- The surrogate key field for our made up example is WH_CUST_NO.
- To make the example clearer, we will use SCD 1 to handle changing dimensions. This means that new records overwrite the existing data.
The ETL process implementation requires several inputs and outputs.
Input data:
- customers_extract.csv - first source system extract
- customers2.txt - second source system extract
- CUST_REF - a lookup table which contains mapping between natural keys and surrogate keys
- MAX_KEY - a sequence number which represents last key assignment
Output data:
- D_CUSTOMER - table with new records and correctly associated surrogate keys
- CUST_REF - new mappings added
- MAX_KEY sequence increased
The design of an ETL process for generating surrogate keys will be as follows:

  • The loading process will be executed twice - once for each of the input files
  • Check if the lookup reference data is correct and available:
    - PROD_REF table
    - max_key sequence
  • Read the extract and first check if a record already exists. If it does, assign an existing surrogate key to it and update the desciptive data in the main dimension table.
  • If it is a new record, then:

    What Is Surrogate Key And Why It Is Needed


    - populate a new surrogate key and assign it to the record. The new key will be populated by incrementing the old maximum key by 1.
    - insert a new record into the products table
    - insert a new record into the mapping table (which stores business and surrogate keys mapping)
    - update the new maximum key

    Sample Implementations

    Generation of surrogate key implementation in various ETL environments:
    PDI surrogate key - surrogate key generation example implemented in Pentaho Data Integration

    How Is Surrogate Key Generated Key

    Comments

    Surrogate Key Database