Posts

Slowly Changing Dimensions – Type Four Models

Type Four – Insert Into a History Table

Type four models, also known as leveraging “history tables”, is the most technically sophisticated of the four models and may be the most difficult to implement. This modeling technique provides for nearly unlimited tracking of historical records while having less storage requirement than type two models. Rather than storing the changes in the same table, a second “history” table is created which stores only the previous values of slowly changing dimensions.

Similar to type two models, type four models accommodate infinite changes to dimensional fields and create an additional record for every change to a dimensional attribute. But in contrast to type two, type four models allow for every change to an attribute to be generated within a new record in a relatively compact history table.  The history table is subsequently more efficient in capturing a large amount of historical data.

Another key advantage of type four models is an efficient manner to query against a timeframe as the related search index only requires two fields (key and date fields).  Other modeling techniques require more fields in the search index for date queries.  Thus the search index utilizing a type four model is smaller, more intuitive, and quicker to retrieve the relevant record in the dimension table than the search index using other modeling techniques.

Type four models do have some important disadvantages.  Namely type four models require implementation of multiple tables, are less intuitive for query developers, require more effort to develop and maintain than other types of models, and allow for history tables to grow to massive size.

Suppose that a vendor changes his phone number to 858-555-6555 from 202-555-8639 because the phone company has added a new area code. Utilizing a type four model, the vendor dimension table would be updated and a new record will be inserted into the vendor history table in the following manner…

Type Four Model - Slowly Changing Dimension

• The vendor dimension table is updated:
– The phone number is updated from 202-555-8638 to 858-555-6555.
• A new record is inserted into the vendor history table:
– The vendor key is copied from the vendor dimension table.
– The phone number 858-555-6555 is inserted.
– The effective date of 12/15/2008 is inserted.
Share

What are Slowly Changing Dimensions?

Modern data warehouse design assumes that business transactions such as sales, orders, shipments, fulfillments, and receivables can occur at a rapid rate and each the details of each transaction needs to be recorded.  Hence a fact table with a dimensional model contains a separate record for each business transaction.  While in contrast, the describing or text-based values of the transaction or dimension often remain fairly constant.  Often, dimensional tables within the dimensional model do not take changes into account.

But in reality, dimensional values can and do change over time and numerous fields of a given row within a dimension table will need to be updated. This phenomenon in data modeling is known as “slowly changing dimensions” and it can be applied to any dimension table within a data warehouse schema.  Moreover, both simple and advanced modeling techniques have been established and can be implemented for handling updates and changes within a dimension table.  In addition, slowly changing dimensions assist the data warehouse in precisely recording the past values, providing an efficient method for tracking history, and allowing for the ability to respond to changes to descriptive values of transactions.

Examples of slowly changing dimensions include:
–  account name
–  customer phone number
–  vendor address
–  product description

These are good examples as they are text-based values that remain relatively constant, but can change and commonly do change over time.  Names, phone numbers, addresses are fairly intuitive and it is easy to see how these values can change slowly over time.  But let’s see how a product description could change…  A simple ingredient change or a packaging change in a product may be so trivial that the organization does not decide to give the product a new product id.  Rather the source system provides the data warehouse with a revised description of the product.  Hence the data warehouse needs to track both the old and new descriptions of the product.

Other good examples of common slowly changing dimensions are the region and territory names for a sales force. Many organizations have management that rename their region and territories on a regular basis or the management realigns their regions and territories along customer purchase patterns.   Typically the requirement of a data warehouse is to keep a record of the names of the regions and territories and the dates they were active.

Originally pioneered by Ralph Kimball, PhD, four main data modeling techniques have been established for managing dimension tables that contain slowly changing dimensions:

Type One – Overwrite the Record
Type Two – Update Record to Inactive / Create an Active Record
Type Three – Leverage Previous and Current Value Fields
Type Four – Insert Into a History Table

These four data modeling techniques range from the complete loss of historical data to an elegant but technically complex method of saving almost all historical data.  Choice of the appropriate technique by the database designer can ensure that the data warehouse contains required historical values and allows for comparisons of current data or data from other time periods.

Share