Archive for "Posts"

Slowly Changing Dimensions – Type Two Models

Data Warehousing, Posts Feb 04, 2011 No Comments

Type Two – Update Record to Inactive / Create an Active Record

Type two modeling is a very reliable and straightforward technique for preserving history of changes to dimensional tables.  It is commonly utilized when a full or partial set of the previous values of a dimension’s attributes must be retained. In type two modeling, every time an update occurs to any of the values in a dimensional table, a new record is physically inserted as active or current into the same table and the old records are marked inactive or historical.

The main advantage of this modeling technique is that it can accommodate and record a massive amount of history and nearly an unlimited changes to slowly changing dimensions. But this advantage also can become a significant drawback. Within this modeling technique, it is possible for dimension tables to grow to massive sizes and adversely affect both system and query performance. In addition, since upon every change this modeling technique requires an update to the linked fact table, implementation is fairly difficult and may require substantial effort to design, develop, and support.

Two unique sub-methods have been established for distinguishing the active (or current) record from the inactive (or historical) records: active flagging and tuple versioning.

In the type two – active flagging sub-method, an “Active” column is constructed within the dimensional table and acts as a flag. Flags are binary fields with permissible values being Y and N, T and F, or 1 and 0. A positive value (Y, T, 1) indicates an active record, while a negative value (N, F, 0) indicates an inactive record.

Example of Type Two – Active Flagging Sub-Method

Suppose that a vendor changes his phone number to 858-555-6555 from 202-555-8639 because the phone company has added a new area code. Using active flagging, the change to the vendor dimension table would be processed as follows…

Type Two - Active Flagging - Slowly Changing Dimension

•  The current active record in the vendor dimension table is updated:
–  The active flag is changed from T to F.
–  The phone number field remains as 202-555-6555.
•  A new record is inserted into the vendor dimension table:
–  The vendor key is given the next available integer number.
–  The vendor name field remains the same as in the old record.
–  The phone number has the new value of 858-555-8639.
–  The active flag is set to T.

In the type two – tuple versioning sub-method, start date and end date columns are included on the dimension table. The values of these date fields then define the period during which that record has been active. For the most part, the start date is the date the record has been created. Moreover, the end date is the date the record has become inactive, either because a newer record has replaced it or because the original record in the source system no longer exists. On the active record, the end date will either be left null, blank, or identified in another way such as all-nines, according to modeler’s preferences and standards.

The main advantage of the tuple versioning method over active flagging is that it provides an audit trail of the date and sequence of all updates.

Example of Type Two – Tuple Versioning Sub-Method

Continuing with the same example above but using the tuple versioning method, the updated phone number in a record in the vendor dimension table is processed as follows:

Type Two - Tuple Versioning - Slowly Changing Dimension

•  The current active record in the vendor dimension table is updated:
–  The end date is updated to the date of the change.
–  The phone number field remains as 202-555-6555.
•  A new record is inserted into the vendor dimension table:
–  The vendor key is given the next available integer number.
–  The vendor name field remains the same as in the old record.
–  The phone number has the new value of 858-555-8639.
–  The start date is the date the record is inserted into the table.
–  The end date is intentionally left blank.

Slowly Changing Dimensions – Type One Models

Data Warehousing, Posts Jan 31, 2011 No Comments

Type One – Overwrite the Record

Type one modeling of slowly changing dimensions is very simple and effectively handles updates to incorrect and/or outdated values within the dimension table. Type one models do not retain a history of changes and do not store previous values in any way. With this modeling technique, the data field that has changed is simply updated to reflect the most current value.  Since this technique does not preserve any previous value, it should only be used when there is no requirement to retain historical data.

Type one models makes sense when correcting data issues, but not when the system requires retrieval of historical values. Without a doubt, type one is the simplest method for handling slowly changing dimensions. However, best practice is to utilize this technique on a very limited basis. Moreover, it should only be implemented when there is no need to track the history of changes.

Example of Type One

First an electronics company loads its list of products within its data warehouse. Later the company discovers that a television which was originally manufactured in the USA is now manufactured in China, and the company has in fact never received a shipment of the older American-made televisions. The product description column in the table is simply updated, and the date and time of the update is entered in the “Last Update” column.

Type One - Slowly Changing Dimension

To correct the product description, the existing description is simply overwritten with the correct value, and the “Last Update” field is revised with the current date. Moreover, the product description is updated because the item is no longer manufactured in the United States. Utilizing the type one method of managing slowly changing dimensions, the record on the product dimension table is updated in-place in the following manner…
–  The product description is updated in-place to the new description.
–  The last update is updated to the date and time of the change.

What are Slowly Changing Dimensions?

Data Warehousing, Posts Jan 20, 2011 No Comments

Modern data warehouse design assumes that business transactions such as sales, orders, shipments, fulfillments, and receivables can occur at a rapid rate and each the details of each transaction needs to be recorded.  Hence a fact table with a dimensional model contains a separate record for each business transaction.  While in contrast, the describing or text-based values of the transaction or dimension often remain fairly constant.  Often, dimensional tables within the dimensional model do not take changes into account.

But in reality, dimensional values can and do change over time and numerous fields of a given row within a dimension table will need to be updated. This phenomenon in data modeling is known as “slowly changing dimensions” and it can be applied to any dimension table within a data warehouse schema.  Moreover, both simple and advanced modeling techniques have been established and can be implemented for handling updates and changes within a dimension table.  In addition, slowly changing dimensions assist the data warehouse in precisely recording the past values, providing an efficient method for tracking history, and allowing for the ability to respond to changes to descriptive values of transactions.

Examples of slowly changing dimensions include:
–  account name
–  customer phone number
–  vendor address
–  product description

These are good examples as they are text-based values that remain relatively constant, but can change and commonly do change over time.  Names, phone numbers, addresses are fairly intuitive and it is easy to see how these values can change slowly over time.  But let’s see how a product description could change…  A simple ingredient change or a packaging change in a product may be so trivial that the organization does not decide to give the product a new product id.  Rather the source system provides the data warehouse with a revised description of the product.  Hence the data warehouse needs to track both the old and new descriptions of the product.

Other good examples of common slowly changing dimensions are the region and territory names for a sales force. Many organizations have management that rename their region and territories on a regular basis or the management realigns their regions and territories along customer purchase patterns.   Typically the requirement of a data warehouse is to keep a record of the names of the regions and territories and the dates they were active.

Originally pioneered by Ralph Kimball, PhD, four main data modeling techniques have been established for managing dimension tables that contain slowly changing dimensions:

Type One – Overwrite the Record
Type Two – Update Record to Inactive / Create an Active Record
Type Three – Leverage Previous and Current Value Fields
Type Four – Insert Into a History Table

These four data modeling techniques range from the complete loss of historical data to an elegant but technically complex method of saving almost all historical data.  Choice of the appropriate technique by the database designer can ensure that the data warehouse contains required historical values and allows for comparisons of current data or data from other time periods.


Dimensional Modeling and Data Warehouses

Data Warehousing, Posts Jan 14, 2011 No Comments

Dimensional modeling is a specific discipline for modeling data that is an alternative to entity-relationship (E/R) modeling. A dimension model contains the same information as an E/R model but packages the data in symmetric format whose design goals are user understandability, query performance, and resilience to change.
Ralph Kimball, PhD, The Data Warehousing Lifecycle Toolkit, 1998

Basic Dimensional Model (Star Schema)

Basic Dimensional Model (Star Schema)

Dimensional modeling is a data modeling technique used to support on-line analytical processing (OLAP) systems and is implemented in databases that host either an enterprise data warehouses or data marts. The key point on the design of dimensional models is to resolve questions in the format “measures by dimensions.”  In addition, dimensional models are commonly referred to as star schema as they comprised of a central fact table surrounded by several dimension tables.

Within a dimensional model or star schema, there exists two types of data entities or tables
•  Facts (Measurements – Numerical Values)
•  Dimensions (Contexts and Attributes – Text, Strings, Dates, & Flags)
Transactional (OLTP) Systems to Analytical (OLAP) Systems

Transactional (OLTP) Systems to Analytical (OLAP) Systems

Within an enterprise data warehouse or data marts, data is fundamentally static, non-volatile and does not get updated.  Rather data is inserted or loaded in bulk into the tables in the model utilizing using batch programs or extraction, transformation, & loading (ETL) routines.  End-users of dimensional models develop queries that either read or select data, and there is no end-user inserting, updating, or deleting of data.  Data in dimensional databases requires data to be converted or extracted from on-line transactional processing (OLTP) or other OLAP systems.

The key benefits of dimensional models and data warehouses include ….
•  Separate environment from transactional systems
•  Allows for high-performance of select/read queries
•  Insulated from changes in source systems
•  Intuitive to developers and business users of queries
•  Contains data from multiple source systems
•  Optimized format for data warehouses and data marts

Business Intelligence Vendors and Products (2010)

Business Intelligence, Posts Jan 08, 2011 1 Comment

As of the end of 2010, the business intelligence (BI) software marketplace is in a state in which there are clear leaders or as Gartner, Inc puts it, “megavendors” that own roughly two-thirds of the $6 billion market.  In addition the rest of the BI vendors are more commonly referred to as “niche” or “pure-play” vendors.  These “niche” vendors each produce and sell viable business intelligence software products, but they do not encompass the breadth of complimentary technologies that the “megavendor” produces.  Regardless of the size of the vendor, both the “megavendors” and “pure-play” vendors do provide very capable and feature-rich software solutions.

Even though many of vendors listed below produce a number of related products, this list is only focused on business intelligence software products.

BI Software Vendors and Products (2010)

BI Software Vendors and Products (2010)

The following companies are currently known as megavendors
• Microsoft
• Oracle
And the following companies are now known as niche or pure-play vendors
• MicroStrategy
• Information Builders
• Actuate
• LogiXML
• QlikTech
• Pentaho
• Jaspersoft
• Panorama Software
• Tibco Software (Spotfire)
• Tableau Software
• Targit
• arcplan
• Board International

Data Warehouse Subject Areas

Data Warehousing, Posts Dec 27, 2010 No Comments

Subject areas within a data warehouse or data mart are physical tables that are grouped together in a dimensional model or star schema that reflect general data or functional categories. Subsequently, subject areas are synonymous with functional areas and each subject area identifies and groups the data that relates to a logical area of the business.

Data Warehouse Subject Areas

Some of the subject areas that are common to most corporations and organizations include …
* Sales
* Product
* Order
* Shipment
* Work Effort
* Invoice
* Accounting
* Human Resources

Gartner Evaluates the “Big Four” BI Megavendors

Business Intelligence, Front Page, Posts Dec 22, 2010 No Comments

Gartner Logo

Although they’re not pure-play business intelligence vendors, IBM, Microsoft, Oracle Corp., and SAP Business Objects own two-thirds of the $6 billion business intelligence (BI) market because they have optimized their BI platforms to work well with their respective enterprise and information management applications, according to Gartner Inc. This integrated approach, as well as the fact that many enterprises already have these vendors’ ERP and information management applications in place, is swaying customers to standardize on one of their BI platforms. (source:

Read more »


BI Maturity Model Level 5 – Data Mining & Predictive Analytics

Business Intelligence, Posts Dec 21, 2010 No Comments

Overview of Data Mining & Predictive Analytics Concept

Sample Data Mining Application

According to, data mining is the process of analyzing large data sets in order to find patterns that can help to isolate key variables to build predictive models for management decision making.  In addition, data mining applications help discover hidden patterns and relationships in data in order to effectively project and predict future results. In order to accomplish this goal, data mining application utilize statistics, algorithms, advanced mathematical techniques, and sophisticated data search capabilities.

Moreover, these sophisticated tools provide answers to questions that may never have been asked and they are effectively able to determine relative amounts of correlation between data elements. Further, the predictive features of these data mining tools enable organizations to exploit useful patterns in data that may have otherwise been difficult to determine.


Characteristics of Data Mining & Predictive Analytics Applications

•  Leverages statistics, advanced techniques, algorithms, and sophisticated data search capabilities
•  Extrapolates past organizational performance
•  Forecasts and predicts future results of the business
•  Provides answers to questions that may never have been asked
•  Calculates levels of correlation between data elements
•  Implementers are experts of statistic analysis and/or processing of large queries

Read more »


BI Maturity Model Level 4 – Online Analytical Processing (OLAP)

Business Intelligence, Posts Dec 21, 2010 No Comments

Level 4 – On-line Analytical Analysis (OLAP) Systems

On-line analytical processing (OLAP) is a technique for rapidly visualizing and analyzing business metrics across different points of view. OLAP is a term used to generically refer to software and applications that provide users with the ability to store and access data in OLAP cubes (also called a ‘multidimensional cube’ or a hypercube) with this “cube” being made up of numeric facts, called measures, and text values, called dimensions.

Sample On-line Analytical Analysis (OLAP) Application

Moreover, OLAP systems provide users with insight into past performance and they enable a deep understanding of the reasons behind why previous events have occurred. Fundamentally, OLAP systems allow users to rapidly view and analyze data from many perspectives or dimensions and allows the users to conduct advanced “What-If” analysis.


Characteristics of OLAP Applications

•  Retrieves data from OLAP cubes rather than database tables
•  Enables advanced insight into past performance
•  Provides accurate and precise “What-If” analysis
•  Queries perform extremely rapidly
•  Primarily used by business areas concerned with financial and resource planning

Read more »


BI Maturity Model Level 3 – Dashboard Management

Business Intelligence, Posts Dec 21, 2010 No Comments

Overview of Dashboard Management

Dashboard management systems are intended to facilitate and support the information and decision-making needs of management by providing easy access to key business information in a highly graphical and intuitive format. Fundamentally, a dashboard is a graphical business tool that displays a set of KPIs (key performance indicators), metrics, and any other relevant information to a business user, manager, or key decision-maker in a single consolidated view and allows for organizational performance to be easily measured and monitored.  Dashboard data is often displayed as aggregate information and contains data that consolidated from multiple data sources scattered throughout an organization. Dashboards are commonly interactive and provide users the ability to drill into particular aspects of the display and/or rapidly switch between views of the data.

Sample Organizational Dashboard

To make this possible, dashboards are typically composed of advanced data visualization tools including charts, grids, gauges, and maps that allow for various ways of presenting information and interacting with the data. Moreover, dashboard management environments are characterized by providing users with graphical methods to view and interact with key data elements and get a snapshot of organizational performance.


Characteristics of Dashboard Management

•  Presents a number of organizational metrics in a single consolidated view
•  Presents data in graphical and intuitive format
•  Utilizes graphics, charts, grids, gauges, and maps
•  Monitors organizational metrics and key performance indicators (KPI)
•  Enables real-time visibility into data
•  Primarily utilized by senior management and key decision-makers
•  Rapidly displays a snapshot of organizational performance

Read more »

Facebook Like Button for Dummies