- Data Warehouses (DW) are read-only, integrated databases designed to provide insight into past organizational performance and project future results. Further, they can be defined as highly-optimized data repositories that include data sourced from numerous transactional, legacy, external systems, applications, data feeds, and data files that exist throughout an organization.
In practice, the data warehouse is designed, organized, and optimized for retrieval and analysis of data while providing managers, executives, and other decision makers with a single and complete view of the truth. Moreover, the data warehouse is a data repository completely designed for decision-support, analytical-reporting, managing performance, and the conversion of raw data into information. The data warehouse is constructed with a focus on data retrieval and the data warehouse enables complex queries to be performed without any impact on the systems that support the business’ primary transactions (i.e. transactional, operational, and mainframe applications). In contrast to its source databases, the data warehouse is a collection of data designed to support management decision making and contains a wide variety of data that business personnel can use to gain a better understanding of business conditions.
Fundamentally, a data warehouse is unique database in which data is collected for analysis and to support management decision making. A data warehouse is defined by its purpose and it comprises data that is subject-oriented, integrated, time-variant, and non volatile. This means the data warehouse is focused on a business concept like sales, orders, and/or fulfillments, as opposed to a specific business process.
There are two fundamental types of data warehouses that exist:
• Enterprise Data Warehouses (EDW)
• Data Marts
Enterprise Data Warehouse (EDW): The enterprise data warehouse is a centralized data store which provides critical analytical data for an entire organization. The enterprise data warehouse is typically a large organization-wide database repository that crosses over every business function and will include data from every organizational unit, division, and department. The enterprise data warehouse represents a truly corporate effort and it affects and benefits multiple organizational units and departments. Typically the implementations of an enterprise data warehouse is initiated and driven by a centrally coordinated organization, such as the information technology (IT) department, and can have lengthy implementation schedules (usually 2-5 years). The organizational and business impact of an enterprise data warehouse tends to be quite substantial as multiple organizational units, departments, and divisions are affected and and subsequently can establish total 360 degree perspective. In essence, an enterprise data warehouse is a substantially large repository of historical and current transaction data of an entire organization.
As an implementation of an enterprise data warehouse is commonly strategic in nature, the volume of data in enterprise data warehouses tends to be quite large. Enterprise data warehouses can contain hundreds of gigabytes, terabytes, and sometimes even petabytes of data.
Data Mart: A data mart is a collection of subject areas organized for decision support based on the needs of a given department or office. Each department within a larger organization may have its own requirements for a data mart and each department’s data mart will be unique to and specific to its own needs. Many times, a data mart will serve as the reporting and analytical solution for a particular department within an organization, such as accounting, sales, customer service, and/or marketing.
The data mart contains only a modicum of historical information and is granular only to the point that it suits the needs of the department. The data mart is typically housed in multi-dimensional technology which is great for flexibility of analysis but is not optimal for large amounts of data.
Typically designed for more tactical and quick-strike purposes, data mart implementations are commonly focused on solving a particular business issue or meeting an individual department’s needs. The data mart also usually contains data feeds from a small number of source systems, includes a manageable volume of data, and is exemplified by a rapid development and deployment schedule (usually 3-6 months). For the most part, data marts are designed with just enough data entities, fields, and records to satisfy one department’s requirements.
There are two kinds of data marts that exist — dependent and independent:
• A dependent data mart is one whose source is another data warehouse, and all dependent data marts within an organization are typically fed by the same source — the enterprise data warehouse.
• An independent data mart is one whose source is directly from transactional systems, legacy applications, or external data feeds.