Basic Architecture of a Data Warehouse (DW) Appliance

By definition, a data warehouse appliance is a complete hardware and software solution that contains a fully integrated stack of processors, memory, storage, operating system, and database management software.  The data warehouse appliance is typically constructed to be optimized for enterprise data warehouses, designed to handle massive amounts of data and queries, and designed to scale and grow over time.

At its core, a data warehouse appliance simplifies the deployment, scaling, and management of the database and storage infrastructure as it provides a self-managing, self-tuning, plug-and-play database management system that can be scaled out in a modular manner. Moreover data warehouse appliances effectively integrate hardware, storage, databases, and operating systems into one unit and allow for accelerated and cost-effective implementations of immense sized analytical systems. Fundamentally, a data warehouse appliance provides a viable solution for the mid-to-large volume data warehouse market and offers cost-effective solutions for multi-terabyte and petabyte data environments.

The origin of the term data warehouse appliance can be traced back to 2002 or 2003 when it was coined by Foster Hinshaw, founder of Netezza and founder and current CEO of Dataupia.

[ad#Google Adsense – Banner, Images]

Fundamentally, a data warehouse appliance is a fully-integrated solution that contains …
• Database Management System (DMBS)
• Server Hardware
• Storage Capabilities
• Operating System (OS)

The purpose for which the data warehouse appliance was invented is to simplify the implementation, scaling, and administration of the database and storage environment. Therefore appliances typically come bundled with a self-managing, self-tuning, plug-and-play database and storage system that can be expanded and scaled in a modular and cost-effective manner. Four basic criteria can be used to describe and compare the ways in which data warehouse appliances are designed to fulfill this purpose:

•  Performance Optimization:  Data warehouse appliances are designed to provide optimal performance on large-block reads, long table scans, complex queries, and other advanced and demanding tasks required in large analytical environments.

•  High-Scalability:  Data warehouse appliances are designed to perform well on large datasets that increase massively over time, and are designed specifically for databases in the multi-terabyte to petabyte range (ten thousand gigabytes and above).

•  High-Availability:  Data warehouse appliances are designed with fail-safe and fault-tolerant architectures, so they are not susceptible to any single point of failure and can be expected to have planned up-time close to 100%.

• Simplified Management:  Data warehouse appliances include intuitive tools and controls to conduct systems administration. Therefore they require minimal knowledge to maintain and have a low learning curve to conduct basic administrative tasks including installation, setup, configuration, tuning, and support.

The primary factor for the platform scalability and large-query optimization of the data warehouse appliance is its massively parallel processing (MPP) architecture. These MPP architectures are comprised of numerous independent processors or servers that all execute in parallel.  Also known as a “shared nothing architecture”, the MPP appliance architecture is characterized by a concept in which every embedded server is self-sufficient and controls its own memory and disk operations.  Further, the MPP architecture effectively distributes data amongst a number of dedicated disk storage units connected to each server in the appliance.  Computations are moved as close to the data as possible and data is logically distributed amongst numerous system nodes.

Almost all large operations in the data warehouse appliance environment, including data loads, query processes, backups, restorations, and indexing are executed completely in parallel. This divide-and-conquer approach allows for extreme high performance and allows for systems to scale linearly as new processors can easily be added into the environment.

Data warehouse appliances have numerous benefits over more traditional and conventional data warehousing systems, including ..
•  Minimal cost-of-entry for data warehouse initiatives
•  Complete data warehouse hardware and software environments
•  Massively parallel processing (MPP) architectures
•  Single hardware and software vendor
•  Tight integration of all data warehouse system components
•  Consolidated system administration of entire environment
•  Scalable for high-capacity and high-performance
•  Built-in high-availability and fault-tolerance
•  Rapid implementation and time-to-value

Share