Archive for "Posts"

Oracle Business Intelligence (BI) Platform

Business Intelligence, Oracle BI, Posts Jun 03, 2011 No Comments
The Oracle Business Intelligence (BI) Platform is a set of functions and tools that incorporate a number of advanced business intelligence products.  In addition to providing a broad set of functions and tools, the Oracle BI Platform is built upon a proven, modern web services-oriented architecture that provides the foundation of the construction of complete business intelligence solutions.  In addition, the platform delivers a full range of capabilities related to business intelligence capabilities including…
• Dashboards
• Ad-hoc queries
• Alerts and notifications
• Operational reports
• Predictive intelligence
• Disconnected analytics
• Microsoft Office integration

Fundamentally, the Oracle BI Platform comes with the tools and functions for developing sophisticated custom business intelligence environments like the one pictured here …

Oracle BI - Dashboards

At its core, the Oracle BI Platform is an innovative next-generation business intelligence environment that offers the software tools for building enterprise BI solutions from data spanning both Oracle and non-Oracle data sources and applications. The Oracle BI Platform has been meticulously designed to allow for pervasive use within an organization.  This sophisticated environment includes a full range of easy-to-use and intuitive end-user tools such as interactive dashboards, advanced reporting and publishing, full ad-hoc analysis over the web, proactive detection and alerts, mobile analytics, Microsoft Office integration, web services and business process integration.

The Oracle BI platform provides transparent and seamless access to numerous heterogeneous data sources.  This highly developed platform also allows for development of custom business intelligence implementations that access data from traditional relational databases, OLTP & ODS systems, data warehouses and data marts, flat files, and web services. Further in 2008, Oracle released the Oracle BI Enterprise Edition Plus Platform, which provides direct connections to Hyperion data sources including Essbase and its related financial performance management applications.

Oracle BI - Platform

The Oracle BI Platform has been developed over several years and includes the most advanced features from the former NQuire and Siebel Analytics products recently acquired by Oracle Corporation.

Read more »

Share

Types of Enterprise Data (Transactional, Analytical, Master)

Master Data Management, Posts Jun 02, 2011 7 Comments
All business enterprises have three varieties of physical data located within their numerous information systems.  These varieties of data are characterized by their data types and their purpose within the organization.
• Transactional Data
• Analytical Data
• Master Data

Data Element Types

Transactional data supports the daily operations of an organization (i.e. describes business events). Analytical data supports decision-making, reporting, query, and analysis (i.e. describes business performance). While master data represents the key business entities upon which transactions are executed and the dimensions around which analysis is conducted (i.e. describes key business entities).

 

Transactional Data

Transactional data are the elements that support the on-going operations of an organization and are included in the application systems that automate key business processes. This can include areas such as sales, service, order management, manufacturing, purchasing, billing, accounts receivable and accounts payable. Commonly, transactional data refers to the data that is created and updated within the operational systems.  Examples of  transactional data included the time, place, price,discount, payment methods, etc. used at the point of sale. Transactional data is normally stored within normalized tables within Online Transaction Processing (OLTP) systems and are designed for integrity.  Rather than being the objects of a transaction such as customer or product, transactional data is the describing data including time and numeric values.

 

Analytical Data

Analytical data are the numerical values, metrics, and measurements that provide business intelligence and support organizational decision making. Typically analytical data is stored in Online Analytical Processing (OLAP) repositories optimized for decision support, such as enterprise data warehouses and department data marts. Analytical data is characterized as being the facts and numerical values in a dimensional model. Normally, the data resides in fact tables surrounded by key dimensions such as customer, product, account, location, and date/time. However, analytical data are defined as the numerical measurements rather than being the describing data.

 

Master Data

Master data is usually considered to play a key role in the core operation of a business. Moreover, master data refers to the key organizational entities that are used by several functional groups and are typically stored in different data systems across an organization.  Additionally, master data represents the business entities around which the organization’s business transactions are executed and the primary elements around which analytics are conducted. Master data is typically persistent, non-transactional data utilized by multiple systems that defines the primary business entities. Master data may include data about customers, products, employees, inventory, suppliers, and sites.

Read more »

Share

Three Steps in ETL Processing

Data Warehousing, Posts May 27, 2011 No Comments
Steps within ETL Processing

Steps within ETL Processing

The term ETL which stands for extraction, transformation, & loading is a batch or scheduled data integration processes that includes extracting data from their operational or external data sources, transforming the data into an appropriate format, and loading the data into a data warehouse repository. ETL enables physical movement of data from source to target data repository. The first step, extraction, is to collect or grab data from from its source(s).  The second step, transformation, is to convert, reformat, cleanse data into format that can be used be the target database.  Finally the last step, loading, is import the transformed data into a target database, data warehouse, or a data mart.

Step 1 – Extraction
The extraction step of an ETL process involves connecting to the source systems, and both selecting and collecting the necessary data needed for analytical processing within the data warehouse or data mart. Usually data is consolidated from numerous, disparate source systems that may store the date in a different format.  Thus the extraction process must convert the data into a format suitable for transformation processing. The complexity of the extraction process may vary and it depends on the type and amount of source data.

Step 2 – Transformation
The transformation step of an ETL process involves execution of a series of rules or functions to the extracted data to convert it to standard format. It includes validation of records and their rejection if they are not acceptable. The amount of manipulation needed for transformation process depends on the data. Good data sources will require little transformation, whereas others may require one or more transformation techniques to to meet the business and technical requirements of the target database or the data warehouse. The most common processes used for transformation are conversion, clearing the duplicates, standardizing, filtering, sorting, translating and looking up or verifying if the data sources are inconsistent.

Step 3 – Loading
The load is the last step of ETL process involves importing extracted and transformed data into a target database or data warehouse. Some load processes physically insert each record as a new row into the table of the target warehouse utilizing a SQL insert statement.  Whereas other load processes include a massive bulk insert of data utilizing a bulk load routine.  The SQL insert is a slower routine for imports of data, but does allow for integrity checking with every record. The bulk load routine may be faster for loads of large amounts of data, but does not allow for integrity check upon load of each individual record.

Read more »

Share

Data Warehouses (EDW vs DataMarts)

Data Warehousing, Posts May 19, 2011 2 Comments
There are two fundamental types of data warehouses:
•  Enterprise Data Warehouses (EDW)
•  Data Marts
Overview of Enterprise Data Warehouse (EDW) and Data Marts

Overview of Enterprise Data Warehouse (EDW) and Data Marts

Enterprise Data Warehouse (EDW): The enterprise data warehouse is typically a large organization-wide database repository that crosses over every business function and includes data from every organizational unit, division, and department.  In essence, an enterprise data warehouse is a substantially large repository of historical and current transaction data of an entire organization. As an implementation of an enterprise data warehouse is commonly strategic in nature, the volume of data in enterprise data warehouses tends to be quite large.  Enterprise data warehouses can contain hundreds of gigabytes, terabytes, and sometimes even petabytes of data.

Data Mart: A data mart is a collection of subject areas organized for decision support based on the needs of a given department or office.  Many times, a data mart will serve as the reporting and analytical solution for a particular department within an organization, such as accounting, sales, customer service, and/or marketing.  For the most part, data marts are designed with just enough data entities, fields, and records to satisfy one department’s requirements.

There are two kinds of data marts that exist — dependent and independent:
• A dependent data mart is one whose source is another data warehouse, and all dependent data marts within an organization are typically fed by the same source — the enterprise data warehouse.
• An independent data mart is one whose source is directly from transactional systems, legacy applications, or external data feeds.

Read more »

Share

Data Federation Technology – Pros and Cons

Data Integration, Posts May 16, 2011 2 Comments
Data Federation
Data federation is a category of data integration technology that provides the ability to query and aggregate data from disparate sources in a virtual database so it can be used by business intelligence, reporting, or analysis applications in real-time.  The virtual database created by data federation technology doesn’t contain the data itself.  Instead, it contains information or metadata about the actual data and its location. The actual data is physically left in place within its source data repository.

Data federation is used to create virtualized and integrated views of data and allows for execution of distributed queries against multiple data sources (relational databases, enterprise applications, data warehouses, documents, XML) at the same time.  Data federation allows for accesses to data without physical movement of data and provides  a layer of abstraction above the physical implementation of data.

Data federation is synonymous with other technologies and commonly referred to as …
•  Data Virtualization
•  Enterprise Information Integration (EII)

Pros of Data Federation

•  Access current and transactional data stored in multiple sources
•  Does not require movement of data (No ETL)
•  Only requested data returned
•  Real-time data access
•  Quicker development time – supports incremental development
•  Reduces data storage and data transfer

Cons of Data Federation

•  Still queries against original data sources
•  Only contains as much data as source system contains
•  If data is archived off source, data is no longer available in federation tool
•  Query performance is not as good as a data warehouse
•  High system performance transferred to an application server from a database server

Read more »

Share

General Data Appliance / Complimentary Database Engine – Datatupia

Data Warehouse Appliance, Posts May 10, 2011 No Comments
Data Warehouse Appliance - Complimentary Vendor
A completely unique data warehouse appliance is provided by Datatupia.  Rather than providing a solution that provides an organization a complete data warehouse solution, Datatupia’s solution compliments existing data warehouse deployments within an organization.

Dataupia’s flagship product, Satori Server

Datatupia is a data warehouse appliance in the sense that it combines software and hardware components on a rack-mounted server hardware.  Additionally, Satori Server includes an embedded copy of the Linux operating system, a custom database engine, an aggregation engine, built-in storage, and parallel processors.  But in contrast to full data warehouse appliances that are purpose-built for data warehousing, Satori Server interoperates with existing database management systems rather than requiring their replacement.

The main feature of Satori Server that makes it unique amongst all data warehouse appliances is “omniversal transparency“.  This feature shields applications and users from changes to the underlying technology and database platform by complementing Oracle, MS SQL Server, or DB2 databases and their dependent applications. Unlike other solutions that displace existing technologies, the Satori Server’s innovative design extends and enhances an existing information eco-system.  Further, Datatupia’s Satori Server is usually added to an existing data warehouse infrastructure by augmenting an organization’s existing data management systems as a query accelerator or capacity extender.

Read more »

Share

Partial Data Warehouse Appliances (Software Appliances)

Data Warehouse Appliance, Posts May 01, 2011 1 Comment

Starting in 2006, a new wave of vendors emerged that focus on database management systems (DBMS) purpose-built for data warehousing that easily integrate with another vendor’s hardware. The products from these “partial-solution” data warehouse appliance vendors have been coined as software appliances”. The software appliance has proved to be a good starting point for new data warehouse vendors. It allows them to focus on their strength, database software, and to partner with commodity hardware vendors and benefit from the hardware firms’ focus and resources.

Data Warehouse Appliance - Partial Solution Vendors

The partial solution vendor is typically a software company that has developed a proprietary database and either partners with or is a subsidiary of one or more major hardware vendors. These software vendors generally focus on the development of an optimal database management software that is suitable for large data warehouse implementations, and commonly rely on other vendors to supply the most-appropriate hardware.

 

Software appliance vendors with products based upon the traditional row-based database model:

Kognitio – Offers a row-based “virtual” database called WX2 that does not include indexes or data partitions and runs on blade servers from IBM and hardware from Hewlett-Packard.  Kognitio can be deployed to a customer in one of three ways:  as a software-only license solution, as a fully configured data warehouse appliance, and on-demand via a Data Warehousing as a Service (DaaS) offering.

 

Software appliance vendors with products based upon the columnar database model in which data is stored in columns rather than rows:

ParAccel – The ParAccel Analytic Database is a proprietary column-based database that combines patent-pending innovations with a columnar, compressed, massively parallel, all-in-memory-capable architecture.  The database is available as a packaged appliance on Sun hardware known as Ultra-Fast Data Warehouse powered by Sun and ParAccel.
Vertica – The Vertica Analytic Database is a grid-based, column-oriented database that runs on grids of commodity Linux-based servers. It features column-oriented storage organization, compression, hybrid storage organization, and a shared nothing architecture.  In March 2011, Hewlett Packard (HP) completed an acquisition of Vertica Systems, Inc.
EXASOL – Offers a proprietary, massively parallel database management system called EXASolution. The product is column-based and makes extensive use of in-memory processing. Join indexes are automatically generated at run-time, as required.  EXASolution is available either as a software only solution or as a pre-configured appliance running on commodity hardware such as Fujitsu Siemens, HP, Supermicro or Hitachi. EXASolution uses its own version of Linux which provides extended capabilities to support its clustered architecture.
Sybase IQ – Distinguishes itself from conventional databases with its column-oriented, grid-based architecture, patented data compression, and advanced query optimizer.  Sybase IQ provides significantly faster query performance than traditional database environments which were designed primarily for processing transactions. The improvement over such solutions can be 10 to 1000 times greater, even as thousands of users and years worth of data are added to the environment.  Announced in May 2008, Sybase Analytical Appliance combines pSeries hardware from IBM with the Sybase IQ database.

 

Software appliance vendors with products based upon a hybrid row and column database:

EMC Greenplum – Acquired in July 2010 by EMC, Greenplum has become the foundation of EMC’s Data Computing Division. Moreover, the EMC Greenplum database contains a shared-nothing, massively parallel processing (MPP) architecture that has been purpose-built for business intelligence and analytical processing.
Aster Data – Provides a database management system (DBMS) called Aster Data nCluster, which is a massively parallel (MPP) hybrid row and column database with an integrated analytics engine.  Aster Data’s solution utilizes Aster Data’s patent-pending SQL-MapReduce to allow for parallel processing of data and applications and to deliver rich, analytic insights.  Aster Data products can be deployed to a customer in one of three ways:  Software-only, Cloud-based, and Complete-appliance called Aster Data MapReduce DW Appliance.  In April 2011, Teradata completed its acquisition of Aster Data Systems, Inc.

Read more »

Share

Complete Data Warehouse Appliance Solutions

Data Warehouse Appliance, Posts Apr 25, 2011 No Comments
Basic Architecture of a Data Warehouse (DW) Appliance
Complete data warehouse appliances are purpose-built data warehouse solutions and systems that encompass a whole-technology stack including:
• Operating System (OS)
• Database Management System (DBMS)
• Server Hardware
• Storage Capabilities

Initially DW appliances were created with proprietary custom-built hardware and storage units. Netezza, Teradata, DATAllegro, & White Cross (now Kognito) were the first vendors to provide solutions in this manner. Subsequently data warehouse appliances evolved and started to utilize lower-cost, industry-standard non-proprietary hardware components.The movement from proprietary to commodity hardware has proven to bring down the cost of the data warehouse appliance as the commodity hardware can integrated at a lower cost of both developing and integrating proprietary hardware. Examples of commodity hardware typically include general-purpose servers from Dell, Hewlett Packard (HP), or IBM utilizing Intel processors and popular network and storage hardware from either Cisco, EMC, or Sun.

Data Warehouse Appliance - Complete Solution VendorsIntroduced 2002, Netezza was the first vendor to offer a complete data warehouse appliance, so early definitions of appliance were based upon Netezza products.  Subsequently, Netezza Performance Server still provides all of the software components of a data warehouse appliance, including the database, operating system, servers, and storage units.  However in 2009, Netezza replaced its own proprietary hardware with IBM blade servers and storage units. Further in 2010, IBM completed a corporate acquisition of Netezza.

Similar to Netezza, DATAllegro was launched in 2005 with a complete solution involving proprietary hardware. Soon after DATAllegro replaced its own proprietary hardware with commodity server from Dell and storage units from EMC. In 3008, Microsoft acquired DATAllegro in 2008 and announced it will integrate DATAllegro’s massive parallel processing (MPP) architecture into its own MS SQL Server platform, which also runs on commonly-available hardware.

Additionally, both Kognitio and Teradata replaced their proprietary hardware within their appliances in a process similar to that of DATAllegro. Kognitio now offers a row-based, in-memory database database called WX2 that does not include indexes or data partitions and runs on blade servers from IBM and Hewlett-Packard. Teradata provides a proprietary database, a variety of common operating systems (Linux, Unix, and Windows), and a proprietary networking subsystem packaged along with commodity processors and storage units.

Announced at the 2008 Oracle OpenWorld conference in San Francisco, the Oracle Exadata Database Machine is a complete package of database software, operating system, servers, and storage. The product was initially assembled in collaboration between Oracle Corporation and Hewlett Packard where Oracle developed the database, operating system and storage software, while HP constructed the hardware. However, with Oracle’s acquisition of Sun Microsystems, Oracle announced the release of Exadata Version two with improved performance and usage of Sun Microsystems storage and operating systems technologies.

At the Sapphire conference in May, 2010 in Orlando, SAP announced the release of its new data warehouse appliance called HANA or High-Performance Analytic Appliance. SAP HANA is a combination of hardware, storage, operating system, management software, and in-memory data query engine that is characterized by data being held in RAM rather than being read from disks or flash storage.

Finally IBM bundles and integrates its own Infosphere Warehouse database software (formerly “DB2 Warehouse”) with its own servers and storage to deliver the IBM Infosphere Balanced Warehouse.

Read more »

Share

Data Warehouse Appliance: Oracle Exadata

Data Warehouse Appliance, Posts Apr 18, 2011 No Comments

Announced by CEO Larry Ellison at the 2008 Oracle OpenWorld conference in San Francisco, Oracle Exadata Database Machine is a complete database appliance with support for both transactional (OLTP) and analytical (OLAP) database systems. Delivered as a complete package of database software, operating system, servers, and storage, the Oracle Exadata Database Machine is simple and fast to implement and ready for large-scale business applications.

The product was initially assembled in collaboration between Oracle Corporation and Hewlett Packard (HP) where Oracle developed the database, operating system and storage software, while HP constructed the hardware.  However, with Oracle’s acquisition of Sun Microsystems, Oracle announced the release of Exadata Version two with improved performance and usage of Sun Microsystems storage and operating systems technologies. The main idea of Exadata is to make the storage database aware and push processing of queries down to the disks for optimal scanning and performance. Subsequently an Exadata machine can scan 1 TB of data in about 3.5 seconds by scanning several (or all) disks in parallel with Oracle’s Parallel Query technology.

Oracle Exadata Database Machine

Oracle Exadata Database Machine

Currently the Oracle Exadata Database Machine provides a solution for all types of database systems, ranging from scan-intensive data warehouse applications to highly concurrent transactional applications.  With its bundled combination of storage, database software, operating system, and standard hardware components from Sun, the Oracle Exadata Database Machine provides extreme performance within a highly-available, highly-secure environment. Additionally Oracle’s unique clustering and workload management capabilities position the Oracle Exadata Database Machine to be well-suited for consolidating multiple databases onto a single and centralized environment.

Facts and Benefits of Oracle Exadata Database Machine

•  Accelerates data warehouse query performance by at least a factor of 10x.
•  Runs more queries concurrently for faster access to business-critical information.
•  Scales to 10x more concurrent users.
•  Provides a trusted highly-available and cost-effective platform.
•  Replaces and consolidates isolated special-purpose databases into one platform.
•  Allows for massive parallel processing of data with a high-bandwidth.
•  Easily expands with the connection of multiple units.
•  Includes combination of Oracle Exadata Storage server, Oracle database software, Sun Solaris operating system (OS), and the latest industry standard hardware components from Sun.

Read more »

Share

Data Warehouse Appliance: SAP HANA

Data Warehouse Appliance, Posts Apr 13, 2011 No Comments

At the Sapphire conference in May, 2010, SAP announced the release of its new data warehouse appliance called HANA or High-Performance Analytic Appliance. SAP HANA is a combination of hardware, storage, operating system, management software, and in-memory data query engine that is characterized by data being held in RAM rather than being read from disks or flash storage. Additionally, HANA has been built to split up queries to run in parallel on multiple processors—a fundamentally different architecture from SAP’s existing applications.  This in-memory and parallel processing architecture of HANA allows for extremely fast performance of queries and analytics on very large amounts of data.

HANA Blade ServerThe SAP HANA solution has been introduced on Hewlett Packard x86 servers (HP ProLiant DL580 G7 and DL980 G7 servers) and is built upon Intel’s multi-core servers. Moreover, a single server blade can contain up to 2TB of main memory (4TB coming soon) and up to 64 processor cores.  With this total solution, SAP claims that they beat the current performance benchmark by factor of 20, on hardware that was several dozens of times cheaper for a 200X price performance improvement. SAP also claims that HANA either reduces or out-right eliminates the need for the development and deployments of complex and expensive datamarts.

SAP intends HANA systems to be well-integrated with its own enterprise resource planning (ERP) systems, allowing for transactional data in SAP ERP systems to be analyzed in real time.  However, HANA is not dependent solely on SAP ERP systems as a data source.  Moreover,  HANA is data source “agnostic”  which means most common data sources and database can be integrated with it.

According to an SAP document, the HANA platform includes a modeling environment that is simple enough for business users to work with.  Additional, HANA supports client interfaces currently include Microsoft Excel and SAP’s Business Objects business intelligence software.

References: SAP’s Transformation: A Work-In-Progress – Part One (ChainLink Research)SAP Launches HANA for In-memory Analytics (PC World)

Read more »

Share
Facebook Like Button for Dummies