What is a Data Warehouse?
A data warehouse is an Integrated, Subject Oriented, Time Variant and Non-Volatile repository which provides support to the organisation/business for Decision making.
Integrated: Consistent naming conventions, format, encoding structures; from multiple data sources.
Subject Oriented: Data is arranged and optimised to answer business, questions specific to their interest. This means, data is organised and summarized by topic or SUBJECT AREAs (e.g., Sales, Marketing, and Finance etc.)
Time Variant: Data warehouse represents the behavioural changes of the data over a time period (e.g. 5-10 years of history data) which is significantly more compared to the operational systems, which in almost all the cases holds current data.
Non-Volatile: Data warehouse doesn’t require transaction processing, recovery and concurrency control mechanisms. As such it requires only two operations for data access: Data Load and Data Access. It is read-only and periodically refreshed.
What is Data Warehousing?
- Data Warehousing is nothing but the process of building and using the data warehouse.
- It is the science of storing data for the purpose of meaningful future analysis.
- It is the process of transforming the data into information.
What is Business Intelligence?
- This is the science of presenting historical data in a meaningful way (often by using different data visualisation techniques).
- Raw data stored in databases turns into valuable information through the implementation of Business Intelligence process.
- This generally involves the use of different analysis and reporting tools.
Characteristics of Data Warehouse
The key characteristics of the data warehouse includes the following:
- Some data is de-normalised for simplification and to improve performance.
- Large amounts of historical data are used.
- Queries often retrieve large amounts of data.
- Both planned and ad-hoc queries are common.
- The data load is controlled.
In general, high data throughput is the key to a successful data warehouse.
Operational Systems (OLTP)
- Operational Systems are designed to support operational monitoring.
- Operational systems generally hold the current data and the inventory data is generally purged after 3 months.
- Online Transaction Processing (OLTP) Systems handles the operational daily transaction. These systems handles the core business of an organisation.
- Examples:
o Application Processing Systems
o Account Transaction Systems
- OLTP Systems were built to automate the business transactions.
- Historical data is typically not needed or retained.
- Following are the key characteristics of OLTP Systems:
o Supports large number of short real time transactions (INSERT, UPDATE and DELETES) and hence result in large number of small size tables
o Faster query processing
o Concurrent operations by a large number of users who use the database in their daily work for regular business operations
o Contains detailed and current data
o Schema used to store transactional databases is the entity model (usually 3NF)
Analytical Systems (OLAP)
- Informational system with current and historical data
- Historical data is used for Analytical Purposes
- Primarily supports Decision making
- Data required for decision making is collected from the operational systems and moved to the Data Warehouse.
- Online Analytical Processing (OLAP) refers to workloads where large amounts of historical data are processed to generate reports and perform data analysis.
- OLAP Databases are fed from OLTP databases.
- OLAP database store a large volume of the same transactional data as an OLTP database, but this data has been transformed through an ETL process to enable best performance for easy report generation and analytics.
- Following are the key characteristics of the OLAP systems:
o Relatively low volume of transactions
o Queries are very complex and often involve aggregations
o Aggregated and Historical data stored in multidimensional schemas (usually STAR Schemas)
o Response time is an effectiveness measure. (Quick read response time is desired)
Comparing OLTP and OLAP
Differentiating Factor
|
OLTP/Operational Systems
|
OLAP/Data Warehouse
|
Business Need and Usage
|
Critical for controlling and running the fundamental business task of an organization.
|
Needed to generate reports, run analytics useful for decision making in a multiple decision support environment.
|
Source Of Data
|
Operational Data. These are the original source of data and are the first data entry point for any organisation.
|
Consolidation data. Data comes from various OLTP systems.
|
Nature of Data Stored
|
Data represents the current snapshot of the ongoing business processes.
|
Data represents the business trends of the organisation. Multidimensional view of the various kind of business activities.
|
Database Tuning
|
Database is tuned for extremely fast Insert, Update and Deletes
|
Tuned only for quick reads.
|
Data Lifespan
|
Deals with data of short lifespan.
|
Deals with data f very large lifespan (historic)
|
Data Size
|
Data is of small size stored in large number of tables.
|
Data size is huge stored in comparatively less number of tables.
|
Database Design
|
Data is highly normalized. (Usually 3NF)
|
Data is somewhat de-normalized with fewer tables. (2NF/Star/Snowflake Schema)
|
Data Backup and Recovery
|
Backup religiously as this is very critical to run the business.
|
Generally don’t require high availability and data may be archived in external storage such as tapes and hence involves reloading of the OLTP data as a recovery method.
|
Examples
|
Banking Applications, Online Reservation systems, e-commerce systems etc.
|
Reporting for Sales, Marketing, Management Reporting and Financial Forecasting
|
Advantages and Dis-Advantages of Data Warehouse
Advantages:
- Clean, Consistent Data
- Integrated data from multiple sources and hence providing a centralised repository of data
- Performing new types of analysis
- Reduces the costs of accessing historical data
- Standardised data across the organisation
- Improving turnaround time for analysis and reporting
- Support ad hoc reporting and enquiry
- Removes informational processing load from transaction oriented database
- Shared data for easy accessibility
Disadvantages:
- Major schema transformation from each of the data sources to one schema in the data warehouse can represent more than 50% of the total data warehouse effort.
- Long initial implementation time and associated high cost
- Adding new data sources takes time and associated costs
- Typically data is static and dated
- Difficult to accommodate changes in the data type and ranges, data source schema, indexes and queries
- Typically can’t actively monitor changes in the data.
Database vs. Data Warehouse
Feature
|
Database
|
Data Warehouse
|
Characteristics
|
Operational Processing (OLTP)
|
Analytical Processing (OLAP)
|
Data
|
Current data
|
Historical Data
|
Database Design
|
Highly Normalized
|
Somewhat de-normalized
|
Database modelling
|
E-R Modelling
|
Dimensional Modelling
|
Unit Of Work
|
Short and Simple Transaction
|
Complex queries
|
Focus
|
“Data IN”
|
“Information OUT”
|
Orientation
|
Transaction
|
Analysis
|
Summarization
|
Highly Detailed
|
Summarized and Consolidated form
|
View
|
Flat Relational
|
Multidimensional
|
Access
|
Read/Write
|
Read
|
Number of Records Accessed
|
Small. Tens of Records
|
High, Millions of Records
|
Number Of Users
|
High. In thousands
|
Small. Hundreds
|
DB Size
|
100 MB to GB
|
100GB to TB
|
Priority
|
High performance and High Availability
|
High Flexibility
|
Efficiency Metric
|
Transaction throughput
|
Query throughput and response time
|
ReplyDeleteThank you for sharing wonderful information with us to get some idea about that content.
ETL Testing
ETL Testing Course
ETL Testing Online
ETL Testing Training
Online ETL Testing Training
ETL Testing Online Training
ETL Course
ETL Testing Certification
ETL Testing Training Hyderabad
ETL Testing Online Course
ETL Testing Course near me
ETL Testing Course in Hyderabad
This concept is a good way to enhance the knowledge.thanks for sharing..
ReplyDeleteData Science
Python
Selenium
ETL Testing
AWS
Nice blog
ReplyDeleteGood blog thanks for sharing
ReplyDeletemicrostrategy course
microstrategy training
SAP ABAP training
ReplyDeleteSAP ABAP online training in hyderabad
Thanks for sharing this. Here are a few links for ETL QA engineers,
ReplyDeleteETL Tester Jobs
ETL Test Engineer Jobs
Search QA Jobs