Pages

Search This Blog

Monday, September 27, 2010

Difference between Database and Data Warehouse

I was just searching on net for basic difference between the two of my core expertise and one of the best extract I found is pasted below:

So how is a data warehouse different from you regular database? After all, both are databases, and both have some tables containing data. If you look deeper, you'd find that both have indexes, keys, views, and the regular jing-bang. So is that 'Data warehouse' really different from the tables in you application? And if the two aren't really different, maybe you can just run your queries and reports directly from your application databases!

Well, to be fair, that may be just what you are doing right now, running some EOD (end-of-day) reports as complex SQL queries and shipping them off to those who need them. And this scheme might just be serving you fine right now. Nothing wrong with that if it works for you.

But before you start patting yourself on the back for having avoided a data warehouse altogether, do spend a moment to understand the differences, and to appreciate the pros and cons of either approach.

The primary difference betwen you application database and a data warehouse is that while the former is designed (and optimized) to record , the latter has to be designed (and optimized) to respond to analysis questions that are critical for your business.

Application databases are OLTP (On-Line Transaction Processing) systems where every transaction has to be recorded, and super-fast at that. Consider the scenario where a bank ATM has disbursed cash to a customer but was unable to record this event in the bank records. If this started happening frequently, the bank wouldn't stay in business for too long. So the banking system is designed to make sure that every trasaction gets recorded within the time you stand before the ATM machine. This system is write-optimized, and you shouldn't crib if your analysis query (read operation) takes a lot of time on such a system.

A Data Warehouse (DW) on the other end, is a database (yes, you are right, it's a database) that is designed for facilitating querying and analysis. Often designed as OLAP (On-Line Analytical Processing) systems, these databases contain read-only data that can be queried and analysed far more efficiently as compared to your regular OLTP application databases. In this sense an OLAP system is designed to be read-optimized.

Separation from your application database also ensures that your business intelligence solution is scalable (your bank and ATMs don't go down just because the CFO asked for a report), better documented and managed (god help the novice who is given the application database diagrams and asked to locate the needle of data in the proverbial haystack of table proliferation), and can answer questions far more efficietly and frequently.

Creation of a DW leads to a direct increase in quality of analyses as the table structures are simpler (you keep only the needed information in simpler tables), standardized (well-documented table structures), and often denormalized (to reduce the linkages between tables and the corresponding complexity of queries). A DW drastically reduces the 'cost-per-analysis' and thus permits more analysis per FTE. Having a well-designed DW is the foundation successful BI/Analytics initiatives are built upon.

No comments: