Muhammad Abbas - Data Engineering Architect: September 2010

Thursday, September 30, 2010

Differnces between a RDBMS schema and a data warehouse schema

RDBMS Schema
* Main Purpose: Record
* Used for OLTP systems
* Highly Normalized
* Difficult to understand and navigate
* Difficult to extract and solve complex problems

DWH Schema
* Main Purpose: Respond
* Used for OLAP systems
* De-normalized
* Easy to understand and navigate
* Relatively easier in extracting the data and solving complex problems

Differences between OLTP and OLAP

Main Differences between OLTP and OLAP are:-
1. User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals.
OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis).
2. Data Contents
OLTP: manages current data, very detail-oriented.
OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process.
3. Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented database design.
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.
4. View
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores

Monday, September 27, 2010

Difference between Database and Data Warehouse

I was just searching on net for basic difference between the two of my core expertise and one of the best extract I found is pasted below:

So how is a data warehouse different from you regular database? After all, both are databases, and both have some tables containing data. If you look deeper, you'd find that both have indexes, keys, views, and the regular jing-bang. So is that 'Data warehouse' really different from the tables in you application? And if the two aren't really different, maybe you can just run your queries and reports directly from your application databases!

Well, to be fair, that may be just what you are doing right now, running some EOD (end-of-day) reports as complex SQL queries and shipping them off to those who need them. And this scheme might just be serving you fine right now. Nothing wrong with that if it works for you.

But before you start patting yourself on the back for having avoided a data warehouse altogether, do spend a moment to understand the differences, and to appreciate the pros and cons of either approach.

The primary difference betwen you application database and a data warehouse is that while the former is designed (and optimized) to record , the latter has to be designed (and optimized) to respond to analysis questions that are critical for your business.

Application databases are OLTP (On-Line Transaction Processing) systems where every transaction has to be recorded, and super-fast at that. Consider the scenario where a bank ATM has disbursed cash to a customer but was unable to record this event in the bank records. If this started happening frequently, the bank wouldn't stay in business for too long. So the banking system is designed to make sure that every trasaction gets recorded within the time you stand before the ATM machine. This system is write-optimized, and you shouldn't crib if your analysis query (read operation) takes a lot of time on such a system.

A Data Warehouse (DW) on the other end, is a database (yes, you are right, it's a database) that is designed for facilitating querying and analysis. Often designed as OLAP (On-Line Analytical Processing) systems, these databases contain read-only data that can be queried and analysed far more efficiently as compared to your regular OLTP application databases. In this sense an OLAP system is designed to be read-optimized.

Separation from your application database also ensures that your business intelligence solution is scalable (your bank and ATMs don't go down just because the CFO asked for a report), better documented and managed (god help the novice who is given the application database diagrams and asked to locate the needle of data in the proverbial haystack of table proliferation), and can answer questions far more efficietly and frequently.

Creation of a DW leads to a direct increase in quality of analyses as the table structures are simpler (you keep only the needed information in simpler tables), standardized (well-documented table structures), and often denormalized (to reduce the linkages between tables and the corresponding complexity of queries). A DW drastically reduces the 'cost-per-analysis' and thus permits more analysis per FTE. Having a well-designed DW is the foundation successful BI/Analytics initiatives are built upon.

Sunday, September 26, 2010

[T-SQL] How to Stop Growing Log File Too Big

In continuation of my last post "http://muhammadabbas.blogspot.com/2010/09/t-sql-truncate-log-file.html" , i am writing this article.

Today i was reading an article and this article highlighted that how could we stop the log file to grow. Here is the text:

Two different solutions for it:

1) Convert the Recovery Model to Simple Recovery

If you are truncating the transaction logs, this means you are breaking the T-Log LSN (Log Sequence Numbers). This follows that if disaster comes, you would not be able to restore your T-Logs and there would be no option for you to do point in time recovery. If you are fine with this situation and there is nothing to worry, I suggest that you change your recovery model to Simple Recovery Model. This way, you will not have extra ordinary growth of your log file.

2) Start Taking Transaction Log Backup

If your business does not support loss of data or requires having point in time recovery, you cannot afford anything less than Full Recovery Model. In Full Recovery Model, your transaction log will grow until you take a backup of it. You need to take the T-Log Backup at a regular interval. This way, your log would not grow beyond some limits. If you are taking an hourly T-Log backup, your T-Log would grow until one hour but after this the T-Log backup would truncate all the ‘committed’ transactions once you take it. Doing this would lead the size of the T-Log not to go down much, but it would rather be marked as empty for the next hour’s T-Log to populate.

With this method, you can restore your database at Point of Time if a disaster ever happens at your server.

Let us run an example to demonstrate this. In this case, I have done the following steps:

1.Create Sample Database in FULL RECOVERY Model
2.Take Full Backup (full backup is must for taking subsequent backup)
3.Repeat Following Operation
1.Take Log Backup
2.Insert Some rows
3.Check the size of Log File
4.Clean Up
After a short while, you will notice that the Log file (ldf) will stop increasing but the size of the backup will increase.

If you have an issue with your log file growth, I suggest that you follow either of the above solutions instead of truncating it.

/* FULL Recovery and Log File Growth */
USE [master]
GO
-- Create Database SimpleTran
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'SimpleTran')
BEGIN
ALTER DATABASE [SimpleTran] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE [SimpleTran]
END
GO
CREATE DATABASE [SimpleTran]
GO
-- Set Database backup model to FULL
ALTER DATABASE [SimpleTran] SET RECOVERY FULL
GO
BACKUP DATABASE [SimpleTran] TO DISK = N'D:\SimpleTran.bak' WITH NOFORMAT, NOINIT, NAME = N'SimpleTran-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
-- Check Database Log File Size
SELECT DB_NAME(database_id) AS DatabaseName,
Name AS Logical_Name,
Physical_Name, (size*8)/1024 SizeMB
FROM sys.master_files
WHERE DB_NAME(database_id) = 'SimpleTran'
GO
-- Create Table in Database with Transaction
USE SimpleTran
GO
IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[RealTempTable]') AND TYPE IN (N'U'))
DROP TABLE [dbo].[RealTempTable]
GO
CREATE TABLE RealTempTable (ID INT)
INSERT INTO RealTempTable (ID)
SELECT TOP 50000 ROW_NUMBER() OVER (ORDER BY a.name) RowID
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Check the size of the Database
SELECT DB_NAME(database_id) AS DatabaseName,
Name AS Logical_Name,
Physical_Name, (size*8)/1024 SizeMB
FROM sys.master_files
WHERE DB_NAME(database_id) = 'SimpleTran'
GO
-- Take Full Backup
BACKUP LOG [SimpleTran] TO DISK = N'D:\SimpleTran.bak' WITH NOFORMAT, NOINIT, NAME = N'SimpleTran-Transaction Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
-- Run following transaction multiple times and check the size of T-Log
INSERT INTO RealTempTable (ID)
SELECT TOP 50000 ROW_NUMBER() OVER (ORDER BY a.name) RowID
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Check the size of the Database
SELECT DB_NAME(database_id) AS DatabaseName,
Name AS Logical_Name,
Physical_Name, (size*8)/1024 SizeMB
FROM sys.master_files
WHERE DB_NAME(database_id) = 'SimpleTran'
GO
/*
Now run following code multiple times.
You will notice that it will not increase the size of .ldf file but will for sure
increasethe size of the log backup.
*/
-- Second Time
-- START
BACKUP LOG [SimpleTran] TO DISK = N'D:\SimpleTran.log' WITH NOFORMAT, NOINIT, NAME = N'SimpleTran-Transaction Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
-- Run following transaction and check the size of T-Log
INSERT INTO RealTempTable (ID)
SELECT TOP 50000 ROW_NUMBER() OVER (ORDER BY a.name) RowID
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Check the size of the Database
SELECT DB_NAME(database_id) AS DatabaseName,
Name AS Logical_Name,
Physical_Name, (size*8)/1024 SizeMB
FROM sys.master_files
WHERE DB_NAME(database_id) = 'SimpleTran'
GO
-- END
--Clean Up
USE MASTER
GO
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'SimpleTran')
BEGIN
ALTER DATABASE [SimpleTran] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE [SimpleTran]
END

[T-SQL] Truncate Log File

Few days back , i was called by my friend who was working on community database with a problem that log file of his db was growing bigger and he was worried about it. So i went there and saw that its not a big issue and told him that i would prepare a script and give him so that he could run that once in a month:

USE TEST;
GO
-- Truncate the log by changing the database recovery model to SIMPLE.
ALTER DATABASE TEST
SET RECOVERY SIMPLE;
GO
-- Shrink the truncated log file to 1 MB.
DBCC SHRINKFILE (TEST_log, 1);
GO
-- Reset the database recovery model.
ALTER DATABASE TEST
SET RECOVERY FULL;
GO

Thursday, September 23, 2010

[T-SQL] Generate Delete script for all tables

You can use this script for generate delete script or even trncate

select 'delete from ' + o.name + ';'
from sysobjects o
where o.type = 'U';

Wednesday, September 22, 2010

[T-SQL] Delete all data in database (when you have FKs)

SET NOCOUNT ON
GO

SELECT 'USE [' + db_name() +']';

SELECT 'ALTER TABLE ' +
'[' + s.name + '].[' + t.name + ']' +
' DROP CONSTRAINT [' + f.name +']'
FROM sys.foreign_keys f
INNER JOIN sys.TABLES t ON f.parent_object_id=t.object_id
INNER JOIN sys.schemas s ON t.schema_id=s.schema_id
WHERE t.is_ms_shipped=0;

SELECT 'TRUNCATE TABLE ' + '[' + s.name + '].[' + t.name + ']'
FROM sys.TABLES t
INNER JOIN sys.schemas s ON t.schema_id=s.schema_id
WHERE t.is_ms_shipped=0;

SELECT 'ALTER TABLE ' +
'[' + s.name + '].[' + t.name + ']' +
' ADD CONSTRAINT [' + f.name + ']' +
' FOREIGN KEY (' +
Stuff( (SELECT ', ['+col_name(fk.parent_object_id, fk.parent_column_id) +']'
FROM sys.foreign_key_columns fk
WHERE constraint_object_id = f.object_id
ORDER BY constraint_column_id
FOR XML Path('')
), 1,2,'') + ')' +
' REFERENCES [' +
object_schema_name(f.referenced_object_id)+'].['+object_name(f.referenced_object_id) + '] (' +
Stuff((SELECT ', ['+col_name(fc.referenced_object_id, fc.referenced_column_id)+']'
FROM sys.foreign_key_columns fc
WHERE constraint_object_id = f.object_id
ORDER BY constraint_column_id
FOR XML Path('')),
1,2,'') +
')' +
' ON DELETE ' + REPLACE(f.delete_referential_action_desc, '_', ' ') +
' ON UPDATE ' + REPLACE(f.update_referential_action_desc , '_', ' ') collate database_default
FROM sys.foreign_keys f
INNER JOIN sys.TABLES t ON f.parent_object_id=t.object_id
INNER JOIN sys.schemas s ON t.schema_id=s.schema_id
WHERE t.is_ms_shipped=0;

Monday, September 20, 2010

[T-SQL] Row Count in Tables

Below is the query trough which you can get row counts from all the tables:

SELECT sc.name +'.'+ ta.name TableName
,SUM(pa.rows) RowCnt
FROM sys.tables ta
INNER JOIN sys.partitions pa
ON pa.OBJECT_ID = ta.OBJECT_ID
INNER JOIN sys.schemas sc
ON ta.schema_id = sc.schema_id
WHERE ta.is_ms_shipped = 0 AND pa.index_id IN (1,0)
GROUP BY sc.name,ta.name
ORDER BY SUM(pa.rows) DESC

[T-SQL] Find Auto Created Stats and Delete query for them

Query for getting them:

SELECT name, OBJECT_NAME(OBJECT_ID)
FROM sys.stats
WHERE auto_created = 1;

Delete query for one of the table stats:

DROP STATISTICS TableName.StatisticsName

Thursday, September 16, 2010

[Personal] Blessed by a baby boy

On 23rd Ramazan , i was blessed by a baby boy. Too cuteeeee.

His grand father has kept his name "Ali Imran".

[T-SQL] Transport Level Error

This is the error which i encountered now a days very much:

A transport-level error has occurred when sending the request to the
server.

Here is the reason for it:

You used to have an open connection to a server but you don’t any more. It was closed or killed from the server side. Maybe the server rebooted (or failed-over to another cluster node). You’re talking into one tin can, but nobody’s on the tin can on the other end of the string.

Annoying, but not a problem… open a new connection and try again. If you see this error message from inside SQL Server Management Studio, just hit F5 again (or Alt-X or Ctrl-E depending on preference).

[T-SQL] Very Nice Article Regarding T-SQL

A very nice article regarding Dont's of T-SQL:

http://michaeljswart.com/?p=823

Muhammad Abbas - Data Engineering Architect

Pages

Search This Blog