Search This Blog

Loading...

Tuesday, December 9, 2014

[DI] ETL versus ELT

When does ETL win?

  • Ordered transformations not well suited to set processing.
  • Integration of third party software tools best managed by Informatica outside of the RDBMS (e.g., name and address standardization utilities).
  • Maximize in-memory execution for multiple step transformations that do not require access to large volumes of historical or lookup data (note: caching plays a role).
  • Streaming data loads using message-based feeds with "real-time" data acquisition.

When does ELT win?

  • Leverage of high performance DW platform for execution reduces capacity requirements on ETL servers - this is especially useful when peak requirements for data integration are in a different window than peak requirements for data warehouse analytics.
  • Significantly reduce data retrieval overhead for transformations that require access to historical data or large cardinality lookup data already in the data warehouse.
  • Batch or mini-batch loads with reasonably large data sets, especially with pre-existing indices that may be leveraged for processing.
  • Optimize performance for large scale operations that are well suited for set operations such as complex joins and large cardinality aggregations.

[DI] A Taxonomy of Data Integration Techniques

 
 There are three main approaches:


1.ETL Approach: (1) Extract from the source systems, (2) Transform inside the Informatica engine on integration engine servers, and (3) Load into target tables in the data warehouse.

2.ELT Approach: (1) Extract from the source systems, (2) Load into staging tables inside the data warehouse RDBMS servers, and (3) Transform inside the RDBMS engine using generated SQL with a final insert into the target tables in the data warehouse.

3.Hybrid ETLT Approach: (1) Extract from the source systems, (2) Transform inside the Informatica engine on integration engine servers, (3) Load into staging tables in the data warehouse, and (4) apply further Transformations inside the RDBMS engine using generated SQL with a final insert into the target tables in the data warehouse.


Tuesday, February 19, 2013

[Informatica] Partitioning option license required to run sessions with user-defined partition points

Partitioning option license required to run sessions with user-defined partition points.


Quote: TM_6281: Partitioning option license required to run sessions with user-defined partition points.

When I de-select the "Sorted Input" property, it runs fine but it spends needless time caching my entire table to file. Does anyone know why this might be happening and/or a possible resolution? " " To run a PowerCenter session with a sorted Aggregator transformation set the server parameter AggSupprtWithNoPartLic to "Yes" as follows:

Unix Using a text editor open the PowerCenter server configuration (pmserver.cfg) file.
Add the following entry to the end of the file:
AggSupprtWithNoPartLic = Yes
Re-start the PowerCenter server (pmserver).

Windows To set this parameter on Windows refer to article 11486
(article 11486) To add a PowerCenter Server parameter (that is not in the server configuration dialog) on Windows enter it in the registry as follows:

Click Start, click Run, type regedit, click OK. Go to the following registry key: HKEY_LOCAL_MACHINE\ SYSTEM\CurrentControlSet\Services\PowerMart\Paramet ers\Configuration
On the Edit menu, point to New, and then click String Value. Enter the String Value "SERVER_PARAMETER". Go to Edit > Modify. Enter "VALUE" and then click OK Re-start the PowerCenter server (Informatica Service). Where SERVER_PARAMETER is the name of the parameter and VALUE is the setting of the parameter."

Saturday, November 3, 2012

[SQL Server] Code to Enable TDE on a Database

-- The master key must be in the master database.

USE Master;

GO

-- Create the master key.

CREATE MASTER KEY ENCRYPTION

BY PASSWORD='MyStrongPassword';

GO

-- Create a certificate.

CREATE CERTIFICATE MySQLCert

WITH SUBJECT='MyDatabase DEK';

GO

-- Use the database to enable TDE.

USE MyDatabase

GO

-- Associate the certificate to MyDatabase.

CREATE DATABASE ENCRYPTION KEY

WITH ALGORITHM = AES_128

ENCRYPTION BY SERVER CERTIFICATE MySQLCert;

GO

-- Encrypt the database.

ALTER DATABASE MyDatabase

SET ENCRYPTION ON;

GO

Saturday, August 25, 2012

[T-SQL] Database backup procedure - advanced

I have got this script from one of the blogs:

if exists (select * from sysobjects where id = object_id(N'[dbo].[spBackupAllDatabases]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)
drop procedure [dbo].[spBackupAllDatabases]
go

Create Procedure spBackupAllDatabases
@Path varchar(128) ,
@Type varchar(4) -- Full / Log
as
/*
Backup file format
_Full_yyyymmdd_hhmmss.bak
_Log_yyyymmdd_hhmmss.bak

exec spBackupAllDatabases 'c:\SQLBackups\', 'Full'
*/
/*
drop table DatabaseBackup

Create table DatabaseBackup
(
Name varchar(128) primary key nonclustered ,
BackupFlagFull varchar(1) not null check (BackupFlagFull in ('Y','N')) ,
BackupFlagLog varchar(1) not null check (BackupFlagLog in ('Y','N')) ,
RetentionPeriodFull datetime not null ,
RetentionPeriodLog datetime not null
)
*/
set nocount on
declare @sql varchar(1000)

-- Get all database names
create table #DBName
(
ID int identity (1,1) ,
Name varchar(128) not null ,
RetentionPeriod datetime null
)

insert #DBName
(Name)
select name
from master..sysdatabases

-- Include any new databases in the backup
insert DatabaseBackup
(
Name ,
BackupFlagFull ,
BackupFlagLog ,
RetentionPeriodFull ,
RetentionPeriodLog
)
select #DBName.Name ,
'Y' ,
'N' ,
'7 jan 1900' , -- default one week
'1 jan 1900'
from #DBName
left outer join DatabaseBackup
on DatabaseBackup.Name = #DBName.Name
where DatabaseBackup.Name is null
and lower(#DBName.Name) <> 'tempdb'

-- Remove any non-existant databases
delete DatabaseBackup
where not exists
(
select *
from #DBName
where #DBName.Name = DatabaseBackup.Name
)

delete #DBName

create table #ExistingBackups
(
Name varchar(128) ,
ID int identity (1,1)
)

-- loop through databases
declare @Name varchar(128) ,
@RetentionPeriod datetime ,
@LastBackupToKeep varchar(8) ,
@ID int ,
@MaxID int

insert #DBName
(Name, RetentionPeriod)
select Name, case when @Type = 'Full' then RetentionPeriodFull else RetentionPeriodLog end
from DatabaseBackup
where (@Type = 'Full' and BackupFlagFull = 'Y')
or (@Type = 'Log' and BackupFlagLog = 'Y')

select @MaxID = max(ID) ,
@ID = 0
from #DBName

while @ID < @MaxID begin -- get next database to backup select @ID = min(ID) from #DBName where ID > @ID

select @Name = Name ,
@RetentionPeriod = RetentionPeriod
from #DBName
where ID = @ID

-- Delete old backups
delete #ExistingBackups
select @sql = 'dir /B ' + @Path
select @sql = @sql + @Name + '_' + @Type + '*.*'

insert #ExistingBackups exec master..xp_cmdshell @sql

if exists (select * from #ExistingBackups where Name like '%File Not Found%')
delete #ExistingBackups

select @LastBackupToKeep = convert(varchar(8),getdate() - @RetentionPeriod,112)
delete #ExistingBackups where Name > @Name + '_' + @Type + '_' + @LastBackupToKeep

declare @eID int ,
@eMaxID int ,
@eName varchar(128)

-- loop round all the out of date backups
select @eID = 0 ,
@eMaxID = coalesce(max(ID), 0)
from #ExistingBackups

while @eID < @eMaxID begin select @eID = min(ID) from #ExistingBackups where ID > @eID
select @eName = Name from #ExistingBackups where ID = @eID

select @sql = 'del ' + @Path + @eName
exec master..xp_cmdshell @sql
end
delete #ExistingBackups

-- now do the backup
select @sql = @Path + @Name + '_' + @Type + '_'
+ convert(varchar(8),getdate(),112) + '_'
+ replace(convert(varchar(8),getdate(),108),':','') + '.bak'
if @Type = 'Full'
backup database @Name
to disk = @sql
else
backup log @Name
to disk = @sql
end

go

[Informatica] Tutorial Part 1

Here is the link if Informatica Tutorial Part 1 for beginners:

http://www.youtube.com/watch?v=ufH_n5exxQw

[T-SQL] Shrink Log File


use DatabaseName

BACKUP LOG DatabaseName With TRUNCATE_ONLY
GO
DBCC SHRINKFILE (Database_Log_FileName,300)
GO