Muhammad Abbas - Data Engineering Architect: Teradata

Showing posts with label Teradata. Show all posts

Saturday, January 10, 2015

[Teradata] Temporal table example

I wanted to show you an example of the TEMPORAL feature in Teradata. Its a pretty neat feature.

Basically, it will allow us to keep a history of changes where required. For example, the FULL REFRESH dimensions, will change (based on the customers feedback) to a CDC (change data capture). This is normally a pretty complex process.

With the TEMPORAL feature, coupled with UPSERTS (MERGE), it becomes quite simple.

First - TEMPORAL. Basically Teradata under the covers of a table will manage a "PERIOD" data type - instead of a process having to manage the classic START_DT and END_DT logic. In fact, you dont even need to know about the column.

Also - when a user does a select * from the table, they will only get the CURRENT rows returned.

Please see this example to help.

Example FULL REFRESH Table that will move to this process:

CREATE MULTISET TABLE EDW_STAGE.tempo_test_auto_purchase_pack_size ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      auto_purchase_pack_size_id INTEGER,
      duration_mins INTEGER,
      pack_size INTEGER,
      display_order INTEGER,
      is_active CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      is_default CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      created_dt TIMESTAMP(6),
      created_by_user_id VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      last_update_dt TIMESTAMP(6),
      last_updated_by_user_id VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      is_deleted CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      source_sys_id SMALLINT,
      company_brand_cd VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
       duration PERIOD(TIMESTAMP(6) WITH TIME ZONE) NOT NULL
         AS TRANSACTIONTIME
      
      )
 PRIMARY INDEX ( auto_purchase_pack_size_id );

NOTE the column called duration - it will NOT have values inserted...Teradata manages this.

I will do an initial insert of 3 columns and those look like this when we select them:

select * from tempo_test_auto_purchase_pack_size;

Note The 2nd row (ID of 3) where the column duration_mins is 40.

We issue an update (or merge) to this table:

update tempo_test_auto_purchase_pack_size set duration_mins = 95 where auto_purchase_pack_size_id = 3;

Now look at the select * from tempo_test_auto_purchase_pack_size;

Note that the new value is the LATEST row....BUT if we want to see history:

  TRANSACTIONTIME AS OF TIMESTAMP '2015-01-10 11:24:10' select * from tempo_test_auto_purchase_pack_size;

And finally - so see them all:

 NONSEQUENCED TRANSACTIONTIME select * from tempo_test_auto_purchase_pack_size;

Note that with this query you see the duration column for this temporal feature.

Hope this helps.

Tuesday, December 9, 2014

[DI] ETL versus ELT

When does ETL win?

Ordered transformations not well suited to set processing.
Integration of third party software tools best managed by Informatica outside of the RDBMS (e.g., name and address standardization utilities).
Maximize in-memory execution for multiple step transformations that do not require access to large volumes of historical or lookup data (note: caching plays a role).
Streaming data loads using message-based feeds with "real-time" data acquisition.

When does ELT win?

Leverage of high performance DW platform for execution reduces capacity requirements on ETL servers - this is especially useful when peak requirements for data integration are in a different window than peak requirements for data warehouse analytics.
Significantly reduce data retrieval overhead for transformations that require access to historical data or large cardinality lookup data already in the data warehouse.
Batch or mini-batch loads with reasonably large data sets, especially with pre-existing indices that may be leveraged for processing.
Optimize performance for large scale operations that are well suited for set operations such as complex joins and large cardinality aggregations.

[DI] A Taxonomy of Data Integration Techniques

There are three main approaches:

1.ETL Approach: (1) Extract from the source systems, (2) Transform inside the Informatica engine on integration engine servers, and (3) Load into target tables in the data warehouse.

2.ELT Approach: (1) Extract from the source systems, (2) Load into staging tables inside the data warehouse RDBMS servers, and (3) Transform inside the RDBMS engine using generated SQL with a final insert into the target tables in the data warehouse.

3.Hybrid ETLT Approach: (1) Extract from the source systems, (2) Transform inside the Informatica engine on integration engine servers, (3) Load into staging tables in the data warehouse, and (4) apply further Transformations inside the RDBMS engine using generated SQL with a final insert into the target tables in the data warehouse.

Muhammad Abbas - Data Engineering Architect

Pages

Search This Blog

Saturday, January 10, 2015

[Teradata] Temporal table example

Tuesday, December 9, 2014

[DI] ETL versus ELT

[DI] A Taxonomy of Data Integration Techniques