Muhammad Abbas - Data Engineering Architect

Sunday, October 25, 2015

[informatica] Get stats/statistics mantained by Informatica

So below are few queries which would help you to get the statistics which is saved by Informatica into its repository and the one which is available in Informatica monitor to view:

1. From the below query, you can take up workflow information. Also workflow run id is important, if you need to see more detailed info from other tables:

select* from dbo.OPB_WFLOW_RUN

where WORKFLOW_NAME='wf_'

order by workflow_run_id;

2. From the below query, you can take up session information, like session start time and other details, but it wont show you the row counts that moved.

select *

from dbo.REP_TASK_INST_RUN

where subject_area='INTG_'

and WORKFLOW_NAME='wf_'

and workflow_run_id=2574343;

3. From the below query, you can take up other session details, like how much rows were moved etc.

select * from
OPB_SESS_TASK_LOG where workflow_run_id=2574343;

Wednesday, October 7, 2015

[scripting] Alternate for touch command in Windows for Unix

I was working on a requirement where I need to process Informatica workflow from a source file. But the issue was that file might exist and in some case it might get missing. So was searching for replacement of touch command in unix and here is the code:

$PMSourceFileDir\$$SOURCE_FILENAME --- is your file path

type nul >>$PMSourceFileDir\$$SOURCE_FILENAME & copy $PMSourceFileDir\$$SOURCE_FILENAME +,,

Tuesday, February 24, 2015

[Mongodb] M-102 Final Exam Answer

Final 1
-------
Answer: 9

Final 2
-------
Answer: 2
Mongo preserves the order of writes in a collection in its consistency model. In this problem, 27003's oplog was effectively a "fork" and to preserve write ordering a rollback was necessary during 27003's recovery phase.

Final 3
-------
3 documents
> bsondump test.foo.2013-03-11T10-25-48.0.bson
{ "_id" : 7 }
{ "_id" : 8 }
{ "_id" : 9 }
3 objects found

Final 4
--------
z:PRIMARY> cfg = rs.conf()
z:PRIMARY> cfg.members[2].votes = 0
z:PRIMARY> cfg
{
        "_id" : "z",
        "version" : 1,
        "members" : [
                {
                        "_id" : 1,
                        "host" : "localhost:27001"
                },
                {
                        "_id" : 2,
                        "host" : "localhost:27002"
                },
                {
                        "_id" : 3,
                        "host" : "localhost:27003",
                        "votes" : 0
                }
        ]
}
z:PRIMARY> rs.reconfig(cfg)
{ "ok" : 1 }

~% mongo --shell a.js --port 27003
z:SECONDARY> part4()
233
Answer: 233
Final 5
-------
z:PRIMARY> a = {_id : 1,
author : 'joe',
title : 'Too big to fail',
text : 'Sample text',
tags : [ 'business', 'finance' ],
when : ISODate("2008-11-03"),
views : 23002,
votes : 4,
voters : ['joe', 'jane', 'bob', 'somesh'],
comments : [
    { commenter : 'allan',
      comment : 'Well, i dont think so…',
      flagged:false, plus:2 },
    { commenter : 'chuck',
      comment : 'where is',
      flagged:true, plus:5 },
    { commenter : 'norris',
      comment : 'my leg',
      flagged:true, plus:77777777 },

]
}

z:PRIMARY> db.postings.insert(a)
One way to assure people vote at most once per posting is to use this form of update:
db.postings.update(
{ _id:… , voters:{$ne:'joe'} },
{ $inc : {votes:1}, $push : {voters:'joe'} } );
Answer: db.postings.find( { "comments.flagged" : true } )
        db.postings.update(
                   { _id:… , voters:{$ne:'joe'} },
                   { $inc : {votes:1}, $push : {voters:'joe'} } );

Final 6
-------
MongoDB supports atomic operations on individual documents.
MongoDB has a data type for Dates and DateTime data.
Answer: 2 & 4
Final 7
-------
MongoDB supports reads from slaves/secondaries that are in remote locations.
Answer: 2

Final 8
-------
mkdir ./data
mkdir ./data/configdb
mongod --configsvr --dbpath ./data/configdb --port 27019
mongorestore ./config_server --host 127.0.0.1:27019

~%/Documents/Mongo/dba/final.8$
~%/Documents/Mongo/dba/final.8$ mongo localhost:27019/config
MongoDB shell version: 2.2.2
connecting to: localhost:27018/config
configsvr>
configsvr> db
config
configsvr> db.chunks.find().sort({_id:1}).next().lastmodEpoch.toString().substr(20,4)
6554
configsvr> ^C
bye
~%/Documents/Mongo/dba/final.8$
> configsvr> db.chunks.find().sort({_id:1}).next().lastmodEpoch.getTimestamp().toString().substr(19,5)
> 07:07
Answer: 07:07

Final 9
-------
mongorestore --oplogReplay s1 --host SPHINX --port 27017
mongorestore --oplogReplay s2 --host SPHINX --port 27017
-- Update shards
>db.shards.update({"_id":"s1"},{$set : {"host":"localhost:27501"}})
>db.shards.update({"_id":"s2"},{$set : {"host":"localhost:27601"}})
configsvr> db.shards.find()
{ "_id" : "s1", "host" : "localhost:27501" }
{ "_id" : "s2", "host" : "localhost:27601" }

-- Run des deux shards
-- FAUX mongod --shardsvr --port 27501 --fork --logpath s1.log --replSet s1
-- FAUX mongod --shardsvr --port 27601 --fork --logpath s2.log --replSet s2
-- Pas de RS à préciser
>mongod --shardsvr --port 27501 --fork --logpath s1.log
> (remove /data/db/mongo.lock si besoin)
>mongod --shardsvr --port 27601 --fork --logpath s2.log
mongos> use snps
switched to db snps
mongos> db.elegans.aggregate([{$match:{N2:"T"}},{$group:{_id:"$N2",n:{$sum:1}}}]).result[0].n
47664

Final 10
--------
mongos> db.elegans.ensureIndex({"N2":1,"mutant":1})
removed == 1 + _files.size()
mongos>
Answer: 2 shards are queried.
        10 documents are scanned.

Friday, February 13, 2015

[Mongodb] M-102 Week 6 Assignments

---- 6.1
-- Stats sur la collections initiale
mongos> db.trades.stats()
{
"sharded" : false,
"primary" : "shard0000",
"ns" : "week6.trades",
"count" : 1000001,
"size" : 244000240,
"avgObjSize" : 243.999996000004,
"storageSize" : 315006976,
"numExtents" : 15,
"nindexes" : 1,
"lastExtentSize" : 84426752,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 32458720,
"indexSizes" : {
"_id_" : 32458720
},
"ok" : 1
}
-- On retrouve le même nombre de document dans le shard car il est unique
mongos> homework.a()
1000001

---- 6.2
> 3

---- 6.3
> 2 (shards bien presents)

[Mongodb] M-102 Week 5 Assignments

Homework: Homework 5.1

Set up a replica set that includes an arbiter.

To demonstrate that you have done this, what is the text in the "state" field for the arbiter when you run rs.status()?

Homework: Homework 5.2

You have a replica set with two members, having decided that a total of two copies of your data is sufficient.

You are concerned about robustness, however.

Which of the following are options that will allow you to ensure that failover can occur if the first server goes down?

Check all that apply.

[yes] Add an arbiter.

Increase the number of votes for the first replica set member from one to two.

Increase the priority of the first server from one to two.

[yes] Add another data bearing node.

Homework: Homework 5.3

At the beginning of the section, "W Timeout and Capacity Planning", Dwight mentioned something that hasn't yet been touched upon in this class: Batch Inserts.

He mentions that you can find information on them in the docs, but he hasn't used them in the shell during this course.

Your job is to look this functionality up in the documentation, and use it for the following problem:

Please insert into the m102 collection, the following 3 documents, using only one shell query (please use double quotes around each key in each document in your answer):

{ "a" : 1 }
{ "b" : 2 }
{ "c" : 3 }

Hints: This is not the same as the Bulk() operations, which were discussed earlier in this course. Also, this does not involve semicolons, so don't put any into the text box. You probably want to test this on your machine, and then copy and paste the insert query in the box below. Do not include the > sign from the beginning of the shell.

db.m102.insert([ { "a" : 1 }, { "b" : 2 }, { "c" : 3 } ])

Homework: Homework 5.4

You would like to create a replica set that is robust to data center failure.

You only have two data centers available. Which arrangement(s) of servers will allow you to be stay up (as in, still able to elect a primary) in the event of a failure of either data center (but not both at once)? Check all that apply.

All 3 servers in data center 1.

2servers in data center 1, one server in data center 2.

[yes] None of the above.

Homework: Homework 5.5

Consider the following scenario: You have a two member replica set, a primary, and a secondary.

The data center with the primary goes down, and is expected to remain down for the foreseeable future. Your secondary is now the only copy of your data, and it is not accepting writes. You want to reconfigure your replica set config to exclude the primary, and allow your secondary to be elected, but you run into trouble. Find out the optional parameter that you'll need, and input it into the box below for your rs.reconfigure(new_cfg, OPTIONAL PARAMETER).

Use "double quotes" around the key(s) in your document.

Hint: You may want to use this documentation page to solve this problem.

{"force" : true}

[Mongodb] M-102 Week 4 Assignments

# starting as a standalone server for problem 1:
mongod --dbpath 1 --port 27001 --smallfiles --oplogSize 50
mongo --port 27001 --shell week4.js
> homework.init()
> homework.a()

result : 5001

rs.init()
use week4
> db.foo.find()
> homework.b()

answer: 5002

mongo --port 27002 week4.js --shell
> homework.c()

result : 5

rs.help()
> rs.stepDown(300)
cfg = rs.conf()
cfg.members.shift()
rs.reconfig(cfg)
homework.d()

answer: 6

> // expect to get back false as we are not primary
> db.isMaster().ismaster
false
> db.oplog.rs.find()
> db.oplog.rs.stats()
db.oplog.rs.find().sort({$natural:1}).limit(1).next().o.msg[0]

answer: R

[Mongodb] M-102 Week 3 Assignments

-- 3.1

db.sensor_readings.find( {   tstamp : {      $gte : ISODate("2012-08-01"),      $lte : ISODate("2012-09-01")    },   active : true } ).limit(3).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 3,
"nscannedObjects" : 2696,
"nscanned" : 2696,
"nscannedObjectsAllPlans" : 2696,
"nscannedAllPlans" : 2696,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 6,
"indexBounds" : {

},
"server" : "cmo3233:27017"
}

> db.sensor_readings.ensureIndex({tstamp:1, active:1})
> db.sensor_readings.find( {   tstamp : {      $gte : ISODate("2012-08-01"),      $lte : ISODate("2012-09-01")    },   active : true } ).limit(3).explain()
{
"cursor" : "BtreeCursor tstamp_1_active_1",
"isMultiKey" : false,
"n" : 3,
"nscannedObjects" : 3,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 7,
"nscannedAllPlans" : 9,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
  "tstamp" : [
   [
    ISODate("2012-08-01T00:00:00Z"),
    ISODate("2012-09-01T00:00:00Z")
   ]
  ],
  "active" : [
   [
    true,
    true
   ]
  ]
},
"server" : "cmo3233:27017"
}
============
> homework.a()
That's good - you can submit the answer below (just submit the number). however there
is an index you could
create that will get nscanned down to 3 for the query of interest. see if you can
do that or watch the answer video for details on that...
6
============
=> Second index de test
> db.sensor_readings.ensureIndex({active:1, str:1, tstamp:1, x:1})
---- 3.2 (currentOp / killOp)
> homework.c()
> 12
-- 3.3
> db.runCommand({compact : 'sensor_readings'})
> homework.d()
21