Sunday, January 20, 2013

Udemy Jenkins Bootcamp: Fully Automate Builds Through Deployment

1- firstly we installed GIT to use it.
some important instruction in GIT
- you need to set some global variables to git like your username and email
 git config --global ""
- and you can check your global variables:
git config --global --list

2- we installed jenkins in windows from jenkins installer, you have a jenkins service in windows to stop and start jenkins

3- jenkins comes with alot of plugin, however you should update these plugins after installation

from the dashboard go to manage jenkins, then manage plugins, from here select all the plugins and press update, restart jenkins when you finish.

4- you can download plugin ( like a plugin that change the sucess build ball color from blue to green ) from the first page (manage jenkins)

5- you can delete a build however you should not

6- from the dashboard choose Manage Jenkins then Configure System, this is an important page where you set jenkins configuration.
you should provide jenkins with the JDK path and the Maven path, you add them in this page.

7- another important git command
git clone
which we use it to clone the project to local disk

8- in order to use git and github in Jenkiens you should add them
go to Manage Jenikns then manage plugins, add "git plugin" and "github plugin"

now you can find 2 section one for git and another for github in ManageJenkins->configure System

Create a Project with Maven and github
first you create a new job and you select Maven project

then on the next page you set the github project url

the project URL is the url in the browser:

then ( on the same page ) you set the "Source Code Management" to git

set Credentials to none ( as this is a public repository ), the Repository URL can be taken from here

after you run the project you will notice that you have Modules section to the lift of the project, Jenkins understand the Modules in POM.xml
and in the build section you should specify the pom.xml

you can set also the Build Trigger

Build periodically: you set a time like build every 5 minutes
Poll SCM: you can poll the source control every 5 minutes for example to check if there is any change, if yes create a build
Build when a change is pushed to GitHub.


you can create a new view in Jenkins, if you have many project in the home page, you can create a view with some project

simply press the + button

Jenkins and Testing
1- ensure that JUnit plugin is installed
2- install checkStyle plugin
3- install FindBugs Plugin: it gives statistics about bugs in java code
4- install PMD Plugin: it gives information about java code, like unused variables.

now go to job configuration
you will find this

the advanced option will give you more things like fail the build based on the number of errors and so on, like fail if there are more than 90 style errors.


Publish Artifacts
what we will do here is to build our Maven project so other process can use it as a dependency,

1- download "Maven Repository Server" plugin and install it
2- go to the job configuration you will find this section:

now when you build the project and you open the build you will find "Build Artifacts As Maven Repository"

when you open it you will find information about the maven project and you willl find the SnapShot


Jenkins & Deploy to  TOMCAT

1- after you install TOMCAT, go to conf/tomcat-users and add the following user
  <user username="jenkins" password="jenkins" roles="manager-script"/>
notice the role is manager-script
restart tomcat
now to check if the user is working go to    
and use the jenkins user name and password

you should be able to login and see a list of running applications

2- download the "Deploy to container Plugin" in jenkins

3- now go to the job configuration and change the "post-build actions" to "deploye war/ear to container"

fill the information, **/*.war means any war file in the directory, we also field the user information.

now when you build the job it will deploy it to tomcat.

Jenkins Security
when we install Jenkins, it will be installed with security enabled is off, which means anybody with the link can access jenkins.
to enable security
1- go to Manage Jenkins then Configure Global Security,
2- check Enable security checkbox
3- set "jenkins owns user database"
4- set "logged-in users can do anything"

you can also create users and roles, using the option Matrix-based security or Project-based Matrix Authorization Strategy.

however to enable users and roles in a nicer way, download this plugin:Role-based Authorization Strategy
now if you go to Configure Global Security, you will find a new option which is  Role-Based Strategy

check this option,
now if you go to Configure Jenkins, you will find a new option "Manage and Assign Roles" here you can create roles and assign them to users.

Also you can create users from Configure Jenkins, Manage Users.

Thursday, January 3, 2013

OREILLY Learning MongoDb

Introduction to MongoDB

1- Documents are represented in JSON, and are saved as BSON binary format

Some terms in MongoDB are:
1- Document: inside documents we have FIELDS
2- Collection: list of documents with similar structure
3- a document may REFERENCE another document in another collection (i.e. creating relationships)
4- Embeded Document: you can embed document inside another so there is no relations
5- Cursur: when you run a query, it returns a cursur, we iterate over this cursor to read the results.

Some Of MongoDB Features

here we created an index on the field name, the value one means ascending order.
as you can see in the created index we have a list of names with a direct link to the location

you can have an index on multiple fields

as you can see we are sorting cost descending and name ascending

OTHER TYPES OF INDEXES: Geospetial, Hashed, Text, Multikeys

this is the structure of Mongo DB, the data is Sharded among servers or (MONGOS). config server gives metadata information about the lcoation of the data.

if you want to request lets say one element that exists in shard B:

the system will go to shard B directly to fetch the element a:123

if you want to read from multiple shards:

each shard will return some data, and then the data will be returned to client

you have some control over where to read data from, for example you can set that you always want to read from the primary server, or secondary or nearease

as you can see we have in the middle "MongoDB Shard Routing Service" called Mongos which talks to the configuration server to get information about the shards location

when we do the write we should allocate a space, the allocation happen based on what we call "Chunks" chunks are seperated based on KeySpace, for example chunk1 holds keys from 1 to 10, chunk2 holds keys from 11 to 20 ...

all write operations are accepted by primary, primary maintain an operation log "OP LOG" which contains all the write operations. the Secondary shards will take a copy of this LOG in order to replicate the primary.

when you write, we have something called write concern, we have 5 write concerns levels

1- Error Ignored
2- Unacknowledged : no acknowledged to the app (very fast)
3- acknowledged: the write will be acknowledged to the app (the default)
4- Journaled: this is similar to transaction, MongoDB will achnolwedge after writing the information to the JOURNAL LOG (not operation log). the journal log will not be affected if there was a hard shut down (Durability).
5- Replica Acknowledge: the strongest, all replica should acknowledge the write

MongoDB Aggregation
1- pipeline aggregation: here you write some queries where the output of one is the input to the next.
2- Map/Reduce: here you write Map function, Reduce function and the optional finalize function
3- Single purpose aggregation: it is a query that does one thing like count

Create, Read, Update and Delete Operations

1- when you want to search and find something

2- you have the follwoing operators for compression: $gt, $gte, $in, $lt, $lte, $ne, $nin

3- you have the following logical operators, $and, $not, $or, $nor

4- you have the following to check an element: $exists, $type. 
which means check if the element b exists.

5- for evaluation you have $mod, $regex, $where

6- when you want to search docuemnts which have an array field, you can use the array operators, $elemMatch, $size
db.customers.find({result:{ $elemMatch: { $gte: 80, $lt: 85 } } })

in mongodb we have the term Projection which means the fields that you want to include in the results
for example
which means show id not name.

we have some projection operators 

1- return first match element  $

2- $elemMatch
db.customers.find({query}, {$elemMatch:{field:value1,field2:{$gt:value}}}

3- $slice

4- other operators like $min, $max, $orderBy, $explain

Optimize Database
1- use indexes, to create an index 
1 = ascending, -1= descending 

2- use limit() to limit the returned results.

3- use projection to limit the number of returned fields

4- use explain() to check the query plan and analyze the performance

5- use hint() to force MongoDB to use an index.


Some Examples:

1- find()

as you can see
1- selectDB() to get the DB from the mongo instance
2- selectCollection() to get the collection
3- find() to find the query

2- Using projection

as you can see we define projection as an array, we also use it in find()

3- using limit()

as you can see we use limit()

4- find only one document

you see that we are using findOne()
you can use limit(1)

5- Sort()

we use sort()
as you can see the sort is by surname 1 which is ascending and -1 descending 

you can sort by multiple fields
$sort = array('surname' => 1 , 'xxx' => -1);

6- grouping using aggregate()

as you can see you group by country then you match (like having in MySql, which means filter the groups) then you sort.

and then we use aggregate with the three arrays $group, $match, $sort

Adding Information: Database, Collection, Document
basically you create multiple databases in single MongoDB instance, in the db you create collection, and we add document

we have save() and insert() for update/create. With insert() you can also specify the write concern 

you also have batchInsert() to insert multiple documents 

as you can see before we were using selectDB() and selectCollection() 

The _ID field
for each document in MongoDB we have _ID field, the value of this field is generated by MongoDB, the value is a mix of machineID time stamp and other values
you can override this value, so when you insert a new document you can give _ID a value (e.g. 1 2 3 ...) however you might do duplication which leads to an exception.

dont use this field, create your own ID field


as you can see you have a query that returns some documents, and you can update using $update array, here we are decreasing the balance by 50

you can also do upsert(), which means if it doesnt exist then insert

you can remove documents by using remove(), you can use an option which is justOne in case multiple documents match your criteria

Data Modeling 

One-To-One relations
document with large number of fields, split a large document to 2 documents with one-to-one relation

you will handle the relation by your self, which means you will define 2 collections, and you should manage the ids between these collections. we call this the manual approach

you can also define one collection and embed one document inside another,

there is another way which we call it DBREF, it means that you define 2 collections but you link them through a dbref, which is 3 values $ref,$id and $db.
so what you are doing is representing a document in a different collection in a good way.

One-To-Many Relations
you can model this using 3 collections

you can also have only 2 collections, Customers and Products, we embed Purchase inside customer

You can also use dbref, which means define define only Cusomers & Products, inside Customers put an array of Purchase products, this array is dbref.

Tree Structure
when we talk about tree, we are talking about something similar to

you can represent this in MongoDB using 5 ways

MongoDB Database Management

the MongoDB authentication is off by default you should enable it in mongodb.conf

we have 5 privileges: read, readWrite, dbAdmin, userAdmin and clusterAdmin (build Cluster)

-to add an overall admin user, when you start MongoDb and if there is no users, you can access it from the shell without authentication.
go to the DB you want
use admin
then create the user

you can also add user

to login you use

to get a list of users in a db

MongoDb Replication
- we have primary which accepts read & write
- we have secondary, which stores a back up copy from the binary
- if the primary is down, there will be a voting to elect new primary from the secondaries.
- we use Arbiter, which is a mongodb instance that doesnt hold data, it is used only during the election of new primary in case we have even number of secondary.
- when you do a read, you can specify the read mode

if you have a collection with alot of documents, a single mongo instance cannot handle all documents, thats why we shard the collection on multiple instances.

or , when you are on a production environment you shard and put in replica set

the sharding cluster contains the following

as you can see we have a config server which contains all sharding information, and you communicate with multiple Routing Service (mongos) to bring the data.

data is written in Chnuks 

based on a shardkey the chunk will be determined, as you can see above we are deviding based on sequential key

another way is through a hash function

the shardkey is chosen from some fields in documents.

based on the shard key MongoDb will determine the chunk where the document will be saved.

your job is to provide a shard key where data will be saved in chunks evenly (i.e. we dont want to have chunks with data more than other chunks)

that is why your key should be random, and highly distributed.

when you choose a shardkey you should consider 3 things

Cardinality: which refers to the uniqueness of the key, for example lets say we have AddressBook Document, if you want to use STATE as a shard key, we only have 50 states in US. This field has low cardinality, all douments with the same state will be stored in the same chunk EVEN IF THE CHUNK SIZE EXCEED THE MAX CHUNK SIZE.
so we will have only 50 chunks, some chunks will have documents more than the others.

on the other hand if you use ZipCode, this field has many values --> high cardinality 

Writing Scale: your key should be random, lets for example think about a key which is Year-Month-Day of document creation, as you can see this key has high cardinality as we can generate alot of values, however these keys are not highly random, you can see that documents that are generated on the same day will be stored in the same chunk, 
on the other hand a key with Minutes-Seconds will be high Cardinal and Highly random.

Query Isolation: sure if you want to get results fast, you should go to one shard, as you can see with Cardinality and write scaling we were talking about randomness, 
if you want to get documents fast you should use a key where it redirect you to few shards.

Index and Performance
to increase performance
1- create indexes
2- reduce document size: reduce field name for example, use GridFS for large files which is used for files more than 60MB

Backup Mongo
1- you can simply copy the MongoDB folder for back up, (copy the whole folder not few files)
2- you can use mongodump and mongorestore.
3- you can use mongoexport and mongoimport

Monitoring Mongo
1- use db.stats()
2- db.serverStatus()
3- mongostat
4- mongotop
5- use rockmongo tool 
6- use mongobird tool


MongoDB University course

1- it is a document DB
2- datamodel is json
3- no relation, scale out
4- atomic read & write on single document
5- Mongo, or the shell used to connect to MongoDB, uses TCP to connect to Database.
6- the driver, like JAVA driver, that you use to connect to MongoDB, also uses TCP, and the data that is transfered over the connection is BSON style
7- JSON has String Integer boolean array object data type
8- BSON extends JSON datatype, and has some extra information to make scanning the document faster, the scanning is linear scan, however adding some information like field length can help the scanner to do some jumps while scanning.

Wednesday, January 2, 2013

Applying Apache Cordova and PhoneGap, O'Reilly Course

In order to start working with phonegap:
1- install node.js
2- install git
3- install phonegap
(open node.js cmd and write)
npm install -g phonegap
4- you can install phonegap desktop app

phonegap is something on top of cordova

use phonegap -v to check phonegap installed version

after you install, you can open phonegap desktop app, add new project. or you can use command line and write


after you add new project you will have this structure

config.xml: contains metadata about the project
hooks: you can add scripts here and tell phonegap to execute them at a certain step (e.g. after compilation).
platforms: list of platforms for this project (e.g. ios, android ...)
plugins: list of plugins for the project
www: basically www is your project
merger: doesnt appear here, you can use it to apply some stuff for specific app, for example apply something on IOS not android.

very important: you should always work on the www folder, dont work on the platforms folder

Deal with Platform
-phonegap platform add android
-phonegap platform add ios
-phonegap platform update ios
-phonegap platform remove ios

now if you want to tell phonegap to be optimized for android you can
phonegap platform add android --save

so when you add the platform you can tell phonegap to remember that android is used,

-phonegap comes with a list of template projects you can browse them by
phonegap template list

and you can create a project of a template:
phonegap create Hello com.asdf.hellp HelloPG --template TEMPLATE_NAME

Build your phonegap application
1- phonegap prepare
this will put the www folder inside each platform
2- phonegap build ios
this will build the platform
3- phonegap emulate ios
this will run the emulator, for android, phongap will comunicate with andriod studio to run the emulator
4- phonegap run
this will run the app on a mobile connected via usb to the computer, or run it on Genymotion simulator

Browser platform
there is a platform called browser, you can use it for testing
phonegap run browser

PhoneGap server:
in order not to emulate all the time (this takes time) you can use phonegap server
- phonegap server:
this will run a server, 
you can download phonegap application on your mobile and connect to this server to do test.
the mobile and computer must be on the same network

on the app: press with 3 fingers to go back
on the app: press with 4 fingers to referesh

Phonegap debugging:
with ios, you can debug with safari
with android, you can debug with chrome

in chrome, go to : Deverloper tool -> inspect devices 
here you will find all connected devices and geneymotions, and you can start debugging

Weinre Product
chorme will work with new android versions, for old versions use Winre

here you can find
<name> APP_NAME </name>: app name
<content src="index.html"/>: this is the first page, and you should have a single page application, dont build multiple pages

we have a plugin called cordova-plugin-whitelist, this plugin is used if the app wants to make call to the network, for example doing http request. Inside this plugin we have 
<access origin="..."/>: define the urls that we can access to, you should add all the urls that the app wants to access.

also you have <allow-intent> where you put what the application can do, for example "tel:*" to access the telphone functionality, "sms:*" to access sms, "geo:*" to access the gps ...

you can set allow-intent per platform

also you have a configuration for full screen (full screen means that you can see the clock and battery infomration, like when you play games)

also you have a configuration for orientation.

also you have overscroll configuration

also a section for ICONS.

check the documentation for other stuff.

Merger folder:
if you want to add a css for ios only, you can do that inside merge folder (if merger folder doesnt exist, create it manually).
inside Merger folder, create a platform folder for each platform, now you can put for example platform.css inside the ios platform folder.
just add a link to this css inside index.html

Phonegap Events
you have list of events in phonegap:

deviceready event: trigger when the device is ready
pause event: trigger basically when you press the home button, when we press the home button the appliction is paused
resume event: when we go back to the application.

some operating system doesnt have pause resume, they reload the application from scratch.

backbutton event: like the android back button.

Phonegap webview
you should know that each mobile operating system comes with a specific web view, which determines how the html will be rendered.

sometimes you need to use CROSWALK to work with olde webviews in android.

Some user interface libraries

Phonegap plugins
- phonegap plugin list
- phonegap plugin add
- phonegap plugin remove

you have plugins like camera battery ....
phonegap plugin add cordova-plugin-camer

sometimes you can change stuff using css or plugin,
for example you have a plugin to change the status bar
this plugin will give you an object only ON DEVICE READY, and you can use this object to manipulate the status bar.

document.addEventListner("deviceready", function() {

also you can use notification plugin to create alerts notifications and beeps ( better than using java script alert()

you have some 3rd party plugins, you can check them online

Enable debug mode on your mobile
go to setting and search for developer option menu
to enable the developer option menu go to "about" and press on the build Number 7 times until it becomes developer

build and release your app
phonegap build --release
however you should sign your app

so it is better to do that from inside android studio or Xcode, 
go to android studio, import the project inside android platform
got build->generate signed apk


after you create an account, you can use 
phonegap remote login
phonegap remote build ios

Tuesday, January 1, 2013

Introduction To Big Date: O'Reilly Course

The main concept of Hadoop is to process large amount of data distributed on multiple machines

when we talk about hadoop we talk about
1- MapReduce framework
2- HDFS, Hadoop Distributed file system

MapReduce: is the algorithm that allows processing large amount of data distributed on multiple machines

the user should write Map function and Reduce function, Hadoop will take care of the rest (distributing the work over multiple machines, handling errors, fault tolerance ...)

MapReduce example:

you have a file pets.txt, we divided this file into blocks, consider that each block is saved on a machine, the MAP function will process the input and convert it to KEY/VALUE pairs, here the key is the animal name, the value is just 1.
SHUFFLE is the process of moving the input with similar keys to the same machine (THIS IS DONE AUTOMATICALLY BY HADOOP).
Then the REDUCE function will do simply a count and the output pet_freq.txt contains the result which is the number of times the animal each animal is mentioned in the file

- HDFS: is a way to distribute our data on multiple machines, Hadoop will store multiple copy of the data on different machines so we are sure the data is not lost. (Default Replication factor is 3)
- the recommended block size in hadoop is 64 or 128 MB
- It is better to have small number of large files 
- hadoop reads the stored data SEQUENTIALLY no Random Access, the read is from beginning to the end of the file, this reduce the disk seek time
- there is no update, if you want to change the value in a file then write another one ( there are some solution for this which is introducing another layer on top of hadoop like HBASE ).

as you can see, the data is broken into blocks, blocks are stored in NODES, the blocks are replicated.

The Master Node has what is called Name Node, the Name Node knows where each block is stored.

- in order to deal with HDFS you may use HDFS SHELL (Command Line), Web user Interface (like Data Meer), Java API, REST.

Apache Hive
it is on top of hdfs, it is based on the idea of defining tables on top of hdfs, so what we do is do the mapping between the tables and hdfs.

There are no tables, it is just imaginary things.

Now you can use Hive QL, which is similar to SQL, and hive will convert them to MapReduce

how to do the mapping between HIVE and HDFS

as you can see you define a the table fields, and how it is mapped to the hadoop file, here we are using Regex for this mapping.

very important, HIVE is not used like relational database, it is used for batch and Analytics stufy

on top of hadoop, it is a procedural language, so here you dont write sql queries you write something like stored procedure.

you can start your development working on sample file without hadoop, do the development then go to hadoop.

the nice thing in PIG that you can write your procedure line by line, write a line check the result if you are satesfied move to the next line.

you can always check the execution plan to see the generated map reduce

example of PIG

as you can see after you load the document (e.g. sales.dat) you specify the schema of the document.

so use PIG when you want to write complex query that cannot be written in a single HIVE QL.

So here we use scala, and scala will be converted to MapReduce.
so the benifit is that you are using a programming language which will ease your life.

basically scallding will be compiled to cascading which is a MapReduce library which convert scala to MapReduce.

HADOOP Ecosystem

we have hdfs, on top of it we have YARN,
YARN fixes the multi tenancy problem of hadoop (running multipe map/reduce at the same time), in addition YARN allows hadoop to scale over 4000 machine in a cluster

TEZ: it provides better performance by elemenating some read write operations, TEZ transform map/reduce requests to directed acyclic graph. TEZ makes HDFS used for not only batch processing but for real time as well
zoo-keeper: is used to manage clusters.
spark: uses TEZ, and it is in memory data structure
flink: uses TEZ, it convert the program to a compiled execution plan

sqoop: is used for integration, basicaly export and import to relational database.

flume: also used for integration but with non relational database, e.g. file system, MQ ...

Mahoot: is used for maching learning (not mentioned in the graph above)

HBASE: has something called bloom filter, it detects if the key is not there and return directly.

we are talking about:
1- non relational databaes
2- distributed database.
3- CAP theorem: Consistency, Availability, Partition Tolerance
4- Eventual Consistency: consistency will happen in milliseconds

Ofcourse RDBMS is CA (Consistent and Available)

there is a relation between scalability and datamode complexity in NOSQL, in that sense if you want to order datastores in term of scalability:
1- Key/Value
2- Column family
3- Document
4- Graph

and the complexity is the opposite order

when we talk about streaming we are talking about different way of generating results

Normally, we have queries we run them on data and get results.
in Streaming we have data as an input to queries and we get the result

some products:

example of comparison between STORM and Hadoom