Sunday, August 24, 2008

Building Performance Monitoring Solution with Nagios and NDOUtils

In my previous post i talked about Application Performance Management - APM tools and discussed possible requirements of it. I also talked about Nagios being wonderful and proven monitoring and solution. In this post i will show how to build monitoring solution with Nagios comparable to any commercial monitoring system.

We will be using following components along with Nagios.

  • NDOUtils - Storing runtime information to database

  • NRPE - Remote Plugin Agent

  • NSCA - Remote Plugin with Passive Checks

  • Nagvis - Visualization Addon

  • Nagios Business Process Addons - Custom Bird's Eye View of Business Application

  • NSClient++ - For monitoring Windows Hosts

Tough part is that these are all discrete components requires good knowledge. This series will focus on all these system and build good monitoring solution. This part will be focusing on NDOUtils and performance data. Once complete I will release the package with installation script so that it will be easy for installation.

I assume you have nagios installed and working. If not visit quick-start guides Ubuntu, Fedora, OpenSUSE. They work perfectly. Go through nagios console and explore what nagios provide out of the box.

Lets first see how to setup nagios and NDOUtils. NDOUtils is an implementation of Nagios Event Broker module(NEB). NEB has shared object which is loaded by nagios and can register for listening events. NDOUtils passes this event to file or socket. Second part of NDOUtils is C-daemon which saves the information to database.

Download ndoutils tar from nagios site. When you try to compile it you may face problem with mysql library. On Ubuntu first install mysql-dev library. Off course you have to have mysql server installed first.

sudo apt-get install libmysqlclient15-dev

Start mysql client and login as root. Create Database and nagios user.

create database nagios;
use nagios;
grant all on nagios.* to 'nagios'@'' identified by 'nagios'
grant all on nagios.* to 'nagios'@'localhost' identified by 'nagios'

Go to db directory and create necessary tables.

./installdb -u nagios -p nagios -h localhost -d nagios

Now go ndoutil deirctory and compile the source

./configure --with-mysql-lib=/usr/lib/mysql

Copy binaries to nagios. This assumes that you have installed nagios in /opt/nagios (Defualt is /usr/local/nagios)

cp ndo2db-3x /opt/nagios/bin
cp ndomod-3x.o /opt/nagios/bin

Also copy config files - ndomod.cfg and ndo2db.cfg from config directory to /opt/nagios/etc
Modify Following lines.



Now open nagios.cfg file and modify following keys

broker_module=/opt/nagios/bin/ndomod-3x.o config_file=/opt/nagios/etc/ndomod.cfg

Now start ndo2db daemon before restarting nagios

/opt/nagios/bin/ndo2db-3x -c /opt/nagios/etc/ndo2db.cfg

Restart Nagios

/opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg

Now start mysql client and check nagios_hosts and nagios_services table to see that data is getting saved.

You can put following lines /etc/rc.local so that all events fired by Nagios ( which is started from /etc/init.d/nagios script) are picked up and processed. This makes everything started when machine boots up.

/opt/nagios/bin/ndo2db-3x -c /opt/nagios/etc/ndo2db.cfg

Once this is done all your data is saved to database. Now we can build our Performance Parser on top of this. I will come-up with shell or perl script for all installation once complete.


Tuesday, August 19, 2008

Java Performance : Monitoring and Measurements - APM

Recently I came across this (Run-time performance and availability monitoring for Java systems) wonderful series of articles about implementing run-time performance monitoring for application's ecosystem sometimes its called APM (Application Performance Management). This series really good piece of information every person who needs to implement some kind of ecosystem for performance needs to read it. But all screen shots resemble Wily Introscope so basically it describes everything what Introscope does.
Article say Helios is the reference implementation of idea discussed.

It covers

  • Monitoring fundamentals - Why you need it. What you need it - Periodic reports based on template and custom reports, Historical storage and analysis, Live Visualization and simultaneous plotting for correlation, Alerting : based on email, blackberry, sms or JMS with GUI, Dashboards

  • Recent advancements - like Agentless monitoring, Synthetic Transaction

  • Some very high-level design details - Performance Data Source, Collector, Tracers

  • Tracing patterns : Polling, Listening, Interception , Instrumentation

And about how you go about performance monitoring in general?

It focuses JMX as primary way of doing things but if you look towards commercial APM like CA Wily's Introscope these are not JMX based. I feel if your application is java centric then it helps to have java based apm : Introscope Wily.

I would like to add following too.

Synthetic transactions : I feel synthetic transactions are really helpful but at the same time difficult to implement. Tools like Grinder can help you to implemented these.

Application Specific Metrics : Generally you have infrastructure metrics like os metrics, storage metrics, network and application infrastructure metrics like : App Server metrics (queue length, response time) and Database : avg. query time. But sometime you may require to gather application/business specific metrics. An example would be say : policies issued per day. Framework should be flexible to allow such metric to be posted to APM where it can then be correlated to technical metrics.

Dignosis : Also APM should be able to switch gears when problem occurs. When problem occurs APM should start collecting data at more granular level so as to get more refiend picture of system. Introscope has dignosis tool called : Transaction Monitor which traces entire transaction as one context. Article though talks about "heavy instrumentation" as i guess there

Heuristics: APM should also provide some level of analysis (rule based) like what Glassbox does. See demo

Dilemma with APM systems is that commercial products like IBM Tivoli, HP OpenView, Mercury BAC come with many features with hefty cost. So open-source comes to rescue here : There lot of Open-source projects which implement this idea. I am big fan of Nagios. Others like Groundwork Monitor implement extra functionality around nagios. Nagios 3.0 has made lot of progress and now "installable" for normal user with this guide.
Also, if your application is java centric then it makes good seance to have JMX based system as described in this article. Sometimes you don't need full blown system. In such situation tools like JAMon API can help you. Do visit JAMon site it will surely help you even its recommended to keep running on production environment.

If you are interested in java profiling and how its done read : Build your own profiling tool and Jensor ( Jensor is java profiler built by TCS's Performance Engg Group ( from where i started my career) is focused on first article mentioned but has good analysis gui which helps you to dig in. Commercial profilers JProbe and JProfiler are really good.

In past I had experimented same idea with system called Nagios. There focus was to implement performance monitoring system for the entire lab. We had written Nagios plugins for Webspere(PMI based) and weblogic ( weblogic shell) for collecting performance metrics. Nagios but, is only scheduling and executing engine which does not effectively care about the data ( Currently with 3.0 version it has lot of extension points where you can save events and performance data in mysql or postgres database). So we had used small open-source system Perfparse for performance data and pulling out reports. In our lab we had implemented this on about 80 servers on single CPU monitoring machine running RHEL 4.0. Worked very well for me.

Best one is the one which solved your problem!!.


Friday, August 15, 2008

Meet the People Who Have Trillions Riding on Linux this Fall


Recently i cam across this link : Meet the People Who Have Trillions Riding on Linux this Fall. This shows how deep linux has penetrated. Linux is handling trillions of money. Wow!

Wednesday, August 13, 2008

Java Performance - Multi Core Processors

Hi i was reading book called Java Concurrency in Practice by Brian Goetz and others.
I found this interesting quote :

"For the past 30 years, computer performance has been driven by Moore's Law; from now on, it will be driven by Amdahl's Law. Writing code that effectively exploits multiple processors can be very challenging."

Amdahl's law describes how much a program can theoretically be sped up by additional computing resources, based on the proportion of parallelizable and serial components.

util.concurrent from Doug Lea brought Java Concurrency API right into JDK. From starting java was the first (at first among mainstram prograaming languages)
language to support Multithreading. With JAVA 5 any developer can write safe and scalable Java programs.

All latest processors now are multi-core processors this clearly shows shift towards parallel systems where your program has to effectively use all hardware thread underlying cpu provides. Unless programs are completely multithreaded, they simply won’t use the power available in hugely multicore systems. Lot of attention is given to transform code to multi-thread code. Read this post for details : "Intel: We Can Transform Single Thread to Multithread"

I also got to read this wonderful post which discusses Java Performance mainly GC on multicore processors : Multicore may be bad for Java.

Indeed Java is Multicore Ready I believe .. Happy Multithreading.


Sunday, August 10, 2008

My New Laptop - Acer 2920

Hi, weeks before I bought new laptop for me Acer 2920. Earlier I had blogged about my journey with linux earlier. Linux was mandatory criteria for my laptop. So i had experimented a lot with different laptop reviewed them on internet. My brother's Laptop : Sony VAIO CR12GB/H had responded well to linux call ... it was so good that my brother is now linux convert uses Ubuntu prefarably (See snapshot Below).

Other criteria being : Windows XP. I feel Vista is a big flop specially for countries like india where hardware here is atleast 6-7 months behind latest technology. In very first post on this blog I mentioned my disappointment in following words

"And last thing i want to say about VISTA is that its fooling people around. It takes seconds to load dekstop and pretend ready for action but reality is different. It take much more time to get all networking started and then i can start my browser. With linux it takes time to login but I know when its now ready - ready for anything file explorer, browser and mp3. "

But the market here in India is not that good. You don't get Windows XP laptop with good hardware these days. So I had to go cost mode .. laptops which are targeted as utility laptops : I settled for Lenovo and then i saw this beauty in Vijay Sales Thane Shop. Sales Guy was really friendly and allowed to boot the laptop with ubuntu live CD. It passed the test. Acer 2920 is the smallest laptop around with no compromise on hardware or any necessary features. It comes with driver CD for both Vista and XP which is not common in latest laptops which restrict you to vista only.

so if you are looking for good utility laptop : Acer is good brand and certainly best one for linux laptop. I recommend Sony also : Sony is also good value for money. Sony laptops have good sound quality : loud and clear, clear crisp bright display. VAIO CR series certainly suited for the most.


Saturday, August 9, 2008

Java Performance : Caching Clustering ... and "FlushCache"

Hi i came back after long time. This time again with java. Recenly playing with lot of java solutions relating to scaling and clustering. Lot of places java high performacne systems you see buzzwords like multi-threading, clustering, caching, map-reduce, partitioning, grid solutions. Lets discuss some of them.

I have been using hibernate for last 1.5 year in my personal experiments and at work place too. In one of project we had cache requirement for master data which was read-only. There we had implemented caching by hand as follows.

1. Wrapper around DAOs which first check in cache and then go to database.
2. Cache was implemented with Websphere provided object pooling API.
3. Websphere provided object pooling mechanism which even works in clustered set-up or network deployment with cache replication strategies.

Our requirement was well met and caching brought lot of improvement in response time as expected.

After caching, horizontal scaling or clustering comes into my mind when i think about performance.
Since then, I discovered lot of caching/clustering projects/apis, some of which are very much i feel worth to take a note.

1. Tangosol - recently acquired by Oracle, along with oracle, timesten ( oracle in-memory database, another acquisition :-) ) forms formidable data-tier. Tangasol is very proven prodcut which provides lot of features like distributed cache, data partitioning. Cameron Purdy on Theserverside claims to reach upto 0.5 milllion cache transaction per second.

2. Terracotta :- Terracotta is very revolutionary java product in fact listed in top 10 java things of year. Terracotta actually speaking is not caching solution but a clustering solution. Its basically JVM level clustering. They have provided cache solutions for lot of commons requirements : Hibernate, HTTP Session, Spring Beans etc. They provide good extension for lot of open-source projects. I had tested tomcat clustering and terracotta, similar to one on terracotta site and it gives good performance boost. Among all other clustering solutions terracotta has simplest programming model : NO PROGRAMMING MODEL. right : objects basically clustered at jvm level so only configuration change no code change. Practically you do require to change java code but that's minimal All java semantics work well in terracotta cluster. Terracotta is very useful in patterns like Master/Worker or bunch processes looking for some coordination or data sharing. Master Worker pattern term i guess was introduced much before Googles Map/Reduce and very much the same idea.

3. Memcached : this one is in c but has client libraries in almost all major programming languages. Idea of memcached is actually little bit different. You keep bunch of memcached process running objects are stored on these processes with hash key. Memcached works best when its processes run on web servers which are cpu-intensive with lot of memory available.

4. New bunch of grind frameworks : gigaspaces, hadoop - java map-reduce implementation

But now i had come across very different requirement where cache modification (writes) were significant in numbers and jdbc-batching with asynchronous operation was key to performance sometime this is called - write behind. When i looked upon only tangosol claims to have such facility so i decided to implement by hand. Here is an idea.

1. A background threads basically monitors queue.
2. All cache write operations append modified objects to this queue.
3. queue periodically flushes them to database and notifies successful operation to clients.

Here catch is how do you handle object graph updates
1. Delta Calculation
2. New Object insertion may require updates in one specific order where result of first inserts basically required for next one. (foreign key)

Initially i went with hibernate where i used StatelessSession as SQL generation engine hoping that hibernate will calculate delta properly. But lack of L1 cache means no life cycle hence no delta monitoring. So i had two choices.

1. Use merge : Fires extra selects
2. Generate sql by hand.

With help of cglib proxies i managed to get list of dirty columns but then how do you handle object relationships?

I soon realized whole ORM needs to be implemented... here list of requirements for basic ORM cum cache with write-behind

1. easy configuration : declarative orm mapping sufficient for most of scenarios
2. Should provide different execution strategies timer driven, threadpool driven, batch reached, batch and timer combined, entity level strategy for sync
3. in-memory multi-indexing based on columns or own implementation : cache object can be queried with different criteria
4. notification - listeners
5. target rate with % of write operation : 500 requests/second.
5. recovery test : check points : automated and manual
6. clustering with help of terracotta.

This is exhaustive list .. where first one itself is very big one..... my take on it is to restrict the scope and focus on caching instead of fancy orm stuff the one like hibernate.... it should be able to support only object graph ... all lifecycle is left to user..

I will add code snippets in coming posts ... now only idea has been finalized.