Tuesday, August 19, 2008

Java Performance : Monitoring and Measurements - APM

Recently I came across this (Run-time performance and availability monitoring for Java systems) wonderful series of articles about implementing run-time performance monitoring for application's ecosystem sometimes its called APM (Application Performance Management). This series really good piece of information every person who needs to implement some kind of ecosystem for performance needs to read it. But all screen shots resemble Wily Introscope so basically it describes everything what Introscope does.
Article say Helios is the reference implementation of idea discussed.

It covers

  • Monitoring fundamentals - Why you need it. What you need it - Periodic reports based on template and custom reports, Historical storage and analysis, Live Visualization and simultaneous plotting for correlation, Alerting : based on email, blackberry, sms or JMS with GUI, Dashboards

  • Recent advancements - like Agentless monitoring, Synthetic Transaction

  • Some very high-level design details - Performance Data Source, Collector, Tracers

  • Tracing patterns : Polling, Listening, Interception , Instrumentation


And about how you go about performance monitoring in general?

It focuses JMX as primary way of doing things but if you look towards commercial APM like CA Wily's Introscope these are not JMX based. I feel if your application is java centric then it helps to have java based apm : Introscope Wily.

I would like to add following too.

Synthetic transactions : I feel synthetic transactions are really helpful but at the same time difficult to implement. Tools like Grinder can help you to implemented these.

Application Specific Metrics : Generally you have infrastructure metrics like os metrics, storage metrics, network and application infrastructure metrics like : App Server metrics (queue length, response time) and Database : avg. query time. But sometime you may require to gather application/business specific metrics. An example would be say : policies issued per day. Framework should be flexible to allow such metric to be posted to APM where it can then be correlated to technical metrics.

Dignosis : Also APM should be able to switch gears when problem occurs. When problem occurs APM should start collecting data at more granular level so as to get more refiend picture of system. Introscope has dignosis tool called : Transaction Monitor which traces entire transaction as one context. Article though talks about "heavy instrumentation" as i guess there

Heuristics: APM should also provide some level of analysis (rule based) like what Glassbox does. See demo http://demo.glassbox.com/glassbox

Dilemma with APM systems is that commercial products like IBM Tivoli, HP OpenView, Mercury BAC come with many features with hefty cost. So open-source comes to rescue here : There lot of Open-source projects which implement this idea. I am big fan of Nagios. Others like Groundwork Monitor implement extra functionality around nagios. Nagios 3.0 has made lot of progress and now "installable" for normal user with this guide. http://nagios.sourceforge.net/docs/3_0/quickstart.html
Also, if your application is java centric then it makes good seance to have JMX based system as described in this article. Sometimes you don't need full blown system. In such situation tools like JAMon API can help you. Do visit JAMon site it will surely help you even its recommended to keep running on production environment.

If you are interested in java profiling and how its done read : Build your own profiling tool and Jensor (jensor.sourceforge.net). Jensor is java profiler built by TCS's Performance Engg Group ( from where i started my career) is focused on first article mentioned but has good analysis gui which helps you to dig in. Commercial profilers JProbe and JProfiler are really good.

In past I had experimented same idea with system called Nagios. There focus was to implement performance monitoring system for the entire lab. We had written Nagios plugins for Webspere(PMI based) and weblogic ( weblogic shell) for collecting performance metrics. Nagios but, is only scheduling and executing engine which does not effectively care about the data ( Currently with 3.0 version it has lot of extension points where you can save events and performance data in mysql or postgres database). So we had used small open-source system Perfparse for performance data and pulling out reports. In our lab we had implemented this on about 80 servers on single CPU monitoring machine running RHEL 4.0. Worked very well for me.

Best one is the one which solved your problem!!.

..
Tushar

2 comments:

Unknown said...

As an alternative to Nagios you could take a look at Zabbix and Zapcat JMX Zabbix Bridge.

Unknown said...

Even Hyperic HQ is a good Open Source monitoring tools. With features like Auto-discovery it becomes a breeze to configure Hyperic.

Tushar, do you plan to write about Application Dependency discovery anytime soon?