Sunday, December 21, 2008

An Insane idea to Solve Java's Memory Problem

This could be insane idea to solve java's memory problem.

Garbage collection overhead has been one of the major performance problem. Recent improvements in Garbage collector : ability to parallel collect garbage using more processors or cores available, concurrent garbage collection has drastically reduced this belief that GC is mostly cause of performance problems. Earlier this problem could be solved by running multiple instances of application on the same box to use its all CPU power and memory. But for some memory-intensive application GC overhead is still a problem to be solved. As JVM is being touted as more of Virtual Machine for many interpreted languages as opposed to only Java runtime memory problem is bound to occur. I have been working recently on Terracotta which is Network Attached Memory (NAM) for Java Applications. When you say NAM, you application now can access more number of objects that could fit in your local heap since objects which are not in your local heap are transparently loaded into local JVM when they are accessed .. sort of lazy loading in hibernate. Now this idea can also be applied to JVM .. offload some objects which are not accessed often down to local disk and load them when they are not accessed.Imagine a situation where you are running application on 16 core or 32 core processors ( intel has one prototype of 80 cores) ..data processing ability of such hugh machine. With 64-bit platform JVM size can grow beyond 2 GB limit but when Full GC happens on JVM sized more than 4 GB its really painful for applications. I have never worked on such hugh JVMs no really no idea how jdk 5 and 6 performs in such situations. Sure there will be some optimizations in JVM to operate at such scale.

A simple prototype of this is various cache solutions which offer cache eviction to local disk when number of objects cross defined cache size. But these implementations are targeted as "caching solutions" which know only how to "get" and "put" objects in a manner so as to make best use of available space.

At first this idea may be insane but various optimizations can be done to make it practical. This is what terracotta has done, to avoid object serialization and operate at field level. Consider a map of 10000 fat objects. When this map is offloaded to disk all value objects will also be written to disk. Now when some value object is looked upon only the object can be loaded while rest of the objects still being stored on disk. This basically means you have to implement some sort of virtual memory manager for virtual machine which will use application access pattern and some stored intelligence to minimize the total load delay and at the same time allow application to operate on large data set.


Any comments are welcome .. I am sure there will be a lot of comments on this insane idea.

..
Tushar

No comments: