Thursday, September 16, 2010

BigMemory - Memory with no garbage collection overhead How?

I just came to know about BigMemory. After reading description and Terracotta CTO Ari's blog I wondered how does it work? BigMemory stores object in memoey without any garbage collection tax and stores it in native memory. It means you are no longer bound by JVM heap limit and you can use all memory (which is dirt cheap) available. And all this in Plain Java. No JNI.

I mean how did it happened that If there was technique available why cant any available cache framework used it? By implementing the this technique along with complete memory manager, Terracotta thus showed that its indeed leader in distributed cache. Why spent effort in optimizing garbage collection times which is a black art.

Lets assume that We have such mechanism which allows us to store java objects in native memory. So how do we implement BigMemory like system. Let's focus just on Cache usecase . You need to track objects in cache. But Object tracking is quite easy in cache usecase. They are removed directly by cache eviction threads or user so ultimately its map.remove(key).

So how does BigMemory work?. This is just my guess. It may be using Direct ByteBuffers. I did some googling around allocating native memory in JVM and found this article. and Yes Direct ByteBuffers are stored in Native memory. So below is my little shot at mimicking BigMemory. Basically each object is converted in ByteBuffer and stored in ByteBuffer. Some time and memory can further be saved by using faster and more smart serialization techniques.


public Object put(K key, V value){
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream stream = new ObjectOutputStream(baos);
stream.writeObject(value);
stream.close();
byte b[] = baos.toByteArray();
ByteBuffer buffer = ByteBuffer.allocateDirect(b.length);
ByteBuffer buffer2 = map.put(key, buffer);
if(buffer2!=null){
ByteArrayInputStream bais = new ByteArrayInputStream(buffer2.array());
ObjectInputStream oois = new ObjectInputStream(bais);
V object = (V)oois.readObject();
oois.close();
return object;
}
else return null;
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException(e);
} catch (ClassNotFoundException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}

public V get(Object key){
try{
ByteBuffer buffer2 = map.get(key);
if(buffer2!=null){
ByteArrayInputStream bais = new ByteArrayInputStream(buffer2.array());
ObjectInputStream oois = new ObjectInputStream(bais);
V object = (V)oois.readObject();
oois.close();
return object;
}
else return null;
}catch (IOException e) {
e.printStackTrace();
throw new RuntimeException(e);
} catch (ClassNotFoundException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}



The solution above stores keys in JVM heap and objects on non-heap memory. So If number of objects are 2M, JVM only has overhead of 2M objects while doing GC and not 4M ( in case of Map and 4-5M in case of EHCACHE where object is serialized and wrapped in ehcache Element object). This can be even further reduced by implementing some sort of hashing. Key is hashed into one of the buckets which are serialized and stored in buffers. You de-serialise bucket metadata and then iterate though it to find out actual buffer and then locate element. But BigMemory is just not cache its entire object graph storage.

Thus version presented here is poor man's BigMemory compared to what BigMemory will have but it still can save hugh GC tuning efforts in cases where there is hugh memory requirement in the application and thus heavy garbage collection. But Terracotta's BigMemory would be with much more smart strategies and data structures, integrated with distributed garbage collector and implementing all cluster semantics as all other terracotta clustered data structure do. Waiting for integration of BigMemory into Terracotta server. Memory has been one of the biggest paint points in Terracotta. That would give Terracotta unlimited-scalability in terms of number of distributed objects stored.

No comments: