What is lucene-bytebuffer.
lucene-bytebuffer is Lucene Directory implementation using Direct ByteBuffer. Directory in lucene is backing storage for index. Lucene uses directory for storing index contents. So there is RAMDirectory, FileDirectory, MemoryMappedFileDirectory, NIODirectory each presenting various different options. lucene-bytebuffer will allow in-memory index to grow upto several gigabytes without incurring garbage collection cost.
Mostly indexes are 90 to 95% read and 2-5% write ie. index hardly changes. If index is huge it will cost a lot in terms Garbage Collection CPU cycles. RAMDirectory holds arrays of size 1024 so for 1GB index its 1 million array objects. So as size gets increased in-memory index performance degrades due to garbage collection.
What if you want to index say 5GB data? Use off-heap bytebuffer backed directory.
Another question is why would you want to use lucene in-memory indexing. May be as Cache which can be queried on more than one property of object indexed?
Tuesday, October 19, 2010
jmalloc : Manual Memory Management in java
One of the definitive advantage of Java over C++ is automatic memory management. Automatic memory management also has its cost tough. This cost was high in earlier versions of JDK(Serial stop the world garbage collection) but now it has been much improved : parallel and concurrent collections. In some cases this cost is so high that manual memory management is the answer. jmalloc is one such simple effort.
GC pain point in java, a limiting factor in many cases. Garbage collection tuning in java is considered as black art and very difficult to tune.
Caching provides performance boost for lot of application. Caching of large data is restrictive because caching mostly is very small part of application logic but it costs relatively more in terms GC impact. JVM don't perform well predictively beyond size of 4GB. Cache is typical - it holds objects with predictable life cycle. Some objects infact live through-out the application life such as "reference data" which does not change and remain cached. Such objects are also problematic for GC, they get promoted to old generation and scanned in every Full garbage collection wasting CPU. Terracotta has addressed similar problem using direct ByteBuffer. jmalloc also does the same thing for ehcache.
BigMemory benchmark claims to have been scaled upto 350 GB of cache on beefy server. BigMemory has shown that garbage collection that java offers is not sufficient for some use-cases like Caching. Caching modifies object only two times : Put on Cache and Eviction from Cache. This is typical case of manual memory management. jmalloc is manual memory management of direct buffers with two simple routines : malloc and free. Direct buffers are not visible to java garbage collection. Thus object stored in directbyte buffer lives as long as its ByteBuffer reference is not collected. jmalloc allocates a single ByteBuffer and divides it into many variable size chunks where objects are serialized and stored.
This is just start. Apart from generic malloc/free metods, I am planning to write a helper class for ehcache which will wrap ehcache so that all benefits like eviction, disk based overflow are available but the object is stored in
If you like the idea let me know.
..
Tushar
GC pain point in java, a limiting factor in many cases. Garbage collection tuning in java is considered as black art and very difficult to tune.
Caching provides performance boost for lot of application. Caching of large data is restrictive because caching mostly is very small part of application logic but it costs relatively more in terms GC impact. JVM don't perform well predictively beyond size of 4GB. Cache is typical - it holds objects with predictable life cycle. Some objects infact live through-out the application life such as "reference data" which does not change and remain cached. Such objects are also problematic for GC, they get promoted to old generation and scanned in every Full garbage collection wasting CPU. Terracotta has addressed similar problem using direct ByteBuffer. jmalloc also does the same thing for ehcache.
BigMemory benchmark claims to have been scaled upto 350 GB of cache on beefy server. BigMemory has shown that garbage collection that java offers is not sufficient for some use-cases like Caching. Caching modifies object only two times : Put on Cache and Eviction from Cache. This is typical case of manual memory management. jmalloc is manual memory management of direct buffers with two simple routines : malloc and free. Direct buffers are not visible to java garbage collection. Thus object stored in directbyte buffer lives as long as its ByteBuffer reference is not collected. jmalloc allocates a single ByteBuffer and divides it into many variable size chunks where objects are serialized and stored.
This is just start. Apart from generic malloc/free metods, I am planning to write a helper class for ehcache which will wrap ehcache so that all benefits like eviction, disk based overflow are available but the object is stored in
If you like the idea let me know.
..
Tushar
Thursday, September 16, 2010
BigMemory - Memory with no garbage collection overhead How?
I just came to know about BigMemory. After reading description and Terracotta CTO Ari's blog I wondered how does it work? BigMemory stores object in memoey without any garbage collection tax and stores it in native memory. It means you are no longer bound by JVM heap limit and you can use all memory (which is dirt cheap) available. And all this in Plain Java. No JNI.
I mean how did it happened that If there was technique available why cant any available cache framework used it? By implementing the this technique along with complete memory manager, Terracotta thus showed that its indeed leader in distributed cache. Why spent effort in optimizing garbage collection times which is a black art.
Lets assume that We have such mechanism which allows us to store java objects in native memory. So how do we implement BigMemory like system. Let's focus just on Cache usecase . You need to track objects in cache. But Object tracking is quite easy in cache usecase. They are removed directly by cache eviction threads or user so ultimately its map.remove(key).
So how does BigMemory work?. This is just my guess. It may be using Direct ByteBuffers. I did some googling around allocating native memory in JVM and found this article. and Yes Direct ByteBuffers are stored in Native memory. So below is my little shot at mimicking BigMemory. Basically each object is converted in ByteBuffer and stored in ByteBuffer. Some time and memory can further be saved by using faster and more smart serialization techniques.
The solution above stores keys in JVM heap and objects on non-heap memory. So If number of objects are 2M, JVM only has overhead of 2M objects while doing GC and not 4M ( in case of Map and 4-5M in case of EHCACHE where object is serialized and wrapped in ehcache Element object). This can be even further reduced by implementing some sort of hashing. Key is hashed into one of the buckets which are serialized and stored in buffers. You de-serialise bucket metadata and then iterate though it to find out actual buffer and then locate element. But BigMemory is just not cache its entire object graph storage.
Thus version presented here is poor man's BigMemory compared to what BigMemory will have but it still can save hugh GC tuning efforts in cases where there is hugh memory requirement in the application and thus heavy garbage collection. But Terracotta's BigMemory would be with much more smart strategies and data structures, integrated with distributed garbage collector and implementing all cluster semantics as all other terracotta clustered data structure do. Waiting for integration of BigMemory into Terracotta server. Memory has been one of the biggest paint points in Terracotta. That would give Terracotta unlimited-scalability in terms of number of distributed objects stored.
I mean how did it happened that If there was technique available why cant any available cache framework used it? By implementing the this technique along with complete memory manager, Terracotta thus showed that its indeed leader in distributed cache. Why spent effort in optimizing garbage collection times which is a black art.
Lets assume that We have such mechanism which allows us to store java objects in native memory. So how do we implement BigMemory like system. Let's focus just on Cache usecase . You need to track objects in cache. But Object tracking is quite easy in cache usecase. They are removed directly by cache eviction threads or user so ultimately its map.remove(key).
So how does BigMemory work?. This is just my guess. It may be using Direct ByteBuffers. I did some googling around allocating native memory in JVM and found this article. and Yes Direct ByteBuffers are stored in Native memory. So below is my little shot at mimicking BigMemory. Basically each object is converted in ByteBuffer and stored in ByteBuffer. Some time and memory can further be saved by using faster and more smart serialization techniques.
public Object put(K key, V value){
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream stream = new ObjectOutputStream(baos);
stream.writeObject(value);
stream.close();
byte b[] = baos.toByteArray();
ByteBuffer buffer = ByteBuffer.allocateDirect(b.length);
ByteBuffer buffer2 = map.put(key, buffer);
if(buffer2!=null){
ByteArrayInputStream bais = new ByteArrayInputStream(buffer2.array());
ObjectInputStream oois = new ObjectInputStream(bais);
V object = (V)oois.readObject();
oois.close();
return object;
}
else return null;
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException(e);
} catch (ClassNotFoundException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}
public V get(Object key){
try{
ByteBuffer buffer2 = map.get(key);
if(buffer2!=null){
ByteArrayInputStream bais = new ByteArrayInputStream(buffer2.array());
ObjectInputStream oois = new ObjectInputStream(bais);
V object = (V)oois.readObject();
oois.close();
return object;
}
else return null;
}catch (IOException e) {
e.printStackTrace();
throw new RuntimeException(e);
} catch (ClassNotFoundException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}
The solution above stores keys in JVM heap and objects on non-heap memory. So If number of objects are 2M, JVM only has overhead of 2M objects while doing GC and not 4M ( in case of Map and 4-5M in case of EHCACHE where object is serialized and wrapped in ehcache Element object). This can be even further reduced by implementing some sort of hashing. Key is hashed into one of the buckets which are serialized and stored in buffers. You de-serialise bucket metadata and then iterate though it to find out actual buffer and then locate element. But BigMemory is just not cache its entire object graph storage.
Thus version presented here is poor man's BigMemory compared to what BigMemory will have but it still can save hugh GC tuning efforts in cases where there is hugh memory requirement in the application and thus heavy garbage collection. But Terracotta's BigMemory would be with much more smart strategies and data structures, integrated with distributed garbage collector and implementing all cluster semantics as all other terracotta clustered data structure do. Waiting for integration of BigMemory into Terracotta server. Memory has been one of the biggest paint points in Terracotta. That would give Terracotta unlimited-scalability in terms of number of distributed objects stored.
Thursday, April 1, 2010
More than MVC framework for Flex : Mate
Recently at work place I was working on complex GUI screen which was to be designed in Flex. Flex has quite nice capabilities : Making beautiful GUI containing charts, widgets with minimal efforts and getting data from java-based-server-side with RemoteObject. But I guess as application will get complex(Do check grooveshark.com to see how rich and complex but beautiful GUI can be created with Flex) I needed some way of structuring and designing application so anybody can understand and extend application without reading thousands of lines of action script code. Being java programmer (framework based eco-system) naturally I thought on the lines of Struts, MVC framework. But then I felt need of spring like container too where I can write/design each component separately, test them and inject and wire them in main application. There are lots of framework in flex : Cairngorm, pureMVC and Mate. Naturally first pick was Cairngorm since its official adobe framework. While browsing through HelloWorld program, I thought why I am needed to to make so many framework specific classes. Last update date on download site was very old : 30th May 2007. Then I learned about Mate : a wonderful framework which allows structuring of application in very unobtrusive way with very small learning curve. It was quite easy for me to help other team members understand new design with this framework.
Mate uses flex eventing. So with Model-View-Controller, View generates events like Submit-Form which then you can "map" to one or more Controllers. The best feature of this framework is "Injection". It injects model into controller and view so once you change model in Controller methods View updates it self. With Flex data binding and Mate injection this becomes quite powerful and very simple design for complex GUI. Mate EventMap tag allows declarative coding thus allows rapid testing and prototyping. When you look at Mate EventMap you will come to know what Application is doing, its high-level components. - your design - exactly what framework should do. Mate allowed me structure my application and set discipline.
Say for example on clicking on "submit" button, you want to show make remote call, get data, update couple of labels on screen and put data into Grid also calculate Total Amount from all records and based on some "Value" you want to show some Signal like Yellow, Amber or Green indicating current status. Such screen can be described by simple EventMap declaration as follows. View can be implemented with its own Panels : SearchPanel and DisplayPanel. SearchController will host all activity specific code like calculating Totals or Summary etc.
EventMap
View code can be refactored into two panels : SearchPanel and DisplayPanel SearchPanel looks like as follows :
SearchPanel
Where as DisplayPanel looks like as follows :
DisplayPanel
and code below injects model into controller and DisplayPanel. The way it works like Spring context's singlton bean so same instance is shared and below tag says : DisplayPanel.searchResult = SearchController.currentResultSet
EventMap - Injector
Below is code for Controller which requires its own Binding when currentResultSet changes :
SearchController
One more good feature if Mate is that it has MockService interface where you can write dummy remote service as part of Flex application itself. This allows developers to work independently when server-side services are not ready or avaialable.
Welcome to awesomeness of Mate.
Tushar
Mate uses flex eventing. So with Model-View-Controller, View generates events like Submit-Form which then you can "map" to one or more Controllers. The best feature of this framework is "Injection". It injects model into controller and view so once you change model in Controller methods View updates it self. With Flex data binding and Mate injection this becomes quite powerful and very simple design for complex GUI. Mate EventMap tag allows declarative coding thus allows rapid testing and prototyping. When you look at Mate EventMap you will come to know what Application is doing, its high-level components. - your design - exactly what framework should do. Mate allowed me structure my application and set discipline.
Say for example on clicking on "submit" button, you want to show make remote call, get data, update couple of labels on screen and put data into Grid also calculate Total Amount from all records and based on some "Value" you want to show some Signal like Yellow, Amber or Green indicating current status. Such screen can be described by simple EventMap declaration as follows. View can be implemented with its own Panels : SearchPanel and DisplayPanel. SearchController will host all activity specific code like calculating Totals or Summary etc.
EventMap
- <EventHandlers type="{SearchEvent.SEARCH}">
- <!-- call the remoting service -->
- <RemoteObjectInvoker instance="{services.productService}" method="search" arguments="{event.searchCriteria1,event.searchCriteria1}">
- <!-- result sequence gets executed when service returns with a result -->
- <resultHandlers>
- <MethodInvoker generator="{SearchController}" method="setGridData" arguments="{resultObject}"/>
- <MethodInvoker generator="{SummaryCalculator}" method="calculateSummary" arguments="{resultObject}"/>
- <MethodInvoker generator="{SignalIndicator}" method="calculateAndIndicateState" arguments="{resultObject}"/>
- </resultHandlers>
- <faultHandlers>
- <CallBack method="handleFault" arguments="{fault.faultDetail}"/>
- </faultHandlers>
- </RemoteObjectInvoker>
- </EventHandlers>
View code can be refactored into two panels : SearchPanel and DisplayPanel SearchPanel looks like as follows :
SearchPanel
- <mx:Script>
- <![CDATA[
- import mate.events.*;
- import mx.controls.Alert;
- import mx.collections.*;
- private function fireSearch():void
- {
- var event:MessageEvent = new SearchEvent(SearchEvent.SEARCH, true);
- event.searchCriteria1 = textInput1.text;
- event.searchCriteria2 = textInput2.text;
- dispatchEvent(event);
- }
- ]]>
- </mx:Script>
- <mx:Button id="searchBtn" label="Search" width="100" click="fireSearch()"/>
Where as DisplayPanel looks like as follows :
DisplayPanel
- <mx:Panel>
- <mx:Script>
- <![CDATA[
- import mate.events.*;
- import mx.controls.Alert;
- import mx.collections.*;
- import mate.model.*;
- [Bindable]
- public var searchResult:SearchResultVO;
- ]]>
- </mx:Script>
- <mx:VBox label="Search Result">
- <mx:DataGrid id="dataGrid" width="350" height="200" dataProvider="{searchResult.Data}"/>
- </mx:VBox>
- </mx:Panel>
and code below injects model into controller and DisplayPanel. The way it works like Spring context's singlton bean so same instance is shared and below tag says : DisplayPanel.searchResult = SearchController.currentResultSet
EventMap - Injector
- <Injectors target="{DisplayPanel}" >
- <PropertyInjector targetKey="searchResult" source="{SearchController}" sourceKey="currentResultSet" />
- </Injectors>
Below is code for Controller which requires its own Binding when currentResultSet changes :
SearchController
- package mate.controller
- {
- import flash.events.Event;
- import flash.events.EventDispatcher;
- import mx.controls.Alert;
- import mx.collections.ArrayCollection;
- import mate.model.*;
- public class SearchController extends EventDispatcher
- {
- private var _currentResultSet:SearchResultVO;
- [Bindable (event="currentSetChange")]
- public function get currentResultSet():SearchResultVO
- {
- return _currentResultSet;
- }
- public function setGridData(result:SearchResultVO):void
- {
- // Do other processing if required....
- _currentSet = result;
- dispatchEvent( new Event('currentSetChange'))
- }
- }
- }
One more good feature if Mate is that it has MockService interface where you can write dummy remote service as part of Flex application itself. This allows developers to work independently when server-side services are not ready or avaialable.
- <MockRemoteObject id="helloService" showBusyCursor="true" delay="1" mockGenerator="{MockHelloService}"/>
Welcome to awesomeness of Mate.
Tushar
Sunday, March 14, 2010
Uploading and Sharing Photos
We all want to share pictures all our Kodak moments with our friends. This is one kind of existence in Web 2.0 virtual world. This normally involves following pattern :
Picasa makes sharing photos really easy. I use Picasa Desktop Application for all my photos. Picasa has concept of gadgets where you can add button, clicking on which it will post your album to sites like Orkut or Facebook. So here is my way.
And orkut has inbuilt support for Picasa Web Albums See screen shot below
Hope this simple tip makes your life with Photos and Social Networks easier.
Tushar.
- Take a lot of pictures by using point-and-shoot digital cameras.
- Download them from Camera/Memory card to computer
- Edit them in Picture editor
- Upload them to web-album sites like flickr or picasa
- Share photos from Picasa or Filkr
- Upload and share on social networking sites like Orkut or Facebook.
Picasa makes sharing photos really easy. I use Picasa Desktop Application for all my photos. Picasa has concept of gadgets where you can add button, clicking on which it will post your album to sites like Orkut or Facebook. So here is my way.
- Download photos from Camera or Phone using Picasa.
- Edit them using picasa
- Use picasa tool bar buttons to upload them to Picasa Web album, orkut and facebook.
And orkut has inbuilt support for Picasa Web Albums See screen shot below
Hope this simple tip makes your life with Photos and Social Networks easier.
Tushar.
Wednesday, February 10, 2010
Google Buzz
My first impression - I like the idea of adding connected sites : twitter, picasa, blogger, reader and all status messages. I am still searching for good social network desktop client which will connect to all my accounts : gmail, orkut, facebook, yahoo, linked-in, twitter but there is not such application existing yet. The once which was closer is Digsby, but digsby's UI is really clumsy to use.
Buzz is single window solution. You want to share picasa album or your recent tweet it works. Lets hope that it does not fail as Google Wave failed due to absence of freinds. I hope google this time sends invitations to clusters of users such that when I use buzz I will find at least some friends to share buzz. Buzz is suppose to compete with Facebook and Twitter, it surely has plus points against both
On other side, I don't know what google is thinking of having two different products : Wave and Buzz. I think since Wave has failed Google was in serious need to restrict facebook's advance so google added social networking aspects to its most loyal user base - Gmail. I am not sure about Buzz success, its surely going to tank as like google's other recent products. Facebooks strong point is that all sharing links, photos, activities, thoughts is at the central of page but in gmail is just another view so am not sure it should be part of gmail. Gmail view is OK but there should be separate page also where I can just spend time on Buzz. Hope google adds new features slowly.
Buzz is single window solution. You want to share picasa album or your recent tweet it works. Lets hope that it does not fail as Google Wave failed due to absence of freinds. I hope google this time sends invitations to clusters of users such that when I use buzz I will find at least some friends to share buzz. Buzz is suppose to compete with Facebook and Twitter, it surely has plus points against both
- it works right inside the most popular Google application - gmail
- no 140 character limit
- auto publish feature : when I tweet, same twit will be "buzzed" too
On other side, I don't know what google is thinking of having two different products : Wave and Buzz. I think since Wave has failed Google was in serious need to restrict facebook's advance so google added social networking aspects to its most loyal user base - Gmail. I am not sure about Buzz success, its surely going to tank as like google's other recent products. Facebooks strong point is that all sharing links, photos, activities, thoughts is at the central of page but in gmail is just another view so am not sure it should be part of gmail. Gmail view is OK but there should be separate page also where I can just spend time on Buzz. Hope google adds new features slowly.
Saturday, January 9, 2010
Google Nexus One
Two days back I bought my android phone in US. Cant wait to get hands on it. Here are few pics of it.
The most astonishing thing is that it got 1 GHz CPU while most of the smart phones in current generation are around 528 MHz thats big advantage and its got bigger screen. For screen bigger the better. It also has Flash Player, significant factor over IPHONE 3GS.
Tech Specifications. For details spec see : Google Site
The most astonishing thing is that it got 1 GHz CPU while most of the smart phones in current generation are around 528 MHz thats big advantage and its got bigger screen. For screen bigger the better. It also has Flash Player, significant factor over IPHONE 3GS.
Tech Specifications. For details spec see : Google Site
- Processor Qualcomm QSD 8250 1 GHz
- Android Mobile Technology Platform 2.1 (Eclair)
- 512MB Flash 512MB RAM 4GB Micro SD Card (Expandable to 32 GB)
- Display 3.7-inch (diagonal) widescreen WVGA AMOLED touchscreen 800 x 480 pixels
- Camera 5 megapixels Autofocus LED flash with Video captured at 720x480 pixels at 20 fps
- 3G : UMTS Band 1/4/8 (2100/AWS/900)
- Wi-Fi (802.11b/g)
- Bluetooth 2.1 + EDR
Subscribe to:
Posts (Atom)