I finally managed to release memcached-session-manager 1.1. This release covers pluggable session serialization/deserialization.
The main motivation was to get an alternative serialization mechanism to java serialization that supports different/new versions of classes (forward/backward compatibility) - in the end a deployment of a new software/application release should be possible with session failover still available.
Available serialization solutions
Therefore I started to watch out for serialization solutions that provide forward/backward compatibility. A very useful resource I found was the thrift-protobuf-compare project. It has a very good performance comparison of different serialization libraries. I checked one serialization solution after the other and had to realize that I have requirements that limit the potential solutions severely: this is the serialization of dynamic structures and the need to be able to recreate correct types during deserialization.
Schemaless serialization, serialization of dynamic structures
The serialization of dynamic structures is required, as users shall not be forced to provide s.th. like a schema for their session objects. Most of the time, users even don’t know really, what gets stored in the http session and what not. Therefore it’s rather impossible to provide a structural definition of the session attributes, and the serialization library has to figure out what needs to get serialized, and also what and how it is deserialized. Unfortunately, most of the “cool” and especially fast serialization libraries like protobuf, thrift or avro rely on some kind of schema that defines the serialization/deserialization and therefore they cannot be used.
JSON cannot be used as correct types need to be deserialized
Because of the second restricting requirement, the need to recreate correct types during deserialization, fast json libraries like jackson drop out.
Serialization with XStream
The serialization solution I chose was xstream, it does all what’s needed and also has a very simple to use api - all that needs to be done for serialization is s.th. like new XStream().toXML( Object, OutputStream ).
To get my own numbers, I compared the number of requests/second for java serialization and xstream based serialization. For this I created a simple wicket webapplication with some pages that differ in the amount what’s stored in the session.
Unfortunately I experienced the same that was said by the thrift-protobuf-compare project: xstream is considerably slower than java serialization.
Serialization with Javolution
Therefore I looked for other possible solutions and decided to use javolution for xml binding and to write the required reflection stuff to determine what needs to be serialized/deserialized.
The interesting part of this was that I learned a little bit more about reflection and serialization: I realized that it’s not possible with standard java reflection to deserialize private classes and classes without a default constructor. For this one needs to fall back to vendor specific solutions (e.g. the sun jdk come with a ReflectionFactory that allows to get a newConstructorForSerialization).
Other things I got aware of was that during serialization cyclic dependencies need to be handled correctly, and that during deserialzation different classes sharing the same reference to an object need to get these shared object references again. The solution for both requirements is the same: during serialization one needs to track which objects are already serialized and perhaps one just writes the reference to an already serialized object. During deserialization these references can be resolved accordingly.
After I solved these things and others with javolution I had another serialization strategy for the memcached-session-manager and could again compare the performance. The result was mixed: on the one hand I was happy because the javolution based solution is faster than xstream, on the other hand it was still slower than java serialization.
Looking for fast serialization solutions
So I’m still looking for serialization solutions that can do what’s needed and are still faster than java serialization.
One thing I’m currently working on is a solution based on aalto, which will be very similar to the javolution based serialization. But hopefully it will be even faster, as according to thrift-protobuf-compare aalto seems to be faster than javolution.
Another promising candidate is kryo. There still needs to be s.th. done so that it can be used (e.g. support for forward/backward compatibility and support for cyclic graphs), but I’m confident that Nate is going to implement these things.
The third candidate is jackson. The first thing that has to be implemented is including type information in json. After that cyclic graphs are still a challenge, but I still hope that at some time jackson will provide everything required.
If you know other fast serialization solutions or if you think that one of the libraries compared in thrift-protobuf-compare is still an option please let me know!
Finally developing version 1.1 of memcached-session-manager was fun: object serialization and deserialization is an interesting field, both the several problems that need to be solved and also the available solutions and libraries.
Now I’m looking forward to find/implement even faster serialization strategies and to optimize the performance of the memcached-session-manager in other ways.