Saturday, August 11, 2012

RavenDB Changes API on the wire

RavenDB Changes API on the wire:
I promised that I’ll talk about the actual implementation details of how RavenDB deal with changes, after moving from SignalR to our own implementation.
First, let us examine the problem space. We need to be able to get notified by the server whenever something interesting happened. We don’t want to do active polling.
That leaves the following options:

  • TCP Connection

  • WebSockets

  • Long Polling

  • Streamed download


TCP Connections won’t work here. We are relying on HTTP for all things, and I like HTTP. It is easy to work with, there are great tools (thanks, Fiddler!) around to do that and you can debug/test/scale it without major hurdles. Writing you own TCP socket server is a lot of fun, but debugging why something went wrong is not.
WebSockets would have been a great options, but they aren’t widely available yet, and won’t work well without special servers, which I currently don’t have.
Long Polling is an option, but I don’t like it. It seems like a waste and I think we can do better.
Finally, we have the notion of a streamed download. This is basically the client downloading from the server, but instead of having the entire request download in one go, the server will send events whenever it has something.
Given our needs, this is the solution that we choose in the end.
How it works is a tiny bit complex, so let us see if I can explain with a picture. This is the Fiddler trace that you see when running a simple subscription test:

image

The very first thing that happens is that we make a request to /changes/events?id=CONNECTION_ID, the server is going to keep this connection open, and whenever it has something new to send to the client, it will use this connection. In order to get this to work, you have to make sure to turn off bufferring in IIS (HttpListener doesn’t do buffering) and when running in Silverlight, you have to disable read buffering. Once that is done, on the client side you need to read from the server in an async manner and raise events whenever you got a full response back.
For our purposes, we used new lines as response marker, so we would read from the stream until we got a new line, raise that event, and move on.
Now, HTTP connections are only good for one request/response. So we actually have a problem here, how do we configure this connection?
We use a separate request for that. Did you note that we have this “1/J0JP5” connection id? This is generated on the client (part an always incrementing number, part random) for each connection id. The first part is a sequential id that is used strict to help us debug things “1st request, 2nd request” are a log easier than J0JP5 or some guid.
We can then issue commands for this connection, in the sample above you can see those commands for watching a particular document and finally stopping altogether.
This is what the events connection looks like:

image

Each change will be a separate line.
Now, this isn’t everything, of course. We still have to deal with errors and network hiccups, we do that by aborting the events connection are retrying. On the server, we keep  track of connections and pending messages for connections, and if you reconnect within the timeout limit (a minute or so), you won’t miss any changes.
If this sounds like the way SignalR works, that is no accident. I think that SignalR is awesome software, and I copied much of the design ideas off of it.


DIGITAL JUICE

No comments:

Post a Comment

Thank's!