demonstore

DemonStore: Lessons Learned from Transaction-based Representations of State

Incremental Complexity : How Dreamler’s Transaction Model fell apart

When we first implemented the Level 0 protocol and started building the specifics of Dreamler as a tool we approached the concept of “client commands” in a very application specific way. We simply built the commands / transactions that we needed to represent the actions that a client could perform on a gameboard.

All of this was represented by binary data, effectively C language structs, which was passed directly to the server and stored verbatim. The client knew how to generate the commands and also knew how to interpret them (as they would arrive from as historical transactions back from the server). Things like “Create Activity”, “Set Activity Position”, “Link Activities” quickly materialized, and we began to call these commands “Level 1”; conceptually built on top of “Level 0”. Again, since the server did not understand nor care to understand the actual content or meaning of these commands, the development of the client could proceed quickly without disrupting or affecting anything on the server side of things.

The actual interpretation of the various commands that we created all resulted in the creation and evolution of an in-memory (RAM) data structure that we simply called the Model. This included things like the actual Activity and Link objects that we visualized on the gameboard. Initially this was all very simple, and we could proceed seamlessly from interaction to visualization to the creation of further commands to manipulate our emerging project planning tool.

From time to time we would however make mistakes in the creation of certain commands; we might forget to include some data that was needed to fully describe the command / transaction that we wanted to apply to the Model. What we found was that mistakes in Dreamler were permanent; since we were storing all transactions as history, any “bad” transactions were still there even when we fixed the problem in the code. This quickly resulted in the practice of deprecating outdated or bad commands and never allowing the re-use of their identifying numbers. After all, as soon as some transcation “happened” it was permanently recorded in the history for that gameboard, and when a client reconnected in a new session to that gameboard all the history came back again and needed to be interpreted and applied to the Model.

This gradually became a larger and larger issue. As the number of commands increased and their implementations changed, the number of deprecated commands that we no longer WROTE but still needed to READ and INTERPRET continued to grow. What we also found, when refactoring the codebase to better manage the large number of commands, was that the actual interpretation of any command was as important as the command itself. Eventually, as the needs and structure of the Model changed, it became impossible to apply the legacy interpretation of a deprecated command. This proved catastrophic when it came to determinism, because when we changed the structure of the Model suddenly the history could no longer be interpreted to the same actual in-memory state as it could previously. Even worse, all transactions that came later now built upon this now DIVERGENT interpretation of history! We started to experience all-around strangeness and data loss, and it became obvious that this whole approach wasn’t going to scale.

We were at this point pretty much painted into a corner, and were very leery of making ANY kind of change to the commands / transactions and their interpretation for fear of losing data. We discussed just cutting our losses by destroying all the history of the existing gameboards, effectively allowing us to remove the deprecated code, but we realized that this kind of situation would invariably happen again down the line, and having it happen when we had real customers with real data invested in the system was just too risky.

After much thought I finally came to the realization that we were simply ignoring the fact that there was no logical solution to the problem. Trying to support historical “events” that had a fixed interpretation which resulted in the mutation of a data model which future events would further mutate was all fine, as long as you never changed the data model beyond the point where the original events could still be applied. But we had already changed our Model several times, causing the illogical situation where we sometimes only partially applied the interpretation of legacy events, causing a changed state in the Model which we then mutated further. It was really already beyond the point where I could even visualize the various threads of history in my head.

All of this caused me a lot of sleepless nights, as we were at this point invested and needed a solution. Early one morning I sat at my kitchen table and sketched out a generalization of the problem domain and constructed a solution that was guaranteed not to break. I didn’t have a clear idea of how this would impact our codebase, but I knew that it would be a huge change. This was at a point where we felt that we couldn’t really afford the downtime, so we deferred. After a while the requirements of new functionality in the client eventually forced our hand, and I took it upon myself to implement the new system, dubbed DemonStore.

DemonStore

DemonStore is essentially an implementation of a generalized transaction system. The types of transactions that are allowed are both small in number and fixed, as are their subsequent interpretations and transformations of the resulting data model. The data model similarly consists of a small number of fixed primitive types.

DemonStore Level 1

A single DemonStore Level 1 Transaction is what is stored as part of the “history” of a gameboard, a singular event. A Transaction is stamped with a time (when it happened) as well as a user key (who created it). It also includes 1-n Events.

An Event implies ONE of the following:
* a WRITE / UPDATE of a “property” of a “bucket”
* a REMOVAL of a “property” of a “bucket”
* a LABEL (a string tag) of a “bucket”
* the REMOVAL of a LABEL (a string tag) of a “bucket”

DemonStore Level 1 Transactions are what the client generates by manipulating the gameboard, are sent over the wire (currently via Level 0 over UDP), and are the only thing that is stored persistently by the server. In the client we iterate incoming transactions and interpret them as change to an in-RAM data model, which we call DemonStore Level 2.

DemonStore Level 2

DemonStore Level 2 is a generic datastore, consisting of a set of conceptual “property buckets” (Buckets). A Bucket has a 64 bit UUID (Universally Unique Identifier) and 1-n Properties of varying types, as well as string Labels. The Bucket UUID is implied by the UUID of a Level 1 Event; the Level 1 Event implies change to the corresponding Level 2 Property, which belongs to the Level 2 Bucket with the same UUID as the Level 1 Event.

In general the client uses DemonStore Level 2 to READ information about the current state of the gameboard (represented by Properties), and DemonStore Level 1 to instigate change to this implicit / derived (from Level 1 Transactions) state. All change takes the form of a DemonStore Level 1 Transaction which is sent to the server / backend. Change to the local RAM model (Level 2 Buckets / Properties) only happens as a result of INCOMING Level 1 Transactions from the backend (i.e. everything bounces on the server to be “official”). Level 2 is conceptually READ ONLY.

Reasoning

As mentioned earlier, we initially built our app-specific transaction protocols directly upon Level 0, and had them manipulate a very visualization-specific in-RAM data model. We eventually ran into the problem of storing transactions that had implicit semantic meaning and corresponding incremental changes to the in-RAM data model. When we changed the meaning of legacy transactions over time and / or the requirements of the data model, things started to get more and more fragile. It seemed impossible to separate the content of the transactions from the code that described the transformations to the in-RAM model that these transactions implied.

DemonStore solves these issues by reducing all transactions to a WRITE/UPDATE or a DELETE, with all possible datatypes fixed. The corresponding transformation of these transactions to an in-RAM model is also fixed (Buckets of Properties), also with fixed datatypes. In Level 0 terms, all DemonStore “client commands” are prefixed by the 16-bit identifier 0x666. The actual contents of the command is a serialization of the Level 1 Transaction with included Events.

The fact that DemonStore DOES NOT have any concept of “structure” to any given Bucket is an important feature. This is indeed what allows for us to be robust in the face of changing Transactions, as the CONTENTS of Transactions may change while the TYPES of Transactions never do. Any changes to Transactions and domain requirements the Buckets will over time / via change accrete lots of redundant / legacy Properties that we no longer read or update, but as long as we don’t run out of RAM this is not practically a problem.

Optimizations

DemonStore Level 2 is sadly rather expensive to read from, as all Property access goes via a string look-up. This is several orders of magnitude slower than the previous C-language struct Model which had fixed native structures. We found during the port to DemonStore that we needed visualization-specific datastructures (essentially read-optimized caches) as another layer on top if DemonStore Level 2. Currently there are many such caches in place in order to support our rendering of the gameboard as well as other views. Our main cache is dubbed DemonAdapter, and contains the Dreamler domain concepts like Activities, Groups, and Links.

In our current implementation the DemonAdapter needs to be rebuilt FROM SCRATCH each time a Transaction prompts a change in the Buckets / Properties (there is no incremental update mechanism in place for anything but the Level 2 Buckets themselves). While it may be tempting to attempt to implement this kind of incrementalism in domain specific caches, this “limitation” is really what allows us to survive changes to the logical client commands over time. This is because the rebuilding of the DemonAdapter is done by explicitly looking for Buckets with certain Labels and Properties and building C struct versions of these “objects” directly. This kind of “pull” model allows us to automatically pass over any Buckets that may have incorrect or missing Properties; the DemonAdapter only includes “correct” domain concepts.

One downside of this constant re-build of the DemonAdapter is that it is illegal for client code to retain references to the various domain objects as these can be swapped out under the hood at any time due to Transactions arriving from over the network. As such manipulation of the DemonAdapter requires som diligence on the part of the programmers, such as for example only ever storing 64 bit UUIDs for Buckets instead of DemonAdapter caches of same. It is also difficult to do certain animations like interpolations of positions and similar. These limitations could potentially be overcome through some kind of double-buffered DemonAdapter; a similar construct is currently in place in order to support interpolation of client avatars.

Depending on the application it may or may not be necessary to build caches of this kind. There currently exist several experimental visualizations that operate directly on DemonStore Level 2 (directly reading the Buckets of Properties).

Aftermath

The move to DemonStore was quite involved, but resulted in a very deep understanding of the problem domain as well as a plethora of tools that evolved as a by-product of the conversion process; yes, we actually managed to convert all our legacy transactions to DemonStore transactions, effectively losing little or no data from our earlier gameboards.

Since the port these kinds of history related problems are simply gone. We don’t think very much about the DemonStore at all, and similarly to the Level 0 UDP protocol it “just works”. There is indeed some complexity and pain involved in dealing with the DemonAdapter cache, but really we are getting the best of both worlds; generality of transactions and data storage, as well as a custom read-optimized version of our domain model required by our real-time visualizations.

0 5071