Some years ago I worked for a company making an analytical SQL Database system. At some point I was asked to think about possible new storage method that would be optimized for high speed SSD storage. In the end, this project wasn't further pursued as other projects took precedence, but ever since I have had this weird idea of log structured btree storage in my head.
A few years later, when I got more serious about actually implementing this wild distributed file system idea of mine, I needed a suitable key/value store library that could use an append only object store as underlying storage. Unfortunately, none of the existing golang key/value store libraries offered this capability. After some careful thinking I realized that the Log Structured BTree idea, although originally coming from an SSD flash storage context, is in fact perfectly suited for this particular use case. So I started writing a Log Structured BTree key/value store library, LSB.
It took some time, but I got something working. But then, when I started thinking about how exactly I would use it to implement the logical equivalent of the inode table and directories in my file system, I realized I was going to need some kind of transaction support in the library to avoid excessive complexity in the file system code. This meant that I now had to implement some kind of transaction support. That took a bit of time, and when I had it implemented, I found out that I had broken my mutex anti-deadlock policy in a way that wasn't trivial to fix. I set about changing some of the key internal data structures to allow for a more feasible mutex policy. That took quite some more time. Eventually I got things working and I started using it in the file system.
The library uses a pluggable backing store interface with a limited file API that only allows appending data to a file, not modifying previously written data. The library includes a sample backing store module that uses a regular file system. For use with my distributed file system, I implemented a custom backing store module that uses its internal append-only object storage interface.
To achieve decent performance despite the issue of write amplification that is inherent to using a log structure to store a btree, LSB uses a write-ahead log for an in-memory delta store. All changes to the database are first put into the delta store and written to the write-ahead log. When the delta store has grown to a threshold, an asynchronous background flush to the underlying btree is started, which writes new btree nodes to the node log files for every node that is modified. The write amplification of modifying all nodes up to the root is amortized over a relatively large number of database changes, thus limiting the impact. Another asynchronous background task periodically cleans old logs that no longer store active nodes.
It seems to be working ok, and some benchmarks seem to suggest that performance is entirely adequate, as compared to some other golang kv store libraries.
The code is still kinda ugly, and needs cleanup.