Exploring Datrium & Open Convergence
Datrium was the last of the presenting companies at Tech Field Day 14. Although I did some research for my primer post, I still wanted to learn more about their tech. There’s no better way to do that than to toss some of their folks into a room full of IT professionals. Another point that I was hoping we would touch on is what the term “Open Converged” meant to them.
In a nutshell, Datrium took the typical hyperconverged model and split it into two components: compute & storage. So how is this not just a traditional setup? The magic happens in their software. In theory, with Datrium’s DVX solution you still have a chunk of storage and a bunch of compute nodes. The difference comes with that “open converged” philosophy. The two pieces are tied together through software, however, either side can be scaled independently. The solution also isn’t tied to proprietary hardware, thus making the solution “open”.
LOOKING AT THE COMPUTE NODES
The compute nodes are designed to be completely stateless. How is this accomplished? The first step is to install a VIB file on your ESXi server (only VMware is supported at the moment). This VIB will essentially create a software storage controller on the host. At this point, all writes will go into the host’s NVRAM RAM as well as be synchronously copied to the DVX storage nodes. This approach moves most storage-related bottlenecks from the array to the host. Because we are dealing with NVRAM, high levels of performance are maintained and bottlenecks are minimized. (EDIT: For clarification, writes go to the host’s RAM (not NVRAM as originally posted) and synchronously to the data nodes mirrored NVRAM. Then data is flushed asynchronously to disks for long term retention on the storage node side. The compute hosts flush to flash.)
Each compute node also acts a read cache which translates into very fast reads for the host. If the data is not available in the cache, then other compute nodes will be checked for the data. If the requested data is still not found, only then will it go back to the storage appliance to read the data? This offers great flexibility for scaling. If I only have a handful of monster VMs, I can just spec out one or two “big” servers for performance and move the VMs there. The rest of the workloads can run on typical servers without the need to standardize on expensive configurations. Because writes are also synchronously written to storage, a failed host will only impact performance and not the actual workloads on those hosts.
CLOSING THOUGHTS
This solution reminded me a lot of PernixData’s FVP software. One difference here is that PernixData only lived on the host. Although it was a great way to gain performance, it did not address capacity issues. Datrium on the other end is aiming to become a one-stop solution. By decoupling the storage from the compute, you can easily scale in whatever increments an organization may need. The architecture also has very little to no management overhead. This translates into very linear and predictable scaling.
In an effort to minimize any sort of lock-in, the Datrium software can be installed on any server that supports the minimum requirement. This can be a big win for organizations that may have ample compute, but need more performance and/or capacity. In a true HCI solution, such an organization might end up discarding perfectly good (for their needs) compute resources. By going with the Datrium approach, the same company would be able to use existing hardware and likely save large capital costs.
Disclaimer: I was invited to participate in Tech Field Day as a delegate. All of my expenses, including food, transportation, and lodging were covered by Gestalt IT. I did not receive any compensation to write this post, nor was I requested to write this post. Anything written above was on my own accord.
Pingback: TFD15 Primer: Scale Computing | Matt That IT Guy
Pingback: Datrium in the News! – CDubHub.us