![]() |
A couple of quick trite observations, and then on to business :-)
There’s a lot of marketing mumbo-jumbo that conflates using software defined (using open or closed source software stacks), and industry standard server hardware as “web scale”. Yes, that’s how web scale architectures are constructed – but that analogous to saying “trees are green, and have leaves – but not all things that are green and have leaves are trees.” Enough preamble, let’s talk about the awesomeness that is Virtual SAN 6.2!
So what did VMware launch today? Virtual SAN 6.2, which is the following:
I’ll say it again – Virtual SAN 6.2 is a HUGE release: Support for All-Flash configurations. VSAN is a cached architecture, which is why there is always a SSD requirement – that SSD is in effect acting as the write cache. But, in VSAN 6.2, you can have very powerful all-flash configurations – where you configure the bulk of the SSDs/Flash for the purpose of capacity. This means that the cache doesn’t do any read cache (as the NAND is fast, and read cache has reduced benefit), and that in turn means more effective write cache capacity. While of course the performance of a VSAN node is configuration dependent, VMware notes that ~100,000 + write IOps per node are possible. That’s monstrous. (BTW – a post for another day, this is a notable ScaleIO/VSAN difference) Data Deduplication & Compression Data Dedupe and Compression are built around the architectural capabilities of all-flash systems – and are both inline. You need to enable it for the whole cluster, and for customers already using an earlier version of VSAN, it will go through and update all your VMs. This requires a non-disruptive low level format change – and you can speed it up if you allow a period of reduced redundancy VM by VM as they transition. The way Dedupe works is it is performed as the data is in the cache tier, and uses a fixed 4K block size. It is done across a disk group, so if duplicate data that is in another disk group, it will not be deduped. I suspect that this will trigger the debate of “one vs. multiple disk groups per node” (read Duncan’s post on the topic here). I personally believe that all-flash configs will rapidly become the dominant deployment model – and will bias to smaller rather than larger disk groups – but I will defer to VMware for the official position. Compression is done right before the data is committed to the capacity flash tier. Kind of neat, they will only compress the data if there is a 2x or better effect – otherwise just write the 4K block as is (to minimize computational load). Together with Erasure Coding (more on that next), these data reduction and efficiency approaches are important because they bring all-flash configurations into “no-brainer” territory. VMware claims up to 7x data reduction, and configuration that can have an effective $1/GB cost – though of course the data reduction rates will vary based on data. This is critical for all workloads – but obviously virtualization (general IaaS and VMs) and VDI/EuC are workloads that are materially affected by this economic effect. I want to strongly re-iterate something: a modern datacenter architecture in 2016 uses all-flash for all transactional workloads. PERIOD. XtremIO, and the new all-flash VMAX that uses the densest and lowest cost 3D NAND also bring the $/GB into the same price bands – and at that point, why do Hybrid? Now, this isn’t a critique about VSAN – but I want to point out some things that highlight why at large IaaS/EuC scales, things like Vblock 540 with XtremIO will be more efficient. Longtime readers often go back to this “Understanding Storage Architectures” post. XtremIO is a tightly coupled cluster (a type II), and VSAN is a loosely coupled cluster (a type III). Since it has a very tightly coupled design with extremely low-latency interconnects and a shared memory space – XtremIO’s dedupe domain is all the data on the cluster, so in practice at scales will have higher data reduction rates, as well as a very, very consistently low latency. This isn’t about “VSAN versus XtremIO” – it’s that practitioners want to understand the tools in the toolbox, and figure out where to use the best tool. VSAN scales like crazy, so you can start small, and then keep growing, but if you KNOW you’re going to be north of around 3000 or so VMs/EuC instances – you’ll tend to find that Vblock 540’s (if you want converged solutions) or XtremIO X-bricks are more cost-effective from a capex point of view. Now, in favor of SDS models, there’s another interesting observation that has nothing to do with capex, which is that SDS models have a certain operational simplicity and flexibility. You don’t do “frame upgrades” with VSAN or ScaleIO as an example. Migrations are non-disruptive. Scaling is non-disruptive. Updates are non-disruptive. Erasure Coding This is huge. In general, most SDS models use a form of data/object/chunk mirroring to protect against node failure. Prior to VSAN 6.2, the same was true of VSAN. The obvious downside was that this results in a “right out of the gate” effective capacity reduction that can be a minimum of 33% and 50% maximum. However, now you can use a new Storage Policy-based Management (SPBM) policy which uses Erasure Coding rather than mirroring objects. Since it’s an SPBM policy – this is a “per VM Object thing” (in fact you could have a policy for each VM disk) which is cool. While the analogy to RAID 5/6 makes sense, it’s important to realize that what we’re talking about there is a distributed parity value across hosts, there’s no RAID controller :-) Failures to Tolerate (FTT) being set to 1 or 2 result in these effective required capacity. Again – note that the effective utilization rates are lower than in “external storage array” category, but this is really leading in the SDS domain, and ultimately this factors into the “total solution cost” equation – also very important for all-flash systems and economics. Also – note that unlike other approaches that use erasure-coding, VSAN 6.2 doesn’t implement this as a “post process” or “cold” task. It’s fascinating to hear people who have professed (the nice ones have been “professing” – others that are less polite have been screaming the argument at anyone who dared disagree) for a long time that “data locality” is absolutely critical and paramount now pivot to “it’s OK for data to not be local” (erasure coding demands that you NOT have data locality). With VSAN – you don’t have to limit your use of erasure coding to “some” workloads (that are colder by nature). Go to town. VM-level QoS This is the beginning of a rich set of QoS policies which are controlled by SPBM, so like protection policy, QoS is a VM-level object policy. You can set IOps limits in VSAN 6.2 which can quell the “noisy neighbor” challenge, and expect the sophistication of the QoS engine to expand out over time to be even more flexible. There’s a lot more in Virtual SAN 6.2, including end to end CRC checks and disk scrubbing for silent latent errors, Client Cache code changes that make all workloads perform better, and make EuC workloads rock even more, and much improved Embedded health and performance monitoring – not as vCenter plugins, but embedded directly. As you can see – a HUGE release – congrats to the VMware team!!! So….what are we as EMC doing about this cool new release? EMC is embracing VSAN in two important ways. First way EMC is leveraging VSAN: as pure software. VSAN is an incredible SDS for customers uniquely focused on vSphere – and can be acquired from VMware, EMC or our mutual partners. Now – that said, people ask about use cases that favor VSAN and which use cases favor ScaleIO. To help understand “where VSAN, and where ScaleIO”, here’s a simple way that both VMware and EMC have collaborated on to understand their primary focus: ..This then maps to a simple way to think about the best way forward, and which guides (no hard and fast rule) towards which technology to use when.. I want to be clear – VSAN scales awesomely. VSAN can support mission critical apps. The decision path for VSAN/ScaleIO is NOT about scale or performance. And, while VSAN can absolutely be (and will – stay tuned!) be used in Hyper-Converged rack scale systems (which if you like my distinction above – requires full integration of the networking domain, and less modular, more disaggregated approaches), the customer observation is that at large scale, heterogeneity starts to become more prevalent. You bet there will be customers who want (I know them right now) who are uniquely focused on vSphere in at Enterprise Datacenter scale, and who will want a Hyper-converged rack scale system using VSAN/vSphere/NSX tightly coupled in the core, and Hyper-Converged appliance using VSAN/vSphere at the enterprise edge. The primary decision factors that steers one way or the other are not really scale or performance per se, but rather the three things on the bottom: customer complexity (for example, some customers need to have configurations that are NOT hyper-converged, but are rather blends of 2-tier compute-only and storage-only nodes – for operational, density, or political reasons), workload variation and the tendency towards vSphere homogeneity or heterogeneity. Also – there are areas in the Enterprise datacenters where the answer is both – so we’ve made this simple – both VMware and EMC offer a simple SDS bundle that entitles the customer to BOTH VSAN and ScaleIO. Now haters are gonna hate J Some will claim that one SDS is the cure to world hunger, and peace in the middle east. It’s become so laughable because “one thing is best” is a “zombie lie” (thanks Bill Maher who I’m paraphrasing: “zombie lie” = a lie that doesn’t die in spite of being clearly, demonstrate-ably and evidently wrong). Note a pattern – people with one storage stack think that it’s the best for all workloads, all the time. Coincidence? J VMware and EMC are blessed with the industries best SDS portfolio – and Virtual SAN 6.2 makes it even stronger. Second way EMC is leveraging VSAN: to build the industries best hyper-converged infrastructure appliance. Hyper-converged infrastructure appliances depend on their SDS stack in the sense that it defines many of their attributes. But – they go above and beyond in a critical dimension – a fully integrated stack, inclusive of the management and orchestration stack that makes deployment, node add/remove, and system updates to be a single-click affair. Something awesome this way comes – click on the image below, and save the date… Tune in a week from now on Feb 16th to see what VMware and EMC have been exclusively working on for a while. I’m pretty excited to share it with you :-) |
