The Ongoing Challenge Of Data Migrations

	One of the more painful topics in the storage business that never gets much of a public airing is data migrations: the inevitable necessity of moving all that information from one piece of hardware that's no longer useful to a new storage array, and doing so in a way that minimizes effort and disruption. Storage hardware ages like any other part of IT hardware infrastructure. New gear is faster / cheaper / better; old gear gets prohibitively more expensive and difficult to manage over time. Occasionally the motivations to move are around specific features, or perhaps improved performance and capacity. More often, the rationale is straight economics: it's cheaper to use the new gear than keep the old stuff around. If you're a smaller IT shop, data migrations are periodic annoyances that only manifest themselves when a new array is coming in, and an old one is either to be decommissioned or repurposed. But if you're a larger shop, you've likely got an extended fleet of storage devices, and there are always new ones coming in and going out. Simple numbers tell the story. Imagine a typical larger enterprise with 3-5 petabytes of data under management, often much more. Storage arrays, generally speaking, are kept around for between 3-5 years. That means that, on the average, you'll be moving a petabyte of data every year. There are roughly 200 working days per year, so that's 5 terabytes of data movement per day, every day -- just to keep the inventory current! The future isn't going to help us here: data volume under management increases something in the range of 30%-60% per year for most larger IT shops. The arrays are getting ever-more capacious. Tolerance for downtime isn't growing. And that's before we start talking about "big data" or anything like that. If you're immersed in the world of running enterprise storage gear, improving the state-of-the-art in data migrations is something you'd probably like the industry to get better at. It's Moving Day! The case has been made for new storage hardware. The debates are over, a vendor beauty pageant winner has been declared, and the new storage gear is now on its way. Most likely, there is a pile of information living on one or more older storage arrays that has to now be brought over. Your first thought is -- why can't we just copy all the data from the old to the new? Yes, that's the idea, but there's more to it than that. Much more. What Makes Data Migrations Hard -- Part I For starters, does everything need to be brought over? That question would be easy to answer if the IT team had a nice record, listing all the various data stores, what they were used for, who owned them, etc. But those sorts of repositories are as rare as hen's teeth in the real world. Maybe you had an accurate picture a few years ago, but a lot has changed since then. So there's likely a project in front of the project, simply to do an inventory of everything that's being stored on the older arrays, and assess whether it's still being used, if it can go somewhere else, etc. If that's too daunting (and it frequently is), the simplifying assumption is that everything is coming over from old to new. What Makes Data Migrations Hard -- Part II Given that newer arrays are bigger and more powerful than older ones, there's usually some consolidation involved. That means getting a handle on everything that's coming over (data sets, applications, clients/hosts, etc.) and making sure the new environment is configured properly: capacity, performance, availability, connectivity, etc. If the new array is targeted to support one or two new applications (e.g. greenfield) the design exercise is somewhat simplified; doing a 4:1 or 20:1 consolidation can take some healthy design skills. What Makes Data Migrations Hard -- Part III Storage arrays are used by either clients and hosts, and they're going to have to connect to the new device. For block-oriented SANs, that means a thorough inventory of servers, operating systems, host bus adaptors, SAN fabric, etc. to make sure they're all up-to-date and supported on the new array. Inevitably, a portion of the server farm is out-of-spec in some regard or another, driving a separate workflow to remediate all the various levels, or perhaps replace that ten-year-old host bus adaptor. Many of the required updates to hosts are themselves disruptive and require a reboot, which means scheduling and user notification is involved, etc. Not pleasant. More planning. In the file world, there's not a lot dependency on the client OS, etc. -- so this is far less of an issue, but instead there's a different challenge of preserving user access rights, quotas, etc. between the old NFS/CIFS domains and the new ones. And failing to get this right can be awfully disruptive to your users. Object storage migrations (and they do happen) typically boil down to preserving metadata tags, especially those that support compliance and application workflows. What Makes Data Migrations Hard -- Part IV And then there's the moving of the data itself -- lots of it. It takes time -- lots of it. Unless special provisions are made, data can't generally be used while it's being moved. That means downtime from the end user perspective. Potentially lots of it. That means many data migrations are performed during those human-unfriendly off-hours when downtime is marginally acceptable. More fun. What Makes Data Migrations Hard -- Part V Once the data is moved over, the real fun begins: making sure that everything works as before. Hosts can boot and find their data, no data got left behind, users can get to their file systems, performance is as expected, and so on. The infrastructure team has even more work to do behind the scenes: are the data protection mechanisms (backup, replication, DR, etc.) working as expected, do the monitoring and reporting tools work, and so on. What Makes Data Migrations Hard -- Part VI Taking into account everything above, any non-trivial data migration is a Project with a capital P. There's great gobs of detailed planning: alternative scenarios are evaluated, failback provisions are made if something doesn't work as expected, and so on. Within very large enterprises, you'll often find a dedicated data migration team as part of the overall storage function. Keeping the inventory current at sizable scale requires specialized skills and a proven methodology -- one that works across multiple vendors. But not everyone wants to invest in that expertise, so third-parties come into play: the larger storage vendors, IT professional services firms and the like. EMC, in particular, has a well-resourced group that does nothing but assess and program manage data migrations: from any set of arrays, to any other set of arrays, in any location. Digging through their tools and methodologies gives you a quick appreciation for just how complex this activity can be. Enabling Technology? We all wish for a universal silver bullet in solving this problem, but -- unfortunately -- there aren't any. There are some neat vendor technologies available to help, but they all are pointed at a portion of the problem, rather than the messy world of larger enterprises, multiple vendors, etc. When you're moving from like-to-like array technology, there's a reasonable expectation that the vendor has tools that can help with minimizing the disruption around moving data. Digging into the EMC block storage portfolio, there are array-specific replication products (SRDF, Recoverpoint, et. al.) which do a good job of bulking data over to a new array while it's in operation. More array-agnostic tools are available (e.g. Open Replicator) if needed. In the block SAN world, there's a strong desire to dynamically re-path a server to the new storage without taking a significant disruption. For many years, PowerPath has provided the required abstraction to invoke a new path without bringing the server down. More recently, VPLEX has found a strong use case in simply moving storage between supported block arrays with the server and its application being blissfully unaware that anything is going on. In the file world, things are a bit different, as you'd expect. The cream-of-the-crop are scale-out implementations such as Isilon's OneFS, where new nodes can be brought online and old nodes evacuated without any disruption, or need for much a priori planning. Although that presumes your data is already on OneFS, and not going anywhere else :) The VNX is no slouch either, providing VNX File Migration which basically enables read/write access while migrating NFS/CIFS file shares. Other vendors have capabilities as well: both in-platform, and occasionally cross-platform. VMware has an interesting capability (storage vMotion), but it also appears to have its limitations. Of course, it presumes that all of your workloads sit in an up-to-date VMware cluster -- and arrays tend to support a mix of virtual and physical workloads. The VMware tools don't have a high degree of knowledge of the array itself, and where best to place things, so that step is still required. Finally, all the data copying is done VM-to-VM, which might be acceptable for smaller data sets, but isn't the most efficient way to move a hundreds of terabytes all in one go. Power Tools Require A Power User All of these vendor capabilities are basically power tools. They presume you've done the planning, and have an intimate knowledge of both the environments you're coming from, and where you're going to. And you know what you're doing. Outside of slick self-contained solutions like OneFS and a few others, there's usually considerable heavy lifting involved. What more could we -- as an industry -- be doing? It's a long list ... First, the shift to scale-out architectures (such as Isilon) that are tolerant for a mix of old and new hardware greatly diminish the problem -- once you're into the environment, and if you're not planning on leaving anytime soon. It'd be great to see more of this in the industry. Second, all of us vendors could do a better job of inventory and planning tools: what's attached to the array, what is it running, the best way to organize and plan a migration, etc. EMC has some pretty good tools in the category, but they're largely designed for people who do this all the time, and not the occasional user. Third, we could do a better job of getting some the pain-preventing enabling technology into people's hands before it's needed. PowerPath, VPLEX, RecoverPoint, et. al. are great enabling technologies, but they mostly presume you've acquired and installed them ahead of time, which requires some forethought. Also, the pricing models associated with most of these technologies assume you're using them all the time vs. occasionally in the context of a specific migration. Fourth, if we're going to live in a world of hypervisors, that's a particularly interesting place to introduce new technology: faster data copying mechanisms, better planning tools that are aware of arrays as complex entities, etc. I suppose the best thing we could do in the storage industry is acknowledge that we've got a shared challenge -- one with no simple answers. All of us vendors keep churning out bigger, better, faster and more efficient storage arrays. We also keep making them easier to acquire and operate. All good. But we don't seem to be making much headway on making them easier to migrate to. ----------------- Like this post? Why not subscribe via email? Related Stories Information Logistics -- Revisited How EMC IT Does Storage What Storage Might Look Like In Five Years
Update your feed preferences