You might be used to getting storage advice from a storage vendor, but the better source might be to speak to someone in their IT group.
Such is the case with EMC IT -- a great team I've written about consistently over many years.
It's an interesting perspective on several levels. EMC is a well-regarded $20B+ global enterprise in a highly competitive industry.
EMC IT also has direct and somewhat privileged access to the engineers who build the stuff, although they still have to pay for their toys just like everyone else.
Most importantly, EMC IT has successfully navigated a substantial transformation to an ITaaS model, and has refashioned itself as the internal service provider of choice for virtually every IT requirement here at EMC.
I was fortunate to speak with Srinivasa Maguluri, a long-time EMCer who is a cloud architect with the team responsible for storage architecture within EMC IT. Like many EMCers, he is bright and passionate with a lot to share. I've done my best to summarize some of the best thoughts from a presentation he gave recently.
How EMC IT does storage is probably a good example of how an IT organization deals with the inherent complexity inherent in any larger enterprise setting. Although you'll see references to specific EMC products here and there, I found that the frameworks and thinking broadly applicable regardless of your storage vendor choice.
See if you agree ...
One Mountain Is Always Followed By The Next One
Srini starts with creating a perspective of the "storage journeys" EMC IT has been on over the years.
The first wave was implementing ILM concepts: consolidation and tiering of storage, mostly as a cost-containment exercise. But, like any strategy, at some point there are diminishing returns to the investments made.
The second wave was aligning the storage strategy with EMC IT's virtualization strategy and associated converged infrastructure, dubbed here as "virtualized storage".
And now the team is well into their third wave: transforming the storage environment to efficiently deliver an ever-expanding set of services the business wants to consume, on-demand and metered.
The slide itself is interesting, the importance of creating relevant context for an audience even more so. One thing leads to another; no easy shortcuts.
Creating Capabilities
That representation is quickly followed by a "capabilities stack" view which I found useful.
At the bottom, we've got "consolidation" -- actually, moving from a per-project or per-owner storage model (everyone has their own array) to a shared storage environment, built on standardized array components. Consider this the foundation.
On top of that, a "virtualization" layer that creates a view of resource pools and associated capabilities, aligned and integrated with the server view, which at EMC is presumed to be virtualized as well.
Srini puts "mobilzation" as the next important capability -- being able to move data to the right place (within an array, between arrays, across distance) without an inordinate amount of effort or disruption to the business. EMC IT hates data migrations, just like everyone else ...
Next, logically, is "automation" -- investing in the process definition and associated tooling to offload the backroom storage team, and empower the people using the stuff.
Finally, there's "consumerization" -- making storage resources easy to discover and understand, making consumption easy, feeding back on costs and service delivered, and so on.
Srini rates the current capabilities (darker means more established). As far as the newer topics, EMC IT has great capabilities, but they aren't pervasive and "baked-in", which only comes with time.
My Favorite Enterprise Storage Architecture Diagram
Srini then shared a great slide -- an elegant architectural decomposition of the enterprise storage stack.
I've seen a lot of these over the years; and this one works very nicely for me, especially the decomposition and layering. It's sort of reminiscent of an OSI stack for storage :)
Starting at the bottom, you've got the hardware: storage engines, drive technologies, provisions for HA and data integrity -- sort of the raw material for your storage service catalog. Note that there is no presumption of *how* this layer is implemented: which technologies, how many, etc.
On top of that, a very precise articulation of required data services: replication, snap/clones, retention, dedupe, virtualization (abstraction) of storage, encryption and so on.
Moving up, there's a well-defined optimization layer: caching, tiering, QoS and resource partitioning -- and the ability to take dynamic direction from a platform higher up in the stack if needed.
Still working upwards, there's the mobility layer: data migrations, federation, multi-site availability, workload migrations -- and a global name space for file users.
Right above, there's the protocol layer with the familiar block / file / object presentations. The first time I saw this, I initially wondered why this was so far up the stack. Then I realized: these days, the specific services are usually strongly bound to the consumption protocol: for example, remote replication is semantically different in the respective block/file/object worlds.
Like me, Srini is pretty excited about the potential of ViPR to lessen these traditional bindings by providing common services independent of presentation -- but that's a topic for future discussion.
The security layer is represented as the final step before exposing storage services externally. Here we'll find multi-tenancy, authorization services, access control, auditing and the like.
Finally -- at the top -- we've got our various consumption portals: plug-ins, GUI, CLI, API. Again, we're both excited about ViPR's ability to standardize this layer regardless of what's underneath.
A few personal perspectives -- EMC IT uses a significant part of the EMC product portfolio to deliver the required storage services -- perhaps more products than absolutely necessary. What EMC product group wouldn't want their product showcased? That being said, there's a clear philosophy here of "service first" -- delivering the services users want comes first and foremost.
If you're responsible for storage architecture, I think you'd be well-served by having this sort of diagram that works in *your* world. If you don't, feel free to copy this one as a starting point :)
From Logical To Physical
As EMC is well along the way towards an ITaaS model, virtualization, converged infrastructure, etc. -- it's interesting to see how this all manifests itself through that lens.
Here's a current snapshot from a few months back. It takes a while to navigate, but I found many pieces interesting.
Up and down the left side, we've got storage, compute and the associated physical deployment model. Across the top (far right), we've got some big logical buckets: mission critical apps, non-mission critical apps, and IaaS.
Note: the distinction between "non mission critical" and "IaaS" has disappeared; as the IaaS model has matured, there is no longer a need for distinguishing between the two. VDI is called out as a special storage service.
Now that we're properly oriented, let's start our tour.
Upper right: notice that we're running *everything* on VCE Vblocks: VMAX-based ones for the super mission-critical stuff, VNX for everything else. Notice that the preferred storage management model differs: more highly automated in the IaaS use case; more direct element management for the mission-critical apps.
Skipping down to the compute layer, it's pretty much a VMware show. Note that the deliver model is different: automated for IaaS, program or project driven for other categories. Certain elements of mission critical apps may be physically isolated to ensure QoS; the preference is to pool as much as possible.
Moving further down to the green storage box, there are some interesting details: storage is pre-allocated for mission-critical, on-demand for everything else. And oversubscribed, as well.
Just as the IaaS environment has matured to subsume the "non-mission critical" category, I'm guessing it won't be long until it matures further and starts to subsume portions of the "mission critical" category as well.
Another perspective on that same topic can be seen from the different provisioning approaches. On the left side, we've got our traditional block environments.
As Srini notes, the vast majority of storage requests can be provisioned up-the-stack using VMware tools. Occasionally, there's a custom request for uber-high performance that needs to go all the way down the stack. Srini says that these special requests are starting to decrease in frequency as the IaaS environment matures.
Over on the file side, things are less demanding, QoS-wise. Fewer categories of service, easier to implement and manage.
Business Continuity, Backup and Data Retention
Up to now, we've been discussing primary storage -- but, as we all know, the custodial aspects can be equally as challenging: replication, backup, retention, etc.
EMC's internal business continuity approach is built around two, geographically dispersed campuses (Hopkinton, MA and Raleigh-Durham NC), each with a local bunker. A variety of familiar technologies are used across this topology, depending on the business requirement.
Like many large corporations, the super-important stuff always runs on SRDF. Three different BC/DR models are implemented on the same shared SRDF business continuity infrastructure to meet different business needs at discrete cost points.
For EMC's most mission critical applications (e.g. our SAP environment dubbed Propel), there is a rather sophisticated set of processes around cloning and shipping of logs and archives in addition to replicating the primary data stores. Transactional data is sent synchronously to the local bunkers; logs are moved asynchronously via SRDF/A yielding a zero-data-loss solution.
Don't ask me to explain how it all works -- I haven't been involved with log-shipping architectures for quite a while :)
At the other end of the SRDF spectrum, VMware's SRM + SRDF/A is the de-facto standard, delivering only a few seconds of potential data loss, but with far less resource consumption and effort.
At a high level, the story is rather simple: business continuity is delivered as a service. Here are your choices, here are the associated costs -- the business decides, not IT.
EMC's Recoverpoint is getting a healthy workout as well, as part of our quite substantial Exchange 2010 environment.
EMC is a company that runs on email; many of us routinely carry multiple devices, big files are routinely sent around, delivery is expected to be near-synchronous, etc.
A bad email day is a bad day for everyone in the company -- no exceptions.
There's an interesting story behind the selection of Recoverpoint to protect the Exchange environment, but it's a bit beyond the scope of this blog post.
Note that Exchange is fully virtualized as well.
File And Object Replication
Anyone who's worked with storage is quite familiar in how remote replication semantics differ between block, file and object. EMC IT is no different.
Like any large enterprise, EMC IT has more than their fair share of content that sloshes around in file systems and other repositories. Like block replication, most of the EMC portfolio gets a workout.
VNX file replication is the bread-and-butter technology of choice; but the newer big data environments are using the Isilon-based SyncIQ.
Atmos has a replication model all its own, very well suited to content repositories and applications that are aware of the services it can provide. And we do have a Centera farm, mostly for compliance-oriented retention.
Backup
For some reason, having three distinct choices seems about right when it comes to most things. A nice compromise between simplicity and effectiveness. We saw three choices in the business continuity discussion above, and for backup we'll see the same thing.
The end user backup model can be represented fairly simply. We all have an Avamar client, and that's about it. I never know it's working.
The one time I had to go retrieve a file I had overwritten, it was a self-service proposition.
Although that now I'm a Syncplicity user, I personally don't need this service very much anymore -- anything I might care about is in the sync-and-share space, replicated to multiple clients, fully versioned, etc.
Note: sync-and-share products like Syncplicity are not a replacement for backing up your entire C:/ drive or similar; but they do work very well for knowledge workers like me who tend to have a smaller working set of content that matters.
For garden-variety VMs, non-critical apps, file systems, etc. -- there's a combination of Avamar and Networker that uses VSS for application consistency, coupled with an automated installation that makes it easy to consume as part of a service. And for those oh-so-important databases, it's a combination of Networker and DataDomain, usually applied to clones of the database.
Data Retention
Like most companies our size, EMC has an awful lot of data that has to be kept around for one reason or another. Rather than come up with a simplistic answer, EMC IT uses four different retention approaches, depending on the primary business objective at hand.
If retention is compliance driven, it goes into a Centera with the appropriate audit controls.
If, instead, the goal is to make the information for reference and re-use at a later date, it goes into an Atmos farm.
Frequently, you want to be as cheap as possible, and that's FMA (file management archiving) combined with a deduped NAS (e.g. VNX).
And, finally, if the goal is to simply keep business records around for the longer term (7+ years), it goes to the DataDomain Extended Retention environment.
All services are clearly described, costs and capabilities are transparently shared, and the business owner decides what's appropriate -- sometimes with a little bit of coaching :)
A Simpler View?
Srini also shared this workload-to-array chart, as at EMC, everyone wants to know exactly *where* their data is being stored. I'm guessing that's a result of us being a storage company - but it's a useful picture nonetheless.
You see your familiar block / file / object, icons that generically describe familiar use cases, and the associated array it's likely to land on.
Not that it should matter to an end-user, but sometimes it's easier to give people a pretty picture, and move on ...
There are two fast-growing environments Srini is working on these days.
One is our burgeoning big data environment for Hadoop and everything that comes along with it. Isilon is the platform of choice here for a variety of reasons, not the least of which is its native HDFS support.
The other is the back-end to our corporate Syncplicity sync-and-share environment -- an environment that could potentially become quite large, very quickly. As a result, there's a strong focus on "all-in" costs to deliver the required storage service -- and Atmos was the clear winner here.
And, yes, EMC IT has to pay for their storage, just like everyone else.
Final Thoughts
Srini is the first to admit that his environment is not without its challenges, and there's plenty left to do -- not only migrating to newer technologies and operational models, but exorcising the remaining bits of legacy in the storage shop -- he's required to sweat his assets, just like everyone else ...
There's also a lot of team sport in play: storage has been thought of as an integral part of IaaS for a while, but as IaaS becomes re-cast into an emerging PaaS model, there will be new storage services required, new consumption and management portals, and new interactions with the various architecture teams.
Maybe his team is fortunate that he works at a large storage vendor. Maybe that makes their jobs a bit more difficult in some ways, as everyone's an expert -- if you ask them.
Regardless, I think his team's efforts to date are worthy of consideration -- great progress has been made, and the foundation has been laid for more to come.