Once in a while, I get someone who challenges me to really think outside the box, and speculate what things might look like farther out beyond the typical 12-24 month horizon.
When that gauntlet goes down, there's always an awkward moment in the room: the sales rep squirms a bit as we're clearly going off the script, the questioner's co-workers look slightly embarrassed, maybe I'll choke, and so on.
Why? Almost none of the IT vendors (including EMC) have formal external views that go out that far -- and that's exactly why it's a good question.
It's not a good idea to respond by simply waving your hands and extrapolating on well-known known trends. To do a good job, you have to identify the less-obvious trends outside the topic at hand, talk about how they'll interact in novel ways, and look beyond just the technology itself.
You also have to be comfortable improvising :)
I don't think anyone expects you to be even partially correct. It's a thinking exercise, nothing more. And, while 2018 might seem a long ways away, the future comes at you fast.
To be clear, these are my personal views, and may or may not be endorsed by EMC. There is no guarantee that any of this will come to pass. Ask me again in a few months, and I'll probably give you a different answer.
With all the usual disclaimers in place, let's dig in.
Why This Might Be More Relevant Than It Appears
We're clearly entering the information age -- it's the raw material of the 21st century. Our society and economy is being radically transformed as a result -- perhaps faster than anyone expected.
That rediscovered precious stuff -- information -- has to live somewhere: be captured, processed and retained in enormous amounts over long periods of time.
And that's where storage comes in. Networks move it, processors transform it, but it all has to live somewhere.
The Default Assumptions
I fully expect that the historical ~30-40% growth rate in stored capacity will start to veer into the 50-60% range (or even higher), driven by things like big data, content repositories, rich data collaboration models, and -- yes -- the internet of things.
Demand will continue to outpace technological and operational advances (I believe the same is true for compute and networking), which means we'll all inevitably end up spending more money to store and retrieve information.
If you're going to thrive in the information age, you're going to need the tools to do it.
As data volumes and use cases go exponential, gravity comes into play (tip-of-the-hat to @mccrory) and adds a new dimension to our thinking. Thanks to latency, large volumes of data want to be close to their applications, and applications close to users.
The disciplines around overcoming distance in the material world is generally referred to as logistics; in the information age, we'll be increasingly more concerned with information logistics : right information, right place, right time, right cost.
Sometimes the application will need to be moved closer to the information, sometimes just the opposite.
And, while we're talking about applications -- the proverbial consumers of information -- we'll no longer be interested in the historical 1:1 relationship between applications and "their" information.
Information will be generated for many potential application consumers; and applications will consume from many potential sources. The more uses we can find for a given amount of information, the more valuable it becomes.
The classic thinking around information management are starting to be re-envisioned as a result. Think big "data lakes" where information is freely poured in -- and consumed -- without the spaghetti of familiar adaptors and gateways.
Re-Imagining Physical Storage
Most of us have come to think of physical storage as big racks of dedicated hardware: blinky lights, lots of air whooshing, cables everywhere, etc.
Dig deeper, and you'll find familiar components: industry-standard storage media, compute, memory, ports, interconnects, etc. -- and a very sophisticated storage operating system.
One emerging theme is re-contextualizing that storage software in a different way: using commodity hardware sourced by the customer, and not the storage vendor.
While the familiar prepackaged array consumption model has its obvious strengths, so does the software-based storage model -- and over the next five years, I would expect it to be far more commonplace as part of a broader move to software-defined storage.
While some adherent of the software-based storage model may point to presumed hardware cost savings thanks to commodity hardware, that's not the real benefit as I see it. There's an attractive ease-of-consumption model, a potential ease-of-management model, and the ability to repurpose storage assets by simply changing the software that runs on them.
Regardless of whether physical storage is expressed as an array, a collection of servers with software, or some combination, I also think we'll see a clear segmentation of storage roles over the next few years: transactional performance vs. capacity and bandwidth.
Why? We're entering an era where information will always be used more than once.
The first pass will be about immediate actions: transactions, decisions, etc. Think lots of flash, in-memory databases, close to the server, etc. The second pass is more about subsequent retrieval and utilization: analytics, content, etc. -- think vast pools of spinning disks with some judicious use of flash as an accelerant.
While there will be plenty of room for storage architectures that can do a decent job at both, the extreme demands of emerging information usage patterns will tend to drive a natural segmentation in architectures -- all scale-out, of course.
SDN (software defined networking) concepts will inevitably find their way into the storage world, and with considerable impact, because just about every aspect of storage involves a network.
For scale-out architectures, there's the interconnect between nodes. Application servers and storage servers communicate over a network. And when we start bridging our data pools together over distance, more networking.
Historically, storage network technology (in any form) has been rigid, brittle fixed-design stuff. SDN shows every promise of making it adaptable and flexible -- as well as sharing storage-related traffic comfortably with other forms of SDN networking.
As processors get faster and sport more cores, we'll see even more powerful data services that can be associated with storage, beyond what we typically think of today: dedupe being only one recent example.
Re-Imagining Logical Storage
So, if we're redrawing the boundaries of where storage stuff gets done, we also need to redraw the boundaries of what storage does and doesn't do along similar lines.
Most people start thinking about storage services as "read" and "write". Fair enough.
Now add in tiering and QoS concepts. Now add in data protection and availability. Clones, snaps and delta logs in all their wondrous forms. Dedupe and compression. Maybe some distance-dissolving federation services. A few different data presentations: block, file, object, HDFS.
Indeed, some of the more intriguing concepts found in EMC's ViPR are around "one copy of data, many presentations". Here's your data as block, as file, as object, as HDFS, as a graph, etc. How data is presented is a function of what the application wants to see, and not what kind of storage gear you bought.
But there's another trend that's worth keeping an eye on ...
One thing I've noticed is that -- over time -- useful information services tend to "drift down" into the infrastructure. That means we can look at familiar information services that are usually performed at an application level potentially "drifting down" to get closer to the information where it makes sense.
One recent example are certain kinds of observed Hadoop and traditional DW implementations: basically boatloads of storage with a bit of application compute strapped on top. Is Hadoop a compute cluster or more of a special-purpose storage cluster?
If you're not sure, I can predict what the biggest line item is going to be in the bill-of-materials …
And there's no reason why a storage array couldn't store and present data in, say, SQL format.
As enterprises demand more analytics, that data is going to have to be sourced from applications that weren't designed to do so.
Call it ETL if you want, it's really creating a pipeline of fresh information into the analytics lake. Would it make sense to think of data sourcing as a storage function vs. custom code running way out on a server somewhere?
Wait a second, storage arrays already know when to land data in two places (i.e. replication). If they knew about application data formats, perhaps they could help create a real-time feed of transactional data into the decisioning environment?
A bit of arcane history? Many, many years ago EMC had a novel product: InfoMover. The idea was simple: using a shared storage array, you could open up and query your mainframe transactional data from UNIX without impacting the mainframe app.
We didn't exactly sell a lot of it, but for the folks who did implement it, they saw amazing benefits in terms of sourcing transactional data into decision support environments.
And, while we're talking about value-added storage services, why not consider low-level search? There's no reason why storage couldn't get smart about different types of application data, file formats, etc.
Is Google's legendary search engine a compute farm, or just a very sophisticated storage architecture? The lines can certainly be blurred.
Use your imagination.
Think of something that everybody does with information today in application space, and re-envision it as a potential storage function.
And I'm betting we'll see a few of those seemingly far-fetched ideas become commonplace in the next five years. We may come to see storage more as an information management layer rather than a mere dumping ground for all those 1s and 0s.
Re-imagining Storage Orchestration
Storage teams seem to be getting much more comfortable around creating and delivering services others can use, without the need to do everything themselves. The consumer of storage services gets policy options, associated costs, and a nice, converged portal to see how everything is doing -- essentially passing the responsibility "up the stack".
Often, that next control point becomes the virtualization team, or perhaps the converged infrastructure team. But they're doing the same thing as well, exposing services and portals to developers, application owners, and DBAs: with ever-higher levels of abstraction and relevant context.
While we'll never outgrow the need to understand what the storage is doing (no matter its physical or logical boundaries), all the heavy lifting is moving elsewhere in the stack for policy definition and closed-loop monitoring and measurement.
One key aspect of SDS (software-defined storage) is all about facilitating this orchestration transition to a set of programmable services: policy down, management information up. Storage services fall in line with other application services: dynamic, invokable, re-configurable, monitorable, etc.
Re-imagining Storage Consumption
Let's get this whole "cloud" thing out of the way, shall we?
Storage follows its workloads, and wherever workloads go, storage will follow. If more workloads get consumed from an external service provider, that's where the storage will go. If more workloads stay in the data center, that's where the associated storage will end up. Other than a few rather specialized use cases, storage and applications prefer to stay close to each other -- simply for performance reasons.
More interesting will be the potential changes around IT philosophy around capacity planning and provisioning. I've met more than a few IT shops that don't give a second thought to bringing on more compute capacity, but apparently detest adding more storage capacity.
Maybe it's the way we storage vendors interact with them. Or perhaps they think it's a poor use of IT money. Or maybe it's too difficult, or … well, I'm not quite sure.
What I do know is that current and future application models are going to want storage services virtually on-demand: capacity, performance, availability, presentation, etc. Just like they want compute, and network, and … the drill should be familiar.
No one is going to want to wait for the IT group to go through their traditional process …
It's Five Years Ago
One interesting exercise is going back five years, and seeing what we were doing back then. I can go back and read my blog posts from that time, and it's rather sobering how much has changed. Please don't go back and read them, many of them are terrible :)
Familiar terms like cloud, big data, mobility, X as a service, etc. --- weren't in the mainstream vernacular yet. Green IT, SOA, ITIL and data warehousing were very much in vogue. Virtualization -- in the form of VMware -- was getting IT people very excited, and VDI was a new shiny thing. Amazon's AWS was an interesting curiosity, nothing more.
IT groups were obsessively focused on reducing costs at all costs -- there was a recession that had just started -- and weren't particularly interested in being a business enabler, getting more strategic, creating services that users wanted to consume, etc.
On the storage front, the first enterprise flash drives were hitting the market, and people weren't quite sure what to make of them. Dedupe was starting to get popular in backup (Avamar, DataDomain), but hadn't yet made its way to primary storage. The storage bloggers were heatedly debating whether iSCSI or FCoE was going to conquer the world.
Sitting here in 2013, it all seems like a distant world, but it wasn't that long ago.