![]() |
Big Data means Big BusinessEarly and mid-2000s was the time of exponential surge in data production. This exponential rise, owing to advancement in wireless technologies, automation and internet speeds, gave birth to a tremendously fast growing technology vertical, big data. Big data, due to its sheer size, had huge computation needs, which gave us Zookeeper, BigTables, GFS, and Cassandra, and then, the most famous open-source project after Tomcat, which is Hadoop. By the early 2010s we had plethora of startups like MongoDB, DataStax, ElasticSearch, Cloudera, HWX, etc., who adopted the open-source tools and made them enterprise ready. In past 7 years, the big data ecosystem and its adopters have matured and slowly moved towards much more advanced use cases around Deep Learning and AI. As a matter of fact, it is safe to say that big data and the opensource community are literally transforming every enterprise backend on this planet. For example, we have helped many enterprises transform “data warehouses” of relational databases into “data lakes.” Every enterprise is somewhere on this transformation path already. Just to put this in perspective, more than $100 billion is being spent on this transformational journey, including hardware, software and services. But, everything is not merry. There are some serious and complex challenges around the next phase of evolution of big data. These include: control, authenticity and monetization, just to name a few: First Challenge: Control over the infrastructure itself in multi-tenant environment:
Second Challenge: How well can you trust the data itself?
Third Challenge: What’s the value of your data, and how do you monetize it?
So how do we addresses the challenges above?Big data seems is currently closed and restricted, with many access, authenticity and monetization challenges. But, wait, as Plato once said “Necessity is the mother of invention”, the above necessity gave birth (kind of) to a new tool for big data, Blockchain Technology. While discussing pure blockchain is out of scope of this blog, but I will give a quick introduction to it. Recently blockchain got a huge upsurge in its popularity because of Bitcoin, which utilizes Blockchain. Technically, you can call blockchain a database, but this is not any normal database, this is next generation of information decentralization. I would not hesitate calling it the only data base that you could call Blue Ocean, with beneficial attributes such as include decentralized and shared control, immutability and audit trails, and native assets and exchanges. Just for the sake of the terminology, blue ocean model means: a new, uncontested market space that makes existing competitors become irrelevant and creates new consumer value often while decreasing costs. However, blockchains have terrible scalability and don’t even have a query language. But even then, the blue ocean benefits have proved enough to capture global imagination. The good news is, it’s relatively easy to marry Hadoop’s scalability and blockchains to create something which can be loosely called a blockchain database, which provides the best of both worlds. A simple NoSQL database like MongoDB can very easily enable query ability and schematic representation for blockchains. This unlocks potential for highly interesting potential for applications in big data. Examples include shared control over infrastructure, audit trails on data, even possibility for a universal data exchange. The figures below show how blockchain changes the game and how decentralization materializes. The figure above shows, that as we move from left to right, the components from blockchain architecture become more prevalent and thus open up the system. While on the left side we still have heavy siloes of application. We see that the processing layer changed over form directly hardware dependent to much more abstract and distributed (ex Ethereum), similarly the file system evolves from local namespaces to a global namespace kind of file system example IPFS (Inter Planetary file system). Below is how blockchain complements and compares with a scalable database. Addressing First Challenge: Shared Infrastructure ControlBeing a blockchain database means the control of the database infrastructure is shared across the entities, whether within enterprises, consortiums or even across the planet. Cool, isn’t it? How? Blockchain based architecture (of a database) is decentralized, which means that its control can be shared. This sharing can happen in one of several ways:
Since this is a big data database, unlike traditional blockchains it can hold the data itself. As, database itself fills up we can keep adding more and more database connected using an open protocol called interledger. Benefits to this approach:Problem. The multinational entity problem Solution. Each regional office with its own sysadmin controls one node of the overall database. So they control the database collectively. The decentralized nature means, that if a sysadmin or two goes rogue, or a regional office is hacked, the data is still protected. (Assuming encryption is in place) Problem. The Industry consortium problem Solution. Similar to above each company controls one node in the chain Problem. Single shared truth of data problem Solution. A universally distributed interledger of databases, where essentially everyone can be part of the universal data market place. Example IPDB. Addressing Second Challenge: Audit Trails on DataBlockchain allows us to have detailed and definitive audit trails on data, to improve the trustworthiness of the connected nodes. Similarly, this principle applies to your data residing in blockchain database. How? Let’s consider a simple data pipeline: IoT sensors -> Kinesis/Event hub + Stream Analytics -> Isilon Storage (HDFS) -> Spark Data prep -> Spark Modelling -> MongoDB Storage -> Tableau. Shown in figure 3 below. So here is what happens:
So the output of each pipeline step is timestamped in the three steps mentioned above. Benefits to this approach:
The figure above shows a generic, agile and modular architecture which increases, security, reliability and trustworthiness of an IoT architecture.
Last but not the least my favorite topic… Addressing third Challenge: Universal Data ExchangeThis novel method enables us to build universal data market place which helps evaporate walls of data silos. A scalable consistent blockchain database architecture speaking the protocol of IP rights transfer enables data to be bought and sold as an asset. The concept is new and amazingly exciting. Not only a universal marketplace, it’s also collectively controlled by a public ecosystem. People and corporations can build data exchanges on top of this universal marketplace to suit their needs. How? Here is how it works. We need to build a global public blockchain database, which currently exists as an open non-profit initiative in the form of IPDB. Remember this can be securely implemented by consortiums as well (separated from public infrastructure). There can be even multiple networks, where assets flow utilizing interledger protocol. The asset is the data rights, backed by copyright law. The asset lives on the blockchain db. Remember, you own the private key for the data you own. You can transfer you rights, data and its slices using open blockchain IP protocol. Some opensource protocols are available example Coala IP. Benefits to this approach:
Figure below shows how a consortium of similarly a multinational works when it comes to enabling a universal data marketplace (The Dream). ConclusionDefinitely big data means big bucks. Blockchainified big data helps resolve three of its outstanding challenges: How to control the Data, how to trust the data, and how to build universal exchanges. We at Dell EMC are firmly dedicated towards making data much more accessible, making it open and enabling large enterprises realize the real potential of data as an economic asset. So, Chains of Big Data is turning out to be the best approach towards creating an open, connected, trustworthy and universal data marketplace, which in turn enables better collaboration and value out of humongous amounts of data being produced every second. The post Solve Big Data Evolution Challenges with Blockchain-ified Data appeared first on InFocus Blog | Dell EMC Services. |
