No matter where you are on your journey to leveraging Big Data, you have challenges to overcome. My team and I have been doing this awhile now. While I’m happy to report we’ve solved many traditional data problems, new ones are always popping up.
If you’ve just decided to build a Data Lake / go big data / whatever you want to call it or just a little further along in your journey, here are some of the challenges we faced early on and how leveraging Big Data solved them.
Image may be NSFW. Clik here to view.Problem: Can’t get access to the data
The data doesn’t exist in a way that IT can get it to you as they are used to doing a ton of work on the data then pushing it to you via BI tools. Get in line for an IT budget request.
Resolution: I still have to request access to a data set but it’s not limited to what IT has put into BI tools. I can self-help on the complete data set
Problem: Can’t get the data at the level I need
The data is just too big. Seriously, did someone just say that?
Resolution: Once the data is available you can view it at all levels and create new levels if you want.
Image may be NSFW. Clik here to view.Problem: Data can’t process because it’s too large
Processing the data would take days or you would have to cut it into smaller chunks them merge it back together.
Resolution: Data runs in seconds, sometimes a minute or two; game-changing for decision support
Problem: Can’t merge the data with other IT-maintained data
Using Shadow IT since IT didn’t have a way for you to do this. If you’ve been around a while, you’re lying if you say haven’t done some Shadow IT work, using a server in a lab or, even worse, some desktop.
Resolution: All IT sourced data is available to me. Of course I have to ask for access first.
Image may be NSFW. Clik here to view.Problem: Can’t merge the data with outside data
Similar to the last one but a different twist as we often need to blend our data with 3rd party data or industry data.
Resolution: I can dump / feed my 3rd party data into my Analytical Sandbox to merge with the IT-sourced data instead of using a Shadow IT solution.
Problem: Can’t add in business rules to the data
Getting to the data via BI tools was the only way and the tools were very limited. Many times we just use the BI tool for Extract Transfer Load and dump that output into a Shadow IT table for the next step.
Resolution: At a database level it’s just a few lines of code and a new column with our data shows up.
Image may be NSFW. Clik here to view.Problem: Can’t process many months of data, never mind years
If you haven’t already pulled and aggregated all of history, it will take days just to put the data together.
Resolution: We can literally say, “Sit down, let’s take a look”. To be honest, we can’t make it look pretty that quickly but sometimes quick and dirty is good enough. Even on bigger projects getting the data is not the problem anymore.
Everything sounds great, right? But don’t get ahead of yourself. With these traditional data solutions, come “new” problems/questions:
Where is my return on investment?
Let’s face it, it wasn’t cheap to get here so you had better have some ideas on how you’re going to create value. And let me say this: you saving days on pulling data is not the return they’re looking for. The return is the new insights and changes in business workflow you are about to do.
How do I ask the right questions before attacking a problem so I don’t create cool reports with no impact?
This was one of the hardest ones for me. We made a ton of mistakes turning on the new system as we didn’t know what was possible. Don’t boil the ocean on your first projects or they’ll fail. But don’t just focus on the BI benefit of speed because no one cares. Well, they’ll care if you plan to cut your BI staff. Otherwise no one cares . Most importantly, don’t give up!
How do I get Shadow IT teams or other BI users on board with a new way of thinking?
I’ve said this before: they are your best friends and can be your worst foe. Get them engaged and feeling like part of the team. They usually have tons of business knowledge and are often data SMEs in many areas. Initially they will view you as a threat looking to put the IT handcuffs on them. Change is hard and not everyone will get there. Embrace the ones that are willing to try.
How do I share this information, securely?
You now have some serious data out there and it needs serious rules about access to it. You must be able to grant access to the right people and make sure they don’t share it with unauthorized people.
How do I feed this insight into applications to impact workflow?
If you have access to the data and are able to model the data well you’re almost there. Data visualization is cool but not good enough. I want to be able to feed this insight into our workflow applications to impact the way we work in real time. Sorry folks, I don’t have a silver bullet for this one. But making sure your projects are clearly outlined with who is going to use your data and how they’ll use it can help. For example, you plan to feed your Data Science model data into an application which enables Sales and/or Service. This is when you are truly killing it, impacting workflow via Data Science! You don’t need a fancy chart, just the right information in front of the person making a decision when they need it.
Having these new problems is a sign of progress and that makes me happy. Many of our historical problems are gone and as we break new ground we will always run in to new obstacles. I’m curious if anyone else has had a similar experience or other challenges I may not have listed.