Big data deployment: It’s never “one size fits all”
by George Dearing
We've gone data crazy. Data lakes, dark data, data scientists — anything that helps us conjure up insights from our information is, well, big. But there are a lot of organizational challenges you need to consider — from skill sets and tool sets to budgets and configurations. For this discussion, let’s look at some big data deployment options and what’s driving choices for enterprises.
The shift is on from data lakes to real-time platforms
We’ve talked about data lakes and the benefits of unifying data. Data hubs, as they’re sometimes called, have gained traction as more companies build proof-of-concepts (POC) and small deployments with Hadoop, an open-source framework for distributed processing across clusters of computers.
On-premise vs. the public cloud
Cloud computing has upended every corner of IT — and it’s doing the same to big data deployment approaches. No longer can organizations assume that infrastructure- or platform-as-a-service are the best options. More discrete choices are emerging rapidly, like big data-as-a-service, Hadoop-as-a-service, or even database-as-a-service. These options are helping businesses avoid a one-size-fits-all strategy. Of course, that means governance and security models also need to be re-evaluated, especially if hybrid and private clouds are being mixed together.
The other piece being moved to the cloud is analytics — a good candidate to make the move first, as many niche vendors have data mining apps and can use the cloud to process huge amounts of data, taking the load off your IT department.
Open source vs. proprietary software
Big data deployment options should give you flexibility and shouldn’t box you in to a corner with proprietary tools or technologies. Combining different pieces of the technology stack can provide a better way to find the right design for an enterprise system. That’s where the commodity vs. purpose-built arguments arises, and when most open source discussions start. Apache’s Hadoop, MongoDB, Cassandra and few others are usually the starting points for core capabilities, with more sophisticated features coming later from add-on vendors or applications.
That’s not to say big companies aren’t deploying everything solely on open source, but it’s easier to mitigate risk when your big data projects are smaller and have fewer users. As projects grow in size, that’s when you can get the full benefit of partnering with a trusted vendor. There are plenty of open source integrators that can help with your goals beyond a POC. It’s not so different from the ecosystem that spawned after the rise of enterprise content management (ECM).
Commodity vs. purpose-built infrastructures
Most hardware vendors sell both commodity and purpose-built systems for big data. The price premium comes into play when you procure off-the-shelf systems, which include integration and built-in support.
While Hadoop and higher-end databases do provide increased scalability with large numbers of commodity servers handling distributed workloads, it’s still important to focus on your real business needs and big data use cases. Certain applications may need more processing power, more memory or SSD storage. Hardware profiles can also vary, with some requiring more administration and maintenance than others. And again, the complexity of what you are trying to achieve should drive a lot of the decision-making. If a ready-made appliance delivers pre-built integration and more advanced features, your total cost of ownership will likely decrease over time as your skill sets grow.
Consider all your options
Looking for your own big data implementation? Our IT experts can help. Learn more
Final thoughts on big data deployment
Spend time looking at all your options, and talk to other companies that have built big data prototypes. Find out how they mixed and matched technologies and platforms to create value. You’ll find there’s no set playbook for any of it.