In the first post in this two-part series, I summarized the journey to Hadoop at Dell SecureWorks. In this follow-on post, I will wrap up the story of our Hadoop journey with a look at five key lessons that we learned. Keep these important best practices in mind should you embark on your own Hadoop journey.
Lesson 1: Start with the end in mind
Our Hadoop journey began with this goal: getting all data flowing into our proprietary system to also flow into our Hadoop cluster. We decided to delay figuring out how to use the data until after we were finished with the changes needed to bring it in. Initially this seemed like the right decision, but as time went on we started to lose support from the business since they didn’t understand the value of what we were doing. Without the end in mind we struggled to help them understand. It would have been better if we had approached the problem with specific use cases in mind and made certain that our efforts were aligned with those use cases. Additionally, this would have made it easier to answer questions about how the data should be organized.
Lesson 2: Hadoop isn’t the only technology required to use big data
When dealing with big data you need to be prepared to deal with problems that extend beyond the realm of Hadoop. Many people naively assume that a big data technology will solve all those problems for them. If you are like us, you will quickly learn that Hadoop is only one of the many technologies available and involved in using big data. To use big data you need a way to get data into Hadoop, and you need a way to get it back out. In our case, we required fatter Internet pipes, upgraded routers, upgraded switches, upgraded firewalls, additional load balancers, and more. The key point here is that when you are planning your big data journey, make sure you have a view of the end-to-end picture.
Lesson 3: Big data really is big
People seem to forget that big data really is big. People assume that simply because they are using a big data technology it should be reasonable to expect OLTP (online transaction processing) performance against all of their queries. Technologies like Impala provide fast response times when scanning a subset of your data—basically whatever you can fit in memory—but if you need to scan more than you can fit in memory, the laws of physics kick in. You should expect queries that scan all of your data to take minutes or hours or even days to complete.
Lesson 4: Be prepared to throw out your old play books
Many of the plays that we know and love simply don’t apply when dealing with big data. Be prepared to rethink how you do backups, how you handle disaster recovery, and how you handle data center moves. Again, identify the use case and prepare a view of the end-to-end solution.
Lesson 5: Change is good
Hadoop has been rapidly changing since its inception. The changes have been good for Hadoop. Don’t fight the change. Use your Hadoop representatives to stay abreast of where the technology is going and be prepared to adapt your environment to maximize your Hadoop investment.
Ultimately, the need to process data as scale is an increasingly common problem for enterprises. With the rise of the Internet of Things (IoT), HPC, and cloud computing, big data is now everywhere. If you are not currently forced to deal with big data, rest assured you will be before long. And when that day comes, you will definitely want to explore the use of the Hadoop platform.
To learn more about the Hadoop environment at Dell SecureWorks, read the case study “Helping customers stay secure .”
Contact us for more information or with any follow-up questions: Hadoop@Dell.com.
Jim Birmingham is director of engineering at Dell SecureWorks. He is responsible for event processing, big data analytics, cloud security, health monitoring, machine learning systems, and data science.
©2016 Dell Inc. All rights reserved. Dell, the DELL logo, the DELL badge and PowerEdge are trademarks of Dell Inc. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
http://www.cio.com/article/3100670/data-center/five-important-lessons-learned-from-a-journey-to-hadoop.html#tk.rss_all via http://www.cio.com #CIO, #Technology