Big data analytics is all the rage these days, and Hadoop is becoming the big data platform of choice with new tools constantly emerging to help you gain valuable analytics from your data. But the fact remains, most organizations still struggle to effectively implement Hadoop into their traditional environments and ensure it compliments their existing enterprise data warehouse (EDW) strategy.
When planning your big data analytics strategy, be mindful not to put the cart before the horse. To put things into perspective, here is a quick overview of how Hadoop works as an analytics platform.
Hadoop as an Analytics Platform
Hadoop provides a near complete ecosystem where you can run batch and ETL-type processing, analytics, store data, and process data as fast as you want. At its core, it is a processing platform where you store data in HDFS and process data using MapReduce, as well as powerful transformation languages such as Hive and Pig that enable analytics, and Mahout for machine learning.
By establishing Hadoop as an Enterprise Data Hub – where you can store and process all of your data in one place – you can run multiple transformation jobs and deliver information to multiple systems. The Enterprise Data Hub enables faster analytics and makes it possible to consolidate infrastructure (hardware and software) within the Hadoop infrastructure.
Many existing business intelligence (BI) tools are enabling connections to Hadoop so you can use your current BI investments. The only thing that is changing is the behind-the-scenes processing. Many BI tools can do the same thing as Excel and Access, but by harnessing the power of Hadoop you don’t have to worry about volume. MetaScale customers find that their business users love that they can use the Excel style functionality that they are used to and be able to run analytics on files that have 10 million rows.
Big data analytics has the ability to transform your business. But the infrastructure required to achieve this – i.e. the Hadoop platform – is unlike anything you are used to working with in a traditional EDW environment. Just keep in mind how important it is to have a clear Hadoop implementation strategy before you can fully realize the potential of your big data…and don’t be afraid to ask for help.