7/8/11

Big Data, Hadoop & More: InfoSphere BigInsights and Streams



Last couple of years, as a consultant in enterprise architecture and information management, I see increasing number of clients who have started to outgrow the capabilities of traditional relational database systems. At the same we see increasing number of systems in the NoSQL (Not Only SQL) category. Apache Hadoop is a common platform for processing large, often unstructured data sets in a distributed fashion. This year, IBM came out with its Hadoop-based product: InfoSphere BigInsights. BigInsights goes beyond the standard open source Apache distribution by extending it with easy to use tooling (my favorite – it makes for a more productive Hadoop experience) and a set of extensions that improve Hadoop integration in enterprise setting. While Hadoop addresses one critical requirement: dealing with volume, its architecture is not handling another important requirement: velocity. When dealing data in motion, we want to of streams of data in (near) real time. So, Hadoop needs its real-time complement. InfoSphere Streams is such a solution.

 

Integration and use of these two technologies is the topic of the presentation I will be giving at the NoSQL Now! conference in Santa Clara. It will be a pleasure to deliver this talk together with Steve Brodsky, a Distinguished Engineer and Technical Executive for IBM Big Data initiatives at the IBM Silicon Valley Laboratory. We have already started working on the presentation; we'll have also some customer case studies to share. Stay put for the download link coming after the conference!