I am excited to deliver the keynote talk at the IDUG 2014 Conference in Phoenix.
While I have addressed the topic of polyglot persistence earlier, this talk is focusing on DB2 practitioners. I have been using DB2 since the mid 90's. It is a trusted database of choice for many enterprises. In this talk I will be focusing on the path a DB2 practitioner should take to familiarize themselves with the alternatives to relational model. For some of these capabilities, we will have to reach out to other systems, and for some we can use the non-relational features that DB2 was introducing into its portfolio of capabilities. In any case, keep an open mind and make the best of the new data storage and processing possibilities!
Check out the details here. I am looking forward to seeing you in Phoenix!
I will be sharing a practical approach to selecting and combining NoSQL technologies for optimal impact. Then, we will discuss a set of compromises we may undertake to achieve a solid technical solution, while optimizing the overall benefit of multi-database systems. We will also discuss the often neglected aspects of practical NoSQL data governance and how to introduce it in an organization.
Check here for more details.
But how would it be to teach kids to program and tackle some of the exciting new technologies? Can teens embrace Hadoop? Can they feel the excitement of a 64 node cluster running on the Amazon EC2 cloud? Can they get excited about failover and CAP theorem?
The answer is resounding "yes". The twelve year old can make enthusiastic steps in these technologies. With a bit of care and thoughtfulness in the approach, young students can develop passion for these and other computer science topics. In that age I had a slide rule. The kids today can run a cluster in the cloud. We should be happy about it.
I am thrilled to receive the award of the ACM for the Supporting Teacher at the Synopsys, the Silicon Valley Science Fair. ACM and its publications were a great factor in my development and I am honored today to receive the ACM Supporting Teacher Award.
It is always good to be back to Austin! It is a great place for an event like the Enterprise Data World conference. This year at EDW, I will be giving a half day tutorial on Hadoop and Big Data tutorial.
What is different about this tutorial is that it is designed to address the seasoned data professional. Many of the tutorial and texts out there are targeting programmers, but this is not the optimal approach for our audience.
Check out the tutorial description here. We will also cover the topics of Big Data governance and explore the least painful ways of bringing in this technology into organization.
As a consultant, I work with clients enabling them to become successful with the new data and software technologies. For some of our clients, relational databases are not the optimal choice, and various NoSQL systems appear as reasonable alternatives.
At the Global Big Data Conference this year in Silicon Valley (Santa Clara), I will be giving a talk about our journey into polyglot persistence: the use of several types of data stores that are each a good match for a particular part of information processing problem.
As we started embracing Big Data and NoSQL across a number of projects, it quickly became clear that one technology is not going to be a solution for all of our needs. We begin by outlining the issues relational technology has with scalability and new data formats. We then illustrate the examples of dominant NoSQL technologies and how they fit into the big picture.
We will show how to productively bring in NoSQL systems into the enterprise, including classical reporting systems and integration strategies with relational systems. You will benefit from getting a clear picture of what type of NoSQL data store is a good match for data processing piece of puzzle.
- What is polyglot persistence?
- The relational database problems
- Taming big data with Hadoop and Map Reduce
- Scalability with Key/Value and Columnar stores
- Flexibility of Document stores
- Finding connections with Graph databases
- Data Governance for NoSQL
- NoSQL and Master Data Management
- NoSQL integration strategies
More details and registration at the Global Gig Data Conference site.
Big Data and Hadoop are now long-lasting buzzwords in the data processing community. Yet, few database practitioners understand what these technologies are, how to use them productively and how to integrate them into a conventional data processing landscape. It’s no wonder, as nearly all resources on these topics target software developers and not data professionals.
For this tutorial, we use IBM BigInsights Hadoop system and besides exploring the common Hadoop features we delve into some of its unique enhancements.
Here is the overview of what we are going to talk about:
- What is Big Data? For sure you could not escape the Big Data buzzword, but do you know what Big Data really is? Is your data Big? How about Medium data? Could you/should you apply Hadoop and its tooling to it? There are benefits even if your data is not huge!
- MapReduce algorithm. At the heart of Hadoop is MapReduce, the algorithm for processing large data sets with a parallel, distributed algorithm executing on a cluster. Learn about this algorithm that brings scalability and fault-tolerance to variety of applications.
- Hadoop. Hadoop is the framework that implements the common parts of the MapReduce. It provides the environment in which to run user Big Data programs. It is fault tolerant, it scales, it is cost effective and it can enable thousands of computers to jointly process data in parallel.
- Hive and Pig. While Java APIs for Hadoop allow for a lot of flexibility, they are at a fairly low level. For data professionals, the productive way of approaching the Hadoop is at a higher level: Hive allows for a subset of SQL to be run over the files stored in Hadoop’s Distributed File System (HDFS), while Pig is a data flow language. See the characteristics of both and its strengths and weaknesses.
- HBase. The database for Hadoop. Complementing traditional Hadoop processing, which falls into a category of batch processing, HBase is a database that provides online / real-time performance. It lies on top of the other Hadoop infrastructure and it is a distributed columnar database.
- Big SQL. Of course, the most productive approach for a data practitioner would be trusted SQL, but plain Hadoop does not have this feature. IBM’s Big SQL extension to Hadoop provides SQL users a familiar environment to become productive with Hadoop and even to use the JDBC APIs. You will learn how to use Big SQL and quickly become productive with Big Data applications.
How about the labs? In the tutorial we will show hands on how to start exploiting the benefits of Hadoop using the IBM BigInsights Hadoop distribution. We will use the QuickStart edition where you can begin exploring Hadoop in a virtual machine - just unpack and run. You will get the instructions on how to get it after the tutorial and run the examples yourselves.
I am looking forward to seeing you at the tutorial at the Information on Demand Conference, November 7th 2013. The tutorial is part of the Big Data and Analytics Tutorial Series. Register now here.