rajeshnair.kc

MOVE OVER SMALL DATA, HERE COMES BIG DATA

Blog Post created by rajeshnair.kc Employee on Jan 2, 2013

This is my first of a series of posts on the topic of data, data management and yes ultimately tying into Archer and GRC. But before we dive into Archer and GRC,
I am going to first talk about data management because fundamentally data is where it all begins.  Right? And what better topic to start off with than something that is trending red hot on the data meter : "Big Data".  Besides we all got a handle on "small" data, right ?

 

In the movie “BIG”, the character played by Tom Hanks literally grows big overnight. This overnight transformation posed immediate problems – he couldn't wear the same outfits anymore, he couldn't use his “boy” bed anymore, his normal mode of transportation wasn't "fitting of him and so on. At the same time, he slowly began to see and use the advantages from being BIG.

 

We can certainly draw from this if anything to shed light on some common concerns about Big Data.

 

1) You don’t wake up to “Oh My God, Where did all this data come from”? Well, hopefully, you don't. At least in general, most organizations don’t get a large shipment of data dumped in their backyard one day in one big visible heap. In fact in almost all organizations, data has been flowing in over the years; it’s been ingested, cleansed, analyzed, filtered, processed, published and archived. Till a few years ago, most of this was data from sources that organizations knew they needed to draw informatio from. Also till a few years ago, you had a data “funnel” – lots of data being ingested, but eventually, after you analyzed it, you only processed and persisted a small percent of the ingested data. Albeit, it should be noted that the variety and rate at which data has been flowing in has picked up in the last few years.

 

2) Do you(I) have a Big Data problem? I have heard this posed over and over again. Data is your “opportunity” not your “problem”. The real question that needs to be asked is what business problem you have or what opportunity you can now create by harnessing “Big Data”.

 

3) What do I do with my old (small) data? Absolutely continue to use it the way you have because your business still runs on it. Big Data doesn't mean that you have to completely rethink and re-hash everything you have as we will soon see.

 

That’s fine and dandy, you say, but can I identify “Big Data” if I needed to solve a business problem – AAAAAhhhh, now let’s talk!  This is a very valid question. Let's talk about this a bit. Fundamentally, you need to first identify your need and then delve into “Discovery mode” to find the data you need to satisfy your requirements. So how do you discover “your” Big Data? We could start with a definition but we will leave technical definitions of Big Data aside for now as there are many pundits who have already defined for the general use case. We will "characterize" Big Data shortly. As I mentioned, keep presumptions aside and focus on identifying the data you need:

  1. First don’t search for big data. When you get there it will be staring at you . Ok let’s not sidetrack. Start with listing all the data sources that you believe will collectively give you the data that you need to solve your business problem or create your business opportunity. The key here is identifying " the data you need", not identifying systems in your organization and this is a very important change in mindset - why? - because traditionally, whether you like it or not, whether consciously or sub-consciously, many look at what data is available within the organization as opposed to what data is needed.
  2. I mentioned the change in mindset needed here. Now that being said, chances are you will find that a lot of the data is already “groomed” and “usable” from systems you have today – transactional systems , data warehouses, data marts etc. You may require additional data to be utilized from these sources than you did before – that’s ok. The more your organization's information systems can be leveraged, the better. 
  3. So far so good - you are happy, you haven't identified anything that really can't be handled by the organization's data sources. But then you start thinking about other pieces of data you really need to achieve your goal:

Maybe you have a popular web retail front end and one of your objectives is to improve your understanding of your customers “click-through” on your website. This can give you all sorts of insights to improve say purchasing likelihood, “website stickiness” etc. So you want to capture and analyze clickstreams starting from the search page that a visitor found your link on. You want to look at the hit date and time, download time, user agent, search key words etc. You are now thinking where and how to ingest and store this data, pre-process and build a predictive model before loading certain information into the warehouse. And you want to keep all that data so you can mine over time,

OR

 

Maybe you are in the energy business and want to take the lead on smart meters by collecting meter data on an hourly basis. No wait- you want to leap frog the competition by building a solution that can take in readings from 10 million meters at 10 minute intervals. That’s 60 million readings per hour or 1.44 billion readings per day.

OR

 

Maybe you are the head of enterprise IT Security team on a mission to minimize threats. You want to take Enterprise IT security to the next level by  analyzing  traffic/data flow from all systems in the enterprise and detecting patterns that are indicative of a threat. That’s right – traffic from all systems in the Enterprise and provide maybe a daily report across all systems on findings.

 

If this type of data is raising your eyebrows, then, well, congratulations, you now have Big Data staring at you !

 

All of the above have at least two common characteristics – it’s volume of data on a scale that you have not handled before and the rate or frequency at which the data is coming in is very fast. There are some other facets to Big Data which I will get into later.

 

In my next few posts, I will explore in more detail the characteristics of Big Data and delve into technologies that can help you leverage Big Data for your business.

Stay tuned.

 

Raj Nair

Senior Product Manager, RSA Archer

Outcomes