Jayabindu Singh , Director of Engineering , Starbucks Coffee Company
Rewinding back few years, when storage capacities were limited and expensive, companies were forced to make decisions where raw data sets were converted to star schemas using ETL process and stored in expensive Data warehouse.
The best old time: Most companies had no choice but to make their best decisions on what all data attributes and at what level of details they need to keep their historical data sets. Once the data was moved to Data warehouse, raw data sets were discarded to minimize storage expenses and to keep the systems in healthy state. In this process there was unintended loss of information.
Raw data vs. crime scene: Think of raw data like a crime scene. Law officers do their best to keep the area intact while they collect evidences from the scene. But eventually, they need to give up and let things go. Then, there comes a time during trial and investigations when officers realize that they may have missed to collect a critical evidence but unfortunately must live with what they have got.
In today’s world of advance analytics more the data sets, better the chances to solve your complex business problem
Similarly, in the analytics world when you are in need to perform deep diagnostics, access to unaltered raw data sets brings tremendous value.
Traditional vs. leaders: In today’s world, when storage has become cheap and capacities are no more an issue, some companies are still sticking to the traditional way of managing their data without realizing the value of gold they are discarding. Many companies who have moved away from utilizing ETL processes and old way of storing data sets and have started utilizing the raw data sets to perform advance analytics to make data driven business decisions are the ones with edge in the market. These are the leaders who really understand their customers, competitors and the future needs in the market to stay ahead of the curve.
Cloud for rescue: With the availability of cloud compute and storage like S3 and ADLS and many unified advance analytics platforms like Incorta and Databrics, it has become very simple to get to actionable insights in almost near real-time right out of the raw data sets without the need for any pre-processing steps. Since you are operating on raw data sets, you get full flexibility to drill down to any details you want and slice and dice the data you want.
Time to rethink your strategy: If you are a leader in analytics space and looking to bring next generation advance analytics capabilities to your organization, you should defiantly review your strategy on raw data retention and accessibility. These raw data sets don’t need to stay in source transactional systems just like old times but can now be seamlessly moved and stored in cloud storage in a very cost-effective way.
Raw data can be in many forms like transactional data, CCTV footage, data from IOT sensors, social media feeds, weather data, local events etc. In today’s world of advance analytics more the data sets, better the chances to solve your complex business problem. Availability of these data sets will define your future and clear your path from descriptive to perspective actionable insights.