Unstructured data: Making sense of the mass and the mess

Wed, 24th Jun 2020

FYI, this story is more than a year old

By Sunil Chavan, Asia Pacific and Japan's FlashBlade VP, Pure Storage

Enormous data growth is coming. In fact, market intelligence firm IDC predicts data will increase 61% to 175 zettabytes by 2025. Organisations today are amassing vast amounts of data; everything from customer information to the burgeoning mountain of sensor and machine data from the adoption of the Internet of Things.

While a lot has been said about the potential of data, unstructured data accounts for 80 - 90% of the overall digital data footprint. This unstructured data, which exists in forms such as images, search data, videos and sensor data, is largely unsearchable and difficult to analyse--posing a significant challenge for businesses.

What's more, much of this data exists in complex infrastructure and silos, such as data lakes, data warehouses, storage area networks and backup systems. At one point in time, these silos served a purpose. Now, they create massive headaches for IT teams, who have to manage these complex systems, and end users, by making it harder to pull information from across the organisation to extract insights.

The good news is, organisations at the cutting edge of their industries have grasped that what was once “cold” or “dark” data has an increasingly important role to play in helping them to maintain their market-leading positions. Examples of these workloads can be found in areas such as genomics, semiconductor design, data analytics, media and entertainment, oil and gas exploration and computer-aided engineering (CAE), among others. Clearly, being able to gain useful insights from unstructured data is crucial to producing an overall competitive advantage for organisations in a variety of sectors.

Producing accurate analytics and training data-hungry artificial intelligence and machine learning algorithms requires an intensity, speed and volume that can often cripple legacy storage systems and media. Therefore, finding needles in the haystacks of unstructured data requires a long hard look at the existing storage foundation.

A modern data experience requires infrastructure that can bridge silos and meet demands for performance, agility and simplicity, without complexity or compromise. This calls for a data hub approach.

At the core of data hubs is a data-centric architecture designed for the very purpose of data sharing across the four many silo groups: data warehouses, data lakes, streaming analytics and AI clusters. This approach integrates their strengths onto a single, unified platform, eliminating bottlenecks across the applications that need data for better insight. Additionally, data hubs are simple and elastic, allowing applications to spin up and down resources as needed.

Pure customer Liquidnet built a data hub based on FlashBlade and FlashArray to perform timely trades and to generate real-time analytics that its institutional clients rely on. Delivering real-time analytics previously was not feasible because legacy storage systems lacked the I/O and parallelism needed to move very large data sets fast enough. FlashBlade provided these vital capabilities: the ability to process large volumes of streaming data in real time and the ability to make data sets available to multiple workloads simultaneously across multiple applications like ElasticSearch, Spark and Kafka.

The tremendous growth of unstructured data is creating huge opportunities for organisations. Until recently, the potential to maximise unstructured data had been restricted by the limitations of legacy storage systems. Fortunately, with a modern data hub architecture built on a next-generation storage infrastructure, such as Pure FlashBlade, organisations can extract real-time insights at the speed of business while enjoying cloud scalability and operational simplicity.

If you would like to learn more about how FlashBlade can help your organisation, visit purestorage.com/flashblade.