cfo-au logo
Story image

Demystifying M.E.L.T. - the key data for business observability

20 Feb 2020

Article by New Relic senior director of customer solution, APJ, Jill Macmurchy.

We’ve previously discussed the role of business observability in software development and the core components required to make it a reality. Observability involves gathering different types of data about all components within a system, to establish the "Why?" rather than just the "What went wrong?". The acronym M.E.L.T. is used to define four essential data types: metrics, events, logs and traces.

Metrics

Metrics are the starting point for observability. They’re an aggregated set of measurements grouped or collected at regular intervals. Most share several traits: a timestamp, a name, one or more numeric values and a count of how many events are represented. Metric examples include error rate, response time, or throughput.

Metrics are typically a compact, cost-effective way to store a lot of data. They’re also dimensional for quick analysis; providing a great way to measure overall system health. Because of this, many tools have emerged to collect metrics, such as Prometheus and StatsD. However, metrics do require careful decision making, and decisions need to be made ahead of time about how data should be analysed. Certain areas can’t be calculated after the fact unless all raw sample events are available for analysis. To have complete observability, collecting and analysing metrics is a must.

Events

An event is a discrete action happening at any moment in time. Take a vending machine for instance. An event could be the moment when a user makes a purchase from the machine. There might be states that are derived from other events, such as a product becoming sold out after a purchase.

Events are a critical telemetry type for any observability solution. They’re valuable because they can be used to validate the occurrence of a particular action at a particular time and enable a fine-grained analysis in real time. However, events are often overlooked or can be confused with logs. What’s the difference? Events contain a higher level of abstraction than the level of detail provided by logs. Logs record everything, whereas events are records of selected significant things.

Adding metadata to events makes them much more powerful. With the vending machine example, we could add additional attributes such as “ItemCategory” and “PaymentType”. This allows questions to be asked, such as "How much money was made from each item category?" or "What is the most common payment type used?”.

The limitation of events is that each one takes some amount of computational energy to collect and process, which could potentially take up a lot of space in a database. Because of this, it’s necessary to be selective about what kinds of events are stored.

Logs

Logs are the original data type. They’re important when engineers are in deep debugging mode, trying to understand a problem and troubleshoot code. Logs provide high-fidelity data and detailed context around an event, so engineers can recreate what happened millisecond by millisecond.

Logs are particularly valuable for troubleshooting things such as databases, caches and load balancers, as well as older proprietary systems that aren’t friendly to in-process instrumentation.

Log data is sometimes unstructured which makes it hard to parse in a systematic way, but when log data is structured, it’s much easier to search the data and derive events or metrics from it. There are tools that reduce the toil and effort of collecting, filtering, and exporting logs, such as Fluentd, Fluent Bit, Logstash, and AWS CloudWatch.

Traces

Traces are samples of causal chains of events, and trace data is needed to determine the relationships between different entities. Traces are very valuable for highlighting inefficiencies, bottlenecks and roadblocks in the customer journey, as they can be used to show the end-to-end latency of individual calls in a distributed architecture.

Applications often call multiple other applications depending on the task they’re trying to accomplish, and often process data in parallel. This means the call-chain can be inconsistent and have unreliable timing. Passing a trace context between each service is the only way to ensure a consistent call-chain, and uniquely identify a single transaction through the entire chain.

W3C Trace Context is set to become the standard for propagating trace context across process boundaries. It makes distributed tracing easier to implement and more reliable, and therefore more valuable for developers working with highly distributed applications. Recently, the specification reached " Recommendation" status.

Whatever stage of observability an organisation is at, understanding the use cases for each M.E.L.T. data type is a critical part of building an observability practice. It allows better sense to be made of data and relationships, enabling quicker and easier resolution of issues and prevention of them reoccurring. This creates reliability and performances: the key objectives of observability.

This improves reliability, operational agility and ultimately better customer experience.

Story image
Empowering a remote workforce: can your business make ‘working from home’ work?
Does your business have a work from home strategy yet? Many organisations are looking to put one in place, to provide employees with greater flexibility around when and where they work.More
Story image
Forrester's guide to staying savvy in the world of COVID-19
Remote working, charting business outlook and CX are covered in this piece filled with advice for operating in the world of the pandemic.More
Story image
How data warehouses have become the new data lakes for business
While data lakes are great when it comes to storage, they don’t perform well when it comes to analysis and reporting. The vast volumes and multiple formats mean that traditional data warehouse tools are unsuitable and another approach needs to be found.More
Story image
Deep learning seeing widespread adoption in APAC region
As a sub-category of machine learning, deep learning is fast becoming part of mainstream AI deployments – new software to join the likes of Siri, Alexa and Cortana is currently being developed and will likely be introduced in the next few years, says GlobalData. More
Story image
Google's 12 remote working tips for better productivity
We've compiled Google's top tips for remote working in 2020.More
Story image
Forcepoint unveils impressive channel recruits across APAC and ANZ
Cybersecurity firm Forcepoint has named four new key appointments to its leadership team as it looks to strengthen its channel, strategy and sales lineup across the Asia Pacific and Australian New Zealand regions.More