CFOtech Australia - Technology news for CFOs & financial decision-makers
Story image

Getting more GenAI projects past the pilot phase

Yesterday

Two years in, generative AI continues to be a hot topic for the enterprise market. One of the key reasons for this is that the technology itself remains fascinating. It's almost like computers learned an entirely new dimension and a new capability; like they could crawl before, but now they can run.

Yet, despite capturing the imagination of technologists and executives alike, a familiar story presents itself in enterprises the world over. In spite of their efforts, many organisations will start pilot projects but experience trouble moving beyond that.

An order of magnitude of GenAI projects stall at the proof-of-concept or prototype mode. They get up and running at a small scale but become stuck on the path to production.

It is debated what proportion of projects this impacts, but the consensus is that it's high: between 70% and 90% of all GenAI projects.

So, what's keeping enterprises from putting more—if not the majority—of their GenAI experiments into production? Our experience points to three potential factors at play. 

"Fuzzy" experiments

The first factor weighing on GenAI projects is how they started. A common thread among early adopters is that executives pushed the organisation to "have an AI story" to tell—to get some skin in the game. Depending on the size of the organisation, that was then passed to the CTO or innovation lead and then down to the technical teams to build something with GenAI. What they were to build was often unclear or left open to interpretation.

This is a familiar experience for organisations that embrace new technologies, especially disruptive ones. The approach is naturally a lot less structured than when dealing with a more mature, established technology because the best use cases are unclear and have not yet had time to emerge. Indeed, we saw this in how organisations started experimenting with Neo4j a decade ago versus today. 

However, the lack of upfront clarity around direction and use cases has meant that GenAI experiments frequently don't solve a high-priority or material problem. That becomes an issue when the organisation wants to see a clear return on investment for its GenAI efforts, making it harder to justify productionising the use case.

Tech stack flux

A second factor at play is that the GenAI tech stack and model landscape are volatile and seemingly in a constant state of flux. Every week, there is a new large language model (LLMs) at the top of the public leaderboards that tracks their progress. Changes and new releases occur at such a frequent pace - model behaviour can change daily - that it is hard to keep up, and that dates experiments. Developers are having to update their own code potentially a dozen times or more across a single experiment, just to keep up with the current model release.

In addition, the choice of models is dizzying. Outside of the 10 or so leading models, there is a long tail of dozens more. They often appear to behave similarly in small-scale, experimental contexts, but when put to use in real-world production scenarios, there can be a lot of variability in performance.

Between the number of models and the fast-moving pace of their evolution and in the underlying tech stack, it remains technically challenging to take something through the development lifecycle into real-world production - and that is why pilots may never progress.

The core DNA isn't deterministic.

A further complicating factor for enterprises is the probabilistic nature of generative AI. Computers, since the von Neumann architecture, have been deterministic in nature; given the same input, they will produce the same output.

Generative AI steps is a departure from this. Its models are probabilistic, so the output of the models cannot always be precisely predicted and is likely to vary each time the model runs. 

This introduces additional uncertainty for developers: even if the use cases are clear and the underlying tech stack stable and mature, the probabilistic nature of the models produces inconsistent outcomes that make it hard to create something that is production-ready.

GraphRAG has a role to play

One way to get more generative AI pilots to production is to solve some of the variability in the model outputs: providing more certainty that the model has the correct response, and making the response explainable. 

This is achieved with Retrieval-Augmented Generation (RAG), which lets LLMs retrieve information from trusted (most likely internal) sources. There are a couple of different types of RAG. Vector-only RAG somewhat removes variability in the output of the LLMs. GraphRAG - which marries knowledge graphs and GenAI - takes this a step further, ensuring the model gives more accurate and complete answers and making the output explainable and traceable as well.

While not a silver bullet, the marriage of GraphRAG and GenAI offers many advantages that can give teams the certainty to move projects out from pilot and into production in an easier and more predictable way.

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X