AI models are getting smarter. So why are enterprise decisions still fragile?
Nikolaos Vasiloglou, VP of Research ML of RelationalAI, argues that the real bottleneck in enterprise AI isn’t model quality — it’s architecture. As LLMs become cheaper and interchangeable, differentiation shifts to decision intelligence: semantic models, knowledge graphs, verifiable reasoning, and systems that retain context over time. He explains why enterprises must move beyond stateless prompts and build AI systems that can reason, govern, and defend their decisions.
Most enterprise AI systems today are built around stateless models that are very good at pattern recognition but weak at reasoning, consistency, and long-term business context. Enterprises increasingly need AI not just to predict or classify, but to drive and explain decisions that matter to the business — and traditional ML pipelines or generic LLM deployments weren’t designed for that.
At RelationalAI, we’re building what is now called a decision intelligence platform—recently recognized in the Gartner® 2026 Magic Quadrant™ and Critical Capabilities for Decision Intelligence Platforms—that grounds intelligence in semantic models and relational knowledge graphs, and couples predictive, prescriptive, and rule-based reasoning as first-class capabilities. Running natively inside Snowflake’s AI Data Cloud, it allows enterprises to reason over their data where it already lives, making business logic, governance, and semantics part of the AI stack itself — not an afterthought.
Fintech and Telco are highly security-sensitive and must operate within a strict data residency perimeter (Snowflake). We bring Decision Intelligence capabilities vital to their operations, such as Graph Analytics, Mathematical Optimization, Rules, GNNs, and Knowledge Graph Construction, within that perimeter. These capabilities, although they exist now, are not available to them because they are fragmented and require unsafe and fragile orchestration. It is difficult to track data movement and ensure data security. On top of that, our superalignment approach on LLMs ensures that LLMs produce verifiable answers and, most importantly, they make smaller models equivalent and better than frontier models at a fraction of the price.
As token prices collapsed and inference pipelines matured, LLMs stopped being “the thing you differentiate on” and became closer to cloud infrastructure: abundant, swappable, and continuously improving. That changes the design center of gravity for enterprises. The hard problem is no longer picking the best model—it’s building systems that keep working as models churn, vendors shift, and capabilities jump.
The big reveal is that knowledge is becoming the durable asset, not the model. Knowledge graphs are getting dramatically cheaper and easier to build—thanks to agent-driven extraction, iterative curation loops, and better tooling—and they’re increasingly central to modern architectures. They’re not just a backend artifact for search or metadata: the KG becomes the spine of application behavior and question answering.
GraphRAG is evolving in the same direction. It started as “retrieve better chunks,” but it’s moving toward answering directly from the graph: structured entities/relations provide grounded facts, while advanced reasoning-capable LLMs do the extended thinking to traverse, reconcile, and compose answers from that structure. In practice, this looks like a system where the LLM is the reasoning layer, and the graph is the source of truth and constraint.
At the workflow level, Deep Research-style capabilities (multi-step investigation, synthesis, and validation over internal sources) are becoming something you want available everywhere in operations—not as a novelty, but as a standard pattern for decisions, audits, customer escalations, procurement, and analysis.
Finally, we’ve entered the era of inference-time compute. Because token generation got very cheap, we can “think longer” (generate more tokens, explore more branches, do more tool calls) to get materially better answers. The tradeoff is that latency and perceived response time become the new cost: token prices fell faster than generation speed improved, so higher-quality runs can take longer. For many enterprises, that’s an acceptable shift—quality, correctness, and defensibility increasingly matter more than raw speed—and we’ll keep engineering around the remaining latency constraints with caching, tiered reasoning, distillation/fine-tuning for hot paths, and smarter routing.
One of the biggest mistakes is trying to solve complex problems with giant prompts and monolithic pipelines. That never works in the long run. Instead, enterprises should follow traditional software engineering practices: modularity, decomposition, proper error handling, and clear interfaces between components.
LLMs should be treated like other stochastic functions — similar to network calls — that can and will fail. That means you need verifiers wherever possible. Not all generative tasks admit strong verification, but when they do, knowledge graphs are an excellent platform for grounding and validation.
Breaking problems into smaller, verifiable tasks dramatically improves reliability. It also makes systems more resilient to model updates because individual components can evolve independently. Ultimately, building AI systems is less about prompt engineering and more about system engineering. I highly recommend the talk, “Introduction to Generative Computing” from the recent leading AI conference, NeurIPS 2025.
Today’s agents are largely stateless. They respond to a prompt, produce an output, and then forget everything. Without memory, agents repeat work, hallucinate context, and fail to learn from past actions.
Agentic memory is what turns agents from reactive responders into decision-making systems. It combines structured symbolic representations with the ability for agents to reason, update, and act over time. It is tempting to describe this as a rebranding of knowledge graphs, but that framing misses the point. This is an evolution.
With memory, enterprises can build AI systems that retain institutional knowledge rather than discard it with every prompt. That is the difference between AI as a tool and AI as a system. And that distinction becomes critical as organizations move from prototypes to mission-critical deployments. While agents use memory through RAG systems, they often have to create their own temporary short-term memory to reason. Claude Code and OpenAI Codex often create planning documents before they execute a complex task. The problem is that after this task is completed, they discard it. They don’t have a mechanism to add it to the long-term memory and persist it. In other words, although every time they solve a problem and produce knowledge, they don’t know how to convert it into an Institutional Asset. It is as if we are living in the world of the famous movie Memento, where the protagonist can not remember more than the last 5 minutes.
At first, I would advise them to make their employees 10x-100x more efficient by training them to use coding assistants such as Claude Code, OpenAI Codex, and Antigravity. There are also versions of those for non-technical people. This is an opportunity that they should not miss and is fundamental. Beyond the actual dollar savings, it is essential to shift the company's mindset toward being AI-first. This is not an easy cultural change, but it is necessary to ensure alignment across legal, engineering, and sales. I insist that everyone in the organization see what is possible and how; if they are not on the AI train, they will become irrelevant and obsolete very soon. Even for developers to adjust to a role change from writing code to supervising it is not easy. So that might take some time, and it is very important. Once the organizations are done with that, it will become obvious which direction they need to take to maximize their ROI, because it will expose things that are possible now that weren’t before. There are some common things that everyone can do, and you can find them on every blog, but that is not what will deliver big ROIs.
Nikolaos Vasiloglou is the VP of Research ML at RelationalAI. He has spent his career building ML software and leading data science projects in Retail, Online Advertising, and Security. He is a member of the ICLR/ICML/NeurIPS/UAI/MLconf/KGC/IEEE S&P community, having served as an author, reviewer, and organizer of Workshops and the main conference. Nikolaos is leading the research and strategic initiatives at the intersection of Large Language Models and Knowledge Graphs for RelationalAI.
RelationalAI brings enterprise decision intelligence natively into the AI Data Cloud. Powered by rich semantic models, advanced reasoners, and context-enhanced LLMs, RelationalAI provides agents that understand your business and drive measurable ROI – all without moving data from where it already is. Our goal: AI that can help run a company, not just chat about it.
Learn more at relational.ai.