The Right Data in the Right Place at the Right Time
Everyone agrees that artificial intelligence (AI) depends upon data. But as they say, the devil is in the details.
In the first article in this series, I differentiated among the various types of data that AI requires: model training data, information people put into prompts, and data that feeds into queries via retrieval augmented generation (RAG).
Agentic AI complicates this issue, as AI agents may look for and fetch data on the fly. As my colleague Eric Newcomer explained in the second article in this series, such agentic behavior may be non-deterministic: agents often behave differently from one occasion to the next.
Given the fact that AI agents will take whatever actions they can to meet the requirements set out for them, their unpredictability makes them both extraordinarily powerful as well as potentially dangerous.
AI agents simply aren’t practical unless the organization in question has its data house in order. In fact, agentic AI’s inherent risk has stopped many organizations’ AI initiatives dead in their tracks.
What, then, do you need to do to ensure your data is up to the challenge? Your data must be the right data in the right place at the right time to support your AI initiatives.
The Problem of Data Fragmentation
The data in any organization is scattered about – across environments, servers, cloud instances, endpoints, and anywhere else bits might get written down.
This data fragmentation is the first major roadblock to achieving the ‘right data in the right place at the right time’ goal.
To address such fragmentation, organizations require integration – technology that connects disparate data sources via connectors and APIs, as well as access to files, databases, and event streams, both on-premises and in the cloud.
Furthermore, such integration must work across different types of data, including structured (as in databases), semi-structured (XML and JSON, for example), and unstructured (documents, emails, and the like).
AI agents must either have direct access to source systems or data infrastructure tools that can access integrated data, for example, in a data warehouse or data lake. In other situations, agents require access to vector storage that is ready for RAG processing.
Data integration has been around for years, of course – but agentic AI raises the stakes. Today, data integration must enforce consistency, as the integration technology must resolve any conflicts that lead to inconsistent information.
Data Validation for a Single Source of Truth
Connecting AI agents to the data addresses only part of the data fragmentation challenge. Equally important: ensuring that the data that feeds the agents is accurate.
In other words, data must provide a single source of truth for AI agents, despite originating from multiple sources. Organizations must be able to synchronize source systems with validated and consistent data.
For structured data, master data management (MDM) can provide trusted golden records across disparate systems. MDM can enforce various quality rules including schema validation, data type enforcement, data deduplication, and rules for avoiding missing and invalid fields.
Unstructured data, however, is every bit as important as structured data for most AI applications. Organizations require end-to-end data pipelines to pull data from various data and vector stores. Those pipelines must prepare the data for the AI agents to use.
Such pipelines support real-time, event-driven data that can enforce data quality rules on unstructured as well as semi-structured and structured data.
Contextualization: The Key to ‘Smart’ AI
For AI agents to leverage available data to achieve their goals, that data must provide the appropriate context.
Context includes the business purpose for the data, where it came from, who owns it, and who should have access to it. Semantic consistency across systems and data sources is also essential for achieving the necessary contextualization.
In many cases, organizations must add metadata that enrich and contextualize available data across multiple disparate systems, clarifying hierarchical relationships among different data sets as well as dealing with issues of versioning and currency.
Is a ‘customer’ in one database or application the same as a ‘customer’ in another? Unless AI agents know the answer to such questions, they are likely to make mistakes that lead to bad business outcomes.
How Boomi Supports the Right Data in the Right Place at the Right Time
The Boomi Enterprise Platform includes a suite of data management services that address the requirements for AI data:
- Boomi Data Integration brings in large volumes of data from a variety of sources on a consistent and real-time basis. It establishes the data foundation that AI agents depend upon by delivering reliable and unified data workflows.
- Boomi Data Hub creates master data, resolves conflicts, and syncs data across systems, thus providing the data quality necessary to support agentic access to trusted data across the organization.
- Boomi Meta Hub provides curated business glossaries with semantic metadata, enabling AI agents to leverage proprietary business context for higher AI reasoning accuracy.
These three services work in combination under a single platform to activate data for AI, eliminating inconsistency and ambiguity, and ensuring high quality, trusted, and governed data for AI agents.
The Intellyx Take
AI agents require a data landscape that is ready for AI. Unless an organization implements data infrastructure that supports the needs of AI generally and AI agents in particular, any AI initiative will stop dead in its tracks.
The core challenges with data infrastructure are problems enterprises have been struggling with for years. Now, however, is the time to get your data act together.
As a vendor of comprehensive data infrastructure and automation technology, Boomi is well-positioned to help any organization integrate and govern data across the enterprise. Creating a digital fabric that connects systems and powers AI initiatives.
Copyright © Intellyx BV. Boomi is an Intellyx customer. Intellyx retains final editorial control of this article. No AI was used to write this article.