By Mark Emmons
When most of us hear the word pipelines, we probably think of something involving the oil and gas industry. You know, hard hats, welding, and so on. But today, the idea of seamlessly connected pipelines also means more than ever when it comes to data ecosystems.
Just think about all of the distributed data sources within organizations today. Foundational systems like ERPs and CRMs. Hundreds (or more) of smaller best-of-breed applications. Databases. Edge devices. Data both in clouds and on-premises. The result? Data volume continues to grow exponentially.
It’s difficult to even know what you have within a business, let alone manage that data and transform it into meaningful insights.
That’s why Mike Bachman, the chief architect in Boomi’s Office of the CTO, joined with three Dell Technologies data experts to write the definitive resource about distributed data pipelines for multicloud enterprises. Since the free digital book “Modern Enterprise Data Pipelines” debuted in 2021, the need to understand, transform, and connect data has only grown.
Bachman took a few minutes recently to explain why more data shouldn’t necessarily mean more headaches when you have a thoughtful, organizational-wide approach to discovering, governing, and connecting digital landscapes. And, no, we did not discuss welding techniques.
What was your motivation for writing the book?
It’s important to start better conversations between IT and the line of business. The bridge between those two teams has broken down over the years. Our book is a way to help technologists talk with the business about how they think about technology. We’re not going to teach engineers anything they don’t already know. But the idea was for engineers to think about a better way to help their business colleagues understand how they think so that they can work together to solve problems. And it begins with simple definitions of things like data pipelines and data fabrics.
How do you define a data pipeline?
Essentially, it’s a data on-ramp, one or more computations applied to the data, and then a data off-ramp. It’s nothing more than getting data from a source, doing some computation on that data to transform it, and then outputting it to the next stage in how you want to use it. It’s simply inputting, processing, and outputting.
How do you define a data fabric?
A data fabric is the comprehensive sum of all the different tools (compute resources, queues, algorithms), pipelines, data resources, and endpoints that a company has. It’s really the full arsenal of everything that can transform or manipulate all the data within your organization.
You write about the need to have a different discussion about understanding data. What do you mean?
How we apply the rules to data is based on our organizational structure. But they’re not always the same rules. Typically, data is treated differently depending on the division, whether it’s sales, marketing, engineering, etc. You have these silos. But it’s essential for an enterprise to have a dictionary to define the data in various contexts and establish the nouns and verbs for everyone. When you don’t have a good methodology for that, it creates a lot of friction.
What do you mean by “nouns and verbs” in this context?
These are more definitions that technologists should be teaching the line of business. Think of the “nouns” as the data itself – whatever it is. The “verbs” are the actions applied to the data. The verbs are going to transform the nouns in some way. They’re going to filter, enrich, perform some level of computation on the nouns.
We often hear “data is the new oil.” But you take that analogy a step further.
What’s important is what you want to do with the oil. You can’t just take crude oil out of the ground and use it. You need to refine it. Everything from plastics to the stuff that powers engines are examples of refined oil. Thinking about processing data is similar. If data is the oil, transformation is the process of refinement. You’re going to filter it. You’re going to enrich it. You’re going to do something that makes it more useful. You want data easily consumable in the simplest amount of steps possible for any number of downstream endpoints, whether they’re inside your organization or outside.
How does integration play a role?
Speed and skill. The speed at which a modern integration platform moves makes it so much easier for the data consumers to get what they need in dramatically shorter periods of time. The problem with building point-to-point integrations is that it gets incredibly unwieldy as the number of endpoints grows. And it takes lots of people to manage integration at scale. That creates a backlog in production.
Boomi makes it very fast and simple to tap into the data without building these point-to-point integrations. Also, the number of people it takes with Boomi is a fraction of what you need if you require developers to code all of these connection points. Boomi’s additional value offers more than connection alone. Today, integration must have the element of “at scale.” That means connection plus intelligent data services. Integration needs to connect any endpoint and do useful things across all pipelines. It comes down to few hands to do many things.
Why do you think integration can help rebuild that broken bridge you mentioned earlier?
Today, all of the tools across clouds are available to use. If you want to use AI or IoT services, you can. You can route messages between applications pretty easily. Because a solution like the Boomi platform can integrate all of that, it then allows organizations to have business-level conversations.
The business can talk about its aspirations versus what IT can offer. You can bring both of them together. That allows you to ask: ‘Who is my end consumer? What do I want her to experience from my product?’ Then you can have a discussion about how to build that subjective experience that we want to evoke and what you want to measure to determine success.
Get your free digital copy of the “Modern Enterprise Data Pipelines” book to find out more.