AI & Data Gravity
Investors and operators have long coveted systems of record. If you are one of those privileged software products, you benefit from data gravity—meaning you store the most valuable and largest datasets.
As a system of record, you own the control point. Customers are less likely to try to switch from you (giving you pricing power), and by owning the data taxonomy, you have an unfair advantage in offering additional applications that make use of your data. If you’re not the system of record and want to build on top of its data, you need to match the system of record’s data taxonomy—how the data is structured, labeled, and organized—forcing you to play by their rules. It’s a tough hand.
The power dynamics of data gravity are changing with generative AI. AI has the potential to lower that incumbent advantage and what it means for a business to benefit from data gravity. In fact, being a system of record may no longer pose as staunch of a moat as it did just 12 months ago.
AI serves as a bundling layer to structure data spread across different software applications. It can be applied in two distinct stages: integration, where it consolidates data from various sources with a focus on the interoperability of that data, and orchestration, where it streamlines and automates workflows across these integrated systems (we’ll tackle orchestration and workflow gravity in a subsequent piece).
This change in data gravity matters because for incumbents, new startups will start rapidly grabbing market share, while challengers have a chance to render old control points moot. As we continue to make advancements in AI, does data gravity still defend control points as effectively as it once did?
Data: Integration
To understand how data integration problems can be solved by GenAI, you have to understand the exceptionally nerdy world of data warehousing today. In a perfect world, all company data would live in neatly labeled tables connecting each disparate system with the next. This was the original pitch of data warehouse solutions like Snowflake. However, the bottleneck turned out to be only partially technology. It’s a distinctly human problem because table schemas are defined by (human) data/analytics engineers.
There is so much room for interpretation when implementing ETL pipelines, translating to a different domain language, and defining the mapping rules between two datasets. Each step requires large amounts of latent knowledge. LLMs are great for the three workflow components of ETL:
- Extract: understanding of table schema and API documentation
- Transform: data processing and transformation (removing duplicates, standardizing data types, join/merge operations)
- Load: validating that an integration is performed without corruption or loss
The opportunity we’ve seen is using AI in the creation of what we’re calling an “Operational Data Layer.” AI can harmonize disparate tools and data sources, translating information from one context to another without requiring companies to give up the tools they depend on. There is great potential to bridge gaps in terminology and function, “awaken” data and insights that were hidden before, and connect related concepts like "release" and "launch" that might otherwise seem disjointed. AI can even do this with multi-modality capabilities.
Why is this powerful? AI simplifies the “data storytelling” process and enhances the overall effectiveness of enterprise software. This makes data more accessible and actionable, which poses serious implications for the idea of data gravity. Integrating AI as a data middleware and operational layer offers a fresh take on data gravity control points, allowing companies to "integrate and augment" instead of the typical “integrate and surround” approach to control point formation that we typically reference. By leveraging AI to harmonize disparate tools and data sources, businesses can build new, automated workflows that enhance existing software stacks, boosting visibility and efficiency without disrupting established software.
As Robert Chatwani, President of Docusign and Tidemark Fellow, pointed out, “AI's ability to understand diverse data sources is shifting the competitive advantage from data ownership to data orchestration and actionable intelligence. This change is leading to really exciting new frontiers in enterprise software.” We’re aligned with his perspective.
Currently, software that is the system of record has been a very powerful (and hard to attack) control point once it achieves a dominant position or becomes the industry standard. This is because every software that wants to work with the system of record needs to conform to the schema/data model defined by the system of record.
However, for the first time, with AI, you can build software that can be effective without having to conform to the incumbent system of record’s data model. You don’t have to integrate directly with Salesforce to get the sales data—instead, you can ingest all the unstructured data from emails with customers, Zoom transcriptions, etc.
In other words, data gravity becomes weaker. Now, with AI, you can acquire this data outside of the system of record, and it becomes easier to migrate. This empowers organizations to maximize the value of their current data infrastructure while seamlessly adding advanced capabilities on top. The key is finding a way to make that data interoperable.
How Interoperability Can Work in the Enterprise
Cordial is a great example of how companies can leverage the interoperability idea to offer a superior product and challenge the data ownership advantage. As a cross-channel marketing platform that enables brands to orchestrate personalized, scalable campaigns, the company benefits from real-time customer insights by combining multiple data sources: some proprietary (i.e., traditional data gravity), some not. The platform leverages everything a brand knows about its customers to ensure high relevance and effectiveness. For instance, for its messaging product, Cordial integrates all relevant customer data to yield personalized emails/ SMS.
Getting more into the actual implementation, customer data varies significantly depending on Cordial’s customers’ end markets (e.g., sportswear vs. toys). This is where the AI-powered data integration element comes into play: for the data ingestion process, the AI needs to contextualize and make sense of the unique data from each client without cross-training between different datasets.
To be clear, we are talking about AI more broadly: while generative AI is popular now, you need predictive AI models to understand variables like purchasing propensity. The challenge (and opportunity) lies in combining generative and predictive AI techniques to make automated predictions based on unstructured and unique data for each client.
Given the current state of technology, humans remain accountable for decisions, with AI serving as a supportive tool rather than a replacement. Cordial’s AI models help automate decisions, such as choosing the best marketing asset for each customer while ensuring human accountability remains in the loop. This is especially true in contexts where gut feelings still play a role.
Of course, the data governance discussion is also top of mind for the Cordial team. To enhance data privacy in Cordial, each brand operates with its version of the model, maintained through a hard separation of data, not just logical partitioning, but entirely separate databases. This ensures that each client's data is isolated and secure. The platform further supports scalable metric analysis, automatically tailoring solutions to meet each client's specific requirements while maintaining this strict data separation.
The Opportunity (And Challenge) for VSVs
Companies often hesitate to fully automate processes due to concerns about losing control. The key to increasing adoption lies in developing AI tools that empower humans, making them more productive and enabling them to achieve more, thus elevating their role and success in the workplace.
As such, we don’t think data gravity fully disappears anytime soon, particularly for systems where transactions are involved. It’s important to still have a human in the loop using an AI tool to look across systems. However, to have straight-through processing or any involvement in high-stakes systems, that AI-exclusive approach simply won’t cut it (yet). It’ll likely be most effective in areas similar to robotics process automation, such as client-side integrations where a user needs two disjointed systems to communicate and share data with one another.
The implication is that as a founder, you should be surveying for offerings that “look across other systems” where their data, combined with yours, could compound to be more powerful than the incumbent system of record. We’re already seeing applications such as dashboarding, document review, contract management, and even light reconciliation being reimagined as powered by AI. We believe this will likely be followed by an engagement and automation layer—AI, RPA, or just plain old rules drive—built on top of this combined system of record. As AI eats away at the system of record (where data lives), being the system of engagement (where users live) becomes more important.
Further, AI probably makes it easier to “integrate and surround”—or perhaps “integrate and augment”—legacy systems. It also may make it easier to have multi-stakeholder platforms because the disparate tables you are connecting span across an entire industry rather than being confined to a single company.
While data ownership may hold less value in the AI revolution, Vertical SaaS Vendors have a major opportunity to compound their data gravity by going after the operational data layer and helping their customers consolidate multiple data pools. Moreover, the winners will help users create value out of this new level of data intelligence. The highest ROI solutions, though, will likely entail new workflows that leverage the power of a new and finally self-sufficient source of truth.
If you want to compound your data advantage with AI, reach out. We’d love to see how we can help.
Win
Control Point Patterns (2024)The Franchise ArchetypeTech-Enabled Roll-UpsFormation and Access
Extend
Employee ExtensionsConsumer Extensions
Marketplace Take Rates
Industry Platforms
Case Studies
Toast: Built to ServeDutchie: Emerging Industries
Isaac: Control Points 2.0
Everyone Needs a CoachFareHarbor: Bootstrapped Legends
CargoWise: Bootstrapped Legends
SiteMinder: Consumer Demand
AppFolio: Consumer Extensions
Davisware: Bootstrapped Legends
Ariba: Supplier Network
Avetta: The $3B Value Chain Extension
Slice: Unbundling the Franchise
CCC: Extending to the Supplier
Xero: Platform Strategy
New to the VSKP?
Get the VSKP delivered to your inbox. Each week, we'll send one strategy essay along with related case studies to help you make sense of it all.
Sign up for VSKP weekly deliveryKeep up with the VSKP
"The Vertical SaaS Knowledge Project" — Sign up to receive new content as soon as it's released.