Your Data, My Model? Why AI Ambitions Demand a Contract Check-Up
As AI capabilities become standard fare in SaaS platforms, software providers are racing to retrofit intelligence into their offerings. But if your platform dreams of becoming the next ChatXYZ, you may need to look not to your engineering team, but to your legal one.
The Problem with “Your Data”
Most software providers already have mountains of processed, transformed and inferred data—data shaped by customer inputs and platform logic. That data could supercharge AI development, from powering smarter dashboards to training predictive algorithms.
But here’s the rub: just because the data isn’t raw customer input doesn’t mean you can freely use it.
You may assume your standard software licence or SaaS agreement gives you all the rights you need. It probably doesn’t.
What Does the Contract Say?
Take a typical clause like this:
“The Customer grants the Provider a non-exclusive, irrevocable licence to use Customer Data to the extent reasonably required to provide the Services and for use in the Provider’s business generally.”
Even a broad “use in our business generally” clause won’t necessarily cover:
-
Using processed or aggregated data from multiple customers
-
Training an AI model whose outputs are shared with others
-
Commercialising new AI-powered features not contemplated in the original deal
And if the data is derived from inputs that were themselves confidential or personal, you’ve got even more legal landmines—Privacy Law, confidentiality obligations, and IP ownership issues if the customer contributed meaningful structure to the dataset.
Is Deidentification Enough (or even Allowed)?
A common fallback is: “We’ll just deidentify the data.” But that’s not a bulletproof strategy.
Under most privacy regimes, data is only considered deidentified if re-identification is not reasonably possible—a high bar, especially in small or specialised datasets. Even deidentified data may still be contractually protected if it originates from information the customer expects to be confidential.
More fundamentally, your contract might not give you the right to deidentify the data at all, unless required to do so by law.
Most software licences and SaaS agreements treat customer data as confidential information. Unless the contract expressly permits you to transform, aggregate or deidentify that data for secondary use (like AI training), doing so could itself amount to a breach. Moreover, if the data includes personal information, you’ll need to navigate privacy laws that impose their own limits—regardless of your contractual rights.
So before you start feeding your LLM, make sure you’re not breaching your SLA.
What to Look For (or Add)
If you’re a provider:
-
Check whether your agreement expressly allows you to create, collate, and use aggregated and deidentified customer data for AI training and product development.
-
Ensure the licence to use data extends beyond service delivery and includes improvements, analytics, and R&D.
-
Include language around data governance, privacy compliance, and ownership of AI outputs.
If you’re a customer:
-
Scrutinise clauses that allow use of data for “business purposes” or “analytics”—these may reach further than you think.
-
Consider negotiating limits, notice obligations, or opt-out rights when your data could be used to build broadly deployed AI systems—unless, of course, that can be turned to your advantage.
In the Age of AI, Contracts Are Training Data Too
Training AI on customer data can unlock immense value—but only if your agreements keep up. Your model is only as smart as your data. And your data rights are only as strong as your contract.