In 2011 I worked on Oracle Spend Classification, a data mining project. Machine learning algorithms classified products and parts into various spend categories for better control over the spend, contracts and even the design. Unlike the previous efforts the machine learning ran autonomously with a great degree of accuracy. But it worked on structured data, the product master, purchase orders, goods receipt and so on.

However, the world is mostly unstructured. The things we see, the things we do, the words we speak, the music we play. Computers are remarkably inefficient at making sense of all that. We have to annotate and label the data for computers to comprehend. And so, in the age of large language models, the term data mining takes on a new meaning. The real mining is no longer done by an algorithm. It is done by people.

Today the Financial Times has published a film on data labeling. People in rural villages strap a GoPro or smartphone to their heads and go about their day - stitching clothes, folding laundry, washing dishes. Everyday human activity, captured as data. Elsewhere, another set of people annotate those frames. This is a car. This is a bag. We send miners down a pit to dig for coal and gold. Today we send people down into their own lives to dig for data. That is the irony few notice: data labeling is the real data mining.

The supply chain of AI is already global and entirely virtual. The raw material is scattered across the internet. It is mined where labour is cheap, refined where capital and computing power are concentrated, and consumed everywhere. Like petroleum, mined data is refined through training into models, and the finished goods carry far more power and price than the raw materials ever did.

This is where the conversation about AI sovereignty feels incomplete. It assumes the raw material is freely available and that mining (labeling) may be a solved problem, so that refining becomes the main challenge. I am not sure that holds. The resource and the mining have to be solved in their own context too. A nation cannot build a model that understands its people if the data and the labeling for that context do not exist.

This is also why smaller, context-aware models may lead the way. They do not demand the same impossible scale.

There is a trigger to all this. Governments have begun to signal who may access the most capable models. Once a government starts controlling a resource (such as a refined, state-of-the-art model) others are forced to consider their own.

We are watching a global supply chain take shape in real time, with all the familiar tensions of resource, refinement, and sovereignty - only this time the raw material is us, our words, our actions and our lives.