We all know that Artificial Intelligence (AI) is here. We further know that AI relies on data. Businesses are increasingly becoming aware of the need for good data literacy and good data practices, sometimes as a precursor to, and sometimes as a critical partner element to, a good AI strategy. However, the landscape of data is also changing.
Why Does Data Matter to AI
AI, as a technology, builds everything by learning patterns from data. The past decade has seen dramatic improvements in AI partly because of the perfect storm of massive datasets meeting powerful computational hardware. While the data required to build effective AI depends very much on the problem domain, it is generally understood that larger datasets enable more options for powerful AIs. As the world realizes the importance of AI capabilities to national, regional and corporate competitiveness, who has access to what data becomes a key part of this competitiveness.
Where Does Data Come From?
Historically, data came from past experiences. For example, a bank may have data about the last decade of their customers. A hospital may have patient records. In these scenarios, data literacy and data management came down to making sure the data that an organization had access to was protected, well managed, and effectively used. These days however, an organization’s internal data is only one lever that is available in the story.
Data Augmentation and Synthetic Data
Technology advances have made it possible to augment existing data (generate new data programmatically), use synthetic data from simulations and other sources, or simply use AI techniques that are less data reliant. Depending on your domain, one or more of these methods can become a key part of a data strategy.
Foundational Models and Transfer Methods
Another powerful capability of AI is the ability to build one AI on another, using methods like fine-tuning or transfer learning. A class of AIs, called Foundation Models, are emblematic of this trend. These models have been trained with often massive datasets, but can be used in applications without necessarily having access to the original training data, and tuned for a range of purposes that leverage the understanding of the underlying dataset.
Open Data
As countries recognize the criticality of data to AI competitiveness, there are fears that an AI race will include a data race. To drive more equitable access to AI, organizations are creating public data repositories, in areas ranging from medicine (see an example of the Cancer Imaging Archive) to public repositories to train Generative AI models comprising trillions of text tokens and billions of images.
Data Marketplaces
Finally, it is also possible to buy and sell data in data marketplaces. This is particularly interesting since it directly connects to the value of data by putting a price on it. In some industries, custom datasets can also be curated, labeled, and purchased from companies that specialize in data generation.
Takeaways for Your Business
What does this mean for your business? It means that a data strategy is critical to any business looking to benefit from AI, and that while a data strategy should always include careful management of internal data, it is no longer limited to this element. A comprehensive data strategy for a business should now include understanding what public data sources may be relevant, what foundational models can bring in digested insights, what synthetic data and augmentation can be used, and what data if any can be purchased. By combining these elements, it is possible to create a powerful data strategy that can then enable Return on Investment (ROI) from AI solutions.
Read the full article here