Why it’s time for ‘data-centric artificial intelligence

The last 10 years have brought tremendous growth in artificial intelligence. Consumer internet companies have gathered vast amounts of data, which has been used to train powerful machine learning programs. Machine learning algorithms are widely available for many commercial applications, and some are open source.

source: – https://mitsloan.mit.edu/

Focusing on high-quality data that is consistently labeled would unlock the value of AI for sectors such as health care, government technology, and manufacturing, Ng said.

In fields like manufacturing and pharmaceutics, AI systems are trained to recognise product defects. But reasonable, well-trained people can disagree about whether a pill is “chipped” or “scratched,” for example — and that ambiguity can create confusion for the AI system. Similarly, each hospital codes electronic records in different ways. This is a problem when AI systems are best trained on consistent data.

Data is often messy and has errors. For decades, individuals have been looking for problems and fixing them on their own. “It’s often been the cleverness of an individual’s skill, or luck with an individual engineer, that determines whether it gets done well,” Ng said. “Making this more systematic through principles and [the use of tools] will help a lot of teams build more AI systems.”

For industries that don’t have access to tons of data, “being able to get things to work with small data, with good data, rather than just a giant dataset, that would be key to making these algorithms work,” Ng said.

A common belief holds that more data is always better. But for some uses, especially manufacturing and health care, there isn’t that much data to collect, and smaller amounts of high-quality data might be sufficient, Ng said. For example, there might not be many X-rays of a given medical condition if not that many patients have it, or a factory might have only made 50 defective cell phones.  

3 thoughts on “Why it’s time for ‘data-centric artificial intelligence

  1. This perspective offers a thoughtful and practical approach to the data requirements of AI, stressing the significance of data quality over sheer quantity in certain contexts. It calls for a shift in our AI strategies, making them more data-centric and tailored to the specific needs of diverse industries.

  2. One of the key challenges highlighted is the human subjectivity in labeling data, as illustrated in the case of product defects and medical records. In these scenarios, standardizing data becomes a paramount issue for training AI systems effectively. It emphasizes the need for more systematic approaches and tools to address data quality issues.

Leave a Reply

Your email address will not be published. Required fields are marked *