The last 10 years have brought tremendous growth in artificial intelligence. Consumer internet companies have gathered vast amounts of data, which has been used to train powerful machine learning programs. Machine learning algorithms are widely available for many commercial applications, and some are open source.
source: – https://mitsloan.mit.edu/
Focusing on high-quality data that is consistently labeled would unlock the value of AI for sectors such as health care, government technology, and manufacturing, Ng said.
In fields like manufacturing and pharmaceutics, AI systems are trained to recognise product defects. But reasonable, well-trained people can disagree about whether a pill is “chipped” or “scratched,” for example — and that ambiguity can create confusion for the AI system. Similarly, each hospital codes electronic records in different ways. This is a problem when AI systems are best trained on consistent data.
Data is often messy and has errors. For decades, individuals have been looking for problems and fixing them on their own. “It’s often been the cleverness of an individual’s skill, or luck with an individual engineer, that determines whether it gets done well,” Ng said. “Making this more systematic through principles and [the use of tools] will help a lot of teams build more AI systems.”
For industries that don’t have access to tons of data, “being able to get things to work with small data, with good data, rather than just a giant dataset, that would be key to making these algorithms work,” Ng said.
A common belief holds that more data is always better. But for some uses, especially manufacturing and health care, there isn’t that much data to collect, and smaller amounts of high-quality data might be sufficient, Ng said. For example, there might not be many X-rays of a given medical condition if not that many patients have it, or a factory might have only made 50 defective cell phones.