exclusive content

When It Comes to AI, Can We Ditch the Datasets?

February 14, 2023

SciTechDaily

A machine-learning model for image classification that’s trained using synthetic data can rival one trained on the real thing, a study shows.

Huge amounts of data are needed to train machine-learning models to perform image classification tasks, such as identifying damage in satellite photos following a natural disaster. However, these data are not always easy to come by. Datasets may cost millions of dollars to generate, if usable data exist in the first place, and even the best datasets often contain biases that negatively impact a model’s performance.

To circumvent some of the problems presented by datasets, MIT researchers developed a method for training a machine learning model that, rather than using a dataset, uses a special type of machine-learning model to generate extremely realistic synthetic data that can train another model for downstream vision tasks.

Read the complete article here