New AI System Enables Machines That See the World More Like Humans Do

Every once in a while, computer vision systems draw conclusions about a scene that are illogical. A robot may, for example, ignore a bowl that is clearly visible to any human observer while processing a scene depicting a dinner table, estimate that a plate is floating above the table, or misinterpret a fork as penetrating rather than leaning against a bowl.

When a computer vision system is applied to a self-driving car, the stakes become significantly higher — for example, it has been demonstrated that such systems fail to detect emergency vehicles and pedestrians crossing the street on the road.

These issues were addressed by researchers at the Massachusetts Institute of Technology, who developed a framework that allows machines to perceive the world more naturally than humans do. Its new artificial intelligence system for scene analysis learns to perceive real-world objects from a small number of images, and then interprets scenes in terms of the objects that have been taught to it.

Using probabilistic programming, the researchers created a framework that allows the system to compare detected objects to input data in order for it to determine whether or not the images captured by a camera are a likely match for any candidate scene. It is possible for the system to determine whether mismatches are most likely caused by noise or by errors in scene interpretation that require correction through additional processing by using probabilistic inference.

This common-sense safeguard allows the system to detect and correct a large number of errors that are common to “deep-learning” approaches used in computer vision, as well as a variety of other errors. Also possible through the use of probabilistic programming is the inference of probable contact relationships between objects in a scene, as well as the application of commonsense reasoning to infer more precise positions for objects.

“If you are not aware of the contact relationships, you could argue that an object is floating above the table, which would be a reasonable explanation in this situation. As humans, we understand that this is physically impossible and that the object resting on the table’s surface is a more likely pose in this situation. Because our reasoning system is cognizant of this type of information, it is capable of inferring more precise poses from our observations. This is a crucial insight gained from this work “Nishad Gothoskar, a PhD student in electrical engineering and computer science (EECS) at the Probabilistic Computing Project and the lead author of the paper, explains why.

This research has the potential to improve the performance of computer perception systems that must interpret complex arrangements of objects, such as a robot tasked with cleaning a cluttered kitchen, in addition to increasing the safety of self-driving cars.

By admin

Leave a Reply

Your email address will not be published.