Posted on

Deep Science: Combining vision and language could be the key to more capable AI



Share

Depending on the theory of intelligence to which you subscribe, achieving “human-level” AI will require a system that can leverage multiple modalities — e.g., sound, vision and text — to reason about the world. For example, when shown an image of a toppled truck and a police cruiser on a snowy freeway, a human-level AI might infer that dangerous road conditions caused an accident. Or, running on a robot, when asked to grab a can of soda from the refrigerator, they’d navigate around people, furniture and pets to retrieve the can and place it within reach of the requester.
Today’s AI falls short. But new research …

Read More