Parikh wins NSF CAREER Award for Visual Question Answering research | ECE | Virginia Tech


Parikh wins NSF CAREER Award for Visual Question Answering research

Photograph of Devi Parikh

Devi Parikh

ECE Assistant Professor Devi Parikh has earned a National Science Foundation (NSF) Faculty Early Career Development (CAREER) Award for her Visual Question Answering (VQA) research, a method of using images to teach a computer to respond to any question that might be asked. The techniques developed in this project have the potential to fundamentally improve the quality of life for the estimated 7 million adults with visual impairments in the United States.

VQA provides a new model through which humans can interface with visual data, and this can lend itself to applications like software that allows blind users to get quick answers to questions about their surroundings.

Parikh’s career goal is to enable machines to understand content in images and communicate this understanding as effectively as humans. The CAREER grant, which is the NSF's most prestigious award and is given to junior faculty members who are expected to become academic leaders in their fields, will bring her one step closer.

Parikh and her team are building a deep, rich database that will mold a machine’s ability to respond accurately and naturally to visual images. Given an image and a question about the image, the machine's task is to automatically produce an answer that is not only correct but also concise, free form, and easily understood.

“To answer the questions accompanying the images, the computer needs an understanding of vision, language, and complex reasoning,” said Parikh.

For an image of a road and the question “Is it safe to cross the street?” the machine must judge the state of the road the way a pedestrian would, and answer “no” or “yes” depending on traffic, weather, and the time of day. Or, when presented with an image of a baby gleefully brandishing a pair of scissors, the machine must identify the baby, understand what it means to be holding something, and have the common sense to know that babies shouldn’t play with sharp objects.

The computer must learn these lessons one question–answer pair at a time; a lengthy, painstaking, and detailed process. With help from Amazon Mechanical Turk, an online marketplace for work, Parikh and her team will use this award to continue collecting a large dataset of images, questions, and answers, which will teach the computer how to understand a visual scene. The publicly available dataset contains more than 250,000 images, 750,000 questions (three for each image), and about 10 million answers.

“Answering any possible question about an image is one of the holy grails of semantic scene understanding,” said Parikh. “VQA poses a rich set of challenges, many of which are key to automatic image understanding, and artificial intelligence in general.”

Teaching computers to understand images is a complex undertaking, especially if the goal is to enable the computer to provide a natural-language answer to a specific question. VQA is directly applicable to many situations where humans and machines must collaborate to understand pictures and images. Examples include assisting people with visual impairments in real-world situations (“What temperature is this oven set to?”), aiding security and surveillance analysts (“What kind of car did the suspect drive away in?”), and interacting with a robot (“Is my laptop in the bedroom?”).

The NSF award also includes an education component, and Parikh considers VQA a gateway subject into the field as a whole. Like science fiction, VQA captures the imagination of both technical and non-technical audiences, said Parikh.

“This work can serve as a gentle springboard to computer vision and artificial intelligence in general,” said Parikh, who has committed to improving the computer vision curriculum at Virginia Tech by introducing an emphasis on presentation and writing skills in her new Advanced Computer Vision course.

Parikh, who earned her Ph.D. at Carnegie Mellon University, has been with Virginia Tech since January 2013. She leads the Computer Vision lab and is a recipient of the Army Research Office (ARO) Young Investigator Program (YIP) award (2014), the Allen Distinguished Investigator Award in Artificial Intelligence from the Paul G. Allen Family Foundation (2014), Virginia Tech ICTAS JFC award (2015), three Google Faculty Research Awards (2012, 2014 and 2015), and an Outstanding New Assistant Professor award from the College of Engineering at Virginia Tech in 2015.