A new method automatically explains, in natural language., the functions of individual neural networks.
Sometimes neural networks are called “black boxes” because they outperform humans in specific tasks. However, researchers often need to learn how or why they work. However, if a neural net is used in a non-lab setting, such as to classify medical images to diagnose heart conditions, researchers can learn how it works and predict how it will behave in real life.
Researchers at MIT have developed a method to shed some light on the inner workings of black-box neural networks. Neural networks modeled after the human brain are organized into layers of interconnected nodes or “neurons” that process data. The system can generate descriptions of individual neurons automatically in English or other natural languages.
In a neural network that recognizes animals in images, a neuron might be described as detecting ears and foxes by their method. This scalable method is more precise than other methods and can generate detailed descriptions of individual neurons.
The team has just published a paper showing how this method can audit a neural network to find out what it has learned. It can also edit a network by identifying incorrect or unhelpful neurons and switching them off.
We created a way for a machine-learning practitioner to give their model to the system, and it will tell them all it knows about the model from the perspective of its neurons in language. This allows you to answer the fundamental question: “Is there anything my model knows I wouldn’t have expected?” Evan Hernandez is a graduate student at MIT Computer Science and Artificial Intelligence Laboratory and the lead author of this paper.
Sarah Schwettmann is a postdoctoral researcher at CSAIL. David Bau, a recent CSAIL grad, is an incoming assistant professor in computer science at Northeastern University. Teona Bagashvili was a former visiting student. Antonio Torralba is the Delta Electronics Electrical Engineering Science and Computer Science Professor. Jacob Andreas, the senior author in CSAIL, serves as the X Consortium Assistant Professor. The research will be presented during the International Conference on Learning Representations.
Descriptions automatically generated
The majority of existing methods that aid machine-learning practitioners in understanding how a model works describe the entire network or require researchers to identify specific concepts they believe individual neurons might be focusing on.
Hernandez and his collaborators created the MILAN system. It is a mutual-information-guided language annotation of neurons. This improves these methods. The system, which does not require a list in advance, can automatically generate natural language descriptions for all neurons within a network. This is particularly important as a neural network may contain hundreds of thousands of neurons.
MILAN describes neurons in neural networks trained to perform computer vision tasks such as object recognition and image synthesizing. The system examines a neuron’s behavior over thousands of pictures to identify the most active regions. The system then selects a natural-language description for each neuron to maximize the pointwise mutual information between image regions and reports. This allows illustrations to capture the unique role of each neuron within the more extensive network.
“A neural network trained to recognize dogs will have a lot of neurons. There are many types of dogs and even different parts of dogs. Even though the term “dog” may be a good description of many of these neurons, it isn’t very descriptive. We need descriptions that describe exactly what each neuron does. Hernandez says that this doesn’t apply to dogs. It also applies to German shepherds.
The researchers compared MILAN with other models and found it produced more detailed descriptions. However, they were more interested in how it could be used to answer specific questions about computer vision models.
Analyzing, editing, and auditing neural networks
They first used MILAN to determine which neurons were most important within a neural network. They created descriptions of each neuron and sorted them according to the words in those descriptions. To see if the accuracy of the network was changing, they slowly removed neurons. They found that neurons with two different definitions (vases or fossils) were less important.
They also used MILAN for auditing models to see if anything was unexpected. Researchers used image classification models trained with datasets that blurred human faces, ran MILAN to count how many neurons were still sensitive to human faces, and then took the data back.
While this may reduce the number of sensory neurons, it does not eliminate them. Certain demographic groups are susceptible to some of these neurons, which is surprising. Hernandez states that these models have never seen a human facial expression before and that all facial processing occurs inside them.
The team used MILAN to edit the neural network. This allowed the team to find and remove neurons that detected harmful correlations in their data. This resulted in a 5 percent improvement in the network’s accuracy for inputs with the problematic correlation.
Although the researchers were impressed with the performance of MILAN in these three applications, it sometimes provides descriptions that could be more specific or correctly guess concepts it does not know.
These limitations will be addressed in the future. They want to improve the richness of MILAN’s descriptions. They plan to extend MILAN to other types and neural networks and use it to describe the activities of groups of neurons. Since neurons work together to produce an output, they hope MILAN will be applied to these networks.
This is an approach to interpretability that starts at the bottom. It aims to create open-ended compositional descriptions of function using natural language. We want to harness human language’s expressive power to generate more realistic and richer reports for the parts of neurons. Schwettmann says that what excites her most is being able to apply this approach to other types of models.
Andreas says the ultimate test for any AI technique is its ability to help users and researchers make better decisions about when and where to deploy AI systems. We still need to be able to do this in a general manner. However, MILAN and the widespread use of language as an explanation tool will be valuable components of the toolbox.