Interpretability, bias, ethics and transparency: what is the relationship? (3/3)

Last updated: 30.10.2020

This article was originally published in french on the Scilog blog related to the magazine “Pour la Science” and on the Blog Binaire, blog of the magazine “Pour la Science”, for scientific popularization. 

Target audience: all audiences, high school students 

Keywords: Biases, ethics, Deep learning, Explainable AI, Artificial Intelligence, Interpretability, Popularization

Short introduction: This article is the result of the collaboration of Marine LHUILLIER and myself in 2020. Marine was my intern and we both worked on an explainable AI project. During our collaboration, Marine get specialized in research at the crossroad of AI and Cognitive Sciences. She is now a Research Engineer / Data Engineer.

The third and last article of our series that questions the concepts of interpretability and explicability of neural networks, we end here with an opening on the particular relations between interpretability, bias, ethics and transparency of these networks.

Representations and biases of a neural network

The goal of an interpretability approach is to allow access to the implicit memory of a neural network in order to extract the rules and representations encoded (i.e. learned or extracted from the data) during its training; the extraction can take place during or after the training. In other words, one can explore and bring into the explicit domain the implicit “reasoning” that it has built for itself thanks to the numerous examples seen and repeated practice (in a way quite similar to that of a human who acquires knowledge through experience and living).

Just as a human has a representation of the world in which he lives, built according to his experience, a neural network, as it is fed with data, also builds its own representation. Although this representation is limited by the data it has learned, it still contains its knowledge and by extension the reasons on which its behavior and predictions are based.

But in this case, what happens when an individual learns only one vision of the world (a deliberately exaggerated example: the sky is green during the day and grey at night)?

Then no matter how many images of sunrises and sunsets, starry nights or rainy skies he has seen, he will always consider that all these images are wrong and only what he has learned is right. As human beings, we nevertheless have a capacity for questioning and reconsideration which can, with time, lead us, through our experience, to relativize our learning and to realize that there are other shades of color just as beautiful to observe in the sky.

Unfortunately, neural networks do not have this capacity because they often carry out a single learning phase (especially in the case of supervised learning), of several examples certainly, but in the most common cases in one go. In other words, once a rule is learned, it becomes immutable! In the case of the classification of the colors of the sky, this does not have much impact, but let’s imagine that this algorithm is used to classify the value of individuals according to their results, whatever they are. The algorithm could then consider, according to the type of data it has received and therefore the rule(s) it has implicitly encoded, that only women’s profiles correspond to a secretary’s job and, on the contrary, that only men’s resumes should be retained for a technical job in the automobile industry. These stereotypical examples, although quite simple and basic, reflect a sad reality: the biases present in AI can be dangerous and discriminatory. 

Indeed, AI algorithms and their decision making are influenced by the data they learn. This raises a problem around the biases present in these data but also those coming directly from the developers implementing these algorithms. Bad data sets can lead to unfortunate consequences and biased implementation. If an AI algorithm learns from the data, wrong or biased rules, then its decisions will be just as wrong and will reproduce what is called selection or judgment bias. An infamous example is the description provided by Google in 2015 on a photo of a couple of African-American people classified as gorillas because the corpus of data was not representative enough of the diversity of the population…

As we can see, the impact of these biases can be very serious, especially in critical areas such as health, defense or finance. Nowadays, many works exist on this subject to study these biases. Nevertheless, it is important to underline that knowing about them does not necessarily mean that we know how to eliminate them [Crawford, 2019]! Moreover, the removal of these biases in an algorithm can be very costly in terms of resources because it would be necessary to restart the learning of neural networks after correction. However, depending on their depth, complexity and the volume of data to be processed, this cost can be very, even too high. These biases are therefore sometimes retained by default because they would be too expensive and uncertain to correct. For example, GPT-3, the language model developed by the company OpenAI in 2020 [Brown, 2020], would require a memory of more than 350 GB, with a cost related to training the model that would exceed $12 million.

What to remember? 

In the context of the demystification of neural networks, AI algorithms also called “black boxes”, interpretability and explainability are two distinct but closely related research fields. If interpretability consists in studying and explaining the internal logic of neural networks, in other words their functioning, explicability focuses on “how” to make sense of this internal logic in the most suitable way possible. Thus, interpretability is often carried out by Machine Learning experts for other experts, whereas explicability can be aimed at neophytes and experts. 

Our interest in these two domains, as well as that of many researchers, makes sense in light of the societal and ethical issues discussed above: the impact of bias, discrimination, compliance with the RGPD, etc. Indeed, succeeding in better understanding how these algorithms work allows us to help demystify AI and make these “black boxes” more transparent, intelligible and understandable. In a more general way, it also allows us to improve the confidence of each and every one of us in these tools that tend to become omnipresent in our daily life or, on the contrary, to refuse those that would not be sufficiently transparent. 

If we had access to the “how” and “why” decisions of AI tools are made, we could then intervene on their functioning as well as on the data they use as a learning base, which are often our own data. We could then make sure that we are moving towards a more inclusive society where each of us is respected in all his or her diversity.

A word from the authors:  This educational article presents one approach to interpretability. There are others and for the more curious, the bibliography of each post is there for you!  The posts are not intended to give an equivalence between a human being and a neural network because a human is much more complex! The examples provided are just for illustration purposes to allow a better assimilation of the concepts. However, we would be happy to discuss them further with you! Marine LHUILLIER & Ikram

To cite this article : 

Lhuillier M and Chraibi Kaadoud I, Interpretability, bias, ethics and transparency: what relationships? (3/3) . Published on the blog of, October 2020