Interpretable Machine learning: some needed definitions

Last updated: 15.03.2022

Short introduction: Interpretability in Machine learning is a very active area. The number of publications is exploding, the number of projects on Github too, and many popularization articles deal with these topics. However, paradoxically, it is difficult to find clear definitions of commonly used terms. I centralize here definitions from a scientific article (reviewed and validated by researchers).
 I extracted in this article parts of the original article cited at the end, published in August 2021

In the literature of neural networks, concepts like latent space and latent representation have been developed and widely used. However, to the best of our knowledge, no complete definitions have been clearly proposed for such concepts. Due to the importance of both concepts, we choose to formulate their definition next. We start by defining basics concepts like data, information, features and knowledge :

Definition 2.1 The raw material that represents the input of an algorithm is called data. Data can be noisy, partial/complete, un/structured and of different types [Grazzini and Pantisano, 2015; Malhotra and Nair, 2015].

Definition 2.2 A data set is a collection of data that describes real-word objects (such as cars, documents, animal, etc.) through multiple properties called features [Bishop, 2006].

Definition 2.3 Once data is analyzed and correlated, it represents information. Information can be reproduced from data and its importance depends on the context it is generated from/for [Grazzini and Pantisano, 2015; Malhotra and Nair, 2015].

Definition 2.4 Knowledge is a set of information that is assessed by a human, i.e. human adds a value and semantics according to his/her own background and context [Grazzini and Pantisano, 2015; Malhotra and Nair, 2015].

Definition 2.9 Neural networks are machine learning models with several architectures, that are usually structured by one or several layers (input, hidden and output). Each layer is composed of one or several computational units called artificial neurons – conceptually derived from biological neurons [McCulloch and Pitts, 1943; Abraham, 2005]. Computational units can also be a Long Short Term Memory (well known also as LSTM) [Hochreiter and Schmidhuber, 1997] or Gated recurrent units [Cho et al., 2014]. A deep neural network has many hidden layers, units, and edges with weights. Units of layer n can be all or partially connected to units of layer n+ 1. Due to this inner complexity, deep neural networks are a typical example of black-boxes.

Definition 2.10 In neural networks, an activation pattern refers to units activation values of one of the layers. An activation pattern is a numerical vector of the size of the layer it is associated with. A hidden pattern refers to the activation pattern of a hidden layer.

Definition 2.11 Latent space refers to the abstract multidimensional space associated to each layer of a neural network where the representation of the learned data is implicitly built. Latent space contains the meaningful internal features (definition 2.2) representations of learned data, which makes it not directly interpretable. In a deep neural network (definition 2.9), each hidden layer, whether it has the same number of units or not, has its own latent space. It is thus possible
to extract several implicit representations from this network.
The latent space can be used to achieve a data dimensionality reduction when the hidden layer is smaller than the input layer. This is the case for example with autoencoders and variational autoencoders [Kingma and Welling, 2014], models that can reduce high-dimensional inputs into efficient and representative low-dimensional representations [Roberts et al., 2018b].

Definition 2.12 Latent or hidden representation refers to the data representation implicitly encoded by a neural network during the learning task and thus is hidden-layer dependent [Bengio et al., 2013]. It is a machine-readable data representation that contains features of the original data that have been learned by the associated hidden layer. One key property of latent space (definition 2.11) is that real-world objects (definition 2.2) that are semantically close (e.g. cars of different brands), will end up grouped together in one latent space: their respective hidden representation in the corresponding layer will be close to each other compared to other objects that are not semantically close (e.g. cats) [Roberts et al., 2018a]. Thus, a latent representation is useful for pattern analysis (definition 2.5) and for similarity detection between objects (definition 2.2) using clustering methods.

To cite these definitions, please cite the original article : 

Ikram Chraibi Kaadoud, Lina Fahed, Philippe Lenca. Explainable AI: a narrative review at the crossroad of Knowledge Discovery, Knowledge Representation and Representation Learning. Twelfth International Workshop Modelling and Reasoning in Context (MRC) @IJCAI 2021, Aug 2021, Montréal (virtual), Canada. pp.28-40. ⟨hal-03343687⟩

A little note: I hope these definitions help you better understand all these concepts related to interpretable ML


  • [Abraham, 2005] Ajith Abraham. Artificial neural networks. Handbook of measuring system design, 2005.
  • [Bengio et al., 2013] Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  • [Bishop, 2006] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006
  • [Cho et al., 2014] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In The 18th Empirical Methods in Natural Language Processing, pages 1724–1734, Doha, Qatar, October 2014. Association for Computational Linguistics.
  • [Grazzini and Pantisano, 2015] Jacopo Grazzini and Francesco Pantisano. Collaborative research-grade software for crowd-sourced data exploration: from context to practice – part I: Guidelines for scientific evidence provision for policy support based on big data and open technologies. EUR 27094. Luxembourg: Publications Office of the European Union, (JRC94504), 2015.
  • [Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. ¨ Neural computation, 9(8):1735–1780, 1997.
  • [Kingma and Welling, 2014] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In The 2nd International Conference on Learning Representations, 2014.
  • [Malhotra and Nair, 2015] Meenakshi Malhotra and TR Gopalakrishnan Nair. Evolution of knowledge representation and retrieval techniques. International Journal of Intelligent Systems and Applications, 7(7):18–28, 2015.
  • [McCulloch and Pitts, 1943] Warren S McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4):115–133, 1943.
  • [Roberts et al., 2018a] Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, and Douglas Eck. A hierarchical latent vector model for learning long-term structure in music. In The 35th International Conference on Machine Learning, pages 4364–4373. PMLR, 2018.
  • [Roberts et al., 2018b] Adam Roberts, Jesse H Engel, Sageev Oore, and Douglas Eck. Learning latent representations of music to generate interactive musical palettes. In IUI Workshops, 2018.