# Initial thoughts on CLEVR-XAI

# On assessing explanations

I started reading this paper on a benchmark dataset for explainable AI (XAI) evaluation, and I just need to write down some ideas. From what I can grasp, XAI problem is quite straight forward to define:

Given this function (f) and input (X), why it outputs (y)?

However, there are multiple not so straight forward ways to answer this question. Anyhow, given an interpreter, i.e., an algorithm which attempts to answer this question, and it is not mathematically proven its correctness, we then have to question if it is right or not. This paper claims that using simple problems with known answers is the way to do so. Moreover, it affirms Visual Question Answering (VQA) is the best method to assess XAI methods.

Although most of these points are valid, I think the claims are not exactly right. First, on VQA, given any explanation result is, as the papers says, a heatmap, this heatmap must then be a function of the model (f) given input (X), i.e., ( E(f |
X) ). Thus, my first counterargument is that one cannot fully ascertain the quality of an interpreter varying only the input. |

To put it simply, one has to vary both (f) and (X) to evaluate the interpreter (E), in which VQA would play the role of one possible (f). Thus, evaluating by one technique will always be a limited approach. Of course, we cannot think of possibly having to evaluate in all possible models, but one has to consider that the best interpreter for VQA might not be the same for Image Classification (IC).

Additionally, the idea of using a ground truth for very simple problems is indeed a valid approach, but still comparing an explanation with a ground truth explanation is a very limited solution. First, this supposes to know all possible explanations for a given input-output pair, which simply cannot be generated for real world comparisons, such as a model that plays chess. My impression is that this always **limits the model** to find *human* solutions, and punishes those able to find rare solutions or problems in the model or dataset.

On that regard, using ground truths also mixes interpreter evaluation with model evaluation, which for me is the most significant issue here. A “wrong” explanation may happen either because the interpreter is bad or because the model has problems, but it is hard to distinguish exactly which one we are evaluating with the ground truth.

Overall, this goes back to the origins of interpretability: we either want to know why the model got it right in this case and why it made a mistake on that other one. **If we are trying to interpret why the model made a mistake with a ground truth, I bet this will imply the interpreter is wrong**, which doesn’t make sense in my opinion. The perfect interpreter is always true to the model and only the model and the prefect interpreter metric is true to the interpreter and only the interpreter.

Needless to say it is unlikely there is a perfect interpreter or perfect interpreter metrics, but we must still be able to compare interpreter among themselves.

All in all, finding a good automated metric that makes sense is a difficult task, specially considering the limitations of using heatmaps as explanations for models that clearly are not only localizing, but looking at edges, textures, shapes, colors, and multiple other patterns. My hunch is that there is still good ideas in deep dream that can be explored: I think that can be proven the perfect explanation has to be connected with the inverse model (f^{-1}), so maybe there could be a metric that checks if the interpreter is “perfect”, i.e., which checks if it is possible to rebuild the input given the model output and the explanation.

\[e(E(f|X)|f(X)) = |f^{-1}(E, f(X)) - X|\]but who knows…

## On the ideal interpreter

The definition of explanation leads me to think on how to find the ideal interpreter, which should be able to to reach maximum consistency. It is a complicated problem, but I am thinking on something in the lines of Explanations for Occluded Images, which is the original proposer of the definition, but more efficient. Possibly something that would check for consistency and try to maximize it with a simple optimization mechanism.

In the end, the explanation definition leads to a metric, which leads to an interpreter methodology. I can play a bit with other definitions the same way to come up with other metrics and interpreters. It’s almost like making my own geometries 🤩 (just in a smaller scale). In the end, it’s a bit more like I am making a definition and I just have to show it conforms with all axioms.

A good next step would be to **organize and write down what all these axioms are**, based on the other papers I got in touch with. Moreover, I could critically think over them and analyze if more could be made or if one could be used to define others (as properties).

## On the test bases

I started thinking on comparing the representations of BarlowTwins (pretraining and finetuned), standard ResNet-50, ViT and a LAION-5B U-Net model. I think this could be an interesting comparison of self-supervised interpretability and with other interpreters as well.