Transformers Language Interpretability

Transformer is a deep learning model used primarily in the field of natural language processing (NLP). Like recurrent neural networks (RNNs), transformers are designed to handle sequential data, such as natural language, for tasks such as translation and text summarisation. However, unlike RNNs, transformers do not require that the sequential data be processed in order. In this experiment, we wanted to showcase transformer models interactively and their various underlying concepts such as self-attention using Google’s Language Interpretability Tool and others.

Experiment objective

The objective of this experiment is to understand NLP models interactively and answer a few of the underlying questions for the predictions:

Better understand Transformer architecture visually
Investigate language models interactively through LIT(Language Interpretability Tool)
Explore visual representation of NLP activities and tasks like self-attention and embedding projector
Infer language model dynamics by understanding the kind of examples on which our model performs poorly. Why did the model make this prediction? Does a model behave consistently if we change things like textual style, verb tense, or pronoun gender?

Business use case

There are multiple direct and indirect applications of this experiment. Some of them include –

Debugging deep learning black-box models: Due to the nested non-linear structure of deep learning algorithms, these highly successful models are usually applied in a black-box manner, i.e.,no information is provided about what exactly causes them to arrive at their predictions. Such a lack of transparency can be a major draw-back.
Explainability in the health care domain: As deep learning has become an active research area in health care, the increasingly widespread applicability of these models necessitates the need for explanations to hold such models accountable.

Also, the key findings from this experiment can be applied to visualise data points in higher dimension using embedding projector. Additionally, we can analyse interesting data points and understand how these models perform on new data points on the fly.

Environment setup

Python; for Documentation, Exploratory Data Analysis and Preprocessing using reticulate package and Python regular expressions.
For training – AWS Deep Learning AMI (instance type: ps.2xlarge); considering high training required load to generate embeddings, we have used GPU processors to manage training time
For inference – AWS Deep Learning AMI (instance typ: t2.large or g4dn.xlarge); loading the trained model and using it for similarity scoring

hbspt.forms.create({ portalId: "2495356", formId: "3c5c77b2-406f-4cd8-a872-c3aa17c17f73" });