The traceability of AI in language understanding: focus on transparency and quality

The urgency of transparency: why artificial intelligence needs to be traceable

The further development of artificial intelligence (AI) has brought with it significant challenges in terms of transparency andtrust. The complexity and size of these models often lead to opaque decision-making processes, which raises concerns about the reliability and fairness of their outputs. Identifying and mitigating biases within these models is crucial to build trust and improve their usability in different applications. As AI systems become more widespread, compliance with ethical standards is also becoming increasingly important. Transparency in the decision-making process helps to identify and avoid distortions and discrimination. Regulatory measures such as the EU regulation on the use of AI in the European AI Act require traceability and transparency of AI decisions to ensure the ethical use of AI.

Explainable AI (XAI): a necessary step

Explainable AI (XAI) aims to make AI systems understandable for humans in an automated way. This transparency is crucial, especially in application areas such as medical diagnoses, traffic control and automated decisions in the financial sector. The ability to understand and explain decisions made by AI systems is essential in order to gain and maintain the trust of users.

Explainable AI is a dynamic and growing field of research. New algorithms and processes are constantly being developed to provide deeper and more comprehensible insights into the “black box” of AI. However, the challenges of implementing XAI in practice are still great. Companies must increase the acceptance of algorithm-based decisions among their customers and employees by improving the traceability and transparency of AI systems.

Explainable AI is used in many different areas. Explainability is particularly important in image classification, text classification and tabular data. For example, the LIME method (Local Interpretable Model-agnostic Explanations) can be used to explain the decisions of a model by highlighting relevant features. The Fraunhofer IAO and the University of Stuttgart have developed VitrAI, a demonstrator that shows the explainability of AI in various practical scenarios.

Language comprehension tasks: The heart of language processing

Natural Language Inference (NLI) is an important task in the field of Natural Language Processing (NLP). It helps to understand the relationship between two sentences. The aim is to find out whether a hypothesis sentence follows from a premise sentence, contradicts it or is neutral to it.

Let’s take a look at an example:

Premise: “The doctor was late and the nurse was not happy about it.”
Hypothesis: “She greeted him.”

Here the relationship is “neutral”. The premise says that the nurse wasn’t happy, but that doesn’t affect whether she welcomes the doctor. These actions are independent of each other.

Another example:

Premise: “The sky is cloudy and it looks like rain.”
Hypothesis: “It will rain soon.”

Here the hypothesis follows from the premise. If the sky is cloudy and it looks like rain, we expect it to rain soon. This relationship is “logical”.

Understanding such relationships is very important for the development of AI systems. These systems must be able to understand human language and draw the right conclusions. NLI is central to many applications of machine learning and artificial intelligence (AI), such as text classification, machine translation and chatbots.

Why language understanding is crucial for the future of AI

Understanding the meaning and relationships between sentences is fundamental to NLI as it forms the basis for many NLP tasks. The same applies to question answering (QA) and sentiment analysis. Understanding model reasoning in language comprehension tasks can help build trust and transparency. Explanations can show which features in the model have the greatest influence on its decisions and thus contribute to more transparency in AI. Another view is that people learn not only from labeled data, but also from explanations; perhaps NLP models can do this as well. By incorporating model explanations, it is possible to train more effective models with better performance.

The SNLI dataset: the basis for robust language comprehension models

For the evaluation of LLM capabilities in generating explanations for an NLI task, two main data sets were used:

SNLI dataset: Consists of 570,000 human-written English sentence pairs with labels. The premises come from image descriptions of the Flickr30k corpus, while the hypotheses were generated by crowdsource annotators.

eSNLI dataset: Contains additional explanations that the annotators had to give to justify the labels.

Methodology: How the traceability of AI is assessed

Data exploration

The first steps of the data analysis included the examination of token counts, sentence lengths, cosine similarities, label distributions and bigram analyses. This analysis helps to develop a deeper understanding of the data structure and the challenges involved in modeling.

Prompt Engineering

Prompt engineering involved the creation of specific instructions and examples to effectively train the LLM. Important aspects were:

  • Definitions of contradiction, neutrality and inference.
  • Instructions inspired by the eSNLI dataset collection: highlighting words based on the label and using these words for explanation.
  • Formatting and entering data in JSON format.

Evaluation and results

The aim of the evaluation was to measure the performance of the LLM in the NLI task and to assess the quality of the explanations generated. Various experiments were carried out:

  • Check whether the LLM is able to imitate human explanations.
  • Investigation of the influence of self-generated explanations on the performance of the LLM in the NLI task.
  • Quality assessment of the generated explanations with regard to their semantic similarity to human explanations.

Quality assessment: How good are the explanations really?

The evaluation of the quality and robustness of AI in language comprehension tasks covers several dimensions:

  1. Standard benchmarks and metrics:Common benchmarks such as GLUE, SQuAD and SNLI are used to assess basic NLU capabilities, including NLI. Metrics such as accuracy, precision, recall and F1 score are typically used to quantify performance.
  2. Human evaluation: Human evaluators assess aspects such as commitment, security, coherence, fluidity and factuality of the generated text. This is crucial to understanding the practical applicability and reliability of AI in real-world scenarios.
  3. Challenges and future outlook: To assess the similarity between explanations generated by large language models (LLMs) and human baselines, an NLP encoder-decoder model is trained twice: once on the original eSNLI dataset created by human annotators and once on data generated by the LLM. These models are then evaluated using a human-generated evaluation set. The model trained on LLM-generated data achieves similarly good results as the model trained on human baseline data, indicating that the quality of the LLM-generated explanations is comparable to human explanations.

Conclusion: Why comprehensible explanations are the future of AI

Understanding the relationships and reasoning in language comprehension tasks is fundamental to improving the reliability and usability of these advanced models. However, ensuring the quality and robustness of these models requires comprehensive evaluation methods, combating bias and continuous improvements through research and development. It requires a careful limitation of fallibility and a conscious responsibility of developers and users to minimize bias. Classification methods must always be checked for fairness in order to avoid discriminatory results.

Data protection is essential in order to gain and maintain the trust of users. Innovation is driving progress in AI, but must always be in line with existing guidelines and compliance requirements. In the near future, the EU regulation will provide a legal framework to ensure that all developments are carried out ethically and transparently. The responsible use of AI can only be guaranteed if these measures are implemented consistently. The double classification of data and results plays a key role in developing precise and fair applications.

Decision-makers in companies must ensure that their AI strategies not only focus on technological innovation, but also observe the ethical and legal framework conditions. Compliance with the EU AI Act and other directives is not only a legal necessity, but also an important step towards ensuring compliance. This in turn leads to the development of trust among users and customers. Companies have a great responsibility to make their AI systems transparent and fair in order to be successful and trustworthy in the long term.

Thirst for knowledge?

For further information and in-depth insights into the topics surrounding AI, you will find some recommended articles here:

  1. Knowledge Graphs and Retrieval Augmented Generation (RAG): Learn more about the application of Knowledge Graphs and how Retrieval Augmented Generation can improve the performance of AI systems. Read the full article here:
  2. Retrieval Augmented Fine-Tuning (RAFT): Learn how Retrieval Augmented Fine-Tuning (RAFT) enriches language models with new knowledge, making them smarter and more adaptable. You can find out more about this in this article:
  3. Large Language Model Pr icing: The pricing of large language models is a decisive factor in choosing the right model for different application areas. Discover the different pricing models and their effects in this article:
  4. Progress and challenges in the fine-tuning of large language models: Find out about the latest advances and existing challenges in large language model fine-tuning. Read the full report here:

These articles offer comprehensive insights and up-to-date information that are helpful for a deeper examination of the respective topics.