Edge-AI and FOLIO at Stanford University Libraries

Challenges using LLMs

There are multiple challenges associated with using AI and more specifically Large Language Models (LLMs) like ChatGPT and Google's Gemini, some of which can be minimized.

Privacy Concerns in Large Language Models

With the widespread release of Large Language Models (LLMs) by various organizations, significant privacy issues have been observed including

  • Personal details in training data: Names, addresses, and financial information may be included in the training data for these models.
  • Logged prompts containing sensitive information: User inputs, including private details, can be logged by the companies managing these LLMs.
  • Re-identification of individuals: Even in anonymized training data, individuals can potentially be re-identified through the model outputs and usage patterns.

Bias

AI bias occurs when models produce outputs that reflect or perpetuate existing inequalities and perspectives in the larger society.

Sources

  • Training data -- over or under sampling underrepresented groups, biases in labeling by excluding or over-representing certain categories or characteristics.
  • Algorithms -- based on biased data could perpetrate underlying flaws or programmers introduce personal bias intentionally or unintentionally
  • Proxies -- Unintended consequences of using proxies for characteristics in the population
  • Cognitive -- people's experiences and preferences can introduce and favor bias or weighting of outcomes or the selection of data.

Hallucinations

Since the initial release of ChatGPT 3.5 in 2022, a major criticism of Large Language Models (LLMs) has been the tendency of these models to fabricate factually incorrect statements. LLMs generate text by predicting the most likely continuation token based on the prompt's text, context, and model's internal weights. Unlike a deductive process, LLMs do not directly reference their training source material to generate responses.

Types of Hallucinations

  • Fact-conflicting -- 1+1 = 3
  • Input-conflicting -- LLM summarizes an article and includes details not present in the original article
  • Context-conflicting -- inconsistent or self-contradiction in the model's outputs

Mitigation

  • Chain-of-thought (COT): A technique prompting the model to break down its reasoning process into sequential steps, explaining how it arrived to its final answer.
  • One-shot and Few-shot Prompts: Techniques that provide context by offering sample responses in a given format, enabling the model to infer patterns for consistency and accuracy in its answers.
  • Retrieval Augmented Generation (RAG): Combines contextual examples with the prompt, grounding the model in factual, current material from external sources, and reducing the model's dependence on outdated or incomplete information in its training data.
  • Reinforcement Learning with Human Feedback (RLHF): A fine-tuning technique of adding direct human feedback to a model's responses by rewarding factual responses and penalizing hallucinations.

Academic Fraud & Copyright

One of the immediate concerns regarding academic fraud in the use of Large Language Models (LLMs) that generate convincing and coherent text that students and researchers can pass off as original work. The growth and inclusion of generative text into academic articles has been widespread, particularly in the computer science literature (How Much Research Is Being Written by Large Language Models?)

The training of Large Language Models (LLMs) requires massive amounts of text and other media that are commonly available on the open web. This content includes both copyrighted and public domain material, which can lead to generative outputs from these models closely resembling existing copyrighted works resulting in various lawsuits.

Carbon Footprint

A real concern of Large Language Models (LLMs) is the amount of energy and water required for training and deploying these models. For example, in their 2024 report Google admitted that their carbon output increased over 13% year-over-year primarily due to the increased energy usage of their customer-facing AI efforts, including the training and inference of their flagship Gemini LLM.

A 2024 report by Microsoft, offers four suggestions to reduce the environmental impact of these models. These suggestions are:

  • Model Selection: Pre-trained models use significantly less power than training new models.
  • Model Improvement: Prompt engineering, RAG, and Fine-tuning can all be used to improve functionality of existing models without needing to train new models
  • Model Deployment: Using Model-as-a-Service (MaaS), the costs and energy requirements are less because the MaaS infrastructure is typically optimized by the vendor.
  • Model Evaluation: When using these models, users should evaluate the costs and performance in order to assess the applicability of their models.