Learn About AI

Complete guide to artificial intelligence terms, tools, and concepts. You'll find a degree's worth of education here—use it well!

Red teaming is a structured testing effort to find flaws and vulnerabilities in an artificial intelligence (AI) system, often conducted in a controlled environment and in collaboration with the AI's developers. This practice involves intentionally and adversarially probing AI models to discover potential risks, biases, and security weaknesses that may not be apparent during standard testing procedures.

Red Teaming: Structured Adversarial Testing to Find Flaws in AI Systems

Reflection (LLMs)

Reflection in large language models (LLMs) is the capacity of an AI agent to examine its own outputs, identify errors or weaknesses, and use that assessment to produce improved results in subsequent attempts. Rather than relying solely on the initial output generated in a single pass, a reflective agent evaluates its performance against a goal or feedback signal and adjusts its approach.

Reflection (LLMs): Learning from Verbal Feedback Across Episodes

Reinforcement Learning (RL)

Reinforcement learning (RL) is a machine learning technique where an AI agent learns to make decisions by performing actions in an environment and receiving rewards or penalties in return, much like a pet learning a new trick.

Reinforcement Learning (RL): Training AI Agents by Rewarding Desired Behaviors

AI reliability is all about consistent and dependable performance over time and under specified conditions.

Reliability (AI): How Consistently an AI System Performs Correctly Over Time

Reproducibility

Reproducibility in artificial intelligence is the ability to recreate the same results when repeating an experiment using the same methods, data, and conditions. It's the scientific equivalent of saying, "I made this amazing discovery, and here's exactly how you can see it too."

Reproducibility (AI): The Ability to Recreate the Same Results from the Same Experiment

In the world of AI, reranking is the process of taking an initial list of search results and re-ordering them using a more powerful, computationally expensive model to improve their relevance to a user’s query. It acts as a quality control step, ensuring that the very best and most pertinent information rises to the top before it is used by a language model or presented to a user.

Reranking: Re-Ordering Initial Search Results Using a More Precise Model

Resource Optimization

Resource optimization is the systematic process of managing and allocating computational resources—including processing power, memory, storage, and energy—to maximize the efficiency, performance, and cost-effectiveness of AI systems.

Resource Optimization: Allocating Compute, Memory, and Energy Efficiently in AI Systems

Responsible AI is not a single product or a simple checklist; it is a holistic commitment to managing the entire lifecycle of an AI system with foresight and integrity. It requires a multi-faceted approach that considers the technical, social, and legal implications of AI, ensuring that systems are not only powerful but also principled.

Responsible AI: Managing the Full AI Lifecycle with Ethical and Legal Foresight

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a framework that enhances large language models (LLMs) by integrating a retrieval pipeline, allowing AI to pull in live, external knowledge before generating a response — RAG ensures that AI systems reference authoritative, up-to-date sources at inference time.

Retrieval-Augmented Generation (RAG): Grounding LLM Responses with External, Real-Time Knowledge

Retrieval Evaluation

Retrieval evaluation is the systematic process of measuring how well an information retrieval system finds relevant information in response to a user's query. It provides a set of standardized metrics and benchmarks to score the accuracy, relevance, and ranking quality of search results, allowing developers to objectively assess and improve system performance.

Retrieval Evaluation: Measuring How Accurately a Search System Finds Relevant Information

Retrieval Metrics

A retrieval metric is a standardized, mathematical formula used to score the quality of a ranked list of search results. It provides an objective, numerical way to answer the fundamental question: “Did the system understand the query and return a useful set of results?”

Retrieval Metrics: Standardized Scores for Evaluating Search and Retrieval System Quality

Retrieval Strategies

Retrieval strategies are the collection of techniques an AI system uses to find, rank, and select information from an external knowledge base before generating a response. They sit at the heart of modern AI applications — from customer service chatbots to enterprise search engines — and they are the primary reason some AI systems feel uncannily accurate while others seem to be guessing.

Retrieval Strategies: Techniques for Finding and Ranking Information in AI Knowledge Bases

RLHF (Reinforcement Learning from Human Feedback)

RLHF (Reinforcement Learning from Human Feedback) is a method for fine-tuning an AI model by using human preferences as a guide for its behavior. Instead of just training a model on what is “correct” based on a static dataset, RLHF teaches the model what is “preferred” by humans.

RLHF: Fine-Tuning AI Models Using Human Preference Ratings as the Reward Signal

Robustness in AI refers to a system's ability to maintain reliable performance even when faced with unexpected inputs, variations in data, or deliberate attempts to fool it. Think of it as an AI's immune system—the stronger it is, the better the AI can handle novel situations without breaking down or making wildly incorrect decisions.

Robustness (AI): How Well an AI System Maintains Performance Under Unexpected Conditions

Robustness Testing

Robustness Testing is the systematic process of evaluating an AI model’s ability to maintain its performance and reliability when faced with unexpected, noisy, or even malicious inputs.

Robustness Testing: Evaluating AI Reliability Against Noisy, Adversarial, or Edge-Case Inputs

Role prompting is a technique where a user instructs an artificial intelligence model to adopt a specific persona, profession, or character before generating a response. By beginning a prompt with phrases like "You are a senior software engineer" or "Act as a helpful customer service representative," the user attempts to guide the model's tone, vocabulary, and structural approach to the task at hand.

Role Prompting: The Illusion of Expertise and the Reality of Context Injection

AI rollback refers to the process of reverting an artificial intelligence system to a previous known-good state after detecting performance degradation, unexpected behavior, or potential harm.

Rollback (AI): Reverting an AI System to a Prior State After Detecting a Problem

SaaS (Software as a Service)

Software as a Service (SaaS) is the practice of delivering software applications over the internet as a subscription service, and it has fundamentally changed how businesses operate.

SaaS (Software as a Service): Delivering Software Over the Internet as a Subscription

AI safety is the interdisciplinary field dedicated to ensuring that artificial intelligence systems operate without causing unintended harm or adverse effects. It involves designing, building, and deploying AI in a way that aligns with human values and intentions, from preventing everyday errors to mitigating large-scale, catastrophic risks.

Safety (AI): Designing AI Systems That Operate Without Causing Unintended Harm

At its core, AI scalability is about an AI system's inherent ability to handle growth—more data, more users, increased complexity—without performance degrading or requiring a total rebuild.

Scalability (AI): Building Systems That Handle Growth Without Performance Degradation

Secure Multi-Party Computation (SMPC)

Secure multi-party computation (SMPC or MPC) is a cryptographic method that allows multiple parties to jointly compute a function over their private inputs without revealing those inputs to each other. In essence, it’s a way to get the answer to a question without ever seeing the data that goes into it.

Secure Multi-Party Computation (SMPC): Computing Shared Results Without Revealing Private Inputs

Self-Consistency

Self-consistency is a technique that asks the model to solve the same problem multiple times, exploring different reasoning paths, and then taking a majority vote to determine the final, most reliable answer.

Self-Consistency: Aggregating Diverse Reasoning Paths for Reliable AI

Self-Refinement

Self-refinement is a technique where an AI model generates an initial output, critiques that output using a specific feedback prompt, and then revises its own work based on that critique—all without human intervention.

Self-Refinement: Iterative Improvement Through Automated Feedback

Semantic Caching

Semantic caching is an advanced data retrieval mechanism that prioritizes meaning and intent over exact matches. By breaking down queries into reusable, context-driven fragments, semantic caching allows systems to respond faster and with greater accuracy.

Semantic Caching: Storing and Reusing Responses Based on Query Meaning, Not Exact Match

Semantic Memory

Semantic memory in artificial intelligence is the long-term storage of general world knowledge, facts, concepts, and rules, completely divorced from the specific time or place that information was acquired.

Semantic Memory: How AI Stores and Retrieves World Knowledge

Semantic Search

Semantic search is an advanced information retrieval technique that focuses on understanding the user's intent and the contextual meaning of a query, rather than just matching keywords. It leverages artificial intelligence, particularly Natural Language Processing (NLP), to decipher the relationships between words and concepts, allowing it to deliver results that are far more relevant and accurate.

Semantic Search: Finding Information Based on Intent and Meaning Rather Than Keywords

Semantic Similarity

Semantic similarity is a measure of how alike two pieces of text are in meaning, not just in the words they use. It’s the technology that allows a search engine to understand that when you search for “how to fix a car,” you’re also interested in results about “automotive repair,” even though the two phrases don’t share any of the same keywords.

Semantic Similarity: Measuring How Alike Two Pieces of Text Are in Meaning

Sentence Embeddings

A sentence embedding is a numerical representation of an entire sentence, condensed into a single list of numbers (a vector) that captures its overall meaning.

Sentence Embeddings: Numerical Representations That Capture the Full Meaning of a Sentence

Sentence Transformers

Sentence transformers are specialized neural network models designed to convert entire sentences into dense numerical representations that preserve semantic meaning, enabling machines to understand and compare the conceptual content of text rather than just matching keywords.

Sentence Transformers: Neural Models That Convert Sentences into Semantically Meaningful Vectors

Sequence Parallelism

Sequence parallelism is a specialized technique used to train and run massive artificial intelligence models by taking the input data (the sequence of text, images, or audio) and slicing it into smaller segments, distributing those segments across multiple computer chips to be processed simultaneously.

Sequence Parallelism: Splitting Long Input Sequences Across Multiple Processors

SFT (Supervised Fine-Tuning)

Supervised Fine-Tuning (SFT) is a training methodology that takes pre-trained AI models and adapts them to specific tasks or domains using carefully curated labeled datasets, enabling rapid specialization without the computational overhead of training from scratch.

SFT (Supervised Fine-Tuning): Adapting Pre-Trained Models Using Curated Labeled Examples

Shadow Deployment

Shadow deployment is a deployment strategy where a new version of an application, particularly a machine learning model, runs in parallel with the stable production version, processing the same real-world inputs without its outputs affecting the end-user.

Shadow Deployment: Running a New Model in Parallel with Production Without Affecting Users

Short-Term Memory

Short-term memory in large language models is the dynamic, temporary workspace where all active reasoning, context processing, and generation occur during a single inference session. It resets completely between sessions, holds only the specific tokens loaded into it for the current task, and disappears the moment the computational process ends.

Short-Term Memory: How LLMs Manage the Active Context Window

SLAs (Service Level Agreements)

A Service Level Agreement (SLA) for AI is a formal contract between AI service providers and their customers that defines specific performance metrics, responsibilities, and remedies for AI systems and services. Unlike traditional SLAs, these agreements address unique AI-specific challenges like model accuracy, explainability, and ethical considerations alongside standard metrics such as uptime and response time.

SLAs (Service Level Agreements): Formal Contracts That Define AI Service Performance Standards

Sliding Window Chunking

Sliding window chunking is a method where AI systems break large documents into smaller, overlapping pieces—like reading a book with multiple bookmarks that overlap each other, ensuring no important information gets lost between sections.

Sliding Window Chunking: Breaking Documents into Overlapping Segments to Preserve Context

A sparse model is an artificial neural network where a significant percentage of the internal weights (the numbers that determine how the model processes information) have been deliberately set to zero. By zeroing out these weights, engineers can drastically reduce the memory footprint and computational cost of the model without necessarily sacrificing its intelligence.

Sparse Models: Neural Networks Where Most Parameters Are Set to Zero for Efficiency

Sparse Retrieval

Sparse retrieval is a method of information retrieval that finds documents by matching the exact words in a query to the exact words in a document. While it may not have the “mind-reading” capabilities of its dense retrieval cousins, sparse retrieval is a powerful, interpretable, and often surprisingly effective way to find what you’re looking for.

Sparse Retrieval: Finding Documents by Matching Exact Words Between Query and Content

Sparse vectors are data structures that store only the important, non-zero information while ignoring all the empty or irrelevant parts. Unlike traditional approaches that track every possible piece of information (even when most of it is useless), sparse vectors focus only on what matters.

Sparse Vectors: Data Structures That Store Only Non-Zero Values for Efficient Retrieval

Speculative Decoding

Speculative decoding is a technique used to make artificial intelligence models generate text much faster. It works by pairing a massive, slow AI model with a tiny, fast "draft" model. It is one of the most elegant and impactful engineering solutions in modern artificial intelligence, fundamentally altering the economics of deploying large language models at scale.

Speculative Decoding: Using a Small Draft Model to Speed Up Output from a Large Model

Streaming Inference

Streaming Inference is a method in artificial intelligence where data is processed and analyzed in a continuous flow, as it arrives, enabling systems to generate insights and make decisions in real-time or near real-time. This approach is crucial for applications that require immediate responsiveness to dynamic, constantly changing information.

Streaming Inference: Processing and Responding to Data Continuously as It Arrives

Stress testing in AI is the practice of deliberately pushing artificial intelligence systems beyond their normal operating conditions to identify vulnerabilities, breaking points, and unexpected behaviors before they cause real-world problems.

Stress Testing (AI): Pushing AI Systems Beyond Normal Conditions to Find Breaking Points

Structured Outputs

Structured outputs refer to the practice of constraining a large language model to produce responses in a predefined, machine-readable format—such as JSON, XML, or a specific programming class—rather than generating free-form text.

Structured Outputs: Constraining LLMs to Produce Machine-Readable Formats Like JSON

Summarization (for Context)

Summarization (for context) is the process of algorithmically condensing large volumes of text into shorter, denser representations while preserving the core semantic meaning required for an AI model to complete a task.

Summarization (for Context): Condensing Long Inputs Without Losing Meaning

Supervised Learning

Supervised learning is a type of machine learning where an AI model is trained on a dataset that has been manually labeled with the correct answers.

Supervised Learning: Training AI Models on Labeled Examples with Known Correct Answers

Synthetic Data Generation

Synthetic data generation is the process of creating artificial data that mimics real-world datasets. This approach reduces privacy risks, enhances AI training, and helps companies bypass data collection challenges.

Synthetic Data Generation: Creating Artificial Training Data That Mimics Real-World Datasets

System prompts are the foundational instructions that developers embed into AI models to shape their personality, behavior, and responses before any user ever types a single word.

System Prompts: Developer Instructions That Shape AI Behavior Before Any User Interaction

Tensor Parallelism

Tensor parallelism is a technique used to train and run massive artificial intelligence models by taking the mathematical calculations required for a single layer of the model and slicing them into smaller pieces, distributing those pieces across multiple computer chips to be processed simultaneously. This approach allows engineers to work with models that are far too large to fit into the memory of any single chip, while also speeding up the time it takes to generate a response.

Tensor Parallelism: Splitting Individual Layer Computations Across Multiple Processors

Text Generation Inference (TGI)

Text Generation Inference (TGI) is the process by which a trained AI model generates new text based on an input prompt, focusing on producing this text efficiently in terms of speed and computational resources.

Text Generation Inference (TGI): A Framework for Serving LLMs Efficiently at Scale

The Stability-Plasticity Dilemma — Why Teaching AI New Things Is Harder Than It Sounds

Teaching an AI system new things tends to destroy what it already knows, but making it resistant to forgetting makes it resistant to learning. This tension — between stability and plasticity — is one of the central unsolved problems in AI development.

The Stability-Plasticity Dilemma — Why Teaching AI New Things Is Harder Than It Sounds

Throughput Monitoring

Throughput monitoring tracks how many tasks, queries, or operations an AI system can handle within a specific timeframe, making sure your system doesn't buckle under pressure when everyone decides to use it at once.

Throughput Monitoring: Tracking How Many Tasks an AI System Handles Within a Time Period

Throughput Optimization

Throughput optimization is the engineering discipline of maximizing the total number of tasks, or inferences, an AI system can perform within a specific timeframe, such as requests per second.

Throughput Optimization: Maximizing the Number of AI Inferences Completed Per Second

Token counting is the process of calculating the exact number of tokens a specific input will consume before sending it to a large language model, allowing developers to predict costs, manage context window limits, and optimize application performance.

Token Counting: Calculating Exactly How Many Tokens an Input Will Consume Before Sending It

The token economy is the system governing how AI breaks down info into tokens, and how these tokens are measured, valued, and affect the cost and performance of AI apps. It's key to understanding how AI works and why it has a price tag.

Token Economy: How AI Breaks Down Text into Tokens and Prices Them

Tokenization is the process of converting text into smaller, manageable units that AI models can process mathematically.

Tokenization: Converting Text into the Numerical Units AI Models Actually Process

Token Optimization

Token optimization is the strategic practice of reducing the number of tokens consumed by a large language model application while maintaining or improving the quality, speed, and reliability of its outputs.

Token Optimization: Reducing Token Usage While Maintaining Output Quality

Tool use is the capability that allows a large language model to interact with external systems, application programming interfaces (APIs), and computational resources to perform actions it cannot accomplish natively. Instead of relying solely on its internal weights to guess the answer to a math problem or hallucinate a current stock price, a model equipped with tool use can recognize its own limitations, formulate a structured request to an external calculator or financial database, and incorporate the precise result into its final response.

Tool Use: Enabling LLMs to Call External APIs, Code, and Data Sources

Toxicity Detection

Toxicity detection is the automated process of identifying and flagging abusive, disrespectful, or otherwise problematic language in text, audio, and other forms of media. This critical discipline aims to create a safer and more inclusive online environment by preventing the spread of harmful content and promoting healthier digital conversations.

Toxicity Detection: Automatically Identifying Harmful or Abusive Language in AI Systems

TPU Acceleration

TPU acceleration refers to the use of Tensor Processing Units (TPUs)—custom-designed microchips—to significantly speed up the complex mathematical calculations required by AI applications, particularly those involving machine learning and neural networks.

TPU Acceleration: Google's Custom Chips Designed Specifically for AI Computation

A TPU cluster is a supercomputer built from thousands of Google's custom-designed computer chips that are specifically engineered for artificial intelligence tasks, all linked together with ultra-high-speed networking to function as a single, massive computational entity for training and running the world's most demanding AI models.

TPU Clusters: Networks of Google's AI Chips for Training and Running Large Models

Training (AI/ML)

In the world of AI and machine learning, training is the fundamental process of teaching a computer model to perform a task by showing it examples. It’s how a generic algorithm learns the specific skills needed to become a specialized tool.

Training (AI/ML): The Process of Teaching a Model to Perform a Task from Examples

Transfer Learning

Transfer learning is a machine learning method where a model developed for one task is reused as the starting point for a model on a second, related task, allowing AI to learn new things faster and with less data.

Transfer Learning: Reusing a Model Trained on One Task as the Starting Point for Another

Transformer Architecture

Transformer architecture is a type of neural network designed to handle sequential data, like sentences or paragraphs, by allowing the model to weigh the importance of different pieces of data in the sequence.

Transformer Architecture: The Neural Network Design Behind Modern Language Models

Translator Prompt

Translator prompts are specialized instructions designed to guide artificial intelligence systems in performing translation tasks with specific requirements for accuracy, cultural sensitivity, and contextual appropriateness.

Translator Prompt: Instructions That Guide AI to Perform Translation with Specific Requirements

Tree of Thoughts (ToT)

Tree of Thoughts (ToT) is an advanced prompting framework that allows large language models to solve complex problems by generating multiple possible reasoning paths, evaluating the promise of each path, and using search algorithms to explore, look ahead, or backtrack until a solution is found.

Tree of Thoughts (ToT): Enabling Strategic Lookahead in Language Models

Unsupervised Learning

Unsupervised learning is a type of machine learning where the AI model is given a dataset without any explicit instructions or labeled examples, and it must find the underlying structure, patterns, and relationships on its own.

Unsupervised Learning: Finding Patterns in Data Without Labels or Predefined Answers

User prompts are specific instructions, questions, or requests that individuals give to artificial intelligence systems to guide their responses or outputs. They serve as the primary interface for human-AI communication, determining both the content and quality of AI-generated results.

User Prompts: The Instructions and Questions Humans Submit to AI Systems

AI validation is the process of determining whether an artificial intelligence system meets its intended purpose and performs correctly across a range of conditions and scenarios.

Validation (AI): Determining Whether an AI System Meets Its Intended Purpose

A Vector DB is a specialized database designed to store and query embeddings, which are numerical representations of unstructured data like text, images, or audio. This allows AI systems to retrieve data based on meaning and relationships rather than exact matches.

Vector DB: A Database Designed to Store and Query High-Dimensional Embedding Vectors

Vector search is a machine learning method that transforms data—whether it’s text, images, audio, or video—into a rich, numerical representation called a vector embedding. It then finds similar items by searching for vectors that are close to each other in a high-dimensional space, effectively searching by meaning and context rather than by exact keywords.

Vector Search: Finding Similar Items by Searching a Space of Numerical Embeddings

A vector store is a specialized database designed to organize and retrieve feature vectors—numerical representations of data like text, images, or audio. These stores are essential in AI and machine learning workflows, enabling high-speed searches, efficient comparisons, and pattern recognition across vast datasets.

Vector Store: A Database That Organizes and Retrieves Feature Vectors for AI Applications

AI versioning is the systematic tracking and management of changes to artificial intelligence models, their code, data, and environments throughout their lifecycle. It creates a historical record that enables reproducibility, collaboration, and responsible deployment of AI systems.

Versioning (AI): Systematically Tracking Changes to Models, Data, and Code Over Time

vLLM is a purpose-built inference engine that excels at serving large language models (LLMs) at high speed and scale—especially in GPU-rich, high-concurrency environments.

vLLM: A High-Throughput Inference Engine for Serving Large Language Models at Scale

What AI Gets Wrong — And Why That's Built Into How It Works

AI makes mistakes in ways that are fundamentally different from software bugs. The main failure modes — hallucination, brittleness, and bias — aren't accidents waiting to be patched; they're structural properties of how these systems learn.

What AI Gets Wrong — And Why That's Built Into How It Works

What Is AI and Why Does It Matter Right Now?

AI is software that learns from data to perform tasks that used to require human judgment — and it matters right now because a convergence around 2022 made it general-purpose enough to affect almost everyone's work.

What Is AI and Why Does It Matter Right Now?

When You Have Labels and When You Don't — Choosing the Right Learning Approach

The choice between supervised and unsupervised learning comes down to one practical question: do you have labeled data, and is labeling it worth the cost? The answer shapes everything about how a system gets built.

When You Have Labels and When You Don't — Choosing the Right Learning Approach

Which Learning Approach Does What — A Practical Guide to AI Learning Paradigms

Supervised, unsupervised, and reinforcement learning each solve a different kind of problem, and real AI systems often combine all three. Knowing which approach does what is the practical foundation for understanding how any AI system was built.

Which Learning Approach Does What — A Practical Guide to AI Learning Paradigms

Which Type of AI Will You Actually Work With? A Practical Guide

Most people will work closely with LLMs and generative AI, encounter NLP and operational AI constantly without realizing it, and interact with ambient intelligence mostly without noticing. Which ones matter most depends almost entirely on what you do.

Which Type of AI Will You Actually Work With? A Practical Guide

Why Deep Learning Was the Breakthrough That Changed Everything

Neural networks existed for decades before deep learning took over. This piece explains what specifically changed when networks went deep — and why that shift produced the explosion of AI capability we see today.

Why Deep Learning Was the Breakthrough That Changed Everything

Zero-Shot Learning (ZSL)

Zero-shot learning (ZSL) is a machine learning paradigm where a model can correctly identify objects or concepts from classes it has never seen during its training. Unlike traditional supervised learning, which requires a massive, labeled dataset for every single category the model needs to recognize, zero-shot learning equips a model with the ability to make educated guesses about the unknown.

Zero-Shot Learning (ZSL): Recognizing Categories a Model Has Never Seen During Training

Zero-Shot Prompting

Zero-shot prompting refers to the practice of guiding a language model to perform a task through a direct instruction without including any examples of the task in the prompt.

Zero-Shot Prompting: Directing AI to Complete a Task Without Providing Any Examples

Enhancing support teams with AI that works alongside humans, not replacing them.