Embedding

AI & LLM Testing

// Definition

A numerical vector representation of text (or images, or audio) that captures meaning in a way machines can compare. Two sentences with similar meaning produce embeddings that are close together in vector space. Embeddings power retrieval in RAG systems, semantic search, and clustering. In QA work, knowing about embeddings matters because they determine what gets retrieved in a RAG pipeline — and bad retrieval is one of the most common reasons AI products give wrong answers.

// Related terms

Retrieval-Augmented Generation (RAG)
A pattern where an LLM is given relevant context retrieved from an external source (a vector database, a search index, a document store) before being asked to generate an answer. The LLM doesn't 'know' the answer from training — it reads what was retrieved and synthesises a response. RAG is how chatbots answer questions about your company's docs without those docs being baked into the model. From a QA perspective, RAG systems have two failure surfaces: retrieval (did the system find the right context?) and generation (did the LLM use the context faithfully, or did it hallucinate?). Testing must cover both, separately.
Large Language Model (LLM)
A neural network trained on massive text datasets to predict the next word in a sequence. Modern LLMs like Claude, GPT-4, and Gemini can answer questions, write code, summarise documents, and follow multi-step instructions — but they don't 'know' anything, they predict plausible continuations from patterns in training data. This is why they sometimes produce confident-sounding falsehoods (hallucinations) and why prompt design matters so much. In QA, LLMs are useful for generating test scaffolding, summarising bug reports, and drafting documentation — but their output always needs human review before it ships.
Model Context Protocol (MCP)
An open standard introduced by Anthropic in late 2024 that lets AI assistants connect to external tools and data sources through a uniform JSON-RPC interface. An MCP server exposes tools (callable functions), resources (readable data), and prompts (templates) to any MCP-compatible host (Claude Desktop, Claude Code, IDE plugins). Build a server once and any compliant client can use it — the protocol is model-agnostic, which makes integrations portable across AI providers.