Every term you're likely to hear, defined in a sentence — then in a paragraph if you want the longer version. No jargon for jargon's sake. No pretending simple ideas are complicated.
When people say "ChatGPT" or "Claude" or "Gemini," they usually mean a specific AI model plus the app that lets you talk to it. The model itself is just the trained system — the numbers — that takes your text in and produces text out.
"Help me with my lease" is a prompt. So is a three-paragraph briefing with goals, context, and expectations. The difference between those two is where the single biggest skill improvement comes from — see The Three Questions.
Example: "Role: Act as a property manager. Task: Summarize these three tenant emails. Format: One paragraph each, flag urgency." RTF is the fastest framework to remember and works across any AI tool. For Copilot-specific prompts, see GCES, which adds a Source field.
Goal — what outcome do you want?
Context — what background matters?
Expectations — what specific steps, format, length?
Source — which files or documents should it use?
Source is the most important part for anyone using Copilot with real documents. The Three Questions from the journey (What do I want? What would correct look like? What can't it know?) are the consumer-friendly cousin of GCES — same instincts, fewer letters.
Training a frontier model like GPT-5 reportedly costs around $500 million in compute. Once training is done, the model is fixed — it doesn't learn from you when you chat with it. (It may learn in aggregate if your chats are used to train future versions — which is why privacy settings matter.)
AI models don't store this data. They extract patterns from it into their weights. That's why a 1–2 TB model can answer about almost any topic — it learned the patterns in a bigger body of text, not the text itself.
Every model has a point where its training data ends. After that date, the model is frozen — it literally hasn't seen anything newer. Modern models bridge this with web search, but the underlying training is still a snapshot.
When AI processes your prompt, it's doing math with these weights to predict the next word. All the "knowledge" the model has is compressed into these numbers — which is why a ~1–2 TB model can answer about almost anything, but also why it occasionally makes things up.
Every chat you have is an inference. Modern inference costs fractions of a cent per query — in part because the training already paid for the hard work. The cost-per-query has dropped roughly an order of magnitude in the last two years.
Models don't think in letters or sentences — they think in tokens. API pricing is usually per million tokens. Token limits (e.g., "128K context") tell you how much text a model can hold in its head at once.
Modern models have context windows of 100,000 to over a million tokens. If a conversation goes longer than the context window, the earliest parts fall out — the model literally forgets them. Long chats drift; when that happens, start a new chat.
AI samples from probabilities each time it picks the next word. That's why "ask again" often gets a better answer, why you can't reliably reproduce a bad one, and why two people asking the same question get different responses. You saw this live at Station 3 of the journey.
Hallucinations come from the same engine that produces good answers. Models are trained to sound confident; they're not trained to say "I don't know." Newer models are measurably better at flagging uncertainty, but the core issue hasn't gone away — so verifying anything that matters remains the safe habit.
The single biggest quality upgrade most people can make: instead of asking a question in the abstract, paste or attach the document and ask about that. Accuracy jumps from "sometimes" to "almost always." This is also the concept behind Copilot Notebooks, custom GPTs, and enterprise AI tools that plug into your files.
When you upload a document to Copilot or ChatGPT and ask about it, RAG is what's happening: the system finds the relevant chunks, feeds them to the model as context, and the model answers from them. Why this matters: grounded answers hallucinate far less than ungrounded ones. Copilot Notebooks are essentially "a bucket of documents for RAG to use whenever you chat."
This closes much of the gap from the training cutoff — the model can fetch current info and cite it. But the core engine is still pattern-based. It can grab stale articles, misread live status, or miss the most current source. Treat web-enabled AI as "strong first pass," not "verified answer."
ChatGPT, Claude, and Gemini all let you turn on persistent memory. The AI stores facts you've told it (preferences, your job, your projects) and recalls them across chats. Useful — but it's opt-in, the model isn't "intelligent about you," and you should review what's been stored from time to time.
Regular chat is one-and-done: you ask, it answers. An agent has a goal and works through steps — reading files, running tools, checking its work, asking clarifying questions. This is the frontier — powerful, but still needs human oversight for anything high-stakes.
An LLM is the model itself — the trained pattern-predictor. Products like ChatGPT wrap an LLM (GPT-4o, GPT-5, etc.) with a chat interface, memory features, file uploads, web search, and other tools. Different companies build different LLMs; most consumer apps today use one of a handful.
A few flavors to know apart: Copilot (free) is consumer chat at copilot.microsoft.com. Microsoft 365 Copilot ($30/user/month) is the business version that knows your work files. GitHub Copilot is for writing code. Copilot Studio is for building custom agents. If someone says "Copilot" at a company, they usually mean Microsoft 365 Copilot.
Think of it as a Copilot scoped to a project. Put all the files for "Pike 3400 Leasing" in one Notebook, and now every chat draws only from those documents — no noise from unrelated content. This is enterprise-grade RAG with a friendly interface. Microsoft renamed this from "Projects" to "Notebooks" in March 2026 — you may still see "Projects" in older documentation.
A medical-specialist AI might start as a general model, then be fine-tuned on medical literature. Enterprise tools like custom GPTs and AI assistants often use a lightweight version of this — or just a smart prompt — to specialize behavior without full retraining.
Meta's Llama, DeepSeek, Mistral, and Alibaba's Qwen are open-weights models. You can download and run them (if you have the hardware). GPT-5, Claude, and Gemini are closed — you access them only through the provider's API. Open-weights models are driving a lot of the efficiency innovation.
Cousin-level tools can now produce convincing fakes. Used in scams, political misinformation, and ordinary content creation. Detection tools, provenance standards (like C2PA), and device-level signing are catching up — but slower than creation. If a clip makes you feel strong emotion, wait 30 seconds before sharing.
Verification doesn't mean checking every word. It means picking the part of the answer that will drive your decision — and confirming that part against a primary source, expert, or test. For a recipe, cook a bite. For a bill, call the biller. For a law, look at the statute. Match the rigor to the stakes.
If there's an AI word you've been hearing and can't find a plain-English explanation, email me. I'll add it to this glossary and credit the ask.