AI Glossary

Usage

API

Application Programming Interface

An API is a way for software applications to talk to each other. In the AI world, when a company like Anthropic or OpenAI offers an API, it means developers can send text to the AI model and get responses back — all through code, without using a chat interface. This is how AI gets built into apps, websites, and products.

Think of it like: A drive-through window at a restaurant. You don't go into the kitchen — you just place your order at the window (send a request), and the food (response) gets handed back to you. The API is that window.

Usage

Agent

AI that can take actions on its own

An AI Agent is a system that can independently plan, make decisions, and take actions to accomplish a goal — not just answer a single question. Instead of waiting for you to tell it exactly what to do at each step, an agent can break down a complex task, use tools (like browsing the web, writing code, or reading files), and keep going until the job is done. Agents are built on top of large language models but add a layer of autonomy and tool use.

Think of it like: The difference between asking someone for directions (a chatbot) and hiring a personal assistant who plans your entire trip, books the flights, and sends you the itinerary (an agent). You give the goal, and the agent figures out the steps.

Architecture

Attention

How AI decides what matters

Attention is the mechanism that lets a Transformer model figure out which words in a sentence are most important to each other. In the sentence "The cat sat on the mat because it was tired", attention helps the model understand that "it" refers to "the cat", not "the mat". This ability to connect distant words is what makes Transformers so powerful.

Think of it like: In a crowded room, you can focus on one conversation even though many people are talking. Your brain "pays attention" to what's relevant and filters out the noise. That's what the AI's attention mechanism does with words.

Language

Context Window

How much the AI can "see" at once

The context window is the total amount of text an AI model can consider during a conversation. This includes everything — your question, any documents you share, and the AI's own response. If the conversation gets too long, the model starts "forgetting" the beginning.

Think of it like: A whiteboard in a classroom. You can only write so much before you run out of space and have to erase the oldest notes to keep writing new ones.

Hardware

CPU

Central Processing Unit

The CPU is the main brain of every computer. It handles general tasks like running your apps, browsing the web, and managing your operating system. It's great at doing many different things, one after another, very fast.

Think of it like: A super smart student who can solve any type of problem — math, science, writing — but works on them one at a time.

Language

Embedding

Turning words into numbers

An embedding is a way to represent a word, sentence, or document as a list of numbers (a "vector") that captures its meaning. Words with similar meanings end up with similar numbers. This is how AI understands that "dog" and "puppy" are related, or that "Paris" is to "France" what "Tokyo" is to "Japan".

Think of it like: GPS coordinates for meaning. Just like nearby cities have similar coordinates, similar words have similar number patterns. "Happy" and "joyful" are close together, while "happy" and "refrigerator" are far apart.

Training

Fine-Tuning

Specializing a pre-trained model

Fine-tuning takes an AI model that's already been trained on general data and gives it extra training on a specific topic or task. For example, you might fine-tune a general language model on medical texts so it becomes much better at answering health-related questions. It's faster and cheaper than training from scratch.

Think of it like: A doctor who already went to medical school (general training) and then does a residency in cardiology (fine-tuning). They already know medicine — now they're becoming a specialist.

Architecture

Frontier Model

The largest, most capable model in a family

A frontier model refers to the largest and most powerful version of an AI model family — the one with the most parameters. For example, Qwen3.5-397B is the frontier model of the Qwen 3.5 family, with 397 billion parameters. Unlike smaller versions (7B, 32B, 72B) that are often sufficient for simple tasks, the frontier model excels at complex work: multi-step reasoning, long document analysis, multi-file code generation. Most API providers don't offer the frontier model because it requires far more GPUs — IG1 AI pools this infrastructure to give you access.

Think of it like: The difference between a chamber ensemble and a full symphony orchestra. The string quartet plays simple pieces just fine, but for a Mahler symphony with all its complexity, you need all 100 musicians.

Hardware

GPU

Graphics Processing Unit

A GPU was originally built to render video game graphics, but it turns out it's incredibly good for AI too. Unlike a CPU that handles tasks one by one, a GPU can process thousands of small calculations at the same time — which is exactly what AI models need.

Think of it like: Instead of one genius student, imagine a classroom of 1,000 students who can each solve a simple math problem at the exact same time. Together, they finish way faster.

Safety

Guardrails

Safety limits on AI behavior

Guardrails are the rules and restrictions built into AI systems to prevent them from producing harmful, dangerous, or inappropriate content. They're set up during and after training to make sure the AI refuses to help with illegal activities, doesn't generate hate speech, and stays helpful and honest.

Think of it like: The bumper rails at a bowling alley. They keep the ball in the lane so you don't get a gutter ball. Guardrails keep the AI from going off-track into dangerous territory.

Safety

Hallucination

When AI makes things up

A hallucination happens when an AI model confidently generates information that sounds correct but is actually false or made up. This happens because the model predicts what "sounds right" based on patterns, not because it actually "knows" facts. It might invent fake research papers, made-up statistics, or non-existent people.

Think of it like: A student who didn't study for an exam but is really good at writing. They write a convincing essay that sounds authoritative — but the facts are completely made up.

Usage

Inference

When the AI actually answers you

Inference is what happens every time you ask an AI a question and it generates a response. The model uses everything it learned during training to predict the best answer, one token at a time. This is why you often see AI responses appear word by word — it's literally generating each piece in sequence.

Think of it like: Taking the exam after months of studying. You're not learning anymore — you're applying what you already know to answer new questions.

Hardware

Latency

How fast the AI responds

Latency is the delay between when you send a request to an AI and when you start receiving a response. Lower latency = faster response. It depends on the model size, the hardware it runs on, how far the server is from you, and how many people are using it at the same time.

Think of it like: The wait time at a coffee shop. Sometimes you get your drink in 30 seconds, sometimes there's a long line and it takes 5 minutes. The drink itself is the same — the wait is the latency.

Architecture

LLM

Large Language Model

An LLM is a very large AI model (usually based on the Transformer architecture) that has been trained on enormous amounts of text to understand and generate human language. "Large" refers to both the amount of training data and the number of parameters. Models like Claude, GPT-4, and Llama are all LLMs.

Think of it like: A student who has read the entire internet and can now write essays, answer questions, summarize articles, write code, and even crack jokes — all based on patterns they picked up from reading.

Architecture

MCP

Model Context Protocol

MCP is an open standard (created by Anthropic) that lets AI models connect to external tools and data sources in a universal way. Instead of building a custom integration for every tool (Google Drive, Slack, a database, etc.), MCP provides one standard "plug" that works everywhere. Any AI that supports MCP can talk to any tool that supports MCP — no custom code needed each time.

Think of it like: USB-C for AI. Before USB-C, every phone had a different charger. MCP is the universal connector — once a tool speaks MCP, any AI can use it, just like any device can use a USB-C cable.

Training

Model Weights

The AI's learned knowledge, as a file

Weights are the actual numbers that make up a trained AI model — they're the result of all that training. When you "download a model", you're downloading its weights. These files can be enormous (tens or hundreds of gigabytes) and represent everything the model has learned.

Think of it like: If the AI's training was years of school, the weights are the diploma + everything in the student's brain, saved as a file. You can copy that file and give the same "knowledge" to another computer.

Architecture

Neural Network

The structure that makes AI "think"

A neural network is a system of math inspired by the human brain. It's made up of layers of connected "neurons" (tiny math functions) that pass information forward. Data goes in one side, gets processed through many layers, and a result comes out the other side. All modern AI — image recognition, language models, self-driving cars — is built on neural networks.

Think of it like: A chain of phone calls. Person A tells something to 3 people, who each tell 3 more people, and so on. By the end, the message has been transformed through many rounds of "telephone" into a final answer.

Usage

OpenClaw

An AI agent that controls your computer

OpenClaw (also known as Claw) is a free, open-source AI assistant that lives on your computer and can actually see your screen, use your apps, and do real work for you. Unlike a chatbot that only answers questions, OpenClaw is an autonomous agent — it can click buttons, fill out forms, browse the web, write code, send messages, and automate entire workflows across your apps. It's one of the fastest-growing open-source projects ever, with over 80K GitHub stars.

Think of it like: Having a super-smart intern sitting at your computer 24/7. You tell them "book me a flight to Paris" or "clean up this spreadsheet", and they actually do it — moving the mouse, typing, switching between apps — all on their own.

Usage

Open Source vs. Closed Source

Who can see the code

An open-source AI model (like Meta's Llama or Mistral) shares its code and weights publicly — anyone can download, modify, and run it. A closed-source model (like GPT-4 or Claude) is kept private — you can only use it through the company's API or app. Open source gives you more control and privacy; closed source often means better performance and easier setup.

Think of it like: An open-source recipe vs. a restaurant's secret sauce. With the recipe, you can cook it yourself, change the ingredients, and share it with friends. With the secret sauce, you just order it and enjoy — but you can't make it at home.

Training

Open Weights

When a model shares its brain, but not its recipe

Open weights means a company releases the trained model weights (the final result of training) so anyone can download and run the model, but doesn't share the training code, the training data, or the full process used to create it. This is different from truly open source, where everything is shared. Models like Meta's Llama and Mistral are open weights — you can use them freely, but you can't fully reproduce how they were made.

Think of it like: A chef who gives you the finished cake so you can eat it, share it, and even decorate it however you want — but doesn't give you the recipe, the oven settings, or the list of ingredients. You have the result, not the process.

Training

Parameters

The AI's internal knowledge dials

Parameters are the numbers inside an AI model that get adjusted during training. A model like GPT-4 has hundreds of billions of parameters. More parameters generally means the model can learn more complex patterns and give better answers — but it also needs more computing power to run. For example, a frontier model like Qwen3.5-397B (397 billion parameters) significantly outperforms its smaller versions (7B, 32B) on complex reasoning and long document analysis.

Think of it like: The knobs on a massive mixing board in a recording studio. Each knob controls a tiny part of the sound, and you need to adjust all of them perfectly to get the right music. More knobs = more nuance.

Usage

Prompt

What you say to the AI

A prompt is the text you type to tell the AI what you want. It can be a question, an instruction, or even a whole document you want the AI to work with. The better and clearer your prompt, the better the AI's response will be — this is why "prompt engineering" has become an important skill.

Think of it like: Ordering food at a restaurant. "Give me something good" might work, but "I'd like a medium-rare steak with fries and no onions" gets you exactly what you want.

Training

Quantization

Making AI models smaller and faster

Quantization is a technique that shrinks an AI model by reducing the precision of its numbers. Instead of storing each parameter as a very precise 32-bit or 16-bit number, you round it down to 8-bit or even 4-bit. The model becomes much smaller (2x to 8x) and runs faster, with only a small drop in quality. This is what makes it possible to run powerful AI models on a laptop or a phone instead of needing a massive server.

Think of it like: Compressing a photo from RAW to JPEG. The file gets way smaller, it loads much faster, and to most people it looks basically the same — you only notice a tiny loss in quality if you zoom in really close.

Architecture

RAG

Retrieval-Augmented Generation

RAG is a technique where the AI first searches through a database of documents to find relevant information, then uses that information to generate a more accurate answer. Instead of relying only on what it memorized during training, the AI can reference real, up-to-date sources. This dramatically reduces hallucinations.

Think of it like: Instead of answering a test from memory alone, the teacher lets you bring your notes. You can look up the real answer before writing it down, so you're much more accurate.

Safety

Sovereign AI

AI you fully own and control

Sovereign AI means running AI infrastructure on your own servers or in your own data centers, instead of relying on a big tech company's cloud. This way, your data never leaves your control, you're not dependent on anyone else's platform, and you comply with local regulations. It's especially important for governments, banks, and healthcare companies.

Think of it like: The difference between renting an apartment and owning your house. When you own it, nobody can change the rules on you, raise the rent, or look through your stuff. Your house, your rules.

Usage

Temperature

How creative vs. predictable the AI is

Temperature is a setting (usually between 0 and 1) that controls how "random" the AI's responses are. A low temperature (like 0.1) makes the model very predictable and factual — it picks the most likely next word every time. A high temperature (like 0.9) makes it more creative and surprising, but also more likely to make mistakes.

Think of it like: A music playlist. Temperature 0 always plays your #1 most-played song. Temperature 1 puts everything on shuffle — you might discover something amazing or hear something weird.

Language

Token

The building block of AI text

A token is a small chunk of text that an AI model reads and generates. It can be a whole word like "hello", part of a word like "un" + "believe" + "able", or even a single character. When people say a model can handle "128K tokens", they mean it can read and remember roughly 100,000 words of text at once. Tokens are also how AI companies charge you — you pay per token sent (input) and per token generated (output). So the longer your question and the longer the AI's answer, the more it costs.

Think of it like: LEGO bricks for language — and each brick has a tiny price tag. Words get broken into small pieces, and the AI rebuilds them into sentences. The more bricks you use (longer conversations), the bigger the bill — just like texting used to cost per message.

Hardware

TPU

Tensor Processing Unit

A TPU is a custom chip designed by Google specifically for AI workloads. While GPUs are general-purpose parallel processors repurposed for AI, TPUs are built from the ground up to do one thing extremely well: the tensor math that powers neural networks. They're used inside Google's data centers to train and run models like Gemini. You can't buy a TPU — you rent access through Google Cloud.

Think of it like: If a GPU is a Swiss army knife (good at many things), a TPU is a laser-cut scalpel — purpose-built for one job and incredibly efficient at it. But you can only use it in Google's hospital.

Training

How an AI learns

Training is the process of teaching an AI model by showing it massive amounts of data — billions of pages of text, code, and more. During training, the model adjusts millions (or billions) of internal settings called "parameters" until it gets good at predicting what comes next in a sentence. This process requires enormous computing power and can take weeks or months.

Think of it like: Studying for the biggest exam ever. The AI reads billions of pages and practices until it gets really good at understanding and generating language. Once trained, it doesn't need to study anymore — it just uses what it learned.

Architecture

Transformer

The architecture behind modern AI

The Transformer is the type of neural network architecture that powers almost all modern AI language models (ChatGPT, Claude, Gemini, etc.). Invented by Google in 2017, its key innovation is "attention" — the ability to look at all the words in a sentence at once and figure out which ones are most relevant to each other, instead of reading them one by one.

Think of it like: Instead of reading a book from page 1 to page 500 in order, imagine being able to instantly see every page at once and highlight the connections between ideas across the entire book. That's what attention does.

Hardware

VRAM

Video Random Access Memory

VRAM is the memory built directly onto a GPU. When an AI model runs, it needs to keep a huge amount of data in memory all at once. The more VRAM your GPU has, the bigger the AI model it can run. This is why high-end GPUs with 80GB or more of VRAM are so valuable for AI.

Think of it like: The size of your desk. A bigger desk means you can spread out more books and notes at once without having to constantly put things away and pull them back out.

API

Agent

Attention

Context Window

CPU

Embedding

Fine-Tuning

Frontier Model

GPU

Guardrails

Hallucination

Inference

Latency

LLM

MCP

Model Weights

Neural Network

OpenClaw

Open Source vs. Closed Source

Open Weights

Parameters

Prompt

Quantization

RAG

Sovereign AI

Temperature

Token

TPU

Training

Transformer

VRAM

Want to see AI in action?