Application Programming Interface
An API is a way for software applications to talk to each other. In the AI world, when a company like Anthropic or OpenAI offers an API, it means developers can send text to the AI model and get responses back — all through code, without using a chat interface. This is how AI gets built into apps, websites, and products.
AI that can take actions on its own
An AI Agent is a system that can independently plan, make decisions, and take actions to accomplish a goal — not just answer a single question. Instead of waiting for you to tell it exactly what to do at each step, an agent can break down a complex task, use tools (like browsing the web, writing code, or reading files), and keep going until the job is done. Agents are built on top of large language models but add a layer of autonomy and tool use.
How AI decides what matters
Attention is the mechanism that lets a Transformer model figure out which words in a sentence are most important to each other. In the sentence "The cat sat on the mat because it was tired", attention helps the model understand that "it" refers to "the cat", not "the mat". This ability to connect distant words is what makes Transformers so powerful.
How much the AI can "see" at once
The context window is the total amount of text an AI model can consider during a conversation. This includes everything — your question, any documents you share, and the AI's own response. If the conversation gets too long, the model starts "forgetting" the beginning.
Central Processing Unit
The CPU is the main brain of every computer. It handles general tasks like running your apps, browsing the web, and managing your operating system. It's great at doing many different things, one after another, very fast.
Turning words into numbers
An embedding is a way to represent a word, sentence, or document as a list of numbers (a "vector") that captures its meaning. Words with similar meanings end up with similar numbers. This is how AI understands that "dog" and "puppy" are related, or that "Paris" is to "France" what "Tokyo" is to "Japan".
Specializing a pre-trained model
Fine-tuning takes an AI model that's already been trained on general data and gives it extra training on a specific topic or task. For example, you might fine-tune a general language model on medical texts so it becomes much better at answering health-related questions. It's faster and cheaper than training from scratch.
Graphics Processing Unit
A GPU was originally built to render video game graphics, but it turns out it's incredibly good for AI too. Unlike a CPU that handles tasks one by one, a GPU can process thousands of small calculations at the same time — which is exactly what AI models need.
Safety limits on AI behavior
Guardrails are the rules and restrictions built into AI systems to prevent them from producing harmful, dangerous, or inappropriate content. They're set up during and after training to make sure the AI refuses to help with illegal activities, doesn't generate hate speech, and stays helpful and honest.
When AI makes things up
A hallucination happens when an AI model confidently generates information that sounds correct but is actually false or made up. This happens because the model predicts what "sounds right" based on patterns, not because it actually "knows" facts. It might invent fake research papers, made-up statistics, or non-existent people.
When the AI actually answers you
Inference is what happens every time you ask an AI a question and it generates a response. The model uses everything it learned during training to predict the best answer, one token at a time. This is why you often see AI responses appear word by word — it's literally generating each piece in sequence.
How fast the AI responds
Latency is the delay between when you send a request to an AI and when you start receiving a response. Lower latency = faster response. It depends on the model size, the hardware it runs on, how far the server is from you, and how many people are using it at the same time.
Large Language Model
An LLM is a very large AI model (usually based on the Transformer architecture) that has been trained on enormous amounts of text to understand and generate human language. "Large" refers to both the amount of training data and the number of parameters. Models like Claude, GPT-4, and Llama are all LLMs.
Model Context Protocol
MCP is an open standard (created by Anthropic) that lets AI models connect to external tools and data sources in a universal way. Instead of building a custom integration for every tool (Google Drive, Slack, a database, etc.), MCP provides one standard "plug" that works everywhere. Any AI that supports MCP can talk to any tool that supports MCP — no custom code needed each time.
The AI's learned knowledge, as a file
Weights are the actual numbers that make up a trained AI model — they're the result of all that training. When you "download a model", you're downloading its weights. These files can be enormous (tens or hundreds of gigabytes) and represent everything the model has learned.
The structure that makes AI "think"
A neural network is a system of math inspired by the human brain. It's made up of layers of connected "neurons" (tiny math functions) that pass information forward. Data goes in one side, gets processed through many layers, and a result comes out the other side. All modern AI — image recognition, language models, self-driving cars — is built on neural networks.
An AI agent that controls your computer
OpenClaw (also known as Claw) is a free, open-source AI assistant that lives on your computer and can actually see your screen, use your apps, and do real work for you. Unlike a chatbot that only answers questions, OpenClaw is an autonomous agent — it can click buttons, fill out forms, browse the web, write code, send messages, and automate entire workflows across your apps. It's one of the fastest-growing open-source projects ever, with over 80K GitHub stars.
Who can see the code
An open-source AI model (like Meta's Llama or Mistral) shares its code and weights publicly — anyone can download, modify, and run it. A closed-source model (like GPT-4 or Claude) is kept private — you can only use it through the company's API or app. Open source gives you more control and privacy; closed source often means better performance and easier setup.
When a model shares its brain, but not its recipe
Open weights means a company releases the trained model weights (the final result of training) so anyone can download and run the model, but doesn't share the training code, the training data, or the full process used to create it. This is different from truly open source, where everything is shared. Models like Meta's Llama and Mistral are open weights — you can use them freely, but you can't fully reproduce how they were made.
The AI's internal knowledge dials
Parameters are the numbers inside an AI model that get adjusted during training. A model like GPT-4 has hundreds of billions of parameters. More parameters generally means the model can learn more complex patterns and give better answers — but it also needs more computing power to run.
What you say to the AI
A prompt is the text you type to tell the AI what you want. It can be a question, an instruction, or even a whole document you want the AI to work with. The better and clearer your prompt, the better the AI's response will be — this is why "prompt engineering" has become an important skill.
Making AI models smaller and faster
Quantization is a technique that shrinks an AI model by reducing the precision of its numbers. Instead of storing each parameter as a very precise 32-bit or 16-bit number, you round it down to 8-bit or even 4-bit. The model becomes much smaller (2x to 8x) and runs faster, with only a small drop in quality. This is what makes it possible to run powerful AI models on a laptop or a phone instead of needing a massive server.
Retrieval-Augmented Generation
RAG is a technique where the AI first searches through a database of documents to find relevant information, then uses that information to generate a more accurate answer. Instead of relying only on what it memorized during training, the AI can reference real, up-to-date sources. This dramatically reduces hallucinations.
AI you fully own and control
Sovereign AI means running AI infrastructure on your own servers or in your own data centers, instead of relying on a big tech company's cloud. This way, your data never leaves your control, you're not dependent on anyone else's platform, and you comply with local regulations. It's especially important for governments, banks, and healthcare companies.
How creative vs. predictable the AI is
Temperature is a setting (usually between 0 and 1) that controls how "random" the AI's responses are. A low temperature (like 0.1) makes the model very predictable and factual — it picks the most likely next word every time. A high temperature (like 0.9) makes it more creative and surprising, but also more likely to make mistakes.
The building block of AI text
A token is a small chunk of text that an AI model reads and generates. It can be a whole word like "hello", part of a word like "un" + "believe" + "able", or even a single character. When people say a model can handle "128K tokens", they mean it can read and remember roughly 100,000 words of text at once. Tokens are also how AI companies charge you — you pay per token sent (input) and per token generated (output). So the longer your question and the longer the AI's answer, the more it costs.
Tensor Processing Unit
A TPU is a custom chip designed by Google specifically for AI workloads. While GPUs are general-purpose parallel processors repurposed for AI, TPUs are built from the ground up to do one thing extremely well: the tensor math that powers neural networks. They're used inside Google's data centers to train and run models like Gemini. You can't buy a TPU — you rent access through Google Cloud.
How an AI learns
Training is the process of teaching an AI model by showing it massive amounts of data — billions of pages of text, code, and more. During training, the model adjusts millions (or billions) of internal settings called "parameters" until it gets good at predicting what comes next in a sentence. This process requires enormous computing power and can take weeks or months.
The architecture behind modern AI
The Transformer is the type of neural network architecture that powers almost all modern AI language models (ChatGPT, Claude, Gemini, etc.). Invented by Google in 2017, its key innovation is "attention" — the ability to look at all the words in a sentence at once and figure out which ones are most relevant to each other, instead of reading them one by one.
Video Random Access Memory
VRAM is the memory built directly onto a GPU. When an AI model runs, it needs to keep a huge amount of data in memory all at once. The more VRAM your GPU has, the bigger the AI model it can run. This is why high-end GPUs with 80GB or more of VRAM are so valuable for AI.
Explore how IG1 builds sovereign AI infrastructure and custom AI solutions for enterprises worldwide.