AILLMTutorial

Deconstructing AI

A No-Nonsense Guide to the Terminology Explosion

Xiaohan··12 min read

Welcome. If you feel overwhelmed by the daily avalanche of AI buzzwords—Agents, RAG, MCP, Function Calling—you are in the right place. Today, we are going to strip away the marketing hype and look at the "undergarments" of these intimidating concepts.

You will discover that many of these "revolutionary" ideas are often just "old wine in new bottles." Ultimately, an "Agent" is often just the parts of a workflow that don't require intelligence, wrapped around a model that does.

Clear your mind, forget what you think you know, and let's rebuild the AI landscape from first principles.

Large Language Model (LLM)

The chaos begins here. Decades ago, language models were simplistic—essentially statistically primitive. However, as researchers increased the parameter size (the number of variables in the model), a critical threshold was crossed. Intelligence seemed to "emerge." To distinguish these powerful giants from their simplistic ancestors, we added "Large" to the name.

The Reality: At its core, an LLM is a Next Token Predictor. It is an engine designed to play a sophisticated game of "fill in the blank." It outputs the most statistically probable next word based on the words that came before it. Without guidance, it is just an autocomplete engine on steroids.

How LLMs Predict the Next Token

Thecatsatonthe___
mat
42%
roof
25%
floor
18%
table
10%
other
5%

Prompt & Context

To make this "autocomplete engine" useful, we assign roles. Imagine you are the Boss, and the LLM is your employee.

  • Prompt: This is the specific instruction or query you give the employee.
  • Context: This is the background information the employee needs to know to answer the prompt.

For example, instead of just saying "Write an email" (Prompt), you might provide: "You are a polite customer support agent. The customer is angry about a late delivery" (Context). By explicitly separating instructions from background data, you guide the probability engine to a specific, useful outcome.

Memory

Here is the catch: The basic LLM has no brain to store your conversation. It answers one question and immediately forgets you exist.

To create the illusion of a continuous conversation, engineers developed a trick. Before you ask your second question, the system secretly pastes your first question and the model's first answer back into the Context.

The Mechanism: "Memory" in LLMs is simply the chat history re-fed into the model every time you hit enter. Since context windows have size limits, we often ask the model to summarize old conversations to save space. This summary becomes the "Memory."

Agent

Eventually, you realize the LLM has a flaw: it is trapped in a box. It cannot access the internet, check the weather, or query your database. It hallucinates answers because it relies only on its training data.

To fix this, we wrap the LLM in a program. If the LLM doesn't know an answer, we tell it: "Ask for help." The wrapper program then performs the action (like a Google search) and feeds the result back to the LLM.

The Reality: We call this wrapper an Agent. An Agent is simply an LLM acting as a "router"—deciding which external software tool to trigger. It is a logic controller powered by natural language.

Agent = LLM + Tools + Memory Loop

UserAgent (Wrapper Program)LLM CoreMemoryRouterWeb SearchDatabaseCalculatorCalendar API

Retrieval-Augmented Generation (RAG)

Giving an Agent access to the entire internet is messy. Sometimes you want it to access your private company data.

  • The Problem: You can't paste 10,000 PDF pages into the prompt (it's too expensive and exceeds limits).
  • The Solution: You use a Vector Database. This converts your text into numbers (vectors). When you ask a question, the database finds the text snippets that are mathematically similar to your query.
  • The Acronym: This process—retrieving relevant data and inserting it into the context to guide the answer—is called Retrieval-Augmented Generation (RAG).

Think of it as an "Open Book Exam." The model doesn't need to memorize the textbook; it just needs to know how to look up the relevant page before answering.

RAG Pipeline — The "Open Book Exam"

Question"What is X?"Embed→ VectorVector DBSimilaritySearch📄 📄 📄Context +PromptLLMAnswerUser QueryVectorizeRetrieveAugmentGenerate

Function Calling

When an Agent needs to use a tool (like a calculator or a calendar API), relying on natural language is risky. If the model says, "I think I should check the calendar," a computer program cannot execute that sentence.

The Solution: We force the model to output a strict data format, usually JSON. This capability is Function Calling. It is simply a protocol (agreement) that allows the vague, artistic brain of an LLM to interface with the rigid, logical world of software code.

Function Calling — Bridging Language & Code

Human:

"Schedule a meeting tomorrow at 2 PM"

// LLM structured output
{
  "function": "calendar_add",
  "date": "2026-02-11",
  "time": "14:00"
}
calendar_add() executed successfully

Model Context Protocol (MCP)

As we build more tools, we run into a scaling problem. How does the Agent know which tools are available? How does it connect to a Google Drive tool versus a Slack tool?

The Solution: MCP (Model Context Protocol). Think of MCP as a "USB standard" for AI tools. It is a universal specification that defines how an AI connects to data sources and tools.

  • The LLM is the brain.
  • The MCP Server provides the tools.
  • The MCP Client (the Agent) acts as the bridge.

Instead of hard-coding every integration, the Agent asks the MCP server: "What tools do you have?" and the server replies with a list. It standardizes the connection.

MCP — The "USB Standard" for AI Tools

LLM(Brain)MCP Client(Bridge / Agent)MCP Server AGoogle DriveMCP Server BSlack… more serversUniversal "USB" Protocol

LangChain & Workflow

Let's look at automating complex tasks. Suppose you want to scrape a competitor's website, summarize their pricing, and save it to a spreadsheet.

You have two main ways to build this:

  1. LangChain: A code-heavy framework. It chains steps together programmatically. It is powerful but rigid—if the website structure changes, the code might break.
  2. Workflow: The low-code version. You drag and drop blocks on a canvas (e.g., "Input" → "Summarize" → "Save").
The Criticism: Both approaches are somewhat brittle. They rely on "hard-coded" logic. If the task changes slightly (e.g., the source is a PDF instead of a website), the chain fails unless you wrote specific if/else logic for every scenario.

Skill

To solve the rigidity of Workflows, we have the concept of a Skill. A Skill is essentially a directory containing a prompt file (often skill.md or similar) and some scripts.

  • How it works: Instead of hard-coding the steps, you describe the capability in plain English in the text file. You tell the Agent: "Read this file to understand how to perform the task."
  • The Benefit: It bridges the gap between total freedom (unreliable) and rigid coding (inflexible). It allows the Agent to dynamically decide when to use the script based on the instructions.

Sub-Agent

For massive tasks, one Agent gets confused. The context becomes too long and "noisy."

The Solution: Break the task down.

  • Master Agent: "I need to build a software app."
  • Sub-Agent A (Coder): Writes the code.
  • Sub-Agent B (Tester): Reviews the code.

A Sub-Agent is just a separate instance of an LLM with a specific prompt and a clean context history, dedicated to a niche task. It prevents "context pollution."

Multi-Agent System

A Sub-Agent handles one piece of work. But what happens when the entire system is designed around multiple agents working together from the start? That is a Multi-Agent System (MAS).

Instead of one brain trying to do everything, you architect a team:

  • Orchestrator: A central planner that breaks the overall goal into tasks and assigns them to specialized agents.
  • Specialized Agents: Each agent has its own LLM instance, system prompt, tool set, and memory—optimized for one role (research, coding, QA, writing, etc.).
  • Communication Protocol: Agents pass messages through a shared bus, structured handoffs, or direct function calls. The format of these messages is critical—without a protocol the system devolves into chaos.
The Key Insight: A Multi-Agent System is an architecture pattern. It answers the question: "How do I organize multiple AI workers to solve a problem together?" Think of it as building a company org chart for AI—with departments, roles, and reporting lines.

Real-world examples include Microsoft's AutoGen, CrewAI, and LangGraph—all frameworks that let you define agent teams, their roles, and how they collaborate.

Multi-Agent System — Specialized Agents Collaborating

OrchestratorTask Planner & CoordinatorAgent AResearcherWeb Search · RAGAgent BCoderIDE · TerminalAgent CReviewer / QALinter · TestsShared Message Bus / Communication Protocol↑ Agents pass results back to Orchestrator

Agentic AI

Now here is where confusion peaks. People use "Agentic AI" and "Multi-Agent System" interchangeably. They are not the same thing.

Agentic AI is a behavioral paradigm, not an architecture. It describes any AI system that exhibits autonomous, goal-directed behavior—the ability to independently perceive its environment, make plans, take actions, observe results, and adjust course without human hand-holding at every step.

The Core Distinction:

Multi-Agent System = "How many brains, and how are they organized?" (architecture)
Agentic AI = "Does the AI act autonomously toward a goal?" (behavior)

A single agent can be "Agentic" if it runs an autonomous loop. A Multi-Agent System is often Agentic, but not necessarily—you could have a rigid multi-agent pipeline with zero autonomy.

The defining feature of Agentic AI is the autonomous loop:

  1. Perceive: Observe the current state of the world (read files, check API responses, parse errors).
  2. Plan: Decide what to do next based on the goal and current state.
  3. Act: Execute the plan (write code, call tools, send messages).
  4. Reflect: Evaluate the result. Did it work? What went wrong? Adjust the plan.
  5. Repeat until the goal is satisfied or a termination condition is hit.

This loop is what makes AI "agentic." Without it, you just have a chatbot that responds to one prompt at a time. With it, you have a system that can tackle open-ended tasks like "debug this codebase" or "write a research paper"—iterating until the job is done.

Why this matters practically: When someone says "we built an Agentic AI," they mean the system can reason, plan, and act in a loop. When someone says "we built a Multi-Agent System," they mean the system has multiple specialized AI workers. The best modern systems—like Devin, OpenAI's Operator, or Claude's computer use—are both: agentic behavior implemented through a multi-agent architecture.

Agentic AI vs. Multi-Agent System — Key Differences

DimensionAgentic AIMulti-Agent System
Core UnitSingle autonomous agentMultiple collaborating agents
DecisionSelf-directed planning loopDistributed / delegated
ArchitecturePerceive → Plan → Act → ReflectOrchestrator → Agents → Bus
When to UseOpen-ended, evolving goalsComplex, parallelizable tasks
RiskUnpredictable actionsCoordination overhead

The Agentic Loop — Perceive · Plan · Act · Reflect

PerceivePlanActReflectcontinuous loop until goal is met

A Unified Methodology: The Spectrum of Control

We can categorize all these terms onto a spectrum of Stability vs. Flexibility:

  1. Hard Code (LangChain / Code): Maximum Stability, Minimum Flexibility. You define every step. Good for repetitive, identical tasks.
  2. Workflow (Low-Code): High Stability, Low Flexibility. Easier to visualize, but still a rigid pipeline.
  3. Skill (Hybrid): Balanced. You provide the tools and a manual (prompt), but let the AI decide exactly how to execute the steps.
  4. Pure Agent (Autonomous): Minimum Stability, Maximum Flexibility. You give a goal and the Agent writes its own scripts and plans its own path.

The Spectrum of Control: Stability vs. Flexibility

Stability Flexibility
Hard Code
Stability
95%
Flexibility
10%
Workflow
Stability
75%
Flexibility
30%
Skill
Stability
50%
Flexibility
55%
Pure Agent
Stability
15%
Flexibility
95%
← More PredictableMore Autonomous →

The Future

Currently, we use Workflows and Skills because LLMs are expensive and prone to error. We need to constrain them. However, as "Token" costs drop to near zero and models become smarter, we will shift toward Pure Agents.

Just as software development moved from Assembly to Python/Spring Boot to maximize developer convenience (ignoring the massive increase in computing power required), AI interaction will move toward maximum user convenience.

We are heading toward a "Super Agent" future—where concepts like MCP, RAG, and Skills are hidden "implementation details." You won't configure a "Skill"; you will just speak, and the system will intuitively understand which tool to wield.


If this breakdown helped clarify the fog of AI terminology, please consider sharing this post.