Guide

What are AI agents? A complete technical guide.

ChatGPT was most people's introduction to AI. But for work, AI assistants are just the starting point. This guide explains how AI agents actually work — and why they're different.

The problem with AI assistants

ChatGPT is powerful — and also very limited for real work.

Three gaps stand between a general-purpose assistant and something you can trust with enterprise workflows.

No knowledge of your business

ChatGPT was trained on public data. It has never seen your contracts, your claims, your customers, or your internal docs.

Purely transactional

Every conversation starts from zero. No memory, no learning, no carry-over across workflows or teammates.

Lack of connections

Assistants can't read your CRM, call your APIs, post to Slack, or take action on systems without a human copy-pasting.

The AI agent tech stack

Five layers turn an LLM into an agent.

Each layer plays a different role. Together, they let agents reason, retrieve, and act.

Read from the bottom up: LLMs are the wide foundation; planning and orchestration sit at the narrow top.

05

Planning

What it is

The reasoning loop that decides what the agent should do next — the Planner, plus reflection and re-planning logic.

How it can be used

Break ambiguous goals (“process this claim”) into concrete sub-tasks, route to the right tools, and self-correct when a step fails.

04

Plugins & Tools

What it is

Functions the agent can call — REST APIs, SQL queries, file writers, webhooks, browser automation — wrapped with schemas so the LLM knows how to use them.

How it can be used

Connect agents to your CRM, ERP, ticketing system, and internal services so they can actually do the work, not just talk about it.

03

Prompting

What it is

Structured instructions that give the model context, constraints, examples, and output formats. The layer where domain logic and brand voice live.

How it can be used

Enforce your SOPs, safety rules, tone, and output contracts so the agent behaves predictably across every run.

02

Vector Database

What it is

An index of embeddings (numerical representations of meaning). Lets the agent find semantically similar pieces of your data.

How it can be used

Power RAG across your knowledge base, past tickets, product docs, and legal corpus — so the agent reasons on your specific truth.

01

LLMs

What it is

The large language model (Claude, GPT-4, Gemini, Llama) that generates text, reasons, and drives the whole stack.

How it can be used

The engine. Everything above is about getting the right inputs to the LLM and routing its outputs to the right place.

A deeper dive into each layer

The core technologies and patterns behind modern AI agents — explained without hype.

What they are: Transformers trained on billions of tokens of text and code. They predict the next token given a context, and that seemingly simple objective lets them reason, write, translate, and generate code.

Tokenization: Text is split into tokens — sub-word units — before the model sees it. Token counts drive cost and latency, which matters a lot when you operate agents at scale.

Training pipeline: Pre-training on a huge corpus, then supervised fine-tuning (SFT), then reinforcement learning from human feedback (RLHF) or similar alignment techniques. Frontier models add tool-use training, long-context training, and code-specific data.

In agents: The LLM is the reasoning engine. Choice of model (Claude, GPT-4, Gemini, Llama) affects cost, latency, tool-use fidelity, and long-context behaviour — all of which matter for production reliability.

Embeddings: A way to turn text (or images, audio, code) into a fixed-length vector that captures semantic meaning. Similar concepts end up near each other in vector space.

Semantic similarity: “Cancel my subscription” and “I want to end my membership” produce nearly identical vectors, even though they share no keywords. Classic search misses this; vector search finds it.

Components: An embedding model (often the same LLM provider), a vector index (Pinecone, Weaviate, pgvector, Chroma), and a retrieval layer that ranks and filters chunks before sending them to the LLM.

In agents: Vector DBs power RAG. Whenever the agent needs to ground a response in your data, it queries the index and feeds the top-k chunks into the prompt.

A prompt is more than a question — it's a contract with the model. A production-grade prompt typically has:

  • Role & context: Who the agent is and what domain it operates in.
  • Rules & constraints: What it must do, must not do, and how to handle edge cases.
  • Examples: Few-shot demonstrations of good behaviour.
  • Tool schemas: Which functions are callable and what they expect.
  • Output contract: A strict JSON schema or structured format so the rest of the system can parse the response reliably.

In production, prompts are versioned, evaluated against test sets, and swapped behind a feature flag — just like any other piece of code.

What it is: A pattern where, before calling the LLM, the system retrieves relevant context from a vector DB (or keyword index) and injects it into the prompt. The LLM answers based on fresh, specific data — not just what it memorised during training.

Why it matters: RAG lets you use the LLM over your private corpus — claims, contracts, product docs, past tickets — without fine-tuning. It's the fastest path to grounded, citable answers.

Good RAG is an engineering problem: chunking strategy, embedding model, re-ranking, hybrid search, context-window management, and eval all matter. Most “bad AI” complaints at enterprises trace back to weak RAG.

An agent without tools is just a chatbot. Tool use is what lets agents do things — create tickets, send emails, run SQL queries, fetch live data, hit internal APIs.

How it works: Tools are defined with JSON schemas. The LLM sees the schemas, decides which tool to call and with what arguments, and emits a structured tool-call. The executor runs it and feeds the result back.

Production concerns: idempotency (agents retry), permissions (some tools should require human approval), rate limits, and error handling. Every tool is a surface area that can fail.

For simple tasks, a single LLM call is enough. For real work, the agent needs to plan: break a goal into sub-steps, pick tools, evaluate results, and re-plan when something goes wrong.

Common patterns:

  • ReAct: Reason → Act → Observe, repeated until done.
  • Reflection: After each step, the agent critiques its own output and decides whether to retry, escalate, or continue.
  • Graph-based orchestration: Define the workflow as a state machine (via LangGraph or custom code) — the agent moves along deterministic edges between autonomous nodes.

This is the layer where agents earn their name. It's also where reliability is won or lost.

Apply this to your business

Want to see how these technologies apply to your workflows?

Share your context in a quick assessment — we'll map the right stack and use cases for your team.

No spam. We reply within 1 business day.

From assistant to agent

The three things that turn an LLM into an agent.

Knowledge, skillset, and autonomy — each one compounds the value of the one before.

01

Knowledge

Agents are grounded in your private corpus using retrieval-augmented generation (RAG) and vector databases, so every answer is informed by your real context.

02

Skillset

Agents call tools — APIs, databases, webhooks, internal functions — the same way a team member would open a tab and do the work themselves.

03

Autonomy

Agents plan multi-step workflows, adapt when something fails, and decide which tools to use at each step without a human writing the script.

Anatomy of an agent

The four components of an autonomous agent.

Under the hood, every production agent is composed of these four moving parts working in a loop.

Planner

Breaks the goal into sub-tasks and decides what to do next. Uses the LLM to reason over context, constraints, and available tools.

Executor

Actually runs the tools — calls APIs, queries databases, updates records — and captures the results back into the agents state.

Task Manager / Evaluator

Judges whether each step succeeded. On failure, it triggers a retry, a fallback, or escalation to a human.

Memory

Short-term (conversation state) and long-term (vector store of past runs, user preferences, learned shortcuts) memory the agent pulls from to stay consistent.

Agentic workflows

How agents handle ambiguous tasks.

Rigid workflows break on unexpected inputs. Agentic workflows adapt — they plan, act, evaluate, and loop back.

1. PlanBreak the goal into steps and pick which tool to use first.
2. ActCall the chosen tool (API, DB, search, etc.) with structured arguments.
3. ObserveRead the result, parse it, and update internal state.
4. ReflectDid it work? If not, re-plan. If yes, continue to the next step.

↺ Loop until the goal is reached, with a budget on retries and escalation to a human when stuck.

Ready to build your first AI agent?

Get a free, personalized plan for where AI agents fit in your business — within 1 business day.

By using this website you automatically accept that we use cookies