Spekit AI Agent Platform — Complete Architecture

01 — Overview

Multi-Agent GTM Platform

A specialized AI agent platform powering Spekit's Go-To-Market. 7 core agents + 7 sub-agents orchestrated through Atlas, backed by semantic memory, a durable job queue, and human governance.

7

Core Agents

7

Sub-Agents

12+

PostgreSQL Tables

3

LLM Backends

4

Graduation Stages

1024d

pgvector Embeddings

Slack

Socket Mode Bot

Open WebUI

Chat Interface

↓

OpenAI-compatible API — /v1/chat/completions

API BRIDGE — FastAPI

Authentication • Routing • Streaming • Feedback • Health

↓

Each agent exposed as a selectable "model"

ATLAS

Router • Haiku • Intent Analysis

↓

Routes to the specialized agent based on intent

ELLEGENTIC

Content

INTELLIGENCE

Competitive

PROSPECTOR

Leads

CLOSER

Sales

HERALD

Distribution

ORACLE

Knowledge

↓

PostgreSQL

pgvector • Jobs • Events • Memory

Content Brain

Semantic Search • 1024d Vectors

External APIs

fal.ai • VoyageAI • Remotion

02 — Routing

Atlas — The System's Brain

Atlas is the single entry point. It analyzes intent and decides: respond directly, search the Content Brain, or route to a specialized agent.

User Message

↓

ATLAS analyzes intent

Claude Haiku • max 1024 tokens • <500ms

↓

3 possible paths

Direct Response

Lightweight questions, greetings

Responds without routing

Content Brain

Semantic search first

Searches existing knowledge

Route to Agent

New content, heavy work

Forwards brief + context

Context forwarded to routed agents

audiencetonekeywordsgoalplatformdesired_lengthctauser_iduser_profilerelevant_memoriesconversation_history

03 — Core Agents

The 7 Specialized Agents

Each agent is a domain expert. Click any card to reveal tools and configuration.

🧭

ATLAS — Router

HaikuAssistedPhase 1

Single entry point. Analyzes intent, searches Content Brain, routes to agents.

Click for details

Tools

route_to_agentkill_switchdm_oussamaschedule_taskschema_applyops_queue_inspectorops_audit_viewerops_approval_viewerops_metrics_viewerweb_searchfetch_url

Targets: ellegentic, intelligence, prospector, herald, oracle, closer
Max tokens: 1024 • Internal router only

🎨

ELLEGENTIC — Content

SonnetAssistedPhase 1

Master content orchestrator. Supervises 7 sub-agents for full content pipeline.

Click for details

Tools

dispatch_to_copywriterdispatch_to_videographerdispatch_to_narratordispatch_to_reporterdispatch_to_formatterdispatch_to_visualistdispatch_to_seo_engineersearch_content_braincheck_brand_scoreget_style_rulesweb_search

Max tokens: 2048 • Quality gates: Brand score ≥80

🔍

INTELLIGENCE — Competitive

SonnetAssistedPhase 1

Competitor monitoring, battle cards, market signals. Tracks Seismic, Highspot, Mindtickle, Guru, Showpad, Allego, Gong.

Click for details

Tools

monitor_competitorgenerate_battle_cardsearch_market_signalssearch_knowledgestore_findingweb_searchfetch_url

Max tokens: 4096

📈

PROSPECTOR — Lead Gen

SonnetAssistedPhase 1

Prospect identification, enrichment, and ICP qualification. Mid-market B2B SaaS, 100-2000 employees, sales-led.

Click for details

Tools

search_prospectsenrich_leadfind_contactsqualify_leadsearch_knowledgestore_findingweb_search

Max tokens: 4096 • ICP signals: Hiring, Series B/C, Competitor usage

🤝

CLOSER — Sales Support

SonnetAssistedPhase 2

Pre-call deal support: account prep, objection handling with proof points, content recommendations.

Click for details

Tools

prep_accounthandle_objectionrecommend_contentsearch_knowledgeweb_searchfetch_url

Max tokens: 4096

📢

HERALD — Distribution

SonnetAssistedPhase 2

Multi-platform publishing: Webflow, LinkedIn, Twitter, Email, Reddit. Human approval required.

Click for details

Tools

prepare_publicationvalidate_assetssearch_knowledgeweb_searchfetch_url

Max tokens: 4096 • Preview → Validate → Approve → Publish

📚

ORACLE — Knowledge

SonnetAssistedPhase 2

Knowledge manager. Answers questions about products, processes, competitive intel, customer stories.

Click for details

Tools

search_knowledgesearch_semantic_memoryweb_searchfetch_url

Max tokens: 4096 • Sources: Content Brain, Semantic Memory

04 — Sub-Agents

ELLEGENTIC's 7 Workers

Each sub-agent has its own system prompt, tools, and quality gates.

✍️

Copywriter

Sonnet

Blog posts, emails, social copy. Spekit Style Guide.

Brand score ≥80 • 2048 tokens

🎬

Videographer

Sonnet

Video scene definitions (JSON) → Remotion → MP4.

Scenes: text_overlay, stats, quote, outro • 4096 tokens

🎙️

Narrator

Haiku

TTS voiceover via Qwen3-TTS MLX (Apple Silicon).

Backends: MLX, OpenAI, ElevenLabs, Polly • 512 tokens

📊

Reporter

Sonnet

Branded PDF reports with data and charts.

PDF validation, brand compliance • 4096 tokens

🔄

Formatter

Sonnet

Repurposing: blog → LinkedIn, Twitter, Email, Reddit.

Platform-specific validation • 2048 tokens

🖼️

Visualist

Sonnet

Image/video/audio via fal.ai (Flux Pro, Kling, Minimax).

fal.ai API • Prompt-only fallback • 2048 tokens

🌐

SEO Engineer

Sonnet

SEO/AEO: meta tags, slugs, FAQs, internal linking.

SEO score ≥70 • 2048 tokens

05 — How to Use

Agent Usage Guides

Step-by-step instructions for each agent. Access via Open WebUI (model dropdown) or Slack (message Atlas).

Using ELLEGENTIC — Content Creation

1

Open Open WebUI and select ellegentic from the model dropdown, or message Atlas in Slack.

2

Describe what you need. Specify:
• Content type: blog, email, social, video, report
• Audience: e.g., "VP Sales at mid-market SaaS"
• Tone: professional, conversational, authoritative
• Goal: generate leads, educate, nurture, convert

3

ELLEGENTIC auto-dispatches to sub-agents. Blog: Copywriter → SEO Engineer → Formatter for social variants.

4

Review output. If approval required, it enters the queue with a Slack notification.

Expected Outputs

Full blog post (1500-3000 words) with SEO meta tags and slug
Social variants: 3-5 LinkedIn posts, Twitter thread, email section
Brand score ≥80 with style compliance report
SEO score ≥70 with keyword analysis
Optional: hero image, voiceover, video

Example: "Write a blog post about how AI is transforming sales enablement for mid-market SaaS. Target VP Sales. Professional but approachable. Include 3 LinkedIn posts and a Twitter thread."

Using INTELLIGENCE — Competitive Research

1

Select intelligence or ask Atlas: "Generate a battle card for Seismic."

2

Searches Content Brain for existing intel, then live web search for fresh data.

3

Results stored via store_finding for cross-agent access.

Expected Outputs

Battle card: strengths/weaknesses, pricing, win themes
Market signal report with competitor moves and trends
Competitive positioning with Spekit differentiation

Using PROSPECTOR — Lead Generation

1

Select prospector or ask: "Find 10 companies matching our ICP."

2

Searches for ICP matches: mid-market B2B SaaS, 100-2000 employees. Pain signals: rapid hiring, CRM switch, competitor usage.

3

Use enrich_lead, find_contacts, qualify_lead for deep-dive.

Expected Outputs

Qualified prospect list with ICP fit scores
Company profiles: size, funding, tech stack, pain signals
Key contacts with titles and LinkedIn

Using CLOSER — Sales Support

1

Select closer or ask: "Prep me for a call with Acme Corp."

2

Combines Content Brain + web research for pre-call briefs.

3

For objections, pulls proof points from case studies and intel.

Expected Outputs

Pre-call brief with background and recent news
Objection responses with customer evidence
Content recommendations by deal stage

Using HERALD — Distribution

1

Herald receives approved content from ELLEGENTIC, or select herald directly.

2

Validates against platform requirements and prepares previews.

3

Human approval required. Review in Slack before publication.

Expected Outputs

Platform-specific previews (Webflow, LinkedIn, Twitter, Email)
Asset validation report
Publication schedule

Using ORACLE — Knowledge Q&A

1

Select oracle or ask: "What's our enterprise pricing?"

2

Searches Content Brain and Semantic Memory, ranked by similarity + freshness.

3

Falls back to web search if internal knowledge is insufficient.

Expected Outputs

Sourced answers with document references
Product info, customer stories, process docs

06 — Memory

Multi-Layer Memory System

4 complementary layers: Content Brain, Semantic Memory, Agent Memory, Feedback Learning.

Content Brain

content_chunks • pgvector • ivfflat cosine

Documents → ~500-word chunks → 1024d embeddings → cosine similarity. 85% similarity + 15% freshness.

idsource_documentcategorychunk_textembedding (1024d)metadata

Semantic Memory

semantic_memory • pgvector • ivfflat cosine

Per-agent + shared scope. Agent isolation with NULL scope for cross-agent access. Importance scoring (0-1).

idagent_namecontentembedding (1024d)importance_scoreaccess_count

Agent Memory

agent_memory • JSONB key-value • TTL-based

Short-term (24h) and long-term (permanent). recall(key), recall_with_user(key, user_id), recall_recent(n).

idagent_namekeyvalue (JSONB)memory_typeexpires_at

Feedback & Learning

feedback_log + feedback_patterns • LLM consolidation

User corrections → nightly consolidation → extracted patterns. Improves future outputs with confidence scoring.

original_outputcorrectionembeddingpattern_typeconfidence

Memory Flow

Document Upload

PDF, Docs, Web

→

Chunking

~500 words

→

Embedding

VoyageAI 1024d

→

pgvector

IVFFlat cosine

↓

Agent Query

→

Cosine Similarity

85% + 15% freshness

→

Ranked Results

07 — Job Queue

Durable PostgreSQL Job Queue

Exponential retry, heartbeat, dead-letter handling. No Celery or Temporal.

PENDING

Waiting for claim

CLAIMED

Worker has lease

COMPLETED

Success

FAILED

Retry or dead-letter

DEAD LETTER

Max retries exhausted

Retry Policy

Base delay5sMax delay300sMultiplier2.0x exponentialLease900s (15 min)Heartbeat60sPoll2s

Job Types

publication_pipelinePost-approval prepcontent_generationSub-agent dispatchcompetitive_reportIntelligence reportslead_enrichmentProspector enrichment

08 — Graduation

Progressive Deployment Stages

4 autonomy levels. All agents currently in ASSISTED mode.

SHADOW

Observe only. No real actions.

ASSISTED ✓

Produces artifacts. Human approval required.
Current stage

SUPERVISED

Acts in scope. Humans spot-check.

AUTONOMOUS

Full automation within guardrails.

09 — Approval

Governance & Approval Workflow

Brand scoring, per-content-type rules, Slack-based human review.

Agent generates content

→

Brand Score

≥80

→

SEO Score

≥70

↓

Auto-Approve

seo_technical, social_media

Approval Queue

blog, email, ceo_content

↓

Slack Review

↓

Approved

↓

Publication Pipeline

Herald prepares distribution

Content Type	Auto-Approve	Min Score	Approvers
seo_technical	Yes	80	—
social_media	Yes	85	—
blog_article	No	80	`elle`
email_campaign	No	80	`elle`
comparison_page	No	80	`elle` + `ian`
ceo_content	No	80	`elle` + `mel`
video_content	No	80	`elle`

10 — Database

PostgreSQL + pgvector Schema

Table	Function	Key Fields	Index
`content_chunks`	Content Brain	source_document, chunk_text, embedding(1024d)	ivfflat (20)
`semantic_memory`	Agent vector memory	agent_name, content, embedding(1024d)	ivfflat (20)
`agent_memory`	KV store + TTL	agent_name, key, value(JSONB), expires_at	idx_lookup
`agent_events`	Event bus	source_agent, target_agent, event_type, payload	—
`agent_jobs`	Job queue	job_type, agent_name, status, priority	idx_pending
`approval_queue`	Review queue	content_type, brand_score, seo_score	idx_status
`feedback_log`	Corrections	original_output, correction, embedding	—
`feedback_patterns`	Patterns	pattern_type, corrected_pattern, confidence	ivfflat (10)
`agent_actions`	Audit trail	action_type, tokens_used, duration_ms	—
`user_profiles`	User data	user_id, role, team, preferences(JSONB)	—
`scheduled_tasks`	Cron tasks	schedule(cron), next_run, enabled	—
`release_signoffs`	Beta promotion	signoff_status, approved_by	—

11 — Integrations

Integration Ecosystem

💬

Slack

Socket Mode bot. Approval workflow, review digests, auto-ingestion.

Approvers: Oussama, Elle, Ian, Mel
Features: Auto-ingestion, review digest, kill_switch

🖼

fal.ai

Media: Flux Pro (images), Kling (video), Minimax (audio).

Agent: Visualist • Fallback: prompt-only mode

🎬

Remotion

React video engine → MP4. Agent: Videographer.

Location: /video_engine/spekit-video/
Output: MP4 + metadata JSON

🎙

TTS

Qwen3-TTS MLX (local), OpenAI, ElevenLabs, Polly.

Primary: MLX on Apple Silicon • Agent: Narrator

🌊

VoyageAI

1024d embeddings. Fallback: Ollama qwen3-embedding:0.6b.

Config: VOYAGE_API_KEY, EMBEDDING_BACKEND

🌐

Planned

Webflow, Ayrshare, SerpAPI, HubSpot.

CMS publishing, social scheduling, CRM integration

12 — LLM Models

Multi-Backend LLM Strategy

Anthropic API

Development (active)

Direct SDK. ANTHROPIC_API_KEY

AWS Bedrock

Production (planned)

IAM auth, VPC. AWS_ACCESS_KEY_ID

Ollama Local

Fallback (free)

Qwen3 localhost:11434. OLLAMA_BASE_URL

Tier	Model	Usage	Agents
Tier 1	Claude Opus	Deep reasoning (rare)	Reserved
Tier 2	Claude Sonnet	Content generation	ELLEGENTIC, Intelligence, Prospector, Herald, Oracle, Closer
Tier 3	Claude Haiku	Fast routing	Atlas, Narrator
Local	Qwen3	Free dev	Fallback

13 — Events

Inter-Agent Event Bus

Async communication via PostgreSQL. No RabbitMQ or Kafka.

Event Types

chain

Sequential handoff

request

Targeted request

broadcast

Fan-out

Example: Content Pipeline

ELLEGENTIC → emit("content_ready") → publication_pipeline → Job queued → HERALD distributes

14 — Architecture Decision

Why Custom — Not a Framework

We evaluated 7 frameworks and chose to build custom. Here's why.

Frameworks Evaluated & Rejected

Framework	Why Rejected
LangChain	Generic abstraction, massive overhead, no content pipeline or brand checking
CrewAI	Too opinionated, no governance/approval workflow, no brand enforcement
OpenClaw	Critical security risk. 512 vulnerabilities identified (8 critical). 135K+ instances exposed to the internet. ClawJacked attack enables full remote takeover via a single link. 1-in-5 ClawHub plugins contain malware. Anthropic banned flat-rate subscriptions (Apr 2026). Meta banned it from corporate machines. A single instance can consume $1K–$5K/day in API costs. Not suitable for enterprise with company data.
NemoClaw (NVIDIA)	Alpha software — not production-ready. NVIDIA’s security layer on top of OpenClaw, announced at GTC March 2026. Improves containment (sandbox) but does NOT address behavioral governance — can’t verify if agent actions are correct or aligned with business goals. 391 open issues on GitHub. APIs subject to change without notice. Optimized for NVIDIA hardware (vendor lock-in). No content pipeline, no brand checking, no approval workflows.
Hermes	Not mature for production. Limited multi-agent support — no sub-agent hierarchy. No content pipeline or brand enforcement. Small community, limited documentation. Not designed for the kind of specialized agent orchestration (routing + delegation + governance) that our use case requires.
Temporal	Complex self-hosting (Cassandra + Elasticsearch), $200+/month, overkill. Good for durable execution but doesn’t solve content generation, brand voice, or multi-agent routing.
Inngest	Good for event-driven workflows, but no content pipeline, no brand voice, no multi-agent hierarchy or governance layer.
n8n (as brain)	We use n8n for GTM workflow orchestration (Clay, Apollo, Instantly), but agents do the reasoning. n8n = arms, agents = brain. Clean separation of concerns.

What Custom Gives Us (That No Framework Has)

Content Pipeline

Research → Create → Brand Check → SEO Check → Approval → Publish. No framework has this end-to-end.

Graduation Stages

SHADOW → ASSISTED → SUPERVISED → AUTONOMOUS. Agents earn autonomy over time. Unique to our system.

Content Brain

75K chunks of shared semantic memory via pgvector. All agents search and write to the same knowledge base.

Brand Voice Enforcement

AI-stink detection (score 0–100), SEO evaluator, output contracts. Zero-tolerance for generic AI content.

Multi-LLM Failover

Claude API → Ollama → Bedrock. Circuit breaker auto-switches. No provider lock-in.

n8n = GTM Orchestration

n8n handles Clay, Apollo, Apify, Instantly workflows. Agents handle reasoning. Clean separation.

Why NOT OpenClaw — Security Deep Dive

512

Vulnerabilities identified (8 critical)

135K+

Instances exposed to the internet

1 in 5

ClawHub plugins contain malware

$5K/day

Potential API cost per instance

Key incidents: ClawJacked attack — full remote takeover via a single malicious link. Supply chain compromise — npm package silently installed OpenClaw on dev machines. Anthropic banned flat-rate subscriptions (April 4, 2026). Meta banned from corporate machines. China restricted from state agencies.
NemoClaw (NVIDIA’s fix): Adds sandbox containment but still alpha. Does NOT solve behavioral governance (correctness of actions). 391 open issues. Not production-ready.
Our approach: Zero external agent frameworks. No MCP server exposed. No plugin marketplace. All tools are hardcoded and audited. Bearer token auth, CORS whitelist, rate limiting, SSRF protection. Human approval on every output (ASSISTED stage).

Why Open WebUI as Interface

Familiar ChatGPT-like UI

Every team member already knows how to use a chat interface. Zero training needed. Select an agent from the dropdown, type, get results.

OpenAI-Compatible API

Our API Bridge exposes all 13 agents as standard /v1/chat/completions models. Open WebUI connects natively — no custom integration needed.

Self-Hosted & Private

Runs in our Docker stack. No data leaves our infrastructure. No third-party SaaS dependency for the UI. Full control over user accounts and permissions.

Multi-Agent Selection

Users pick the right agent for the job: Atlas for general routing, ELLEGENTIC for content, Intelligence for competitor research, Prospector for leads, Closer for deals.

15 — Cost Efficiency

Token Cost Optimization

Our architecture is designed to minimize LLM token usage at every layer.

How We Save Tokens

Optimization	How It Works	Savings
Tiered Model Selection	Haiku ($0.25/MTok) for routing, Sonnet ($3/MTok) for content, Opus only for complex reasoning. Most calls use the cheapest model.	~70%
Single-Router Architecture	Atlas (Haiku) routes to 1 specialist agent. No multi-agent chain. LangChain/CrewAI create 3–5 LLM calls per request — we do 2.	~60%
Content Brain Pre-Fill	75K chunks of Spekit data injected as context. Agents don't need to "learn" from scratch — they already know the product, competitors, and customers.	~40%
Brand Check = No Regen	Programmatic brand scoring (regex, not LLM). Catches AI-stink without burning tokens. Only regenerates if score < 80.	~30%
Refinement Skip Research	When user says "make it shorter", we reuse previous research context. No web search, no Content Brain re-query.	~50%
Tool Loop Cap	Max 10 iterations, 40-message cap, 3-error abort. Prevents runaway token consumption from stuck agents.	Safety
Sub-Agent Delegation	ELLEGENTIC dispatches to specialized sub-agents with focused prompts. Each sub-agent has a narrow scope → shorter prompts → fewer tokens.	~25%

Cost Comparison: Custom vs Framework

Our Architecture

~2

LLM calls per user request

Atlas (Haiku) routes → 1 Agent (Sonnet) generates

Typical Framework (LangChain/CrewAI)

5–8

LLM calls per user request

Chain of prompts, memory summarization, tool planning, execution, reflection

Estimated monthly cost for 1000 requests/day:

~$150

Our architecture

~$500–800

LangChain/CrewAI

Architecture Summary

Custom multi-agent system. No external frameworks. Full control over agent lifecycle, job orchestration, and human governance.

0

External Frameworks

13

Total Agents

98

Total Tools

~$150/mo

Est. LLM Cost