CaféMatic Case: On-premise Multichannel Automation for SMBs
How CaféMatic implemented a self-hosted LLM, AI agents and a multichannel orchestrator (WhatsApp, Telegram, web) to improve support, security and efficiency.
Introduction
This article presents a practical real-world case: how the SMB CaféMatic — a coffee roaster and subscription service for artisanal coffees with 45 employees — designed an automation solution based on an internal forum, AI agents and a self-hosted multichannel LLM that integrates Telegram and WhatsApp. The focus is not theory but the architecture, workflows and concrete metrics that improved customer support, data security and operational efficiency.
Context and challenges for the SMB
CaféMatic faced these problems:
- High volume of repetitive inquiries (availability, subscription changes, delays) via WhatsApp.
- Dispersed knowledge: recipes, roasting processes and frequent responses scattered across documents and chats.
- Privacy and compliance constraints: sensitive subscriber data.
- Lack of traceability in returns and logistics processes.
Objective: automate workflows without relying on cloud services for sensitive data, while maintaining multichannel presence (WhatsApp, Telegram, webchat).
Solution design
Overall architecture
- Self-hosted LLM (model optimized for conversation and RAG) running on CaféMatic's on-premise hosting servers.
- Local vector DB indexing the internal forum content, process manuals and FAQs.
- Multichannel orchestrator that routes incoming messages (WhatsApp Business API, Telegram Bot API, webhooks) and decides which specific agents to use.
- Specialized AI agents: Intake Agent, Fulfillment Agent, Knowledge Agent and Escalation Agent.
- Connectors to ERP/CRM (REST API) to create orders, shipping notes and update subscriptions.
Role of the internal forum
The internal forum acts as the single source of truth and a continuous training dataset:
- Categorizes threads by tags (logistics, subscriptions, quality).
- Allows the Knowledge Agent to publish weekly summaries and create response templates in the vector DB.
- Serves as an audit trail and prompt-improvement record.
Automated workflows (practical examples)
Flow 1 — Subscription pause via WhatsApp (end user)
- Customer sends: "I want to pause my July subscription" via WhatsApp.
- Webhook -> Orchestrator detects intent and calls the Intake Agent.
- Intake Agent validates identity (secure questions) and queries the CRM via API.
- If verification is correct, the Fulfillment Agent applies the change in the CRM and generates confirmation.
- Orchestrator sends the response to the customer and creates a thread in the internal forum to notify the operations team if there's any logistical impact.
Impact: average resolution time dropped from 48 hours to 4 minutes for simple cases. Human error rate in subscription changes reduced by 85%.
Flow 2 — Technical query about roasting (customer or partner café)
- User on Telegram asks about flavor notes for a batch.
- Orchestrator queries the vector DB and returns an answer via RAG, citing the forum thread that documents the roasting parameters.
- If the query is complex, the Escalation Agent notifies the head roaster via an internal Telegram channel and creates a task in the backlog.
Practice: the Knowledge Agent adds new notes to the forum after each batch, improving future responses.
Flow 3 — Returns and logistics handling
- Customer reports a defective product via WhatsApp and attaches a photo.
- Intake Agent uses basic vision (local model) to classify the damage and creates a case.
- Fulfillment Agent generates the return label in the ERP and schedules pickup with the carrier.
- Orchestrator updates the forum with the case and actions taken; the quality team receives an automated summary.
Results: 70% reduction in manual steps and return resolution times from 7 to 2 days.
Governance, privacy and maintenance
- On-premise LLM: all customer data remains on owned servers; encrypted backups with key rotation.
- Role-based access controls for the forum; only agents and authorized staff can edit the knowledge base.
- Prompt policy: centralized template and quarterly review to avoid bias and out-of-policy responses.
- Incremental retraining: the Knowledge Agent tags relevant threads and the team re-trains embeddings weekly with new data.
Metrics and practical results
- Initial response time on instant channels: from 2 hours → 30 seconds (automatic) for FAQs and simple changes.
- Staff reallocation: 1.5 FTEs redirected from support to product improvement projects.
- CSAT increase: +12 points in 3 months.
- Data leak incidents: 0 after on-premise implementation and quarterly audits.
Actionable conclusion
Quick checklist for SMBs that want to replicate the case:
- Audit knowledge sources and create an internal forum as a single repository.
- Define priority use cases (e.g., subscription changes, returns, FAQs).
- Implement a minimal viable self-hosted LLM + vector DB for RAG.
- Deploy a multichannel orchestrator that connects WhatsApp, Telegram and CRM/ERP.
- Design specialized agents (intake, fulfillment, knowledge, escalation) and security pipelines.
- Measure KPIs (response time, CSAT, operational errors) and adjust prompts/embeddings weekly.
With concrete steps and solid governance, an SMB can automate support and operations while keeping full control over its data and improving efficiency across channels like WhatsApp and Telegram without relying exclusively on the public cloud.