When you have a single AI assistant, communication isn't a problem. But when you scale to 10, 20, or 50 agents distributed across multiple servers, a fundamental question arises: how will these agents talk to each other?
This is the challenge facing more and more Moroccan companies deploying advanced AI systems. After implementing multi-agent architectures for clients in e-commerce, logistics, and financial services, we're sharing the patterns that work — and the ones to avoid.
The problem: inter-agent communication
Imagine an e-commerce company with these AI agents:
- A product recommendation agent
- A dynamic pricing agent
- An inventory management agent
- A customer service agent
- A fraud detection agent
- A logistics optimization agent
Each agent needs information from the others. The recommendation agent needs to know if a product is in stock before suggesting it. The pricing agent needs inventory levels to adjust prices. The fraud agent needs to correlate customer behavior with purchase patterns.
Without a communication architecture, you end up with a spaghetti of point-to-point connections. Each new agent multiplies complexity. Maintenance becomes a nightmare.
The solution: message bus
A message bus is a centralized communication channel where agents publish and consume messages. Instead of direct connections between agents, each agent communicates only with the bus.
The advantages are immediate:
Decoupling: Agents don't know about other agents. They publish messages to topics and subscribe to topics they care about. You can add, remove, or modify an agent without touching the others.
Scalability: The bus can distribute load across multiple instances. An overloaded agent can be replicated without architecture changes.
Resilience: If an agent goes down, messages are retained in the bus until it returns. No data loss.
Observability: All messages pass through a central point. You can log, monitor, and debug easily.
Choosing the right technology
Three options dominate the market in 2026:
Apache Kafka
The de facto standard for high-throughput systems. Kafka handles millions of messages per second with millisecond latency.
Strengths:
- Message persistence (replay possible)
- Partitioning for horizontal scalability
- Mature ecosystem (Kafka Streams, Connect, Schema Registry)
Drawbacks:
- High operational complexity
- Steep learning curve
- Significant infrastructure cost
Recommended for: Systems with 50+ agents, massive data volumes, need for historical replay.
RabbitMQ
Simpler than Kafka, RabbitMQ excels for medium-sized architectures with complex routing patterns.
Strengths:
- Simple to deploy and operate
- Flexible routing (direct, topic, fanout, headers)
- Native support for multiple protocols (AMQP, MQTT, STOMP)
Drawbacks:
- Lower performance than Kafka under heavy load
- No native message replay
- More limited horizontal scalability
Recommended for: Systems with 10-50 agents, sophisticated routing patterns, teams with less DevOps experience.
Redis Streams
The lightweight option for teams already using Redis. Redis Streams offers essential message bus features with minimal footprint.
Strengths:
- Extremely fast (sub-millisecond latency)
- Simple if Redis is already in your stack
- Consumer groups for load distribution
Drawbacks:
- Less robust persistence than Kafka
- Fewer advanced features
- Smaller community
Recommended for: Systems with fewer than 20 agents, teams already using Redis, prototypes and MVPs.
Reference architecture
Here's the architecture we use for AI automation projects with our clients:
┌─────────────────────────────────────────────────────────────┐
│ MESSAGE BUS (Kafka) │
├─────────────────────────────────────────────────────────────┤
│ Topics: │
│ ├── events.customer.* (customer behaviors) │
│ ├── events.product.* (product changes) │
│ ├── events.order.* (orders and transactions) │
│ ├── commands.agent.* (inter-agent instructions) │
│ └── metrics.agent.* (agent telemetry) │
└─────────────────────────────────────────────────────────────┘
▲ ▲ ▲ ▲
│ │ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Agent │ │ Agent │ │ Agent │ │ Agent │
│ Reco │ │ Pricing │ │ Stock │ │ Fraud │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Topic naming convention
We use a hierarchical convention:
events.*: Facts that occurred (immutable)commands.*: Instructions to executemetrics.*: Monitoring data
Each level adds specificity: events.customer.pageview, events.customer.purchase, commands.agent.pricing.recalculate.
This structure allows agents to subscribe at different granularity levels. The fraud agent can listen to events.customer.* to see all behaviors, while the inventory agent only listens to events.order.created.
Essential communication patterns
Pattern 1: Event Sourcing
Agents emit events describing what happened, not states. Instead of "product X stock = 42", the agent emits "product X stock reduced by 3 units".
Benefits:
- Complete history of changes
- Ability to reconstruct state at any point
- Facilitates debugging and auditing
Pattern 2: Saga for distributed transactions
When an operation involves multiple agents (order → inventory → payment → shipping), use the Saga pattern. Each agent executes its part and emits a success or failure event. An orchestrator coordinates the whole process and manages compensations on failure.
Pattern 3: Dead Letter Queue
Messages that agents can't process (invalid format, missing dependency) are routed to a special queue. A team can analyze them and reprocess manually or adjust the faulty agent.
Practical implementation in Python
Here's a minimal example with Redis Streams, suited for medium-sized projects:
import redis
import json
from datetime import datetime
class AgentMessageBus:
def __init__(self, agent_id: str, redis_url: str = "redis://localhost:6379"):
self.agent_id = agent_id
self.redis = redis.from_url(redis_url)
self.consumer_group = f"group_{agent_id}"
def publish(self, topic: str, payload: dict):
"""Publish a message to a topic."""
message = {
"agent_id": self.agent_id,
"timestamp": datetime.utcnow().isoformat(),
"payload": json.dumps(payload)
}
self.redis.xadd(topic, message)
def subscribe(self, topics: list[str], handler: callable):
"""Subscribe to multiple topics and process messages."""
for topic in topics:
try:
self.redis.xgroup_create(topic, self.consumer_group, mkstream=True)
except redis.ResponseError:
pass # Group already exists
while True:
for topic in topics:
messages = self.redis.xreadgroup(
self.consumer_group,
self.agent_id,
{topic: ">"},
count=10,
block=1000
)
for _, msg_list in messages:
for msg_id, msg_data in msg_list:
payload = json.loads(msg_data[b"payload"])
handler(topic, payload)
self.redis.xack(topic, self.consumer_group, msg_id)
This example illustrates fundamental concepts. For production deployment, add error handling, retry logic, and monitoring.
Monitoring and observability
A multi-agent system without monitoring is a system on borrowed time. Here are the essential metrics:
Processing latency: Time between message publication and processing. An increase signals an overloaded agent.
Queue depth: Number of messages waiting per topic. A growing queue indicates consumers can't keep up.
Error rate: Percentage of messages sent to dead letter queue. A spike often reveals a bug or unanticipated format change.
Throughput per agent: Messages processed per second. Helps identify bottlenecks.
We recommend Prometheus + Grafana for visualization, with PagerDuty alerts for critical anomalies.
Common mistakes to avoid
Mistake 1: Messages too large
A message should contain the minimum necessary. If an agent needs additional details, it queries the source directly. Large messages saturate the bus and slow down the entire system.
Mistake 2: Temporal coupling
Never assume a message will be processed immediately. Design your agents to function even if responses take minutes. The message bus is asynchronous by nature.
Mistake 3: No versioning
Message formats evolve. Without versioning, a schema change breaks all consumers. Always include a schema_version field and manage backward compatibility.
Mistake 4: Ignoring idempotence
A message can be delivered multiple times (retry after timeout, agent restart). Each handler must be idempotent: processing the same message twice should produce the same result as once.
Use cases in Morocco
We've deployed this architecture for several clients:
Casablanca e-commerce retailer: 15 agents coordinating recommendations, inventory, and pricing in real-time. Result: 23% increase in average basket size through contextual recommendations.
Tangier logistics company: 8 agents optimizing delivery routes, fleet management, and demand forecasting. Result: 18% reduction in fuel costs.
Digital bank: 12 agents for fraud detection, credit scoring, and automated customer service. The AI integration solutions reduced fraud false positives by 40%.
Related Resources
Comparing providers? Check out our detailed comparison:
FAQ
What's the difference between message bus and REST API for inter-agent communication?
REST APIs are synchronous: the caller waits for the response. The message bus is asynchronous: the agent publishes and continues its work. REST creates tight coupling between services (if the called service goes down, the caller fails). The bus decouples agents and absorbs load spikes. Use REST for requests requiring immediate response, the bus for everything else.
How many agents justify the investment in a message bus?
Beyond 5 agents that need to communicate regularly, a bus becomes worthwhile. Below that, direct calls with retry logic may suffice. The tipping point also depends on criticality: an e-commerce system with 5 critical agents benefits more from a bus than an internal system with 10 non-critical agents.
How do you handle message security between agents?
Three levels: agent authentication (each agent has unique credentials to access the bus), authorization by topic (an agent can only publish/consume on its authorized topics), and sensitive payload encryption (messages containing personal data are encrypted end-to-end). Kafka and RabbitMQ natively support TLS and SASL.
What happens if the message bus goes down?
This is why Kafka and RabbitMQ support clustering. Deploy at minimum 3 nodes in different availability zones. If one node goes down, the others take over. Unprocessed messages are retained and will be delivered on restart. Test your failover scenarios regularly.
How do you migrate from point-to-point architecture to a message bus?
Proceed gradually. Start by identifying the most critical communication flows. Implement the bus for these flows in parallel with existing connections (dual write). Validate that the bus works correctly. Then switch consumers to the bus and remove direct connections. Repeat for each flow. A big-bang migration is too risky.
