Skip to content

Architecture Overview

This document provides a comprehensive overview of the AI Ops App architecture.

High-Level Architecture

The AI Ops App integrates multiple LLM providers with Nautobot through a multi-layered architecture with middleware support:

┌──────────────────────────────────────────────────────────────┐
│                        User Interface                         │
│  (Web UI / REST API / Chat Interface)                        │
└────────────────┬─────────────────────────────────────────────┘
┌────────────────▼─────────────────────────────────────────────┐
│                     Nautobot Plugin Layer                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │   Views      │  │     API      │  │    Jobs      │       │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘       │
└─────────┼──────────────────┼──────────────────┼──────────────┘
          │                  │                  │
┌─────────▼──────────────────▼──────────────────▼──────────────┐
│                      Application Layer                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │  AI Agents   │  │   Models     │  │   Helpers    │       │
│  │  (LangGraph) │  │ (LLMProvider,│  │(get_llm_     │       │
│  │  + Middleware│  │  LLMModel,   │  │ model,       │       │
│  │              │  │  Middleware, │  │ middleware)  │       │
│  │              │  │  MCPServer)  │  │              │       │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘       │
└─────────┼──────────────────┼──────────────────┼──────────────┘
          │                  │                  │
┌─────────▼──────────────────▼──────────────────▼──────────────┐
│                   Integration Layer                           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │  LangChain   │  │    Redis     │  │  PostgreSQL  │       │
│  │   (MCP)      │  │(Checkpoints, │  │  (Models)    │       │
│  │              │  │  Middleware  │  │              │       │
│  │              │  │   Cache)     │  │              │       │
│  └──────┬───────┘  └──────────────┘  └──────────────┘       │
└─────────┼────────────────────────────────────────────────────┘
┌─────────▼─────────────────────────────────────────────────────┐
│                   External Services                            │
│  ┌─────────────────────────────────────────────────┐          │
│  │     LLM Providers (Multi-Provider Support)      │          │
│  │  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐│          │
│  │  │ Ollama │  │ OpenAI │  │Azure AI│  │Anthropic│          │
│  │  └────────┘  └────────┘  └────────┘  └────────┘│          │
│  │  ┌────────┐  ┌────────┐                        │          │
│  │  │HuggingF│  │ Custom │                        │          │
│  │  └────────┘  └────────┘                        │          │
│  └─────────────────────────────────────────────────┘          │
│  ┌──────────────┐                                             │
│  │ MCP Servers  │                                             │
│  │ (Tools/Ctx)  │                                             │
│  └──────────────┘                                             │
└────────────────────────────────────────────────────────────────┘

Middleware Architecture

The app supports a flexible middleware system that processes requests before and after they reach the LLM:

User Request
┌───────────────────────────────────────┐
│        Middleware Chain               │
│  (Executed in Priority Order 1-100)  │
├───────────────────────────────────────┤
│  Priority 10: LoggingMiddleware      │ ← Log request
│  Priority 20: CacheMiddleware        │ ← Check cache
│  Priority 30: RetryMiddleware        │ ← Retry logic
│  Priority 40: ValidationMiddleware   │ ← Validate input
├───────────────────────────────────────┤
│              LLM Model                │ ← Process request
│      (Ollama/OpenAI/Azure/etc)       │
├───────────────────────────────────────┤
│  Priority 40: ValidationMiddleware   │ ← Validate output
│  Priority 30: RetryMiddleware        │ ← (if needed)
│  Priority 20: CacheMiddleware        │ ← Store in cache
│  Priority 10: LoggingMiddleware      │ ← Log response
└───────────────────────────────────────┘
Response to User

Component Architecture

1. User Interface Layer

Web UI

  • Chat Interface: /plugins/ai-ops/chat/ - Interactive chat widget
  • Provider Management: List, create, edit, delete LLM providers
  • Model Management: List, create, edit, delete LLM models
  • Middleware Management: Configure middleware for models
  • Server Management: Configure and monitor MCP servers
  • Navigation: Integrated into Nautobot's navigation menu

REST API

  • LLM Providers API: /api/plugins/ai-ops/llm-providers/
  • LLM Models API: /api/plugins/ai-ops/llm-models/
  • Middleware Types API: /api/plugins/ai-ops/middleware-types/
  • LLM Middleware API: /api/plugins/ai-ops/llm-middleware/
  • MCP Servers API: /api/plugins/ai-ops/mcp-servers/
  • Chat API: /plugins/ai-ops/api/chat/ - Programmatic chat access

2. Application Layer

AI Agents

Multi-MCP Agent (ai_ops/agents/multi_mcp_agent.py): - Production-ready agent implementation - Supports multiple MCP servers simultaneously - Application-level caching for performance - Health-based server selection - LangGraph state management - Middleware integration

Single-MCP Agent (ai_ops/agents/single_mcp_agent.py): - Simplified single-server implementation - Development and testing scenarios

Agent Features: - Conversation history via checkpointing - Tool discovery from MCP servers - Multi-provider LLM support (Ollama, OpenAI, Azure AI, Anthropic, HuggingFace) - Middleware chain execution - Async/await architecture

Models

LLMProvider: - Defines available LLM providers (Ollama, OpenAI, Azure AI, Anthropic, HuggingFace, Custom) - Stores provider-specific configuration in JSON schema - Has corresponding provider handler classes - Enable/disable providers without deletion

LLMModel: - Stores model configurations for any supported provider - Environment-aware (LAB/NONPROD/PROD) - Integrates with Nautobot Secrets - Supports default model selection - Can have multiple middleware configurations - References LLMProvider via foreign key

MiddlewareType: - Defines middleware types (built-in LangChain or custom) - Reusable across multiple models - Name validation and formatting

LLMMiddleware: - Configures middleware instances for specific models - Priority-based execution order (1-100) - JSON configuration for flexibility - Active/inactive toggle - Critical flag for initialization requirements

MCPServer: - Stores MCP server configurations - Health status tracking with automated checks - Protocol support (HTTP/STDIO) - Type classification (Internal/External)

Helpers

get_llm_model: - Environment detection - Model configuration retrieval - Azure OpenAI client creation - Sync and async variants

get_info: - Status retrieval utilities - Default value providers

Serializers: - LangGraph checkpoint serialization - Custom data type handling

3. Integration Layer

LangChain & LangGraph

LangChain: - Azure OpenAI integration - Message handling - Tool abstraction

LangGraph: - State graph workflow - Checkpointing system - Conditional routing - Tool node execution

MCP Integration: - langchain-mcp-adapters: MCP client library - MultiServerMCPClient: Multi-server support - Tool discovery and execution

Redis

Purpose: Conversation checkpoint storage

Configuration: - Database: Separate from cache/Celery (default DB 2) - Key Pattern: checkpoint:{thread_id}:{checkpoint_id} - TTL: Managed by cleanup job

Data Stored: - Conversation messages - Agent state - Metadata (timestamps, user info)

PostgreSQL

Purpose: Application data storage

Tables: - ai_ops_llmmodel: LLM configurations - ai_ops_mcpserver: MCP server configurations - Plus standard Nautobot tables (secrets, statuses, etc.)

4. External Services

Azure OpenAI

Service: Microsoft Azure OpenAI Service

LLM Provider Support: - Ollama: Local open-source models - OpenAI: GPT-4, GPT-4o, GPT-3.5-turbo - Azure AI: Azure OpenAI Service - Anthropic: Claude models - HuggingFace: HuggingFace Hub models - Custom: Extensible provider system

Communication: - HTTPS REST API - API Key authentication (via Secrets) - Provider-specific endpoints - Handler-based initialization

MCP Servers

Purpose: Extend agent capabilities with custom tools

Types: - Internal: Hosted within infrastructure - External: Third-party services

Protocols: - HTTP: RESTful MCP servers - STDIO: Process-based servers

Health Monitoring: - Automatic health checks via scheduled job - Status field in database - Failed servers excluded from operations - Parallel health checking with retry logic

Data Flow

Chat Message Flow with Middleware

1. User submits message
2. ChatMessageView receives request
3. Session ID retrieved (thread_id)
4. process_message() called
5. MCP client cache checked/created
6. LLM model configuration retrieved
7. Middleware chain initialized fresh (priority order)
8. LangGraph state graph created with middleware
9. Message added to state
10. Middleware pre-processing (Priority 1 → 100)
11. Agent processes message
12. LLM provider handler creates model instance
13. Model processes request
14. Tools invoked if needed (via MCP)
15. Middleware post-processing (Priority 100 → 1)
16. Response generated by LLM
17. State persisted to Redis
18. Response returned to user

Provider Selection Flow

1. Get LLM model (by name or default)
2. Load model's provider relationship
3. Get provider handler from registry
4. Retrieve provider config_schema from database
5. Get model's API key from Secret
6. Provider handler initializes LLM
   ├─ Ollama: ChatOllama(base_url, model_name)
   ├─ OpenAI: ChatOpenAI(api_key, model_name)
   ├─ Azure AI: AzureChatOpenAI(api_key, endpoint, deployment)
   ├─ Anthropic: ChatAnthropic(api_key, model_name)
   ├─ HuggingFace: ChatHuggingFace(api_key, model_name)
   └─ Custom: CustomHandler(config, api_key)
7. Return initialized chat model instance

Middleware Execution Flow

┌────────────────────────────────────────────┐
│  Request from Agent                        │
└───────────────┬────────────────────────────┘
┌───────────────▼────────────────────────────┐
│  Load Model's Middleware Configurations    │
│  - Query LLMMiddleware.objects             │
│  - Filter: is_active=True                  │
│  - Order by: priority, middleware__name    │
└───────────────┬────────────────────────────┘
┌───────────────▼────────────────────────────┐
│  Initialize Middleware Chain               │
│  For each middleware (priority order):     │
│    1. Load middleware type                 │
│    2. Get configuration JSON               │
│    3. Initialize middleware instance       │
│    4. Add to chain                         │
└───────────────┬────────────────────────────┘
┌───────────────▼────────────────────────────┐
│  Pre-Processing Phase                      │
│  (Priority 1 → 100)                        │
│                                            │
│  Priority 10: LoggingMiddleware            │
│    - Log request timestamp                 │
│    - Log user info and message             │
│                                            │
│  Priority 20: CacheMiddleware              │
│    - Check if response cached              │
│    - If cached: return cached response     │
│    - If not: continue chain                │
│                                            │
│  Priority 30: ValidationMiddleware         │
│    - Validate input format                 │
│    - Check for malicious content           │
│    - Sanitize input if needed              │
└───────────────┬────────────────────────────┘
┌───────────────▼────────────────────────────┐
│  LLM Processing                            │
│  - Model generates response                │
│  - Tools invoked if needed                 │
└───────────────┬────────────────────────────┘
┌───────────────▼────────────────────────────┐
│  Post-Processing Phase                     │
│  (Priority 100 → 1)                        │
│                                            │
│  Priority 30: ValidationMiddleware         │
│    - Validate output format                │
│    - Check for sensitive data              │
│    - Filter response if needed             │
│                                            │
│  Priority 20: CacheMiddleware              │
│    - Store response in cache               │
│    - Set cache TTL from config             │
│                                            │
│  Priority 10: LoggingMiddleware            │
│    - Log response timestamp                │
│    - Log token usage and latency           │
└───────────────┬────────────────────────────┘
┌───────────────▼────────────────────────────┐
│  Response to User                          │
└────────────────────────────────────────────┘

Chat Message Flow

↓ 10. Tools invoked if needed (via MCP) ↓ 11. Response generated by LLM ↓ 12. State persisted to Redis ↓ 13. Response returned to user

### Model Configuration Flow

**LAB Environment**:
get_llm_model() ↓ Detect environment → "LAB" ↓ Read environment variables ↓ Create model via default provider (typically Ollama) ↓ Return model
**Production Environment**:
get_llm_model() ↓ Detect environment → "PROD" ↓ Query LLMModel.get_default_model() ↓ Load model's provider relationship ↓ Get provider handler from registry ↓ Retrieve Secret for API key ↓ Build configuration dict from provider config_schema ↓ Provider handler creates model instance ↓ Return model
### MCP Server Discovery Flow
1. App startup or cache expiry ↓ 2. Query MCPServer.objects.filter(status__name="Active") ↓ 3. Build connections dict ↓ 4. Create MultiServerMCPClient ↓ 5. Discover tools from each server ↓ 6. Cache client and tools (5 min TTL) ↓ 7. Tools available to agent
### Middleware Instantiation Flow
1. Agent needs to process request ↓ 2. Query LLMMiddleware for model │ a. Filter: is_active=True │ b. Order by: priority, middleware__name ↓ 3. Initialize fresh middleware instances │ a. Load each middleware type │ b. Apply configuration from database │ c. Create new instance (NOT cached) ↓ 4. Apply middleware chain to request

Note: Middleware instances are always fresh to prevent state leaks between conversations. This ensures stateful middleware (e.g., SummarizationMiddleware) don't share state across different users.

### Health Check Flow (Scheduled Job)
1. MCPServerHealthCheckJob triggered (scheduled) ↓ 2. Query all HTTP MCP servers (exclude STDIO, Vulnerable) ↓ 3. Parallel execution (ThreadPoolExecutor, max 4 workers) │ For each server: │ a. Send GET to {url}{health_check} │ b. If status differs from database: │ - Wait 5 seconds │ - Verify (check #1) │ - Wait 5 seconds │ - Verify (check #2) │ - If both confirm: update database ↓ 4. If any status changed: │ a. Clear MCP client cache │ b. Log cache invalidation ↓ 5. Return summary: │ - checked_count │ - changed_count │ - failed_count │ - cache_cleared
## State Management

### Conversation State

**Storage**: MemorySaver (in-memory) for short-term session storage

**State Structure**:
```python
{
    "messages": [
        HumanMessage(content="User question"),
        AIMessage(content="AI response"),
        ...
    ]
}

Thread Isolation: - Each session has unique thread_id (Django session key) - Sessions don't interfere - Parallel conversations supported

Persistence & TTL: - Backend (MemorySaver): In-memory storage with timestamp tracking - Checkpoints tracked with creation timestamps - Automatic cleanup every 5 minutes via scheduled job - Expires after configured TTL (default: 5 minutes) + 30s grace period - Lost on application restart - Frontend (localStorage): Browser-based message display - Messages filtered by age on page load - Synced with backend TTL configuration - Cleared automatically when expired - Inactivity timer for auto-clearing (matches TTL config)

TTL Configuration:

# In nautobot_config.py
PLUGINS_CONFIG = {
    "ai_ops": {
        "chat_session_ttl_minutes": 5,  # Default: 5 minutes
    }
}

Cleanup Process: 1. Frontend TTL Check (on page load): - Filters messages older than TTL + grace period - Shows expiry message if conversation expired - Calls backend clear API if all messages expired

  1. Backend Scheduled Cleanup (every 5 minutes):
  2. Scans all MemorySaver checkpoints
  3. Removes checkpoints older than TTL + grace period
  4. Logs processed and deleted counts

  5. Inactivity Timer (frontend):

  6. Resets on any user activity
  7. Triggers after TTL minutes of no interaction
  8. Clears both frontend and backend state

Migration Path: - Current: MemorySaver (session-based, in-memory) - Future Option 1: Redis Stack with RediSearch (persistent, cached) - Future Option 2: PostgreSQL (persistent, database) - See TODOs in checkpointer.py for implementation details

Application State

MCP Client Cache:

{
    "client": MultiServerMCPClient,
    "tools": [Tool1, Tool2, ...],
    "timestamp": datetime,
    "server_count": int
}

MCP Cache Invalidation: - Time-based (5 minute TTL) - Manual refresh available - Server status changes trigger refresh (via health check job)

Middleware Instantiation: - Middleware instances are NOT cached - Fresh instances created for each request from LangChain graph - Prevents state leaks between conversations - Configuration loaded from database each time - Changes to middleware configuration take effect immediately on next request

Security Architecture

Authentication & Authorization

User Authentication: - Nautobot's built-in authentication - LDAP/SAML support via Nautobot - Session management

API Authentication: - Token-based (Nautobot API tokens) - Per-request authentication - Token permissions enforced

Permissions: - ai_ops.view_llmprovider - ai_ops.add_llmprovider - ai_ops.change_llmprovider - ai_ops.delete_llmprovider - ai_ops.view_llmmodel - ai_ops.add_llmmodel - ai_ops.change_llmmodel - ai_ops.delete_llmmodel - ai_ops.view_middlewaretype - ai_ops.add_middlewaretype - ai_ops.change_middlewaretype - ai_ops.delete_middlewaretype - ai_ops.view_llmmiddleware - ai_ops.add_llmmiddleware - ai_ops.change_llmmiddleware - ai_ops.delete_llmmiddleware - ai_ops.view_mcpserver - ai_ops.add_mcpserver - ai_ops.change_mcpserver - ai_ops.delete_mcpserver

Secrets Management

API Keys: - Stored in Nautobot Secrets - Never in code or database directly - Provider-agnostic (environment, HashiCorp Vault, etc.)

Access Control: - Secrets retrieved at runtime - Minimal exposure - Audit trail via Nautobot

Data Security

In Transit: - HTTPS for all LLM providers (Ollama, OpenAI, Azure AI, Anthropic, etc.) - HTTPS for MCP servers (recommended) - TLS for Redis connections (optional)

At Rest: - PostgreSQL encryption (via deployment) - Redis encryption (via deployment) - Nautobot Secrets encryption

Network Security

Firewall Rules: - Outbound to LLM provider APIs (443) - Ollama: Configurable port (default 11434) - OpenAI: api.openai.com:443 - Azure AI: *.openai.azure.com:443 - Anthropic: api.anthropic.com:443 - HuggingFace: huggingface.co:443 - Outbound to MCP servers (various) - Inbound to Nautobot (80/443)

Internal Communication: - PostgreSQL: local or internal network - Redis: local or internal network - MCP servers: internal network (typically)

Scalability & Performance

Caching Strategy

MCP Client Cache: - Application-level - 5-minute TTL - Reduces initialization overhead - Thread-safe

Model Instances: - Reusable within agent lifecycle - Not cached globally (created per request) - Stateless between requests

Middleware Instances: - Always fresh per request (from LangChain graph) - Not cached to prevent state leaks - Minimal performance impact due to efficient instantiation

Async Architecture

Benefits: - Non-blocking I/O - Concurrent request handling - Better resource utilization

Implementation: - Django async views - Async agent processing - Async MCP client operations

Database Optimization

Indexes: - Primary keys (UUID) - Status field (MCPServer) - is_default field (LLMModel)

Queries: - Filtered by status for MCP servers - Default model query optimized - Minimal database round trips

Redis Optimization

Checkpoint Cleanup: - Scheduled job removes old data - Prevents unbounded growth - Configurable retention period

Key Structure: - Efficient key patterns - SCAN for safe iteration - Separate database number

Monitoring & Observability

Logging

Log Levels: - INFO: Normal operations - WARNING: Degraded conditions - ERROR: Failures

Log Locations: - Nautobot logs directory - Application-specific logger: ai_ops

Key Events: - Agent message processing - MCP cache operations - Model configuration retrieval - Health check failures

Metrics

Application Metrics: - Request count - Response times - Error rates

System Metrics: - Redis memory usage - PostgreSQL connection pool - Azure OpenAI API usage

Health Checks

MCP Server Health: - Automatic health check requests - Status field updated - Failed servers excluded

System Health: - Redis connectivity - PostgreSQL connectivity - Azure OpenAI API accessibility

Error Handling

Graceful Degradation

No MCP Servers: - Agent continues without tools - Basic conversation capabilities maintained - Warning logged

Model Configuration Missing: - Error raised early - Clear error message - LAB fallback to environment variables

Azure API Failures: - Errors propagated to user - Rate limit handling recommended - Retry logic (external implementation)

Error Recovery

Transient Failures: - MCP server temporarily unavailable → Excluded until healthy - Redis connection issue → Conversation history unavailable but agent works - Azure API timeout → User notified, can retry

Permanent Failures: - Invalid API key → Configuration error, admin must fix - Model not found → Create model or use default - Server consistently failing → Update status to Maintenance

Deployment Considerations

Environment Requirements

Minimum: - Python 3.10+ - PostgreSQL 12+ or MySQL 8+ - Redis 6+ - Nautobot 2.4.22+

Recommended: - Python 3.11+ - PostgreSQL 14+ - Redis 7+ - Dedicated Redis for checkpoints

Scaling

Horizontal Scaling: - Multiple Nautobot workers - Shared Redis for checkpoints - Shared PostgreSQL database - MCP client cache per worker

Vertical Scaling: - More CPU for LLM processing - More memory for Redis checkpoints - More connections to PostgreSQL

High Availability

Components: - Nautobot: Load balanced, multiple workers - PostgreSQL: Primary/replica setup - Redis: Redis Sentinel or Cluster - MCP Servers: Multiple instances with health checks

Failure Scenarios: - Single worker failure → Other workers handle requests - Redis failure → Conversation history lost, functionality continues - PostgreSQL failure → Application unavailable (required) - MCP server failure → Other servers continue, failed server excluded

Future Architecture Enhancements

Planned Improvements

  1. PostgreSQL Checkpointing: Replace Redis with PostgreSQL for persistence
  2. Conversation History UI: View and manage past conversations
  3. Model Performance Metrics: Track model usage and performance
  4. Advanced Caching: Redis caching for model responses
  5. Streaming Responses: Real-time streaming of AI responses
  6. Multi-Tenancy: Tenant-specific models and configurations
  7. Custom Agent Types: Support for specialized agent implementations
  8. Tool Usage Analytics: Track and visualize tool invocations

Integration Opportunities

  • ITSM Integration: ServiceNow, Jira ticket creation
  • Monitoring Systems: Integration with Prometheus, Grafana
  • ChatOps: Slack, Teams integration
  • Workflow Automation: Ansible, Terraform integration