Architecture Overview¶
This document provides a comprehensive overview of the AI Ops App architecture.
High-Level Architecture¶
The AI Ops App integrates multiple LLM providers with Nautobot through a multi-layered architecture with middleware support:
┌──────────────────────────────────────────────────────────────┐
│ User Interface │
│ (Web UI / REST API / Chat Interface) │
└────────────────┬─────────────────────────────────────────────┘
│
┌────────────────▼─────────────────────────────────────────────┐
│ Nautobot Plugin Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Views │ │ API │ │ Jobs │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────┼──────────────────┼──────────────┘
│ │ │
┌─────────▼──────────────────▼──────────────────▼──────────────┐
│ Application Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ AI Agents │ │ Models │ │ Helpers │ │
│ │ (LangGraph) │ │ (LLMProvider,│ │(get_llm_ │ │
│ │ + Middleware│ │ LLMModel, │ │ model, │ │
│ │ │ │ Middleware, │ │ middleware) │ │
│ │ │ │ MCPServer) │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────┼──────────────────┼──────────────┘
│ │ │
┌─────────▼──────────────────▼──────────────────▼──────────────┐
│ Integration Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ LangChain │ │ Redis │ │ PostgreSQL │ │
│ │ (MCP) │ │(Checkpoints, │ │ (Models) │ │
│ │ │ │ Middleware │ │ │ │
│ │ │ │ Cache) │ │ │ │
│ └──────┬───────┘ └──────────────┘ └──────────────┘ │
└─────────┼────────────────────────────────────────────────────┘
│
┌─────────▼─────────────────────────────────────────────────────┐
│ External Services │
│ ┌─────────────────────────────────────────────────┐ │
│ │ LLM Providers (Multi-Provider Support) │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐│ │
│ │ │ Ollama │ │ OpenAI │ │Azure AI│ │Anthropic│ │
│ │ └────────┘ └────────┘ └────────┘ └────────┘│ │
│ │ ┌────────┐ ┌────────┐ │ │
│ │ │HuggingF│ │ Custom │ │ │
│ │ └────────┘ └────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌──────────────┐ │
│ │ MCP Servers │ │
│ │ (Tools/Ctx) │ │
│ └──────────────┘ │
└────────────────────────────────────────────────────────────────┘
Middleware Architecture¶
The app supports a flexible middleware system that processes requests before and after they reach the LLM:
User Request
↓
┌───────────────────────────────────────┐
│ Middleware Chain │
│ (Executed in Priority Order 1-100) │
├───────────────────────────────────────┤
│ Priority 10: LoggingMiddleware │ ← Log request
│ Priority 20: CacheMiddleware │ ← Check cache
│ Priority 30: RetryMiddleware │ ← Retry logic
│ Priority 40: ValidationMiddleware │ ← Validate input
├───────────────────────────────────────┤
│ LLM Model │ ← Process request
│ (Ollama/OpenAI/Azure/etc) │
├───────────────────────────────────────┤
│ Priority 40: ValidationMiddleware │ ← Validate output
│ Priority 30: RetryMiddleware │ ← (if needed)
│ Priority 20: CacheMiddleware │ ← Store in cache
│ Priority 10: LoggingMiddleware │ ← Log response
└───────────────────────────────────────┘
↓
Response to User
Component Architecture¶
1. User Interface Layer¶
Web UI¶
- Chat Interface:
/plugins/ai-ops/chat/- Interactive chat widget - Provider Management: List, create, edit, delete LLM providers
- Model Management: List, create, edit, delete LLM models
- Middleware Management: Configure middleware for models
- Server Management: Configure and monitor MCP servers
- Navigation: Integrated into Nautobot's navigation menu
REST API¶
- LLM Providers API:
/api/plugins/ai-ops/llm-providers/ - LLM Models API:
/api/plugins/ai-ops/llm-models/ - Middleware Types API:
/api/plugins/ai-ops/middleware-types/ - LLM Middleware API:
/api/plugins/ai-ops/llm-middleware/ - MCP Servers API:
/api/plugins/ai-ops/mcp-servers/ - Chat API:
/plugins/ai-ops/api/chat/- Programmatic chat access
2. Application Layer¶
AI Agents¶
Multi-MCP Agent (ai_ops/agents/multi_mcp_agent.py):
- Production-ready agent implementation
- Supports multiple MCP servers simultaneously
- Application-level caching for performance
- Health-based server selection
- LangGraph state management
- Middleware integration
Single-MCP Agent (ai_ops/agents/single_mcp_agent.py):
- Simplified single-server implementation
- Development and testing scenarios
Agent Features: - Conversation history via checkpointing - Tool discovery from MCP servers - Multi-provider LLM support (Ollama, OpenAI, Azure AI, Anthropic, HuggingFace) - Middleware chain execution - Async/await architecture
Models¶
LLMProvider: - Defines available LLM providers (Ollama, OpenAI, Azure AI, Anthropic, HuggingFace, Custom) - Stores provider-specific configuration in JSON schema - Has corresponding provider handler classes - Enable/disable providers without deletion
LLMModel: - Stores model configurations for any supported provider - Environment-aware (LAB/NONPROD/PROD) - Integrates with Nautobot Secrets - Supports default model selection - Can have multiple middleware configurations - References LLMProvider via foreign key
MiddlewareType: - Defines middleware types (built-in LangChain or custom) - Reusable across multiple models - Name validation and formatting
LLMMiddleware: - Configures middleware instances for specific models - Priority-based execution order (1-100) - JSON configuration for flexibility - Active/inactive toggle - Critical flag for initialization requirements
MCPServer: - Stores MCP server configurations - Health status tracking with automated checks - Protocol support (HTTP/STDIO) - Type classification (Internal/External)
Helpers¶
get_llm_model: - Environment detection - Model configuration retrieval - Azure OpenAI client creation - Sync and async variants
get_info: - Status retrieval utilities - Default value providers
Serializers: - LangGraph checkpoint serialization - Custom data type handling
3. Integration Layer¶
LangChain & LangGraph¶
LangChain: - Azure OpenAI integration - Message handling - Tool abstraction
LangGraph: - State graph workflow - Checkpointing system - Conditional routing - Tool node execution
MCP Integration:
- langchain-mcp-adapters: MCP client library
- MultiServerMCPClient: Multi-server support
- Tool discovery and execution
Redis¶
Purpose: Conversation checkpoint storage
Configuration:
- Database: Separate from cache/Celery (default DB 2)
- Key Pattern: checkpoint:{thread_id}:{checkpoint_id}
- TTL: Managed by cleanup job
Data Stored: - Conversation messages - Agent state - Metadata (timestamps, user info)
PostgreSQL¶
Purpose: Application data storage
Tables:
- ai_ops_llmmodel: LLM configurations
- ai_ops_mcpserver: MCP server configurations
- Plus standard Nautobot tables (secrets, statuses, etc.)
4. External Services¶
Azure OpenAI¶
Service: Microsoft Azure OpenAI Service
LLM Provider Support: - Ollama: Local open-source models - OpenAI: GPT-4, GPT-4o, GPT-3.5-turbo - Azure AI: Azure OpenAI Service - Anthropic: Claude models - HuggingFace: HuggingFace Hub models - Custom: Extensible provider system
Communication: - HTTPS REST API - API Key authentication (via Secrets) - Provider-specific endpoints - Handler-based initialization
MCP Servers¶
Purpose: Extend agent capabilities with custom tools
Types: - Internal: Hosted within infrastructure - External: Third-party services
Protocols: - HTTP: RESTful MCP servers - STDIO: Process-based servers
Health Monitoring: - Automatic health checks via scheduled job - Status field in database - Failed servers excluded from operations - Parallel health checking with retry logic
Data Flow¶
Chat Message Flow with Middleware¶
1. User submits message
↓
2. ChatMessageView receives request
↓
3. Session ID retrieved (thread_id)
↓
4. process_message() called
↓
5. MCP client cache checked/created
↓
6. LLM model configuration retrieved
↓
7. Middleware chain initialized fresh (priority order)
↓
8. LangGraph state graph created with middleware
↓
9. Message added to state
↓
10. Middleware pre-processing (Priority 1 → 100)
↓
11. Agent processes message
↓
12. LLM provider handler creates model instance
↓
13. Model processes request
↓
14. Tools invoked if needed (via MCP)
↓
15. Middleware post-processing (Priority 100 → 1)
↓
16. Response generated by LLM
↓
17. State persisted to Redis
↓
18. Response returned to user
Provider Selection Flow¶
1. Get LLM model (by name or default)
↓
2. Load model's provider relationship
↓
3. Get provider handler from registry
↓
4. Retrieve provider config_schema from database
↓
5. Get model's API key from Secret
↓
6. Provider handler initializes LLM
│
├─ Ollama: ChatOllama(base_url, model_name)
├─ OpenAI: ChatOpenAI(api_key, model_name)
├─ Azure AI: AzureChatOpenAI(api_key, endpoint, deployment)
├─ Anthropic: ChatAnthropic(api_key, model_name)
├─ HuggingFace: ChatHuggingFace(api_key, model_name)
└─ Custom: CustomHandler(config, api_key)
↓
7. Return initialized chat model instance
Middleware Execution Flow¶
┌────────────────────────────────────────────┐
│ Request from Agent │
└───────────────┬────────────────────────────┘
↓
┌───────────────▼────────────────────────────┐
│ Load Model's Middleware Configurations │
│ - Query LLMMiddleware.objects │
│ - Filter: is_active=True │
│ - Order by: priority, middleware__name │
└───────────────┬────────────────────────────┘
↓
┌───────────────▼────────────────────────────┐
│ Initialize Middleware Chain │
│ For each middleware (priority order): │
│ 1. Load middleware type │
│ 2. Get configuration JSON │
│ 3. Initialize middleware instance │
│ 4. Add to chain │
└───────────────┬────────────────────────────┘
↓
┌───────────────▼────────────────────────────┐
│ Pre-Processing Phase │
│ (Priority 1 → 100) │
│ │
│ Priority 10: LoggingMiddleware │
│ - Log request timestamp │
│ - Log user info and message │
│ │
│ Priority 20: CacheMiddleware │
│ - Check if response cached │
│ - If cached: return cached response │
│ - If not: continue chain │
│ │
│ Priority 30: ValidationMiddleware │
│ - Validate input format │
│ - Check for malicious content │
│ - Sanitize input if needed │
└───────────────┬────────────────────────────┘
↓
┌───────────────▼────────────────────────────┐
│ LLM Processing │
│ - Model generates response │
│ - Tools invoked if needed │
└───────────────┬────────────────────────────┘
↓
┌───────────────▼────────────────────────────┐
│ Post-Processing Phase │
│ (Priority 100 → 1) │
│ │
│ Priority 30: ValidationMiddleware │
│ - Validate output format │
│ - Check for sensitive data │
│ - Filter response if needed │
│ │
│ Priority 20: CacheMiddleware │
│ - Store response in cache │
│ - Set cache TTL from config │
│ │
│ Priority 10: LoggingMiddleware │
│ - Log response timestamp │
│ - Log token usage and latency │
└───────────────┬────────────────────────────┘
↓
┌───────────────▼────────────────────────────┐
│ Response to User │
└────────────────────────────────────────────┘
Chat Message Flow¶
↓ 10. Tools invoked if needed (via MCP) ↓ 11. Response generated by LLM ↓ 12. State persisted to Redis ↓ 13. Response returned to user
get_llm_model() ↓ Detect environment → "LAB" ↓ Read environment variables ↓ Create model via default provider (typically Ollama) ↓ Return model get_llm_model() ↓ Detect environment → "PROD" ↓ Query LLMModel.get_default_model() ↓ Load model's provider relationship ↓ Get provider handler from registry ↓ Retrieve Secret for API key ↓ Build configuration dict from provider config_schema ↓ Provider handler creates model instance ↓ Return model 1. App startup or cache expiry ↓ 2. Query MCPServer.objects.filter(status__name="Active") ↓ 3. Build connections dict ↓ 4. Create MultiServerMCPClient ↓ 5. Discover tools from each server ↓ 6. Cache client and tools (5 min TTL) ↓ 7. Tools available to agent 1. Agent needs to process request ↓ 2. Query LLMMiddleware for model │ a. Filter: is_active=True │ b. Order by: priority, middleware__name ↓ 3. Initialize fresh middleware instances │ a. Load each middleware type │ b. Apply configuration from database │ c. Create new instance (NOT cached) ↓ 4. Apply middleware chain to requestNote: Middleware instances are always fresh to prevent state leaks between conversations. This ensures stateful middleware (e.g., SummarizationMiddleware) don't share state across different users.
1. MCPServerHealthCheckJob triggered (scheduled) ↓ 2. Query all HTTP MCP servers (exclude STDIO, Vulnerable) ↓ 3. Parallel execution (ThreadPoolExecutor, max 4 workers) │ For each server: │ a. Send GET to {url}{health_check} │ b. If status differs from database: │ - Wait 5 seconds │ - Verify (check #1) │ - Wait 5 seconds │ - Verify (check #2) │ - If both confirm: update database ↓ 4. If any status changed: │ a. Clear MCP client cache │ b. Log cache invalidation ↓ 5. Return summary: │ - checked_count │ - changed_count │ - failed_count │ - cache_cleared## State Management
### Conversation State
**Storage**: MemorySaver (in-memory) for short-term session storage
**State Structure**:
```python
{
"messages": [
HumanMessage(content="User question"),
AIMessage(content="AI response"),
...
]
}
Thread Isolation: - Each session has unique thread_id (Django session key) - Sessions don't interfere - Parallel conversations supported
Persistence & TTL: - Backend (MemorySaver): In-memory storage with timestamp tracking - Checkpoints tracked with creation timestamps - Automatic cleanup every 5 minutes via scheduled job - Expires after configured TTL (default: 5 minutes) + 30s grace period - Lost on application restart - Frontend (localStorage): Browser-based message display - Messages filtered by age on page load - Synced with backend TTL configuration - Cleared automatically when expired - Inactivity timer for auto-clearing (matches TTL config)
TTL Configuration:
# In nautobot_config.py
PLUGINS_CONFIG = {
"ai_ops": {
"chat_session_ttl_minutes": 5, # Default: 5 minutes
}
}
Cleanup Process: 1. Frontend TTL Check (on page load): - Filters messages older than TTL + grace period - Shows expiry message if conversation expired - Calls backend clear API if all messages expired
- Backend Scheduled Cleanup (every 5 minutes):
- Scans all MemorySaver checkpoints
- Removes checkpoints older than TTL + grace period
-
Logs processed and deleted counts
-
Inactivity Timer (frontend):
- Resets on any user activity
- Triggers after TTL minutes of no interaction
- Clears both frontend and backend state
Migration Path:
- Current: MemorySaver (session-based, in-memory)
- Future Option 1: Redis Stack with RediSearch (persistent, cached)
- Future Option 2: PostgreSQL (persistent, database)
- See TODOs in checkpointer.py for implementation details
Application State¶
MCP Client Cache:
{
"client": MultiServerMCPClient,
"tools": [Tool1, Tool2, ...],
"timestamp": datetime,
"server_count": int
}
MCP Cache Invalidation: - Time-based (5 minute TTL) - Manual refresh available - Server status changes trigger refresh (via health check job)
Middleware Instantiation: - Middleware instances are NOT cached - Fresh instances created for each request from LangChain graph - Prevents state leaks between conversations - Configuration loaded from database each time - Changes to middleware configuration take effect immediately on next request
Security Architecture¶
Authentication & Authorization¶
User Authentication: - Nautobot's built-in authentication - LDAP/SAML support via Nautobot - Session management
API Authentication: - Token-based (Nautobot API tokens) - Per-request authentication - Token permissions enforced
Permissions:
- ai_ops.view_llmprovider
- ai_ops.add_llmprovider
- ai_ops.change_llmprovider
- ai_ops.delete_llmprovider
- ai_ops.view_llmmodel
- ai_ops.add_llmmodel
- ai_ops.change_llmmodel
- ai_ops.delete_llmmodel
- ai_ops.view_middlewaretype
- ai_ops.add_middlewaretype
- ai_ops.change_middlewaretype
- ai_ops.delete_middlewaretype
- ai_ops.view_llmmiddleware
- ai_ops.add_llmmiddleware
- ai_ops.change_llmmiddleware
- ai_ops.delete_llmmiddleware
- ai_ops.view_mcpserver
- ai_ops.add_mcpserver
- ai_ops.change_mcpserver
- ai_ops.delete_mcpserver
Secrets Management¶
API Keys: - Stored in Nautobot Secrets - Never in code or database directly - Provider-agnostic (environment, HashiCorp Vault, etc.)
Access Control: - Secrets retrieved at runtime - Minimal exposure - Audit trail via Nautobot
Data Security¶
In Transit: - HTTPS for all LLM providers (Ollama, OpenAI, Azure AI, Anthropic, etc.) - HTTPS for MCP servers (recommended) - TLS for Redis connections (optional)
At Rest: - PostgreSQL encryption (via deployment) - Redis encryption (via deployment) - Nautobot Secrets encryption
Network Security¶
Firewall Rules: - Outbound to LLM provider APIs (443) - Ollama: Configurable port (default 11434) - OpenAI: api.openai.com:443 - Azure AI: *.openai.azure.com:443 - Anthropic: api.anthropic.com:443 - HuggingFace: huggingface.co:443 - Outbound to MCP servers (various) - Inbound to Nautobot (80/443)
Internal Communication: - PostgreSQL: local or internal network - Redis: local or internal network - MCP servers: internal network (typically)
Scalability & Performance¶
Caching Strategy¶
MCP Client Cache: - Application-level - 5-minute TTL - Reduces initialization overhead - Thread-safe
Model Instances: - Reusable within agent lifecycle - Not cached globally (created per request) - Stateless between requests
Middleware Instances: - Always fresh per request (from LangChain graph) - Not cached to prevent state leaks - Minimal performance impact due to efficient instantiation
Async Architecture¶
Benefits: - Non-blocking I/O - Concurrent request handling - Better resource utilization
Implementation: - Django async views - Async agent processing - Async MCP client operations
Database Optimization¶
Indexes: - Primary keys (UUID) - Status field (MCPServer) - is_default field (LLMModel)
Queries: - Filtered by status for MCP servers - Default model query optimized - Minimal database round trips
Redis Optimization¶
Checkpoint Cleanup: - Scheduled job removes old data - Prevents unbounded growth - Configurable retention period
Key Structure: - Efficient key patterns - SCAN for safe iteration - Separate database number
Monitoring & Observability¶
Logging¶
Log Levels: - INFO: Normal operations - WARNING: Degraded conditions - ERROR: Failures
Log Locations:
- Nautobot logs directory
- Application-specific logger: ai_ops
Key Events: - Agent message processing - MCP cache operations - Model configuration retrieval - Health check failures
Metrics¶
Application Metrics: - Request count - Response times - Error rates
System Metrics: - Redis memory usage - PostgreSQL connection pool - Azure OpenAI API usage
Health Checks¶
MCP Server Health: - Automatic health check requests - Status field updated - Failed servers excluded
System Health: - Redis connectivity - PostgreSQL connectivity - Azure OpenAI API accessibility
Error Handling¶
Graceful Degradation¶
No MCP Servers: - Agent continues without tools - Basic conversation capabilities maintained - Warning logged
Model Configuration Missing: - Error raised early - Clear error message - LAB fallback to environment variables
Azure API Failures: - Errors propagated to user - Rate limit handling recommended - Retry logic (external implementation)
Error Recovery¶
Transient Failures: - MCP server temporarily unavailable → Excluded until healthy - Redis connection issue → Conversation history unavailable but agent works - Azure API timeout → User notified, can retry
Permanent Failures: - Invalid API key → Configuration error, admin must fix - Model not found → Create model or use default - Server consistently failing → Update status to Maintenance
Deployment Considerations¶
Environment Requirements¶
Minimum: - Python 3.10+ - PostgreSQL 12+ or MySQL 8+ - Redis 6+ - Nautobot 2.4.22+
Recommended: - Python 3.11+ - PostgreSQL 14+ - Redis 7+ - Dedicated Redis for checkpoints
Scaling¶
Horizontal Scaling: - Multiple Nautobot workers - Shared Redis for checkpoints - Shared PostgreSQL database - MCP client cache per worker
Vertical Scaling: - More CPU for LLM processing - More memory for Redis checkpoints - More connections to PostgreSQL
High Availability¶
Components: - Nautobot: Load balanced, multiple workers - PostgreSQL: Primary/replica setup - Redis: Redis Sentinel or Cluster - MCP Servers: Multiple instances with health checks
Failure Scenarios: - Single worker failure → Other workers handle requests - Redis failure → Conversation history lost, functionality continues - PostgreSQL failure → Application unavailable (required) - MCP server failure → Other servers continue, failed server excluded
Future Architecture Enhancements¶
Planned Improvements¶
- PostgreSQL Checkpointing: Replace Redis with PostgreSQL for persistence
- Conversation History UI: View and manage past conversations
- Model Performance Metrics: Track model usage and performance
- Advanced Caching: Redis caching for model responses
- Streaming Responses: Real-time streaming of AI responses
- Multi-Tenancy: Tenant-specific models and configurations
- Custom Agent Types: Support for specialized agent implementations
- Tool Usage Analytics: Track and visualize tool invocations
Integration Opportunities¶
- ITSM Integration: ServiceNow, Jira ticket creation
- Monitoring Systems: Integration with Prometheus, Grafana
- ChatOps: Slack, Teams integration
- Workflow Automation: Ansible, Terraform integration
Related Documentation¶
- Models - Database models
- Agents - AI agent implementations
- Helpers - Helper modules
- Jobs - Background jobs
- External Interactions - External system integrations