Documents → Chunking → Embeddings → Vector Indexing → Semantic Search
mTLS | AES-256 | JWT | Rate Limiting | RBAC
Kubernetes | Docker | Multi-Region | Auto-Scaling
End-to-end pipeline with event-driven processing and asynchronous workflows
Port 3000 | HTTPS REST
Port 8080 | gRPC/REST
Port 8081 | REST/Kafka
4 Services (Hover to expand)
S3 + PostgreSQL
Deployment & Monitor
Performance
Security
Reliability
Frontend ↔ API Gateway
Protocol: HTTPS REST | Port: 3000→8080 | Rate: 1000 req/min | Auth: JWT Bearer Token
API Gateway ↔ Authentication Service
Protocol: Internal gRPC | Port: 8080→8090 | Check: JWT validation, RBAC verification | Cache: 5min TTL
API Gateway ↔ Spring Boot Backend
Protocol: gRPC + REST | Port: 8080→8081 | Load Balancer: Round-robin | Timeout: 30s
Backend ↔ Document Processor
Protocol: Kafka Queue | Topic: document-processing | Partition: 10 | Batch: 32 chunks/sec | Storage: S3 + PostgreSQL metadata
Backend ↔ Embedding Service
Protocol: HTTP REST | Port: 8081→GoogleAPI | Endpoint: /embeddings | Batch: 100 chunks | Dimension: 768D | Retry: 3x exponential
Backend ↔ Vector Database
Protocol: gRPC + REST | Port: 8081→Pinecone | Operations: Upsert, Query, Delete | Similarity: Cosine | TopK: 5 | Namespace: user-id based
Backend ↔ LLM Service
Protocol: HTTP REST | Port: 8081→GoogleAPI | Streaming: Server-sent events | Model: gemini-2.5-flash | Timeout: 60s | Max tokens: 1024
Document Processor ↔ Embedding Service
Protocol: Internal Service Bus | Event: chunk-ready | Payload: text + metadata | Order: Guaranteed (single partition)
Embedding Service ↔ Vector DB
Protocol: gRPC streaming | Batch upsert: 100 vectors/request | Index: Vector + metadata + namespace | Consistency: Eventual (3 replicas)
All Services ↔ Storage Layer
S3: Document storage + backup | PostgreSQL: Metadata + audit logs | Connection: Connection pooling (50 connections) | Replication: 3x cross-region
All Services ↔ CI/CD Pipeline
Monitoring: Prometheus + Grafana | Logs: ELK Stack | Traces: Jaeger | Alerts: PagerDuty | Deployment: GitOps (ArgoCD)
📄 Document Upload Flow
🔍 Search & Query Flow
Request: multipart/form-data
Max Size: 100MB
Response: {documentId, status, chunks}
Auth: Bearer JWT
Request: {query, topK}
Response Time: p99 < 500ms
Streaming: SSE (Server-Sent Events)
Rate Limit: 1000 req/min
Response: Service health & metrics
Interval: 30s polling
Circuit Breaker: Enabled
Timeout: 5 seconds
Trigger: Async embedding generation
Payload: {documentId, vectorCount}
Retry: Exponential backoff 3x
Signature: HMAC-SHA256
p50: 145ms
p95: 380ms
p99: 500ms
Max (SLO): 600ms
Peak TPS: 1.2K/sec
Concurrent Users: 10K
Vector Dim: 768D
Indexed Vectors: 50M+
SLA: 99.99%
Replication: 3x across AZs
Backup: Daily + RPO 1h
RTO: 15 minutes
• Data in Transit: TLS 1.3, HTTPS, mTLS
• Data at Rest: AES-256-GCM (AWS KMS)
• Key Management: AWS Secrets Manager
• OAuth 2.0 / OpenID Connect
• JWT with RS256 signature verification
• Role-Based Access Control (RBAC)
• Prometheus metrics collection
• ELK Stack (Elasticsearch, Logstash, Kibana)
• Distributed tracing (Jaeger)
• SOC 2 Type II compliant
• GDPR data residency requirements
• HIPAA audit logging enabled
ARCHITECTURE VERSION 1.0 | LAST UPDATED: DECEMBER 2025 | DISTRIBUTED UNDER ENTERPRISE LICENSE