A unified data intelligence layer — from source extraction and intelligent archiving to AI-powered querying, governance enforcement, and point-in-time restore — all within your sovereign perimeter.
Meridian is the brain of the platform — a FastAPI backend with a React control plane that manages every Stratum node, archive job, user, and governance policy across your organisation. All metadata lives here; all data stays on your Stratum servers.
Stratum agents maintain a persistent WebSocket connection to the gateway pool. Job dispatch latency drops from ~2 seconds (polling) to ~10 milliseconds. Redis Pub/Sub decouples the backend from gateway instances, enabling horizontal scaling.
Full parent/child organisation hierarchy. Each org gets its own user space, Stratum registry, storage backends, archive jobs, and governance policies. System admins manage the platform; org admins manage their own estate — with complete isolation enforced at the database level.
Define reusable SLA templates specifying schedule, retention policy, and health thresholds. Assign templates to archive jobs — changes propagate automatically. Built-in scheduler supports cron expressions, daily/weekly/monthly cycles, and on-demand triggers.
Sovereign DataVault tracks the schema of every archived table. When the source schema changes, drift detection surfaces additions, removals, and type changes — giving operators a preview before the next archive run or restore operation touches the target.
All connection credentials, encryption keys, and SSH passwords are stored in HashiCorp Vault — never in the application database as plaintext. Vault Transit provides envelope encryption; Vault KV stores per-org secrets. Snapshot and restore for Vault state included.
Stage and distribute agent binaries, LLM GGUF model files, and Trino plugin JARs from Meridian. Stratum agents pull packages on demand — no external internet access required. Complete air-gap operation from day one.
Sovereign DataVault Structured (LVS) connects to your production relational databases, extracts data on a configurable schedule, writes sorted Apache Iceberg Parquet to your Stratum storage, and registers each file in a queryable metadata tracker. No external clusters. No manual partition tuning. No idle compute burning resources.
At archive time, the Sort Advisor inspects the source table's schema — primary keys, clustered indexes, column types — and selects the optimal sort strategy before writing a single Parquet row.
Sort basis is recorded per-file in the tracker (sort_column_basis) — giving operators a full audit trail of why each sort decision was made.
Sovereign DataVault ships production-ready connectors for all major enterprise databases. Each connector handles schema discovery, data type mapping, incremental extraction, redacted column detection, and pre-requisite validation before the first byte is written.
Automatically discovers masked or redacted columns in source schemas — ensuring compliant handling before archiving.
Every archive run snapshots the schema. Drift is detected, compared, and surfaced in the UI before any restore.
Continuous health scoring per archive job. Agents report heartbeats, job status, and error events in real time.
Arrow Flight SQL interface for high-throughput bulk reads from archived Iceberg tables — ideal for analytics pipelines.
Sovereign DataVault Unstructured (LVUS) ingests file-based sources — local filesystems, NFS, SFTP, object storage, cloud collaboration, email archives, web feeds, and application logs — extracts structured metadata and raw text, runs Named Entity Recognition, generates embeddings, and indexes everything in a local Qdrant vector store. The result: semantic search and RAG-powered AI queries over millions of documents without a single byte leaving your infrastructure.
NER runs at ingest time — detecting PII (names, emails, phone numbers, national IDs, account numbers, addresses) in every document. Entities are indexed separately and drive DSAR search.
nomic-embed-text (768-dim) runs on the Stratum node via Ollama — embeddings are generated locally, stored in per-Stratum Qdrant. No cloud API call is ever needed for embedding.
LVUS archivers write ZSTD-compressed JSONL extractions and register manifest Parquet files in the datagen Iceberg catalog — making document metadata queryable via Trino SQL.
SharePoint MIP labels, S3 Object Lock / Macie classification, HDFS permissions, and group-based ACLs are captured at ingest and enforced at query time via the governance engine.
Choose Eager mode (embed at ingest — immediate semantic search) or Lazy mode (embed on demand — lower write overhead for large cold archives). Switch per Stratum.
All raw content is stored as ZSTD-compressed archives. Content-addressed storage (CAS) deduplicates text payloads across documents — dramatically reducing storage footprint.
Documents cycle through PRODUCTION → ARCHIVED_WARM → ARCHIVED_COLD → FILE_DELETED states, with configurable warm/cold tiering thresholds per source.
Track field-level version history for reclassified documents — see exactly which attributes changed, when, and by whom. Full audit trail for compliance.
The AI Query page brings together a professional Monaco SQL editor, a persistent multi-session AI chat assistant, and a real-time schema browser — all powered by your choice of LLM. For unstructured data, the same interface switches to RAG mode, retrieving relevant document chunks and synthesising answers with citations.
Sovereign DataVault is provider-agnostic. Switch models per session or set an org-level default. Air-gap deployments use Ollama with locally-hosted GGUF models — no internet required.
Restore archived Iceberg Parquet files back to any target relational database. Sovereign DataVault reads Parquet, maps types back to the target schema, and writes row-by-row with FK order awareness. Schema drift is detected and surfaced before the restore begins — no surprises.
TDM Workflows provision masked copies of production archives to dev and staging environments. Every workflow is replayable, CI/CD-triggerable, and produces referentially-intact, compliance-safe test data.
The Semi-Structured plane handles CSV, JSON, XML, EDI, and fixed-width archives independently — with its own restore dialog, file browser, and job tracking. Restore specific files or full source snapshots.
tdm_•••••••••••••
Register a Change Data Capture bridge source to ingest CDC event batches into TDM — keeping your test environments in near-real-time sync with production changes, without direct production access.
Sovereign DataVault's observability suite gives platform operators full visibility into the health and behaviour of every component — from Stratum agent heartbeats to Trino query traces and archive job event streams.
Interactive topology diagram showing every service (agents, Trino, Qdrant, Ollama, catalogs) and their live connections. Click any node for health details, latency metrics, and recent events.
Full-text search across all service logs with time-range filtering, severity faceting, and service-level filtering. Query across hundreds of gigabytes of structured log data in seconds.
Automatic correlation of anomalies across services. When an archive job fails, RCA traces the causal chain — from the agent event, through Trino query failures, to the originating storage or network error.
Distributed trace viewer for every query and archive operation. Inspect spans across the full call stack — backend → agent → Trino → Iceberg catalog — with timing, status, and payload at each hop.
Before making infrastructure changes, model the downstream impact — which archive jobs, queries, and restore pipelines will be affected. Blast-radius analysis for Stratum maintenance windows.
Pre-bucketed log metrics (1-minute, 5-minute, 1-hour granularity), service health summaries, active and resolved outage tracking, and configurable alert thresholds — all in a single pane of glass.
Three clearly separated planes — control, data, and AI — that can be deployed on-premises, in a private cloud, or in a hybrid configuration. No customer data ever transits the AI plane. No metadata ever leaves Meridian.
Docker Compose for single-node. Kubernetes + Helm for production clusters. All components deployable on RHEL 8/9, Ubuntu 22+, or compatible enterprise Linux.
Full air-gap operation. Package Manager distributes binaries and LLM models offline. Ollama serves local inference. No external internet connection required at any point after initial setup.
Set SCHEDULER_EXTERNAL=true to offload scheduling to Redis. Add the saas Docker profile for cloud-native operation. Control plane can run in your private cloud while Stratums remain on-premises.