Platform Security Solutions Resources
Contact Sales Request Demo
Complete Platform Reference

The Sovereign DataVault Platform

A unified data intelligence layer — from source extraction and intelligent archiving to AI-powered querying, governance enforcement, and point-in-time restore — all within your sovereign perimeter.

Structured Unstructured AI Query Restore & TDM Observability Architecture

Meridian: centralised intelligence for your entire archive estate

Meridian is the brain of the platform — a FastAPI backend with a React control plane that manages every Stratum node, archive job, user, and governance policy across your organisation. All metadata lives here; all data stays on your Stratum servers.

WebSocket Agent Transport

Stratum agents maintain a persistent WebSocket connection to the gateway pool. Job dispatch latency drops from ~2 seconds (polling) to ~10 milliseconds. Redis Pub/Sub decouples the backend from gateway instances, enabling horizontal scaling.

🏢

Multi-Tenant Organisations

Full parent/child organisation hierarchy. Each org gets its own user space, Stratum registry, storage backends, archive jobs, and governance policies. System admins manage the platform; org admins manage their own estate — with complete isolation enforced at the database level.

📅

SLA Templates & Scheduler

Define reusable SLA templates specifying schedule, retention policy, and health thresholds. Assign templates to archive jobs — changes propagate automatically. Built-in scheduler supports cron expressions, daily/weekly/monthly cycles, and on-demand triggers.

🔄

Catalog Sync & Schema Drift

Sovereign DataVault tracks the schema of every archived table. When the source schema changes, drift detection surfaces additions, removals, and type changes — giving operators a preview before the next archive run or restore operation touches the target.

🔒

HashiCorp Vault Integration

All connection credentials, encryption keys, and SSH passwords are stored in HashiCorp Vault — never in the application database as plaintext. Vault Transit provides envelope encryption; Vault KV stores per-org secrets. Snapshot and restore for Vault state included.

📦

Package Manager

Stage and distribute agent binaries, LLM GGUF model files, and Trino plugin JARs from Meridian. Stratum agents pull packages on demand — no external internet access required. Complete air-gap operation from day one.

Archive relational data at enterprise scale — with millisecond query access

Sovereign DataVault Structured (LVS) connects to your production relational databases, extracts data on a configurable schedule, writes sorted Apache Iceberg Parquet to your Stratum storage, and registers each file in a queryable metadata tracker. No external clusters. No manual partition tuning. No idle compute burning resources.

Sort Advisor

Automated write-time sort column selection

At archive time, the Sort Advisor inspects the source table's schema — primary keys, clustered indexes, column types — and selects the optimal sort strategy before writing a single Parquet row.

Composite clustered index → Z-order multi-column sort
IDENTITY / SEQUENCE PK → sort by primary key
UUID PK + indexed date → skip UUID, sort by date
Non-clustered index on timestamp → sort by that column
No useful index → write unsorted, pruning disabled

Sort basis is recorded per-file in the tracker (sort_column_basis) — giving operators a full audit trail of why each sort decision was made.

Pruning Panel — AI Query Page
Predicate: order_date BETWEEN '2023-01-01' AND '2023-03-31'
📄 FINCORE_2023_Q1.parquetSCAN
✓ 5 of 6 files pruned · 1 scanned · 142ms total
Bounds Verify: all file bounds match live Parquet footer stats
Source Connectors

Every major enterprise relational database — supported out of the box

Sovereign DataVault ships production-ready connectors for all major enterprise databases. Each connector handles schema discovery, data type mapping, incremental extraction, redacted column detection, and pre-requisite validation before the first byte is written.

🐉Oracle 19c / 21c
🐘PostgreSQL
🐬MySQL / MariaDB
🪟SQL Server
🔷IBM Db2
🗄️More coming
Iceberg
Native format
Parquet
File encoding
Trino
Query engine
Archive Schedule — FINCORE_PROD
Schema Discovery 142 tables scanned
Sort Advisor TRXN_DATE selected
Extract & Sort In progress — 67%
Write Parquet → Stratum Queued
Register Iceberg Catalog Queued
Update Bounds Tracker Queued
🔍

Redacted Column Detection

Automatically discovers masked or redacted columns in source schemas — ensuring compliant handling before archiving.

📊

Schema Version History

Every archive run snapshots the schema. Drift is detected, compared, and surfaced in the UI before any restore.

✔️

Archive Health Monitor

Continuous health scoring per archive job. Agents report heartbeats, job status, and error events in real time.

🚀

Flight SQL Server

Arrow Flight SQL interface for high-throughput bulk reads from archived Iceberg tables — ideal for analytics pipelines.

Your unstructured data has never been this searchable

Sovereign DataVault Unstructured (LVUS) ingests file-based sources — local filesystems, NFS, SFTP, object storage, cloud collaboration, email archives, web feeds, and application logs — extracts structured metadata and raw text, runs Named Entity Recognition, generates embeddings, and indexes everything in a local Qdrant vector store. The result: semantic search and RAG-powered AI queries over millions of documents without a single byte leaving your infrastructure.

Source Connectors — 20+ built in

📁Local Filesystem
🔗NFS Shares
🔒SFTP Servers
📦S3-Compatible (on-prem)
🌐AWS S3 + IAM / STS
☁️Azure Blob Storage
📁HDFS
☁️Google Drive
📊SharePoint Online
📧Email Archives (EML)
📝Confluence Spaces
🪵Application Logs
🌊JSON / XML Feeds
📉CSV / TSV Files
📨Messaging Archives
🌍Web Crawlers
📄Fixed-Width (FW) Files
🔄EDI Documents

Document Formats Supported

PDF (text + scanned via OCR)
Word (DOCX), Excel (XLSX/XLS)
PowerPoint (PPTX)
HTML, Markdown, RTF
Plain text and log files
Images (JPEG, PNG) with OCR
Video (audio transcript extraction)
Email (EML, MSG with attachments)
🧠

Named Entity Recognition

NER runs at ingest time — detecting PII (names, emails, phone numbers, national IDs, account numbers, addresses) in every document. Entities are indexed separately and drive DSAR search.

🔢

Vector Embeddings — Local

nomic-embed-text (768-dim) runs on the Stratum node via Ollama — embeddings are generated locally, stored in per-Stratum Qdrant. No cloud API call is ever needed for embedding.

🗂️

Iceberg Manifest Parquet

LVUS archivers write ZSTD-compressed JSONL extractions and register manifest Parquet files in the datagen Iceberg catalog — making document metadata queryable via Trino SQL.

🔐

Permission Inheritance

SharePoint MIP labels, S3 Object Lock / Macie classification, HDFS permissions, and group-based ACLs are captured at ingest and enforced at query time via the governance engine.

📊

Eager & Lazy Embedding

Choose Eager mode (embed at ingest — immediate semantic search) or Lazy mode (embed on demand — lower write overhead for large cold archives). Switch per Stratum.

🔄

ZSTD Compression + CAS

All raw content is stored as ZSTD-compressed archives. Content-addressed storage (CAS) deduplicates text payloads across documents — dramatically reducing storage footprint.

📁

Document State Lifecycle

Documents cycle through PRODUCTION → ARCHIVED_WARM → ARCHIVED_COLD → FILE_DELETED states, with configurable warm/cold tiering thresholds per source.

🔍

Document Field Versioning

Track field-level version history for reclassified documents — see exactly which attributes changed, when, and by whom. Full audit trail for compliance.

Natural language intelligence over your entire archive

The AI Query page brings together a professional Monaco SQL editor, a persistent multi-session AI chat assistant, and a real-time schema browser — all powered by your choice of LLM. For unstructured data, the same interface switches to RAG mode, retrieving relevant document chunks and synthesising answers with citations.

Structured Query (SQL + AI)

Monaco editor with syntax highlighting and schema autocomplete
AI generates schema-aware SQL from natural language
COUNT(*) pre-check before full query execution
Multi-session chat with persistent history per archive
Inline pruning panel — see exactly which files were skipped
Results visualised as bar, line, pie, or scatter charts
AI narrative generation — convert results to business prose
Saved queries, query history, thumbs up/down feedback
Export results to CSV / Excel
Lazy schema tree — expand catalogs, schemas, tables on click

Unstructured RAG Query

Semantic vector search across all ingested documents
RAG: top-K chunk retrieval + LLM synthesis with citations
Document preview pane with entity highlights
Similar document discovery (post-retrieve)
LvEL commands — advanced filter syntax for structured search
Saved commands — repeatable research workflows
In-session document access requests with justification

Supported AI Providers

Sovereign DataVault is provider-agnostic. Switch models per session or set an org-level default. Air-gap deployments use Ollama with locally-hosted GGUF models — no internet required.

🤖Ollama (local)
Claude (Anthropic)
OpenAI
☁️AWS Bedrock
🔷IBM WatsonX
🐉Oracle Vector DB
AI Session · FINCORE Archive
👤
What were the top 10 loan disbursements in Q1 2023?
🤖
Generated SQL:
SELECT loan_id, customer_name,
  amount, disbursement_date
FROM datagen.fincore.loan_master
WHERE disbursement_date BETWEEN
  '2023-01-01' AND '2023-03-31'
ORDER BY amount DESC LIMIT 10
✓ 10 rows · 89ms · 31 files pruned

From archived Parquet back to your production database — in hours

Point-in-Time Restore

Restore archived Iceberg Parquet files back to any target relational database. Sovereign DataVault reads Parquet, maps types back to the target schema, and writes row-by-row with FK order awareness. Schema drift is detected and surfaced before the restore begins — no surprises.

Restore to Oracle, PostgreSQL, MySQL, MSSQL targets
ADD COLUMN schema diff preview before restore
Configurable batch size and parallel writer count
Restore points — select any historical archive version
Unstructured restore: LVUS restore dialog + per-file progress

Test Data Management (TDM)

TDM Workflows provision masked copies of production archives to dev and staging environments. Every workflow is replayable, CI/CD-triggerable, and produces referentially-intact, compliance-safe test data.

Masking modes: FULL, PARTIAL, REGEX, FPE, RANDOM, DENY, NULL
FK chain traversal — provision in dependency order
Row count and size estimation before provisioning
Seed SQL files for deterministic initial data state
CI/CD API keys — trigger provisioning from pipelines
Job retry (max 3) with full run history per workflow
CDC bridge sources for near-real-time test data refresh

Semi-Structured Restore

The Semi-Structured plane handles CSV, JSON, XML, EDI, and fixed-width archives independently — with its own restore dialog, file browser, and job tracking. Restore specific files or full source snapshots.

TDM Workflow — RISK_DEV_REFRESH
⚙ FK Graph Build87 links resolved
📏 Size Estimate14.2 GB · 8.7M rows
🔐 Apply MasksFPE on PAN, PARTIAL on email
✍ Write to DEV_DBIn progress
🌱 Run seed.sqlQueued
Triggered via CI/CD · API key: tdm_•••••••••••••

🔁 CDC Bridge (Phase 3)

Register a Change Data Capture bridge source to ingest CDC event batches into TDM — keeping your test environments in near-real-time sync with production changes, without direct production access.

See everything — every service, every trace, every anomaly

Sovereign DataVault's observability suite gives platform operators full visibility into the health and behaviour of every component — from Stratum agent heartbeats to Trino query traces and archive job event streams.

🗺️

Service Map

Interactive topology diagram showing every service (agents, Trino, Qdrant, Ollama, catalogs) and their live connections. Click any node for health details, latency metrics, and recent events.

🔍

Log Search

Full-text search across all service logs with time-range filtering, severity faceting, and service-level filtering. Query across hundreds of gigabytes of structured log data in seconds.

🩺

Root Cause Analysis

Automatic correlation of anomalies across services. When an archive job fails, RCA traces the causal chain — from the agent event, through Trino query failures, to the originating storage or network error.

🕵️

Trace Explorer

Distributed trace viewer for every query and archive operation. Inspect spans across the full call stack — backend → agent → Trino → Iceberg catalog — with timing, status, and payload at each hop.

💥

Impact Analyzer

Before making infrastructure changes, model the downstream impact — which archive jobs, queries, and restore pipelines will be affected. Blast-radius analysis for Stratum maintenance windows.

📈

Observability Dashboard

Pre-bucketed log metrics (1-minute, 5-minute, 1-hour granularity), service health summaries, active and resolved outage tracking, and configurable alert thresholds — all in a single pane of glass.

Sovereign by design — no compromise

Three clearly separated planes — control, data, and AI — that can be deployed on-premises, in a private cloud, or in a hybrid configuration. No customer data ever transits the AI plane. No metadata ever leaves Meridian.

Control Plane — Meridian (OneView DSL)
React Frontend (MUI)
FastAPI Backend
Gunicorn + Uvicorn
PostgreSQL MetaDB
Redis Pub/Sub
WebSocket Gateway Pool
Celery / APScheduler
HashiCorp Vault
Nginx :443 TLS Termination
↕ WebSocket (10ms job dispatch) + REST API over TLS
Stratum — Structured (LVS)
LVS Agent (systemd)
Trino Query Engine
Iceberg REST Catalog
PostgreSQL 16 (catalog)
Block / NFS Storage
Stratum — Unstructured (LVUS)
LVUS Agent (systemd)
Qdrant Vector Store
Ollama (embed model)
Trino + Iceberg Catalog
Block / NFS / S3 Blobs
AI Server (Query-Time Only)
Ollama LLM (Qwen2.5 etc.)
Claude / OpenAI / Bedrock
RAG Orchestration
NO customer data stored
↕ Stratum-local data access only (vectors + chunks stay on Stratum)
Data Sources (remain untouched — Sovereign DataVault reads, never modifies source data)
Oracle
PostgreSQL
MySQL
SQL Server
IBM Db2
AWS S3
Azure Blob
SharePoint
HDFS
NFS
SFTP
Email
Logs
Confluence
Google Drive

🏗️ On-Premises

Docker Compose for single-node. Kubernetes + Helm for production clusters. All components deployable on RHEL 8/9, Ubuntu 22+, or compatible enterprise Linux.

🔌 Air-Gap

Full air-gap operation. Package Manager distributes binaries and LLM models offline. Ollama serves local inference. No external internet connection required at any point after initial setup.

☁️ Private Cloud / SaaS Overlay

Set SCHEDULER_EXTERNAL=true to offload scheduling to Redis. Add the saas Docker profile for cloud-native operation. Control plane can run in your private cloud while Stratums remain on-premises.

See the full platform in action

Schedule a guided walkthrough — we'll show you Sovereign DataVault running against your database types, with your data volumes, in your security model.

Book a Demo
Sovereign DataVault · Data Flow Pipeline
Deployment Architecture Simulation
△ ARCHIVE FLOW
◆ SOVEREIGN ON-PREM
◻ STRATUM SERVER 🔒 TLS 1.3 · AES-256 🔒 mTLS 1.3 · AES-256 🔒 TLS 1.3 · AES-128 🔒 TLS 1.3 · AES-128 JDBC / psycopg2 HTTPS / REST HTTPS / AI API HTTPS / SQL 🗄 Data Source Oracle · PG · MySQL 🤖 Stratum Agent Extract · Archive Meridian Control Plane 🧠 AI Server Query-Time Only 👤 Ask Ariv User Query Press Simulate to watch data flow through the pipeline
Architecture Layers
Control Plane — Meridian
Stratum — LVS Agent + Trino
AI Server — Ollama / Claude / OpenAI
Data Sources — Oracle, PG, MySQL…
Encryption Summary
Source ↔ Agent:TLS 1.3 · AES-256
Agent → Meridian:mTLS 1.3 · AES-256
Meridian → AI Server:TLS 1.3 · AES-128
At Rest (Stratum):AES-256-GCM (SSE)
Pipeline Phases
Connecting → Extracting → Archiving (Parquet)
→ Cataloging → AI Query → User Response
No customer data transits the AI plane.