Complete visibility into your Aeron fleet

Lucern is a purpose-built monitoring tool for Aeron media drivers. Real-time driver health, stream performance, fault detection, and fleet topology — across every node, from a single browser tab.

Built for teams who run mission-critical Aeron installations and need fast, accurate fault diagnosis.

Lucern dashboard showing fleet sidebar, node detail, stream status, and Aeron counters

The Challenge

Aeron is a high-performance, low-latency messaging library trusted by trading firms, financial exchanges, and latency-critical systems worldwide. It is fast, reliable, but can be daunting to troubleshoot.

When something goes wrong — a stream falling behind, a sender being flow-controlled, packets being retransmitted — the evidence is buried in counters, error logs, and loss reports spread across multiple nodes. Standard infrastructure monitoring tools see CPU and memory; they do not speak Aeron. Diagnosing a backpressure event or identifying which subscriber is causing a flow-control problem requires deep expertise and manual counter inspection across every node in the fleet.

Lucern exists to close that gap.

One binary. Just add Aeron.

Lucern is a single Go binary with an embedded web UI. Drop it alongside your Aeron media driver. Point it at the Aeron directory. Start it.

./lucern -aeron-dir /dev/shm/aeron
  • No agents
  • No separate database
  • No collector infrastructure
  • No persistent state to manage

Gossip starts automatically, seeding from the CNC channel data your drivers already produce. Within seconds, other Lucern nodes in the same network appear in the fleet sidebar — no additional configuration required.

Built for all your Teams

Lucern is built to assist experts, and provide expert guidance to everyone else.

Aeron engineers

Everything you need, right in front of you

Lucern puts the full raw picture from every node in your fleet onto one screen — no SSH, no manual counter polling, no scripting across hosts.

  • All Aeron counters, loss reports, and error logs from every node, easily visible
  • Traffic-light indicators surface which nodes have active events — you know where to look before you click
  • Counter deltas calculated automatically on every refresh cycle — rate of change, not raw absolute values
  • Changed counters highlighted each tick, so drift is visible the moment it happens
  • Per-stream sender gap, consumer lag, NAK rates, backpressure events, and image-reset counts — the full picture per stream, per node
  • Regex filtering on counters, streams, and loss entries — find the signal in the noise instantly
  • Individually acknowledgeable loss and error entries, so new events always stand out against known noise
Operations & support

Answers without needing Aeron expertise

You don't need to know what a NAK rate is to know something is wrong. Lucern translates Aeron's internals into a clear health picture with enough context to act.

  • At-a-glance traffic-light health across the entire fleet — green is good, anything else tells you which node to open
  • Every indicator is labelled in plain language with an explanation of what it means and why it matters
  • Topology map shows how data flows between nodes — publishers, subscribers, and the channels connecting them — so you can reason about dependencies and data flows
  • Shareable URLs open directly to the same node and view you're looking at — send an expert exactly what you see
  • Integrated AI assistant handoff explains what the counters and events are telling you, in plain English — no Aeron background required to get started

Aeron-native fault detection

Six per-node traffic-light indicators give you an instant at-a-glance health view. Each has a configurable lookback window, so a brief event stays visible long enough to be noticed — not silently cleared by the next poll cycle.

Loss

Packet loss entries in the driver loss report. Any loss on a latency-critical stream warrants immediate attention.

Errors

Media driver error log entries indicating conditions the driver itself flagged as abnormal.

Backpressure

The sender is being flow-control limited. The publisher is writing faster than the downstream system can consume.

Flow Control

Flow control under-runs or over-runs are occurring between publisher and subscriber.

NAK

Retransmission requests are being sent or received, indicating the receiver has detected missing data.

Stream Lag

One or more streams are falling behind their publishers. Configurable warning and critical thresholds fire before recovery becomes impossible.

Two fleet nodes showing traffic-light indicators with active loss and flow-control events

Per-stream diagnostics

Health broken down to individual stream level. For every active send and receive stream:

  • Sender gap between publisher position and wire position
  • Receiver lag percentage against the configured term buffer
  • Live throughput in bytes per second
  • Per-stream backpressure events, sender and receiver NAKs, and image-reset activity
  • Traffic-light badges that sort active problems to the top
Stream status table showing lag percentage, position gaps, and bytes per second

Fleet-wide topology

Lucern nodes discover each other automatically via a gossip protocol seeded from CNC channel data. No service registry required. No central coordinator.

  • Interactive topology map — selected node at the centre, peers in orbit
  • Send-channel connections drawn as labelled lines
  • Traffic-light status visible for every node in the map
  • Click any node to switch instantly to its perspective
  • Driver liveness and status at a glance
Fleet topology map showing nodes connected by stream channels
Fleet sidebar showing all nine nodes with their traffic-light status

More capabilities

Aeron Cluster support

Surfaces cluster state on each node: election state, node role (Leader, Follower, or Candidate), and consensus module state. Traffic-light warnings fire on elections, cluster errors, or work-cycle threshold breaches.

Cluster status showing election state CLOSED, node role FOLLOWER, consensus module ACTIVE

Acknowledgement system

Loss and error entries are individually acknowledgeable by stable ID. Once acknowledged, excluded from traffic-light counts until a genuinely new event arrives. Acknowledgements reset on restart so no historical backlog silently suppresses a current problem.

Shareable URLs

The selected fleet node and active tab are reflected in the browser URL at all times. Copy the link, share it with a colleague, and they open directly to the same node and view.

AI-assisted debug

An integrated AI assistant can be invoked on any node to help diagnose what the counters and events are telling you — useful when the symptom is clear but the cause is not.

Loss report table showing per-stream loss entries with acknowledgement controls AI diagnostic analysis output for a Lucern node showing stream topology and fault indicators

Full REST API

Every piece of data Lucern shows in the UI is also available over a REST API, documented by an OpenAPI 3.0 schema served live from the instance. Integrate with your own dashboards, alerting pipelines, or runbook automation tools using the same data the UI consumes.

Fleet Streams view showing all streams across the fleet with publisher and subscriber details

Design principles

Aeron-native

Reads the CnC file through the Aeron C library — the same path the driver itself uses. Loss reports and error logs come from the actual files on disk. Nothing is inferred or approximated.

Read-Only Aeron access

Lucern does not and cannot interfere with Aeron, it's engineered specifically to only require Read-Only access to Aeron media driver data.

Fleet-first

Gossip is core to how Lucern works. Every node contributes its local view to a consistent fleet-wide picture, visible from any single node.

Fast, Simple and Lightweight by design

No external data storage required. Counter history is an in-memory ring buffer — no disk writes, no database. Pair with your normal monitoring & alerting system for persistence

Resilient to driver changes

If a media driver restarts or is upgraded, Lucern detects the change on the next tick and updates accordingly. There is no stale state to clear manually.

Configuration is transparent

Every option Lucern supports — lookback windows, lag thresholds, refresh interval, gossip parameters — is visible at /api/v1/config on any running instance.

Tested with real faults

Tested by creating real faults on demand and validating Lucern's response across the fleet.

Interested in Lucern?

Contact JMIPS to discuss licensing, deployment, and how Lucern fits your Aeron environment.

Contact JMIPS