CGNAT Logs: An Idea

2/1/2026

In the previous post, we established that CGNAT logs are not observability data. They are not metrics. They are legal artifacts generated at absurd scale.

Once you accept that, the design goal changes.

The goal is no longer “search everything instantly.”
The goal is “retain everything cheaply and answer specific questions reliably.”

What follows is not theoretical. This system already exists. It has been built, tested, and validated on simulated CGNAT data. It works because it respects the shape of the problem.

This is how to treat CGNAT logs properly.

Step 1: Stop thinking in indices, start thinking in slices

CGNAT data is time series data, but not in the way metrics are.

Each record is immutable.
Each record belongs to a precise time window.
Most records are never touched again.

That makes time slicing the natural primitive.

Logs are written to disk in fixed time slices, one hour being a common and practical choice. The exact slice size is adjustable based on workload, flow rate, and operational preferences.

Each slice is a flat file.
Append only.
No indexing.
No mutation.

Slices are stored in date-based directory structures so navigation is deterministic and cheap.

For example:

Nothing clever. Nothing fragile.

Step 2: Disk is the system of record

These slices live on disk. Preferably boring disk.

ZFS mirrors are ideal here because they give you exactly what CGNAT needs:

predictable write behavior
data integrity
cheap redundancy
simple replication

This data can be:

mirrored locally
sharded across hosts
fully replicated to secondary hosts

The system does not care. Files are files.

There is no cluster state. No quorum. No heap. No leader election.

Disk holds the truth.

Step 3: Searching is a compute problem, not a storage problem

This is the critical mental shift.

Instead of pre-indexing everything, you defer computation until the moment you actually need an answer.

When a query is required, a dedicated program is invoked.

That program does exactly three things:

Identifies which time slices are relevant
Assigns slices to worker processes
Scans them in parallel

You tell it how many cores to use.

Eight cores will work.
Forty cores will work faster.
There is no timeout because nothing is waiting on heap or coordination locks.

This is embarrassingly parallel work, and the system treats it that way.

If the data exists on multiple hosts, the program is topology-aware and enlists those hosts as well. Their CPUs become part of the query.

The logs stay where they are. The compute goes to the data.

Step 4: Deterministic performance instead of fragile performance

This approach has a property most search stacks do not.

Performance is linear and predictable.

More cores means faster completion
More hosts means faster completion
More data means proportionally more time

Nothing falls off a cliff. Nothing silently degrades.

There is no heap exhaustion.
There are no long GC pauses.
There is no cluster instability.

The worst case is simply “this takes longer.”

That is an acceptable failure mode.

Step 5: Output is a product, not an afterthought

Once the scan completes, the results are materialized.

The output format is selectable:

human readable reports
structured data
CSV
JSON
custom templates

Optionally, the result set can be written into OpenSearch.

This is the correct role for OpenSearch.

It becomes a secondary system used for:

aggregation
visualization
short term analysis
dashboards

OpenSearch never holds the full CGNAT corpus. It holds answers, not raw truth.

Step 6: Preprocessing beats aggregation every time

Monthly reports are where most systems break.

Trying to aggregate a month of CGNAT data directly in OpenSearch will fail, not because OpenSearch is bad, but because the workload is wrong.

This data requires preprocessing.

Your system already supports that.

You can:

compute daily rollups
compute hourly summaries
extract trend metrics

Those summaries are small, stable, and perfect for OpenSearch dashboards.

Raw data stays on disk.
Derived data goes to search.

That separation is the whole game.

Step 7: Why this works

This design succeeds because it aligns with reality.

CGNAT logs are:

write once
read rarely
immutable
time bounded

The system treats them exactly that way.

There is no fantasy about “instant search across everything forever.”
There is no pretending this is observability data.
There is no unnecessary indexing tax.

Just files, CPU, and math.

The real takeaway

This is not clever engineering. It is disciplined engineering.

CGNAT logging punishes systems that try to be elegant.
It rewards systems that are boring, parallel, and honest about scale.

Store everything cheaply.
Compute only when needed.
Index only the answers.

That is how you survive CGNAT at rural scale without lighting money on fire.

Part 3 -> If I Were To Build It Today