Mistral 7B Moves In: A Local LLM Joins the Stack

There is a new layer running quietly behind this site.
It is not visible in the navigation. It does not have a dashboard. It does not ask for API keys or track token usage. It lives in a small Python script, talks to a local model server, and writes into the same MariaDB tables that power this site.
It runs on my own hardware.
Everything at https://dangerousmetrics.com/llm-editorial (this will open a new tab), which runs inside the same application stack as this site, is generated by the local LLM. The layout is hand-built. For now.
The model behind it is Mistral 7B. Seven billion parameters is small by current industry spectacle standards, but it is large enough to reason, summarize, analyze, and adopt tone when constrained properly. Running through Ollama on a local machine, it responds in milliseconds. There are no usage meters. No per token accounting. No external dependency beyond the model weights sitting on disk.
That changes how you experiment.
When you are not watching a cloud bill climb with every prompt, you stop thinking in terms of cost per request and start thinking in terms of system design. You can call the model from cron every minute. You can test five personas in a row. You can refine a prompt, rerun it, and rerun it again without negotiating with a pricing page.
The result is not novelty content. It is infrastructure.
Right now there are multiple small Python workers. One writes deadpan institutional satire based on headlines. One reviews internal changelog entries and produces structured reflections. Another explains raw log lines in a restrained but slightly intense security voice. Each script shares the same core mechanics: build a disciplined prompt, call the local model, parse output, store it in the database, render it through the existing Next.js front end.
The front end does not know or care whether the text came from me or from a local model. It reads from the same tables. The layout, the index pages, the slug routes, the editorial grid, they all treat the output as first class content.
That is the interesting part.
This is not bolted on AI. It is wired into the content layer of the site.
Because it runs locally, it is predictable. I know the hardware. I know the memory limits. I know the latency envelope. I can tune temperature and top_p without worrying about rate limits. I can throttle it. I can schedule it. I can let it observe my own changelog entries and generate reflective commentary within minutes of a deployment.
Over time, this will become more disciplined.
Prompts will move into structured instruction files. Personas will be versioned. Guardrails will be tightened. Some scripts will merge into a unified framework. Others will remain intentionally separate to preserve tone boundaries. Eventually, certain workers will pull directly from OpenSearch instead of from CLI input, translating operational telemetry into readable summaries.
None of this requires a cloud subscription.
It requires a machine, a model, and a willingness to treat language as part of your stack.
The long term goal is not volume. It is coherence. If this works properly, the site will slowly develop a self documenting layer. Changelog entries will be reviewed. Log lines will be interpreted. Structural changes will be reflected on. Not in a breathless way, but in a disciplined, system aware way.
As the framework stabilizes, this entire layer will become part of the Anvil Tech Doc Series. The documentation will not just describe how to wire a model into a site. It will describe how to control tone, prevent hallucination, design personas, structure database writes, and run a local LLM as a first class service without drifting into noise.
There is a difference between adding AI to a site and integrating a language model into your infrastructure.
This is the latter.
Right now it is a collection of small scripts and careful prompts. Over time, it will be a permanent part of the architecture.