Oleksii Siniaiev
Post page navigation
Articles 19 min read June 13, 2026

A month with an AI journal: finding patterns in sleep, stress, and workouts

How to analyze an AI journal after the first month: transcription fixes, honest reflection with cited sources, Obsidian dashboards, cost, and privacy.

Monthly AI journal reflection
On this page

TL;DR

  • The value of an AI journal doesn’t show up on install day, it shows up a month later, when the entries start adding up into patterns: sleep, stress, workouts, mood.
  • The agent doesn’t “remember the whole vault” automatically: an honest monthly reflection requires explicitly assembling context from your files, and I show how.
  • The links you find are hypotheses to watch and questions for a specialist, not diagnoses.

On the first evening after setup, I opened the journal folder several times to make sure the file had really appeared. The next day I checked the commit. A week later I stopped looking: the voice message went through, the entry was saved, so everything was working. That was a good sign. The tool had finally disappeared from my attention and left only the habit.

In the first article I described how an old gaming laptop became a home AI server, and in the second we built the system from scratch. This article is about what happens next, once the “it works!” thrill wears off but the 9 PM evening reminder keeps arriving: daily operation, and AI journal analysis once you’ve accumulated enough entries. This is where the journal’s real life begins, and it’s exactly what decides whether it stays with you or dies like every tracker before it.

What changes after the first month

The first week, you play with the system: you send voice notes, watch the vault grow, show the bot screenshots. By the third week the novelty has evaporated, and what’s left is the bare mechanics of a habit: a buzz at 9 PM, forty seconds of dictation, a confirmation, done. At that point the system’s most important quality isn’t the model’s intelligence, it’s the absence of friction. If I had to open an app and fill out a form, I’d have quit already.

And a month in, the quiet shift the whole thing was built for happens: there are enough entries that they start answering questions. This is the moment life logging turns from accumulation into analysis. Not “how was today,” you know that already, but “why am I wrung out by Thursday two weeks in a row.” A single day has no answer to that. Thirty days do.

Let me show it with a composite example from my own practice, with the details changed so I’m not publishing real entries. Three nights in a row of broken sleep: Wednesday, Thursday, Friday. On their own each night looked like a fluke: it was hot, I stayed up for one more episode, I just couldn’t sleep. Then on Saturday a workout was unexpectedly harder than usual. In the moment it looked like “a bad day at the gym.” In the journal, the accumulated sleep debt became a working hypothesis I could test against the following weeks, not a ready-made cause.

One other entry from that same week stuck with me. I dictated: “I don’t feel stressed, but everything is annoying me.” The system didn’t log high stress, because the scale anchors (we defined them in the second article) tell these states apart, and the reflection surfaced a phrasing I’ve remembered ever since: sleep debt eats your emotional buffer. Stress low, irritability high, and those are different metrics with different causes. No health app with a “rate your stress from 1 to 10” form would ever have shown me that difference.

A daily workflow without perfect data

A realistic picture of my day with the journal:

  • in the morning after the gym: a workout screenshot from Hevy (or any other tracker, if your stack is different);
  • during the day: nothing, the system stays quiet;
  • in the evening at 9 PM: a reminder, a 40 to 60 second voice note, sometimes a calorie summary from YAZIO or a similar app;
  • after the note: one short follow-up from the bot, if I forgot something required.

That last point matters more than it looks. If I didn’t mention my weight, the bot asks exactly one thing: “You didn’t mention your weight. Did you weigh yourself today?” Not a questionnaire, not a list of five missed fields, one question about the single most important one. If I don’t answer, the field stays empty with a needs_review flag, and that’s correct: an empty value is more honest than an invented one. When I build a weight chart a month later, I want to see the holes in the data, not zeros the model politely filled in.

And don’t try to dictate perfectly. My evening note is a stream of consciousness: about sleep, about work, about the delivery driver who didn’t bring the order so I had to walk half an hour on foot. That note, by the way, became my favorite example of why a journal needs free text: instead of a complaint, what stayed in the entry was the reframe “but hey, I closed my step goal and called Grandma on the way.” Metrics would never have caught that. A journal that stores only numbers loses half of life.

How to fix transcription errors

The local faster-whisper sometimes errs systematically: the same names, exercises, and terms get misrecognized the same way every time. The tested configuration uses the medium model on CUDA; the practical way to fix recurring errors is a corrections dictionary at notes/known-asr-errors.md:

Markdown
# Known transcription errors

| Heard | Canonical term | Context | Status |
|---|---|---|---|
| "romanian deadlift" / "rumanian deadlift" | Romanian deadlift | workout | verified |
| "yazi" / "yazio" | YAZIO | nutrition | verified |
| "hevvy" / "heavy" | Hevy | workout | verified |

(The examples are synthetic; your dictionary will fill up with your own errors in the first two weeks.)

The rules that separate a useful dictionary from a dangerous one:

  • only corrections you’ve confirmed yourself make it into the dictionary;
  • a replacement applies by context, not globally: the same “heard” word can mean different things in a conversation about a workout and about people;
  • when confidence is low, the skill asks a clarifying question instead of silently correcting;
  • the raw transcript stays untouched: the correction lives in the normalized summary, and the Git history shows what changed and when.

For my own speech, a dictionary of recurring errors gave me more practical value than simply swapping the model for a bigger one. That’s not a universal benchmark: compare the options on your own set of short recordings, and count not just accuracy but latency too.

Workout screenshots: trust, but verify

The separate vision layer (in my case the OpenAI API, which I call on when I feel like it) extracts exercises, sets, weights, and duration from a Hevy screenshot. This runs through that auxiliary layer, not through any built-in DeepSeek feature: DeepSeek handles the text. It sounds like magic and works like an intern: mostly right, but you have to check. Numbers on a screen are OCR’s weakest spot: mixing up 60 and 80, dropping a decimal point, merging two sets into one.

So the rules are strict:

  • the original screenshot is always saved to attachments/ alongside the entry;
  • extracted data gets source: ocr and needs_review: true;
  • I confirm it with one word in Telegram, and only then does the status change to verified;
  • if the service offers an official structured export, it’s usually worth preferring it over a screenshot, and separately checking that the fields are complete.

Photos of documents: maximum care, minimum interpretation

The most sensitive category: photos of medical documents and lab results. Here the system deliberately does little: it recognizes the values, saves the original, marks everything needs_review, and stops. No interpretations like “this marker is elevated”: without reference ranges, units, and context that’s guesswork, and with them it’s a doctor’s job.

So why keep lab results in the journal at all?

So that at an appointment with a specialist you can open up the timeline: here are six months of values, here’s my sleep and my workouts over the same period, here are my questions. The journal prepares material for the conversation with the doctor, it doesn’t replace it. Before you take the photo, crop out personal identifiers and remember about EXIF metadata.

The weekly summary: a compression layer

You can mechanically hand thirty daily entries to a large-context model, but it’s expensive, noisy, and makes source-checking harder. So there’s an intermediate layer between daily and monthly: a weekly summary that the reflection skill builds from seven daily files.

A good weekly summary has to:

  • compute averages over filled-in values only, and show the denominator: sleep: data for 5/7 days;
  • not substitute zeros for missing values;
  • list recurring patterns instead of retelling every day;
  • cite specific daily entries as sources;
  • suggest questions to watch, not conclusions.

The denominator is the most underrated field. “Average sleep 7.2 hours” over two filled-in days out of seven isn’t a statistic, it’s a fluke with a confident face.

The weekly summary was the first moment when the journal spoke in something other than the voice of a single day. A bad Tuesday stopped being the center of the world and became one point among seven. Sometimes that confirmed the feeling of a difficult week; sometimes it showed the opposite: three days were fine, Friday was simply louder in my memory than the rest.

Monthly reflection: AI journal analysis without faked RAG

Now for the main disappointment you need to live through in advance: the agent does not remember your vault. Hermes persistent memory and session search are not an index over your Markdown files. If you just ask “analyze my month,” the model answers based on whatever happens to be left in its conversation memory, and it’ll look convincing and be garbage.

An honest monthly reflection is explicit context assembly. Three working modes, in ascending order:

Simple: ask the reflection skill to read the files for a date range. It works, but a month of entries will burn through a lot of file-tool calls and tokens.

Recommended: a helper script build-reflection-context.sh runs deterministic context prep before any AI: it gathers daily/2026/06/*.md, parses the YAML, computes metric coverage, folds compact summaries into a single temporary file reflection-context-2026-06.md, and the model analyzes only that single artifact.

Scalable: the monthly reflection reads 4 to 5 weekly summaries plus aggregated metrics, and opens individual daily files only to check specific hypotheses.

An iron rule for all three modes: a source reference is only allowed for a file that actually made it into the context. If the reflection says “see the June 12 entry,” that entry was read, not “recalled.” And every reflection starts with an honest header: the date range, the list of loaded files, the coverage.

The agent shouldn’t pretend it read everything

The request pins down the date range, the file list, and the limits of the analysis. If part of the period didn’t make it into the context, that’s stated explicitly in the answer: “this analysis is built on 26 of 30 days.”

A prompt for analyzing sleep, stress, and workouts

My working monthly-reflection prompt, adapt it to yourself:

Prompt
Analyze the prepared context for {month}.

Response format:
1. Data coverage: how many days are filled in for each metric (X/30).
2. Trends in sleep, energy, mood, stress: over filled-in days only.
3. Recurring links (for example: late bedtime -> energy the next
   day), each with references to specific daily entries.
4. For each link: alternative explanations. Correlation is not causation.
5. What I promised myself in the last reflection and what of that shows in the data.
6. One small experiment for next month with a measurable result.
7. Questions worth asking a specialist if the pattern repeats.

Forbidden: diagnoses, advice on medications and supplements, conclusions about causes
without caveats, references to entries not in the context, averages without
a stated denominator.

Point 6 is my favorite. It’s exactly where the melatonin story from the first article came from: the reflection showed a consistent link “melatonin in the evening -> a rough morning,” the experiment was trivial (two weeks without it), and the mornings evened out. Not a diagnosis, not a medical conclusion: a personal experiment with a measurable result, which I then discussed in concrete terms.

And sometimes the reflection catches things that aren’t metrics at all. In one of my monthly summaries the most accurate observation was: “you noticed it yourself: when you go to bed at 10 to 11, you sleep well; yesterday’s energy after a late bedtime is luck, not a system.” The model didn’t tell me anything new, it quoted me back to myself from an entry three weeks old that I’d managed to forget. Half the value of reflection is a conversation with your own words.

Dashboards in Obsidian: Dataview over frontmatter

Since the metrics live in YAML frontmatter, you can build dashboards on top of them without touching the entries themselves. For that, Obsidian has the Dataview plugin: it treats each note’s frontmatter as a database row and lets you write queries right inside Markdown. A dashboard note with a query like this turns into a live table for the month: sleep, energy, mood, stress, and workouts by day, and it updates itself with every new entry. For my format a plain frontmatter date field is enough; if the date is baked into the filename, you can use file.day:

Prompt
TABLE sleep_hours, energy, mood, stress, training
FROM "daily/2026/06"
SORT date ASC

For a sleep and stress journal this is the main working tool of life logging analysis: a single glance at the table is enough to spot a cluster of bad nights or a week where stress crept up. The Charts plugin adds graphs on top of that, and the Obsidian Git plugin handles commits in the background if you edit notes by hand on the desktop. A simple chart looks like this:

Code
type: line
title: Sleep & Energy (Week)
labels: [Mon, Tue, Wed, Thu, Fri, Sat, Sun]
series:
  - title: Sleep (hours)
    data: [7.5, 6, 8, 7, 6.5, 9, 7]
  - title: Energy
    data: [7, 5, 8, 6, 4, 9, 8]
  - title: Stress
    data: [3, 6, 2, 4, 7, 1, 2]

If you want a dynamic version over the daily notes, you can build it with DataviewJS and window.renderChart(); for an article a single simple block is enough to show the principle. Two tips from practice:

  • visualize the gaps, don’t hide them: a hole in the sleep chart is information too;
  • don’t build an “overall AI health score.” A single synthetic index from different scales creates false precision and kills the value of the individual metrics. A sleep chart next to an energy chart says more than their “average.”

And a fallback: the vault has to stay readable with no plugins at all. Dashboards are the icing, the data is the cake.

What it actually costs

Instead of estimates — real data from OpenRouter over two months of use.

May 2026: I was still experimenting with models, so some requests went through deepseek-v4-pro. Total for the month: $1.91, 880 requests to pro and 159 to flash, over 30 million input tokens combined.

OpenRouter usage for May 2026: total expenses $1.91, deepseek-v4-pro 880 requests and 30.5M tokens, deepseek-v4-flash 159 requests and 8.6M tokens
May 2026: $1.91 total. A mix of pro and flash — I was testing normalization quality.

June 2026 (first 13 days): fully switched to deepseek-v4-flash. 508 requests, 26 million tokens, spent $0.44. At that pace, a full month comes out to around $1.

OpenRouter usage for June 2026 (first 13 days): expenses $0.44, deepseek-v4-flash 508 requests and 26.3M tokens
June 2026, 13 days on flash: $0.44. A full month will come out to around $1.

Cost structure by layer:

LayerMonthly volumeReal cost
GitHub private repo$0 (GitHub Free)
STT (local faster-whisper)~30 voice notes$0 for API, electricity only
DeepSeek via OpenRouter~500 requests, 26M+ tokens~$1/month on flash
OpenAI vision (rarely)Screenshots and lab resultsSeparate variable line item; keep a cap
Electricity, laptop 24/7Around-the-clock operationDepends on your rate and hardware draw

May was more expensive because of the pro-model experiments. Flash handles routine normalization more cheaply and with no noticeable quality loss for journal entries. A stronger model makes sense only for rare monthly reflections — and only on the same assembled, verifiable context.

Privacy and data access: who sees what

Let me repeat the trust-boundary table from the first article, now with retention decisions:

LayerWhat it seesMy decision
TelegramEvery message to the botA conscious convenience tradeoff
DeepSeekText and derived textDon’t send documents or identifiers
OpenAI visionThe images you sendA budget cap, no documents with names in frame
GitHubThe whole vault on pushPrivate repo + 2FA; remember: this is access control, not encryption
Local diskEverything, including audioDisk encryption, a shelf out of the cat’s reach

There’s a separate trust boundary: the phone. In the second article we set up a one-way mirror of the vault to iCloud for reading on an iPhone, and it has a price: the journal’s contents also end up in Apple’s cloud. If that’s one participant too many for you, read the journal only on the desktop, or use a single sync mechanism whose trust model you understand. The Obsidian Git plugin’s mobile version technically exists, but it’s experimental and SSH-less; I wouldn’t rely on it.

Media storage decisions: I keep voice notes on a short-retention policy (30 days, then only the transcript remains), screenshot originals live permanently, and photos of documents are a separate conversation with separate caution. Whatever mode you pick, pick it before you start operating and write it down in _system/retention-policy.md: purging binaries from Git history after the fact is an evening’s work, and it rewrites history.

And a procedure for the worst case: if a secret or a file with personal data made it into the repo, simply deleting it doesn’t help (history remembers everything). The sequence: rotate the secret -> rewrite history -> force push -> check the clones. I worked through an incident like this on a work project in the article about accidentally pushed secrets, but ideally let it stay theoretical for you: the pre-commit secret scan from the second article exists for exactly this reason.

When it’s worth going fully local

Can you move the text layer off DeepSeek and onto your own laptop too? Technically yes: Hermes supports OpenAI-compatible providers; the Ollama path has to be configured through custom_providers and checked against your version. Then not even text leaves the machine. The honest requirements:

  • there’s no universal minimum context: the requirements depend on the skills you choose, the volume of files, and the reflection scenario;
  • hardware should be chosen via a benchmark matrix; my 2019 laptop handles local Whisper, but for a decent local LLM it’s already struggling;
  • in my tests, small local models more often got the structure wrong, but that needs to be rerun on a single synthetic fixture with the results published, not generalized from an impression.

And the main thing: a local LLM doesn’t make the system “fully private.” Telegram still sees the messages, GitHub sees the vault. Going local makes sense deliberately, for a specific layer, not for the sake of a nice “self-hosted” word in a headline.

What breaks most often

This table reflects the operational problems that showed up over the first months of daily use, and it covers every layer of the system: the Telegram gateway, speech recognition, both model providers, Git, and the vault itself.

SymptomCauseWhat to do
The bot is silentThe gateway daemon crashed or didn’t start after a reboothermes gateway status, logs, systemd
Reminder didn’t arrive / arrived lateTimezone or a stopped gatewayA test job at +2 minutes, timezone in the config
Voice notes take minutes to processWhisper on CPU or an overloaded modelShrink the model, check the GPU, measure latency
“I can’t see the image”The vision provider isn’t configured or the model is out of dateVerify the model ID, key, limits
Text provider errorOut of balance or the model ID changedDeepSeek balance, current ID
Push failsSSH keys, token, networkssh -T [email protected]; entries pile up locally, nothing is lost
Duplicate entries for one dayBroken upsert logic in the skillCanonical path, the fixture from step 12 of the second article
The repo bloatedAudio and photos in GitRetention policy, cleanup by rules, not “by eye”

Over the first months this table covered most of my operational failures and helped me quickly figure out which layer to start the diagnosis from. That’s the main upside of a narrow system: the search space stays manageable.

What remained after the first months

If I compress the first months into a single takeaway: the journal works not because the model is smart, but because recording the day became cheaper than not recording it. Everything else follows from that:

  • Consistency beats completeness. Thirty imperfect entries are more useful than seven perfect ones. Everything in the system should protect the habit: one question instead of a questionnaire, voice instead of a form, an empty field instead of an interrogation.
  • Boring rules matter more than a smart model. Canonical file names, the frontmatter schema, the corrections dictionary, and the denominators in summaries gave me more than any model upgrade.
  • Raw input is sacred. Every interpretation can be recomputed; a lost original can’t be brought back.
  • Reflection is a conversation with yourself, not an oracle. The best finds in the monthly summaries are my own forgotten words and links I couldn’t see right in front of me.
  • The journal didn’t make me healthier on its own. It made the consequences of decisions visible: the late episode, the melatonin, three bad nights before the gym. The decisions are still mine, and some of them I still get wrong, but now at least I make them knowingly.

The old laptop still sits on the shelf, the cat is still unhappy, and for the first time in my life a journal has survived its fourth month. For someone who quit every tracker by week three, that’s the best metric there is.

Over these months, the agent did not turn into a doctor, therapist, or all-knowing Jarvis. It remained something much more grounded: an attentive interface to my own records. That limitation is exactly what makes it useful. It does not live the day for me, make decisions, or promise to understand me better than I understand myself. It helps preserve the facts, return to my own words at the right time, and notice repetition where memory would leave only a vague feeling.

The series ends where the daily practice begins: at 9 PM my phone buzzes, I hold the record button, and for forty seconds I describe how the day went. The difference is that behind this simple action is no longer another closed service, but a history that belongs to me.

FAQ

Will this replace a doctor or a therapist?

No. The system collects observations and helps you prepare questions. Any health decisions go to a specialist, any worrying symptoms go straight to a doctor, not into the journal.

Won’t the model start “diagnosing” on its own?

It will, if you let it: models love to draw confident conclusions. That’s why the limits are baked into the skill and into the reflection prompt, not left to good faith. The “Forbidden” section of the prompt is its most important part.

What if I skip days?

Nothing. A skipped day is data, not a failure. The summaries show coverage (“sleep: 24/30 days”), and the system never invents values for missed days. Coming back after a break is nothing to be ashamed of: the bot doesn’t reproach you, it just keeps going.

Can I keep the journal in two languages?

Yes, Whisper handles mixed speech, and normalization brings everything to one schema. I dictate in Russian and plan to switch to English: daily dictation is great speaking practice, and the structure of the entries doesn’t depend on the language.

Why not hand it all to one strong assistant with memory?

Because the assistant’s memory isn’t your data. It’s the internal state of someone else’s service: you can’t export it in full, verify what’s in it, or move it to another provider. The Markdown files in my Git will outlive any provider, any subscription, and any policy change: even if both Hermes and DeepSeek vanished tomorrow, I’d still have a complete archive of entries that reads in anything. Plus I decide retention: what to keep, for how long, and when to delete it. With a service’s memory, the service makes all those decisions.

Where do I start if three articles is too much?

With a single habit: set a reminder for 9 PM and, for a week, answer it with a voice note even just in your “Saved Messages.” If the habit sticks, come back to the second article and build the full system: it’ll have something to process.

Share this article

LinkedIn X Email

Get in touch

If this article is relevant to your work, feel free to reach out.

I am always open to discussing architecture, Laravel, WordPress, performance, and practical implementation problems.

Send a message See selected work

Explore more

Articles

June 13, 2026

How I turned an old gaming laptop into an AI wellbeing journal

How an old Xiaomi Mi Gaming Laptop became a home AI server: Hermes Agent,…
Articles

June 13, 2026

Hermes Agent + DeepSeek on Ubuntu: a complete Telegram AI journal setup

A step-by-step setup for Hermes Agent and DeepSeek on Ubuntu: a locked-down Telegram bot,…
Articles

May 31, 2026

WordPress Bedrock Docker: local dev and bare-PHP production guide

How to run WordPress Bedrock with Docker Compose locally and deploy to Hostinger without…