TL;DR
- A 2019 gaming laptop became a home server for a personal AI agent. It can handle ordinary requests, but its main role for me is tracking physical and emotional wellbeing over time.
- Through Telegram, the agent accepts journal entries, lab results, documents, and workouts, sends reminders about important events, and turns scattered messages into a connected history in Markdown.
- Hermes Agent orchestrates the process, DeepSeek handles the routine text, speech is transcribed locally, and images go through a separate vision layer.
- This is a tool for observation and self-reflection, not a doctor or therapist: it helps surface patterns and formulate questions, but does not replace professional care.
9 pm on a Tuesday: how my AI wellbeing journal works
A warm evening on the Costa Blanca. My phone buzzes once. It’s not a message from work and it’s not the news. It’s my own bot, and it asks the same thing it asked yesterday and the day before: how was your day, how did you sleep, energy, mood, stress, did you train?
I don’t open any app with forms and sliders. I hold down the record button in Telegram and talk for forty seconds, just as it is: slept about seven hours, did strength training in the morning, energy dipped toward evening, day was fine overall. I let go of the button and get back to what I was doing.
Then comes the part this whole thing was built for. In another room, a laptop I used to game on hums quietly with its fans. It receives the voice message, transcribes the speech locally, turns what I said into a structured Markdown entry, saves the original next to the result, makes a commit, and pushes it to a private repository. A few seconds later the bot sends a short confirmation and one follow-up question if I forgot to mention something.
My entire “journal entry” took less than a minute. But the journal is only one role of a more general personal agent. I can ask it to find tickets, analyze a document, write an email, help with code, or remind me about an appointment. The most valuable use I found for it was different: helping me follow my physical and emotional state over the long term.
I call this role a personal AI wellbeing concierge. It accepts text, voice, and photos; stores lab results and medical documents; logs workouts and progress; and collects my observations about sleep, energy, stress, and mood. Then it returns all of that not as a pile of messages, but as a connected history in my own Markdown files. In this first article I’ll explain how I got here, why the system is built this way, and where its honest limits lie. The second will be a full reproducible installation guide; the third covers operation, reflection, cost, and privacy.
This is not a medical product
The system is meant for personal journaling and for preparing questions to ask a specialist. It is not a medical device, it does not diagnose, it does not prescribe treatment, and it does not replace a doctor or emergency care.
The laptop that waited five years for a new job
In 2019 I bought a Xiaomi Mi Gaming Laptop and was genuinely happy with it. Good hardware, a discrete graphics card, evenings spent gaming. The purchase paid for itself completely, as long as I had the time to make it pay off.
From 2021 on I basically stopped gaming. Not on principle, I just ran out of time for it. In 2022 the full-scale war in Ukraine began, and that same year my daughter was born. My priorities reshuffled completely, and the laptop closed for the last time for several years. It sat on a shelf, expensive and useless, and every time my eyes landed on it I felt a small pang: there was still good hardware inside, including a GPU that nobody was giving any work to.
In early 2026, the wave of interest in personal AI agents suggested an idea that was obvious in hindsight: an always-on agent needs an always-on machine, and I already had one. An old gaming laptop fits the home-server role better than you’d think: it’s quieter than a server rack, it has a “free built-in UPS” in the form of its battery, and its GPU can run speech recognition models locally.
Today it sits in my home office, on an open wall shelf, a little above head height. The height isn’t a design choice: there’s a cat in the apartment who considers any warm piece of tech his personal property, and the shelf is the one spot he hasn’t managed to jump to yet. That’s how a former gaming flagship became the strangest server I’ve ever seen: it stands between books, plugged into a power outlet and the router, quietly running git push while the cat glares up at it.
This episode is where the system’s first engineering principle came from: don’t buy new hardware for an experiment, give a second life to what you already have. If the experiment fails, all I lose is time.
My still very young Jarvis
When people talk about a personal AI agent, it is easy to picture a universal assistant: find tickets, write an email, analyze a document, help with code, set a reminder. My agent can handle those requests too. But I quickly realized that its most valuable role for me was outside work. I already have enough AI agents in my development workflow. I wanted help with the part of life that usually falls between the calendar, notes, medical PDFs, a workout app, and my own memory.
Something between a journal, secretary, archivist, attentive conversation partner, and a coach who does not shout about discipline. I call this role a personal AI wellbeing concierge. It stores lab results and helps compare them over time. It logs workouts, weights, and progress. It collects observations about sleep, energy, nutrition, stress, and mood. It reminds me about doctors and other important events. In the evening it listens to the story of my day and gives it back to me in a clearer form.
An ordinary reminder would say, “Surgeon appointment, Tuesday at 9:45.” The agent can add context: “You are in Spain now, so think about arranging an interpreter for the visit.” It is still a simple cron job, not magic. But it uses stored context and speaks in the tone I chose, so it feels less like another system notification and more like a message from an assistant that understands why the event matters.
Perhaps personal agents will eventually resemble Jarvis, or a digital familiar that accompanies someone for years. Mine is much younger: it lives on an old laptop, writes Markdown files, and sometimes reminds me to arrange an interpreter for a doctor. But it already does the important part: it preserves the context of my life longer than my memory holds it.
From the outside, all of this looks suspiciously like an ordinary journal. I describe what happened and the system puts events into files. The value appears over time. Memory smooths out details, mood rewrites the past, and a single bad day looks random. Consistent records reveal recurring links between sleep, workload, training, events, and emotional state.
The agent does not know me better than a doctor or therapist, and it should not pretend otherwise. Its job is more modest: help me observe myself more carefully, formulate thoughts and questions, and retain important details between consultations. Its feedback is material for self-reflection, not a diagnosis or therapy.
I have started journals and trackers many times. They all died the same way: by the third week, filling out a form becomes a chore, gaps accumulate, and the habit falls apart. A Telegram voice message survives laziness. Forty seconds of speech instead of five form fields is the lowest-effort capture method I know. It is life logging in its honest form: recording life as it happens, not as a form finds convenient.
The only question was who would turn that stream of consciousness and scattered attachments into structured data. That produced the core requirement: the agent should accept text, voice, and images, preserve context in explicit records, run scheduled requests, and produce one format that can outlive any service. In other words, Markdown files under version control.
A month with OpenClaw: impressive, but not for me
My first attempt was OpenClaw, which positions itself as a personal AI assistant on your own devices. It sounded like a perfect match for my task, and the capabilities really were impressive: lots of channels, companion apps, multi-agent routing, a broad automation ecosystem.
I set it up and ran it for about a month, then gradually realized something inconvenient: I was using perhaps ten percent of the platform while maintaining all one hundred. That scale may be justified for a broad multi-channel system, but my scenario was one user, one Telegram channel, and one daily workflow. This is not a verdict on OpenClaw’s general reliability, only the result of my experience with one configuration.
Hence the second principle: for a narrow task, operational simplicity beats breadth of features. Every component I don’t use still demands my attention when it breaks.
Why subscriptions didn’t become infrastructure
In parallel, I ran into the question of what all this should think on. The first thought was obvious: I already pay for subscriptions to strong models, so why not let the agent use them. I experimented with access through subscription products and adjacent tooling, but quickly realized you can’t build personal infrastructure on that: providers kept changing their access policies for third-party agents, and subscription-backed access turned out to be an unstable foundation that depends on the current rules rather than on my decisions.
Fine, then honest API calls. And here a second inconvenient truth was waiting for me: an always-on agent with a heartbeat, cron jobs, and tool calls consumes tokens nothing like a chat you drop into a couple of times a day. With direct billing for premium models, the routine background work costs significantly more than your everyday-chat experience would suggest.
The third principle: routine load needs a cheap model, and the strong one should be kept for rare, complex tasks. Normalizing an evening voice message into a structured entry doesn’t require flagship-level intelligence. It’s assembly-line work: classify, extract, format.
System requirements: one user, Telegram, and Markdown
After a month with OpenClaw and the experiments with providers, the requirements list shrank to an honest minimum:
- one user: me;
- one interface: Telegram, because it’s already always at hand;
- three input formats: text, voice messages, images;
- a journal of physical and emotional wellbeing without mandatory forms;
- workouts, lab results, and related documents in one chronology;
- a daily check-in and personalized scheduled reminders;
- context stored in explicit files rather than a promise that the model “remembers everything”;
- output: structured Markdown in my own file system;
- versioning and backup: a private Git repository;
- predictable cost for routine processing;
- speech transcribed locally after delivery through Telegram, with no separate cloud STT provider.
In essence these are the requirements for a self-hosted solution: a local AI agent on your own machine wherever that’s practical, and cloud services only where they give disproportionately much for their compromises.
That’s the list I went shopping against, all over again.
Why Hermes Agent and DeepSeek
After some research I moved to Hermes Agent. Here it’s important to separate two roles that often get conflated: the orchestrator and the model. Hermes doesn’t “think” anything itself, it runs the process: it receives messages through the Telegram gateway, launches speech recognition, calls the model, executes skills, works with files, and runs cron jobs. The thinking is done by the connected model, and you can swap it out.
For routine text I chose DeepSeek: Hermes supports it as a direct provider, and the cost of bulk text normalization with it is low. For classifying, extracting facts, and summarizing daily entries, that’s enough. A strong model can be plugged in selectively, for example for a monthly retrospective, as an optional layer rather than a mandatory default.
The comparison with OpenClaw against my criteria looked like this:
| Criterion for my project | OpenClaw | Hermes Agent | Why it matters to me |
|---|---|---|---|
| Personal always-on assistant | Supported | Supported | Baseline requirement |
| Telegram | Supported | Supported | Main interface |
| Skills + cron | Supported | Supported | Daily check-in |
| Broad multi-channel platform | A strong point | Has a gateway, but my setup is narrower | I don’t need dozens of integrations |
| Cheap routine model | Many provider options | Direct DeepSeek provider | Predictable cost |
| Local speech recognition | Requires a separate link to OpenClaw’s current documentation | faster-whisper | After Telegram, audio doesn’t also go off to an external STT API |
| Markdown workspace | Can be configured | Used directly | Obsidian and Git workflow |
| Bottom line for me | More powerful than my requirements | Enough for a narrow pipeline | A personal choice, not a universal benchmark |
Let me stress it once more: this is not a verdict that “Hermes is better than OpenClaw.” It’s a conclusion for one specific single-user scenario, mine. After the switch, the key metric stopped being “how many integrations are potentially available” and became “how many evenings I don’t spend maintaining the system.” In my narrow pipeline that metric improved sharply: I stopped maintaining the platform and started using the journal.
About images: how the picture path works
In my setup, DeepSeek handles the text. Workout screenshots and document photos go through a separate auxiliary vision layer: in my case a small OpenAI API budget that I connect optionally and use fairly rarely. So text and images travel two different paths, and in the second article I set up the vision layer as its own step.
What actually works for me right now
Before installing the agent, I wiped the disks, removed Windows, and installed Ubuntu. No dual boot: the machine is no longer a gaming rig and does not need its past. For an around-the-clock service, a clean system means fewer unnecessary programs, background processes, and update surprises.
This isn’t a concept or a weekend prototype. The system has been running in daily mode since February 2026; as of early June 2026 the private workflow had around 80 daily notes. Configuration at the time of writing (verified: 2026-06-07):
Hardware: Xiaomi Mi Gaming Laptop (2019), GPU: GTX 1060 Mobile 6 GB + Intel UHD 630
OS: Ubuntu 24.04.4 LTS, kernel 6.17.0-22-generic, x86_64
Role: always-on home AI server
Orchestrator: Hermes Agent v0.12.0 (2026.4.30), commit 4f3766917
Main model: DeepSeek (deepseek-v4-flash), direct API
Gateway: Telegram via systemd user service
STT: faster-whisper 1.2.1, medium, CUDA, int8_float32
Vision/OCR: small OpenAI API budget as an auxiliary vision layer
TTS: Edge TTS
Workspace: local Markdown repository + private Git remote
A dated gateway snapshot for 2026-06-07: 35 days of uptime, about 5.7 GB RAM, CPU time 6h27min. Power consumption is unconfirmed: the machine has no reliable sensor, so an external wattmeter is needed.
Voice path verified on the GPU
As of 2026-06-07 the local STT works end-to-end again: faster-whisper 1.2.1, the medium model, GTX 1060 6 GB via CUDA 12.2, compute type int8_float32. A 22-second test audio clip was transcribed in 2.17 seconds.
How an ordinary day goes
Morning strength training at the gym. I log it the way I always have, in Hevy, take a screenshot of the totals, and send it to the bot. The vision layer extracts the exercises, sets, and weights, the skill saves the original screenshot as an attachment and the structured workout fields next to it, under the same date. I count calories in YAZIO and only send the bot the daily summary: there’s no point duplicating every single meal. In theory both workouts and nutrition could be tracked entirely through the bot, without third-party services, but that’s the next level up, which I’ve left for later.
During the day the system stays quiet. That’s an important property: a good journal doesn’t beg for attention.
At 9 pm (21:00) the reminder arrives. I answer by voice: sleep, energy, mood, stress, events of the day. My wife is used to it by now and doesn’t turn around when I dictate something to my phone in the evening. My daughter is four, and for her the bot doesn’t exist yet. Local faster-whisper makes the transcript, the skill extracts only the explicitly stated facts and normalizes the scales. If I forgot something required, the bot asks one compact follow-up question rather than a five-message survey. Experience taught me: one short follow-up keeps the habit alive, a survey kills it.
The end result is a single Markdown entry per date: the raw transcript on its own, the normalized summary on its own, metrics in the YAML frontmatter (sleep_hours, sleep_quality, energy, mood, stress, training), and links to sources. A Git commit pins the version. A repeat message for the same date updates the existing entry rather than spawning duplicates.
And in reply the bot sends a short summary of what it understood from my dictation. It looks roughly like this (the example is synthetic, all details changed, but the tone is real):
🛌 Sleep: promised yourself you’d be in bed by 10 pm, but the show won again. Lights out at midnight, up at 7 — that’s 7 hours instead of the nine you’d planned.
🏋️ Gym: legs and abs, the workout was rough after a short night.
💼 Work: two releases and a couple of bugfixes, packed day, no lunch.
🚶 10,000 steps closed, but you only started at five in the afternoon — the lack of sleep is showing.
⚡ Energy 4/10 — holding up pretty well for a night like that.
The tone of this feedback is set by the skill’s instructions: how gentle or sharp the assistant is, whether it praises or pokes fun, whether it offers little tips on request. In a public example, this stays a safe text setting, not a medical interpretation. Mine is currently in “friend with a touch of irony” mode, and reading the evening summary has become a small ritual: it’s nice when someone tidily folds your day into three lines, even if that someone works on a shelf so the cat can’t reach it.
And after a few weeks the accumulated entries start working for me: I can ask for a weekly or monthly retrospective of sleep, load, stress, and mood. Exactly how that’s done without the illusion that “the agent remembers everything,” I’ll explain in the third article.
Patterns the journal found: TV shows, sleep, and melatonin
Any tracker lives until the first “why am I even doing this.” For me that question got answered after a couple of months, once the entries had piled up and patterns became visible in them that I’d never have assembled from memory.
The first pattern is mundane, but in numbers it hits harder than in feelings. Evening, “just one more episode,” screen goes dark by 1 am, up at seven. I already knew I wouldn’t get enough sleep. But the journal showed the whole picture: the day after an evening like that, I’m not only wrecked until lunch, my weight also doesn’t drop, it climbs, even though my eating didn’t change. It’s one thing to vaguely suspect it, another to watch the same combination repeat in the entries over and over. I didn’t quit weeknight shows entirely after that, but it got much harder to bargain with myself.
The second pattern was unexpected for me. For a while I was taking melatonin before bed and considered it a harmless little helper. The entries showed the opposite: on the days after melatonin I could barely wake up and spent half the morning coming around, and that combination repeated too consistently to chalk up to coincidence. For me it worked like a handbrake on the whole morning. I stopped using it, and my mornings evened out.
This is an observation, not a recommendation
Both examples are my personal correlations from my own journal, not medical conclusions. Reactions to melatonin vary from person to person, and decisions about supplements are worth discussing with a specialist. The value of the journal is exactly that you arrive at the appointment not with “I’ve been feeling kind of off” but with a concrete picture covering a month.
This, for me, is the whole point of the system. Not “AI monitors my health,” but a low-effort way to collect observations in which patterns later become visible, and out of which concrete questions emerge instead of vague complaints.
What changed besides the number of Markdown files
The most noticeable change was not technical. A bad day used to end with a short verdict: “I’m tired,” “I got nothing done,” “I need to pull myself together tomorrow.” Now the evening entry makes me break that feeling into parts. Work may have gone fine, but I slept too little, skipped lunch, and by evening could no longer judge the day fairly. That simple decomposition does not solve the problem, but it restores its proper size.
Sometimes the agent tells me nothing new. It merely puts into words something I already knew but had not stopped to name. That is useful too. Normally too much time passes between an event and reflecting on it; here reflection is part of the same short ritual as recording the facts.
There is also a practical benefit before consultations. Instead of trying to reconstruct several weeks from memory, I can open the chronology: when sleep changed, how workload shifted, what I was taking, which lab values arrived, and what I noted at the time. The records do not provide a medical answer, but they help me bring a more precise question to a specialist.
The agent is also gradually becoming an interface to my own history. I can ask not only “how did I sleep this week?” but “when did my knee last hurt after squats?”, “what was my bench press a month ago?”, or “what did I want to discuss with the doctor?” The answer is valuable not because the model is clever, but because dated sources sit underneath it and can be opened and checked.
The key principle: every answer should lead to a source
If the agent draws a conclusion from the history, it should cite the dates and records behind it. Otherwise confident prose is easy to mistake for system memory when it may only be plausible generation.
Architecture: who does what
Me / phone
├─ text check-in
├─ voice note
├─ workout screenshot
└─ document photo
│
▼
Telegram Bot (strict allowlist: only me)
│
▼
Hermes Gateway (old laptop, Ubuntu, systemd)
├─ STT: local faster-whisper
├─ Main LLM: DeepSeek (text normalization)
├─ Auxiliary vision: separate provider for images
├─ wellbeing skills (journal + reflection)
└─ cron: reminder at 21:00, periodic reflection
│
▼
Local Markdown vault (entries + attachments)
│
├─ Obsidian for viewing and dashboards (Dataview)
└─ Git commit → private GitHub repository
(versioned backup, not E2EE storage)
Every block in this diagram came not from a love of architecture but from a specific episode in the story above: the laptop gave me local inference, OpenClaw taught me to narrow the scope, API prices led me to DeepSeek, my reluctance to send voice to the cloud led to local Whisper, and the fear of losing years of entries led to Git. The entries themselves sit as YAML frontmatter plus free text, and Obsidian reads them as an ordinary vault: metrics for dashboards, text for a human.
Where the trust boundaries run
“It’s in Obsidian” doesn’t mean “it’s stored locally.” An honest table of who sees what:
| Party | What it receives | How it’s controlled |
|---|---|---|
| Telegram | The contents of all messages to the bot | A conscious trade-off for convenience |
| Local STT | Doesn’t get a separate external copy: recognition runs on the laptop | faster-whisper on my GPU; the original message has already passed through Telegram |
| DeepSeek | Message text and derived text | Only what I sent myself |
| Vision provider | The images sent | A separate key and budget |
| GitHub | The vault contents on push | Private repo: access control, not encryption |
| Local disk | Everything | Disk encryption and physical access |
A private repository is not the same as encryption
A private GitHub repository restricts access, but it is not an end-to-end encrypted medical store and does not automatically make the solution HIPAA/GDPR-compliant. If data has passed through Telegram, a cloud model, or GitHub, those services are part of the threat model. A detailed breakdown is in the third article.
What this system can and can’t do
It can:
- accept text, voice, and images without opening any forms;
- turn a stream of consciousness into structured entries with a single schema;
- maintain a chronology of workouts, measurements, lab results, and documents;
- keep the raw input separate from the model’s interpretation;
- send personalized reminders with relevant stored context;
- ask about the day and use one follow-up instead of a survey;
- version everything in Git, including the history of corrections;
- prepare weekly and monthly retrospectives as hypotheses for self-reflection.
It can’t, and shouldn’t:
- diagnose or interpret symptoms as a replacement for a doctor;
- advise changing medications or dosages;
- draw causal conclusions from correlations;
- “know” the accuracy of OCR: data recognized from photos is always flagged for manual review;
- guarantee confidentiality beyond the boundaries in the table above.
If an entry contains alarming symptoms, the system has only one correct action: advise seeking help, not try to “analyze” it.
Honest limitations: what to know up front
I deliberately do not publish a ready-made repository. The vault holds personal data: workouts, symptoms, documents. Releasing it as a template makes no sense, and a skeleton without data and skills offers no value at all. Hermes Agent is an open framework, and everyone writes their own glue code for their own needs. For a senior developer that is a couple of evenings of work; for those just getting to know agents, the second guide in this series will be a good starting point.
The system is not entirely free. Local faster-whisper and DeepSeek through OpenRouter are genuinely cheap: my routine background usage comes to a few dollars a month. But if you lean heavily on the vision layer to OCR documents and analyze screenshots, the costs grow. I use it rarely and on purpose, so for me it is not an issue. A detailed cost breakdown by layer is in the third article.
The old laptop is a single point of failure. A power cut at home, a frozen Wi-Fi connection, a battery overheating under load: the journal becomes unavailable. For me that is an acceptable trade-off, missing one evening’s entry is not a catastrophe. If you are building a system for critically important medical tracking, it is worth thinking about a backup channel or cloud redundancy. I deliberately chose not to: privacy matters more to me than 99.9% uptime for a personal journal.
Instead of a conclusion: an assistant begins with memory
When I took the laptop down from the shelf, I simply wanted to give old hardware a new job and see how far a personal AI agent could go today. What I built was neither a home Jarvis nor a digital doctor. It was something less fantastical but already useful: an assistant I can tell about my day, send a document, assign a reminder, and ask a month later what was actually happening to me.
Perhaps a personal agent does not begin with ordering a taxi or turning on the lights. It begins with memory you can trust: explicit, dated, verifiable, and owned by its user. My agent does not run the house or make important decisions. It does something else: it helps me avoid losing the context of my own life.
For me, that is already enough for the old gaming laptop to stop feeling old.
What’s in the second and third articles: installation and operation
In the second article I build this system from scratch on a clean Ubuntu: installing Hermes Agent, connecting DeepSeek, a Telegram bot with locked-down access, local faster-whisper, a separate vision layer, the vault structure, both skills, a private Git repo, and the 9 pm reminder. Every step with a check and an “if it didn’t work” section.
In the third article: life with the journal after installation. Fixing recognition errors, working with screenshots and documents, weekly summaries, an honest monthly reflection without imaginary RAG, Obsidian dashboards, real cost broken down by layer, and privacy.
FAQ
Is this a replacement for a doctor, or a medical app?
No. It’s a personal observation log. The most it can do: help you notice patterns and prepare more structured questions for a specialist.
Why not an off-the-shelf health app?
Apps make you fill in their forms and store data in their formats. Here the input is free (voice, text, photos) and the output is my own Markdown files, which will outlive any service.
Is an old gaming laptop required?
No, any always-on machine running Ubuntu will do. A GPU speeds up local speech recognition, but faster-whisper also runs on CPU, just slower.
Does DeepSeek understand images?
In my setup, DeepSeek handles the text, and images are processed by a separate auxiliary vision layer through the OpenAI API. That’s two different budgets and two different setup steps. I connect the vision layer optionally and use it fairly rarely, so text and voice work fully without it.
How private is this?
A voice message passes through Telegram, but after delivery it’s transcribed locally and not sent to a separate STT provider. DeepSeek sees the text, the vision provider sees the images, and the vault is stored in a private GitHub repo. These are conscious trade-offs, and in the third article I’ll break them down layer by layer.
How much does it cost per month?
Routine text processing on DeepSeek costs little, the vision layer is billed separately, and local speech recognition costs only electricity. I give the exact formula and a dated price snapshot in the third article.
What language should I dictate notes in?
I dictate in Russian, and faster-whisper handles it confidently. The plan is to switch to English: dictating about my own day every evening looks like ideal conversation practice on the way to C1, and the journal will keep working exactly as before.




