Oleksii Siniaiev
Post page navigation
Articles 26 min read June 13, 2026

Hermes Agent + DeepSeek on Ubuntu: a complete Telegram AI journal setup

A step-by-step setup for Hermes Agent and DeepSeek on Ubuntu: a locked-down Telegram bot, local faster-whisper, a Markdown vault, private Git, and a daily reminder.

Hermes Agent and DeepSeek AI journal setup
On this page

An article for those who have already worked with the terminal

This guide assumes basic skills with Ubuntu/Linux, Git, and Telegram bots. If any of that sounds unfamiliar, no problem: any AI chat will walk you through it step by step. Just describe what you’re doing and ask ChatGPT, Gemini, Claude, or another tool of your choice. They handle the basics better than any static guide.

TL;DR

  • The guide starts from an Ubuntu install that’s already in place: budget a few hours for the base system and separate time for the vault, skills, and the end-to-end test.
  • We’ll build a Telegram journal on Hermes Agent with DeepSeek, local speech recognition, and a private Git repository.
  • Text, voice, and images travel down separate, verifiable paths, and the raw originals are always kept right next to the final entry.
  • Every step ends with a check, so a mistake is caught right there instead of when you fire up the whole system.

In the first article I described how an old Xiaomi Mi Gaming Laptop became a home AI server for a wellbeing journal, and why, after a month with OpenClaw, I rebuilt the system from scratch on Hermes Agent and DeepSeek. Now for the interesting part: reproducing my configuration from zero. This is the guide I wish I’d read myself back in early 2026, instead of piecing it together one fragment at a time.

I deliberately built it around the principle of “step, check, next step”. When I assembled the first version of the system, my main mistake was setting everything up at once and then spending two evenings figuring out which of the six components had gone silent. Don’t repeat it: verify each layer on its own.

The guide is long, but the route is simple. First we give the agent a voice and a communication channel, then teach it to turn messages into your files, and finally add memory, backup, and habit. After each stage, the system can already do something complete:

  1. Steps 1–6: the bot responds to text, voice, and images.
  2. Steps 7–9: messages become a verifiable Markdown archive with change history.
  3. Steps 10–12: the journal becomes a daily system, survives reboots, and reminds you on its own.

You do not need to finish everything in one evening. The best first stopping point is after the voice test in step 5; the second is after the first successful commit in step 9. A useful intermediate system is better than an exhausted person debugging vision, Git, and cron at two in the morning.

Don’t copy commands before checking versions

Hermes Agent moves fast. The current audit was done on Hermes v0.12.0 (2026.4.30), commit 4f3766917 and Ubuntu 24.04.4 LTS; verification date: 2026-06-07. For a newer version, cross-check against the official repository.

Verified voice configuration

NVIDIA driver 535.309.01, GTX 1060 6 GB, CUDA 12.2, ffmpeg 6.1.1, and faster-whisper 1.2.1. The medium model runs on CUDA with int8_float32; 22 seconds of test audio were transcribed in 2.17 seconds.

What we’re building

Architecture of an AI journal: messages from Telegram pass through Hermes Agent on Ubuntu, local faster-whisper, DeepSeek, and a separate vision provider, then get saved to a Markdown vault and a private Git repository
The full system architecture. Each layer is configured and verified by its own step in this guide.

The same system in text form:

Prompt
Telegram (you only, by allowlist)
        │
        ▼
Hermes Gateway on the Ubuntu machine (systemd, must run continuously)
  ├─ DeepSeek: text normalization of entries
  ├─ faster-whisper: local recognition of voice messages
  ├─ auxiliary vision: a separate provider for images
  ├─ skills: wellbeing-journal + wellbeing-reflection
  └─ cron: a reminder every evening at 21:00
        │
        ▼
Markdown vault (Obsidian-compatible)
        │
        └─ git commit → private GitHub repository

For hardware, any machine works that can run reliably around the clock and meets the requirements of the components you pick. In my case it’s a 2019 gaming laptop on a shelf in the home office: its GPU speeds up speech recognition, but faster-whisper runs on CPU too, just slower. The configuration to pin down before publication:

ComponentMineMinimum
OSUbuntu 24.04.4 LTS, kernel 6.17.0-22-generic, x86_64Current stable Ubuntu
CPU/RAMIntel i7-8750H (6C/12T), 15 GiB RAM / 10 GiB availableAny x86-64, 8 GB RAM
GPUGTX 1060 Mobile (6 GB) + Intel UHD 630; NVIDIA 535.309.01, CUDA 12.2Not required (CPU-mode STT)
DiskNVMe 154 GB, 84 GB free (43%)20 GB free
Hermes Agentv0.12.0 (2026.4.30), commit 4f3766917Same version as in the article

Before you start: Ubuntu, accounts, and secrets

I’m deliberately leaving the Ubuntu install out of scope: there are plenty of good guides for it, so start with the official Ubuntu Server documentation. The starting point of this guide: a clean, updated system, internet access, and a user with sudo.

Get these ready in advance so you’re not jumping between tabs later:

  • a Telegram account (the bot is created for free via BotFather);
  • a DeepSeek platform account and a payment method for the API key;
  • an OpenAI account with a small budget for the vision layer (optional; text and voice work without it);
  • a GitHub account with two-factor authentication enabled;
  • Obsidian on the desktop for browsing entries (optional; the vault stays plain Markdown).

A rule for secrets that holds for the whole guide

API keys and tokens live only in ~/.hermes/.env and your password manager. Never: in the vault, in the Git repository, in screenshots. Every key in this article is a placeholder; don’t try to use them.

And one last thing: decide where the machine will live. My laptop lives on a wall shelf above head height. The reason is mundane: the cat. If you have pets, four-year-olds, or other sources of chaos, think this through before the system becomes part of your daily routine.

Step 1. Prepare Ubuntu

Update the system and install the base dependencies:

Bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y git curl ffmpeg python3

ffmpeg is needed to convert Telegram voice messages before recognition, and Python is for Hermes and helper scripts. In the 2026-06-07 audit, python3, git, node, and rg were available, but ffmpeg and a standalone uv were not found in PATH. If uv --version doesn’t respond, install it following the official astral.sh/uv instructions, or use the Hermes environment from ~/.hermes/hermes-agent/venv/.

Check the timezone right away: it decides whether the evening reminder actually arrives in the evening.

Code
timedatectl
# if needed:
sudo timedatectl set-timezone Europe/Madrid

If you plan to use a GPU for speech recognition, confirm the system sees the card:

Code
nvidia-smi   # verified: GTX 1060 6 GB, driver 535.309.01, CUDA 12.2

Check: timedatectl shows your timezone, and ffmpeg -version responds without errors.
If it didn’t work: don’t go any further. Every next step builds on this one.

Step 2. Install Hermes Agent

The official install is a single command. But before running any curl | bash from the internet, including this one, open the URL in a browser and read the script: you should understand what you’re about to run on the machine that will hold your personal journal.

Bash
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
echo $SHELL            # find out the active shell
source ~/.bashrc       # bash is the default on Ubuntu
# source ~/.zshrc      # if you've switched to zsh

Then diagnostics:

Code
hermes doctor
hermes --version

Check: hermes doctor shows no critical problems, and hermes --version prints a version. Write it down: it’s your reference point for any troubleshooting.
Expected result: the critical checks pass; in the 2026-06-07 audit, hermes doctor showed PASS on the critical checks plus minor OAuth warnings.
If it didn’t work: most often it’s PATH: restart the terminal or source your shell config again.

Step 3. Connect DeepSeek for text

DeepSeek is the workhorse of this architecture. It normalizes the daily entries, and it’s precisely its low cost that makes an always-on agent economically sensible, which I covered in detail in the first article.

Create an API key in your account on the official DeepSeek platform and launch the interactive model picker:

Code
hermes model

Pick DeepSeek from the list of providers and enter the key when Hermes asks for it: it’ll store the secret in ~/.hermes/.env as DEEPSEEK_API_KEY. Don’t edit the files by hand as long as the CLI can do it itself. If the interactive picker doesn’t appear (for example, a headless session with no TTY), the verified version uses the keys model.default, model.provider, and model.base_url in ~/.hermes/config.yaml.

The current model ID as of the verification date: deepseek-v4-flash. The current prices for this check: cache hit $0.0028, cache miss $0.14, output $0.28 per 1M tokens (verified: 2026-06-07; before setup, open the official pricing page).

A smoke test without leaving the terminal:

Bash
hermes -z "Answer in one word: working?"

Check: the agent answers, and the response shows the DeepSeek model was used.
If it didn’t work: confirm the key is written to ~/.hermes/.env, that the DeepSeek account has a positive balance, and that the model ID isn’t stale (that happens).

Step 4. Create a Telegram bot and lock it down

First we create the bot through Telegram’s official tool:

  1. Open Telegram and find @BotFather (blue checkmark, the official one).
  2. Send the command /newbot.
  3. BotFather will ask for the bot’s name (the display name, for example “My Wellbeing Journal”) and a username (it has to end in _bot, for example my_wellbeing_bot).
  4. In return you’ll get an HTTP API token like 7123456789:AAF.... Save it to your password manager right away: it’s shown only once, after that you can only reset it via /revoke.

Next you’ll need your numeric user ID. The simplest way to find it: message @userinfobot in Telegram and it’ll reply with your ID on a single line. Write it down next to the token. The current Hermes path: run hermes gateway setup or send /whoami to the bot in a Telegram DM after connecting it.

Connect the gateway:

Code
hermes gateway setup

The setup wizard will ask for the bot token. The most important thing in this step: the allowlist. The final configuration should contain:

Markdown
# ~/.hermes/.env
TELEGRAM_ALLOWED_USERS=123456789   # your numeric ID, placeholder

Here it’s important not to confuse two different settings, and almost everyone confuses them, me included on the first evening:

SettingWhat it controls
TELEGRAM_ALLOWED_USERSWho is allowed to talk to the bot at all. Authorization.
TELEGRAM_HOME_CHANNELWhere the bot delivers messages by default (for example, cron reminders). The delivery address.
--deliver telegram:<chat id>An explicit delivery address for a specific cron job.

Specifying a chat ID for delivery isn’t enough to lock the bot down: without an allowlist it stays open to anyone who finds it by name. A personal journal with an open door is the worst possible combination.

Start it and check:

Code
hermes gateway            # foreground, for the first check
# then, for continuous operation:
hermes gateway install    # installs the systemd service
hermes gateway status

Check: message the bot “hi” from your own account: it should reply. Ask someone on another account to message it: the bot should stay silent.
If it didn’t work: check the logs (~/.hermes/logs/gateway.log); most often the problem is the token, or that the allowlist holds a username instead of a numeric ID.

Step 5. Local faster-whisper for voice

Voice messages are the system’s main interface. The message itself goes through Telegram, but after delivery Hermes recognizes it with local faster-whisper, without sending the audio off to yet another cloud STT provider. The voice dependency is installed like this:

Bash
cd ~/.hermes/hermes-agent
uv pip install -e ".[voice]"

The command runs in the environment managed by the Hermes install. On the verified system the virtualenv is at ~/.hermes/hermes-agent/venv/. If a standalone uv isn’t found in PATH, first restore/install uv, or use the documented Hermes update/install path for this version.

The actual STT config as of 2026-06-07:

Code
stt:
  enabled: true
  provider: local
  local:
    model: medium
    language: ''

This config selects the multilingual faster-whisper medium model. The verified runtime uses faster-whisper 1.2.1, device cuda, and compute type int8_float32.

Next comes the choice of model size, and here’s the honest fork in the road:

  • base: fast even on CPU, usually enough for short everyday notes in a single language;
  • medium: noticeably more accurate on mixed speech and jargon, but it needs a GPU or patience;
  • the verified configuration: medium, CUDA on a GTX 1060 6 GB, int8_float32; 22 seconds of audio are processed in 2.17 seconds.

A tip from running it: don’t chase the bigger model right away. After a couple of weeks you’ll see which words get recognized consistently badly, and a contextual correction dictionary (it shows up in the third article) fixes that more cheaply than moving to a heavy model.

Check: send the bot a 20–30 second voice message in your language. In the control test, 22 seconds of audio were transcribed in 2.17 seconds on the GPU.
If it didn’t work: check ffmpeg, the faster_whisper import in the Hermes venv, nvidia-smi, free RAM/VRAM, and the gateway logs.

Cloud STT as a fallback

Hermes has cloud STT alternatives (Groq, OpenAI, Mistral). They’re faster on weak hardware, but every voice message then goes off to a third party. For a wellbeing journal I consider that unacceptable by default: choose the cloud deliberately, not because it’s easier.

Step 6. A separate vision layer for images

A quick reminder of how the picture path works: in this setup DeepSeek handles the text, while workout screenshots from Hevy and photos of documents go through a separate auxiliary vision layer. That’s a separate provider with a separate budget; in my case the OpenAI API with a small spending limit. This step is optional: I connect vision when I feel like it and use it fairly rarely, so if you don’t need images you can skip it and live comfortably on text plus voice.

Markdown
# ~/.hermes/config.yaml
auxiliary:
  vision:
    provider: openai
    model: auto  # the resolved model needs to be confirmed in the logs before publication
Markdown
# ~/.hermes/.env
OPENAI_API_KEY=sk-placeholder

If you’ve never worked with the OpenAI API: the key is created in the API keys section, and the spending limit is set in the billing/usage sections of your account. Set a hard monthly ceiling right away: vision requests cost more than text ones, you’ll be sending screenshots every day, and the ceiling turns a possible surprise, in the worst case, into “the bot stopped seeing pictures for the rest of the month”.

Check: send the bot any screenshot with no personal data, for example a terminal window. The bot should describe what’s in the image.
If it didn’t work: compare the model name against the current OpenAI documentation, check the key and the logs. The typical symptom of a disconnected vision layer: the bot answers text fine but replies to an image with “I don’t see an image” or a provider error. The symptom of a stale model ID: a model-not-found error in the gateway logs.

First checkpoint: the agent can already communicate

If text, voice, and a test image all work, you already have a functional personal Telegram agent. The next steps do not make the model smarter; they turn the conversation into a reliable record system you can still trust a month later.

Step 7. Create the Markdown vault

I earned this structure the hard way on my own archive: the first version was “architecturally pretty”, with numbered folders and an inbox, and almost all of that prettiness turned out to be dead weight for one user with one Telegram entry point. The working structure is flat:

Code
wellbeing-journal/
├── daily/YYYY/MM/YYYY-MM-DD.md        # one entry per day, canonical name
├── weekly/YYYY/YYYY-Www.md            # weekly summaries
├── monthly/YYYY/YYYY-MM.md            # monthly reflections
├── notes/
│   ├── terminology.md                 # your terms: exercises, supplements
│   ├── measurements-history.md        # history of measurements
│   └── known-asr-errors.md            # recognition correction dictionary
├── documents/
│   ├── lab-results/YYYY/MM/
│   └── reports/YYYY/MM/
├── attachments/YYYY/MM/YYYY-MM-DD/    # original screenshots and photos
├── audio/                             # voice messages, per retention policy
├── templates/
│   ├── daily.md
│   ├── weekly.md
│   ├── monthly.md
│   └── lab-result.md
├── scripts/
│   ├── validate-entry.sh
│   ├── build-reflection-context.sh
│   ├── sync-to-mobile.sh
│   └── commit-entry.sh
├── _system/
│   ├── schema.md                      # frontmatter contract
│   ├── scales.md                      # definitions of the 1–10 scales
│   ├── safety.md                      # medical boundaries of the skill
│   └── retention-policy.md
├── dashboards/
├── JOURNAL_GUIDE.md
├── .gitignore
└── .obsidian/

The system’s key contract: Markdown for the human, YAML for the automation. The body of the entry stays free text, while everything the dashboards and reflections will later read lives strictly in the frontmatter:

Code
---
date: 2026-06-06
schema_version: 1
sleep_hours: 7.5
sleep_quality: 4
energy: 3
mood: 4
stress: 2
pain: 0
training: strength
symptoms: []
medications_changed: false
source: telegram
needs_review: false
---

Three rules I later paid dearly to fix in my own archive:

  • an unknown value stays null or empty; the model never substitutes 0 for “didn’t say”;
  • the date in the frontmatter must match the filename;
  • the scales (mood, energy, stress) are defined in _system/scales.md with anchors, and the user’s self-assessment always outweighs “scoring by message tone”.

Scale anchors aren’t a formality, they’re the main defense against semantic drift. Without them, the model will one day decide that “everything’s annoying” means high stress, when for you that’s low energy. My working anchors from _system/scales.md, validated over several months of daily scoring:

Markdown
## Mood (mood)
- 1–2 — down, miserable
- 3–4 — irritable, everything's annoying
- 5–6 — even, neutral (this is the norm, not "bad"!)
- 7–8 — upbeat, cheerful
- 9–10 — euphoria

## Energy (energy)
- 1–2 — switched off, barely moving
- 3–4 — sluggish, the post-lunch crash
- 5–6 — normal, working mode
- 7–8 — alert, drive
- 9–10 — hyperactivity

## Stress (stress)
- 1–2 — calm, on holiday
- 3–4 — light background hum
- 5–6 — noticeable pressure
- 7–8 — anxiety, poor sleep
- 9–10 — crisis, panic

Note the comment on mood: 5–6 is the norm. Without it, it’s easy to start treating an “even” day as a failure, and your scores will creep upward simply out of reluctance to give yourself a “middle”. And my main hack for the scores themselves: don’t think. Three questions at the end of the day, and the first number that comes to mind is the right one. The longer you agonize between a four and a five, the less that number means.

Check: create the structure, drop in one test daily file from the template, and open the folder in Obsidian: the vault should open with no errors and no required community plugins.

Step 8. Skills: a core without a monolith

Hermes skills live in ~/.hermes/skills/, each in its own folder with a mandatory SKILL.md. My first version of the journal skill was a single file hundreds of lines long, and verifying its behavior was impossible. The working layout: two skills, each with a short core and references pulled out into separate files.

Code
~/.hermes/skills/personal/wellbeing-journal/
├── SKILL.md                  # short core: flow and routing rules
├── references/
│   ├── schema.md             # copy of the frontmatter contract
│   ├── input-text-voice.md   # handling text and voice messages
│   ├── input-images.md       # handling images and OCR
│   ├── safety.md             # medical boundaries
│   ├── provenance.md         # source rules
│   └── git-workflow.md       # commit rules
├── templates/
│   └── confirmation.md       # Telegram confirmation template
└── scripts/
    ├── validate_entry.py
    ├── upsert_daily.py
    └── safe_commit.sh

~/.hermes/skills/personal/wellbeing-reflection/
├── SKILL.md                  # weekly/monthly analysis only
├── references/
│   ├── reflection-rubric.md
│   └── medical-boundaries.md
└── scripts/
    └── build-reflection-context.sh

The journal skill’s core in full. For Hermes v0.12.0 it needs YAML frontmatter with the mandatory fields name and description:

Markdown
# wellbeing-journal

You keep the user's personal wellbeing journal. You are a journaling
assistant, not a doctor: you don't make diagnoses, you don't prescribe or
cancel treatment, you don't interpret urgent symptoms. Full rules: references/safety.md.

## Input formats

- Text: a daily note or a reply to the evening check-in.
- Voice: the transcript is already prepared by the system. Rules: references/input-text-voice.md.
- Images: workout screenshots and photos of documents. Rules: references/input-images.md.

## Main flow

1. Determine the entry date. If the date is ambiguous, ask ONE clarifying question.
2. Extract only explicitly stated facts. Don't infer values.
   A missing metric stays null, never 0.
3. Normalize the scales per _system/scales.md. The user's self-assessment
   outweighs your reading of the tone.
4. Gather the missing REQUIRED fields into one compact question
   ("Sleep / energy / mood / stress as numbers on one line?").
   Don't ask for optional fields.
5. Write to the canonical path daily/YYYY/MM/YYYY-MM-DD.md via scripts/upsert_daily.py:
   reprocessing the same date updates the file, doesn't create a new one.
6. Save the raw input (transcript, text) in a Raw section separate from the Summary.
   Never overwrite Raw with your interpretation.
7. Mark data from images as needs_review: true and source: ocr.
8. Run scripts/validate_entry.py. If validation fails, report the error,
   don't commit.
9. Commit only via scripts/safe_commit.sh (which also runs the secret scan).
10. Send a short confirmation per templates/confirmation.md: what was saved,
    which fields are filled in, what's left as needs_review.

## Forbidden

- Medical conclusions, diagnoses, advice on medications and dosages.
- Interpreting lab values without units and reference ranges.
- Git push when secrets are suspected in the entry.
- Creating files with -2, FINAL, evening suffixes or any variations on the
  canonical name.
- Copying system logs, provider errors, and cron output into the journal:
  those are operational events, not observations about wellbeing.

The full listings of references, templates, and helper scripts are better moved into a separate fixture after the final command check. Until such a repository exists, the article shouldn’t promise a ready-made link. We’ll go through the reflection skill in detail in the third article; for the install it’s enough to create its skeleton.

Check: send the bot the text “Slept 7 hours, energy 4, mood 4, stress 2, no training”. A file daily/2026/06/2026-06-06.md should appear with filled-in frontmatter and your text in the Raw section. Send a correction, “correction: slept 6 hours”: the file should update, and a second file should not appear.
If it didn’t work: check that the skill is visible to the agent (hermes skills list), and that SKILL.md points to the right vault paths.

Step 9. A private Git repository

Git here solves two problems: backup and a history of corrections. When you fix a recognition error in an old entry a month from now, the git history is exactly what shows what changed and when.

Bash
cd ~/wellbeing-journal
git init
git add .
git commit -m "Initial vault structure"

Create an empty private repository on GitHub and connect it via an SSH key or the system credential helper. Don’t use a plaintext token in the remote URL: it’ll stay in the config forever.

Bash
git remote add origin [email protected]:youruser/your-private-journal.git   # placeholder
git push -u origin main

A mandatory .gitignore before the first meaningful commit:

Code
.obsidian/workspace*
.trash/
*.tmp
.DS_Store

And a rule worth learning before, not after: deleting a file in a later commit doesn’t remove it from history. If a token or a photo of a document with personal data ever lands in the repository, a simple “delete and commit” won’t help: you’ll need to rewrite history and rotate the secret. I lived through this on a work project and wrote it up in a separate article, how we cleaned out accidentally pushed secrets: better to read it before you need it.

That’s why safe_commit.sh runs a primitive but working secret scan before every commit. A skeleton that’s easy to extend with your own patterns:

Code
#!/usr/bin/env bash
# safe_commit.sh — commit only after a secret scan
set -euo pipefail
cd "$(dirname "$0")/.."

# Patterns: DeepSeek/OpenAI keys, Telegram bot token, generic
PATTERNS='sk-[A-Za-z0-9]{16,}|DEEPSEEK_API_KEY=|OPENAI_API_KEY=|[0-9]{8,10}:[A-Za-z0-9_-]{30,}'

if git diff --cached --diff-filter=ACM -U0 | grep -qE "$PATTERNS"; then
  echo "STOPPED: a possible secret was found in the staged changes." >&2
  exit 1
fi

git commit -m "${1:-Update journal entry}"
git push || echo "Push deferred: entries saved locally."

The full version with entry-schema validation should only be published together with a verified fixture. Until then, the article keeps a minimal safe example and a checklist before push.

Check: do a test push, then clone the repository into another directory and confirm the vault opens from the fresh clone. That’s both a backup check and a restore rehearsal.
If it didn’t work: it’s almost always the SSH keys. ssh -T [email protected] will show whether you’re authorized.

Second checkpoint: the agent now has verifiable memory

The value is no longer trapped inside a chat or a model’s promise to “remember.” You have ordinary files, change history, and a tested recovery path. Even if you stop here, you already have a useful personal archive.

Step 10. Obsidian and reading on your phone

On the desktop it’s simple: open ~/wellbeing-journal as a vault. With the phone it’s trickier, and here I’ll give honest advice instead of pretty advice: don’t attach two sync mechanisms to one folder. Git and iCloud/Obsidian Sync writing to the same vault at the same time will, sooner or later, cause a conflict on the most valuable file.

My working setup: the canonical vault lives in Git on the server, and a one-way read-only mirror goes to the iPhone (scripts/sync-to-mobile.sh does an rsync --delete into an iCloud folder, excluding .git). I read the journal on my phone but only write to it through the bot. Edits made on the phone in the mirror get overwritten by the next sync, and that’s a deliberate limitation, not a bug.

Alternatives: official Obsidian Sync as the only mechanism (no Git), or desktop-only mode. The Obsidian Git mobile implementation is experimental and doesn’t support SSH; I wouldn’t bet on it.

Check: run sync-to-mobile.sh first with --dry-run and review the list of actions, then a real run and a check on the phone.

Step 11. The 21:00 reminder

The final touch that turns a pile of components into a habit. First, pin the timezone in the Hermes config:

Markdown
# ~/.hermes/config.yaml
timezone: "Europe/Madrid"

An empty value means the server’s local time; explicit is always more reliable. Then, before trusting the syntax from any article, including mine:

Code
hermes cron create --help

Create the job:

Code
hermes cron create "0 21 * * *" 
  "Remind me to fill in the evening wellbeing check-in: sleep, energy, mood, stress, symptoms, training, and the important events of the day. Don't draw medical conclusions." 
  --name "Evening wellbeing check-in" 
  --deliver telegram

Important to understand: cron jobs are run by the gateway daemon. Creating a job isn’t enough, the gateway has to run continuously, which is why we installed it as a systemd service in step 4.

And don’t take the schedule on faith. Create a test job for two minutes from now and check the actual delivery time:

Code
hermes cron create '2m' 'Send exactly one word: TEST' --name 'STT Smoke Test Confirm' --deliver telegram
hermes cron list
# after the check:
hermes cron remove <job_id>

My first reminder arrived on time, but only because I ran this two-minute test and caught the discrepancy in advance. One important detail here: removal works only by job ID, not by job name. An hour of debugging the timezone on launch evening, or two minutes of testing: the choice is obvious.

Check: the test job arrived in Telegram at the expected local time; hermes cron list shows the permanent 21:00 job.
If it didn’t work: check hermes gateway status, the timezone in the config, and TELEGRAM_HOME_CHANNEL: a bare --deliver telegram delivers exactly there.

Step 12. The end-to-end test

Now run the whole system through, like one day in the life:

TestExpected resultWhere to look for the error
Text check-inOne daily entry and one commitGateway, skill, vault permissions
Voice messageRaw audio, transcript, and a structured entry for the same dateWhisper, GPU/CPU, codec, ffmpeg
Workout screenshotAttachment saved, training fields filled, source: ocrVision API, limits, MIME
Demo photo of a documentData extracted, but marked needs_review: trueVision layer, skill rules
ResendingThe entry is updated with no duplicate and no suffixed filesupsert logic, canonical path
Machine rebootGateway came up on its own, the bot answerssystemd service
Push with Wi-Fi offEntry saved locally, push catches up later, no data lossgit-workflow in the skill

The end-to-end test takes about twenty minutes, and they’re the best twenty minutes of the whole install: after them you trust the system enough to hand it a real evening check-in.

Third checkpoint: this is no longer a prototype

The bot accepts every format, updates one canonical entry, preserves originals, survives a reboot, and postpones a push without losing data. This is the point where a collection of integrations becomes a tool you can use every day.

Final checklist

  • [ ] TELEGRAM_ALLOWED_USERS holds only your numeric ID, and the bot stays silent for strangers.
  • [ ] All secrets are in ~/.hermes/.env and your password manager, none in the vault, Git, or screenshots.
  • [ ] The timezone is verified with a real test job, not an assumption.
  • [ ] Voice messages are recognized locally, and latency is measured.
  • [ ] The vision layer is configured separately, with a spending limit on the provider account.
  • [ ] Resending for the same date updates the entry, and duplicates aren’t created.
  • [ ] Raw input is saved separately from the normalized summary.
  • [ ] The repository is private, 2FA is on, and a fresh clone opens in Obsidian.
  • [ ] The gateway survives a machine reboot without a manual start.

How to update Hermes Agent

Hermes moves fast, and one day you’ll need an update. For the verified version the update command is hermes update, then always hermes doctor and hermes --version, a hermes gateway status check, and one test message to the bot. After major updates, cross-check the skills header format: breaking changes happen in fast-moving projects.

Common mistakes

A token in the remote URL

A plaintext token in the repository address stays in the config and history forever. SSH key or credential helper only.

Expecting DeepSeek to read the picture itself

In this architecture text and images travel different paths. DeepSeek handles the text, while pictures go through a separate auxiliary vision layer: its own setup step and its own budget (in my case a small OpenAI API limit).

Chat ID instead of an allowlist

The delivery address and the right to talk to the bot are different settings. Without TELEGRAM_ALLOWED_USERS the bot is open to everyone.

Cron without a timezone check

First a test job two minutes out, then the permanent schedule. Otherwise “21:00” turns out to be server time, not yours.

Two sync mechanisms on one vault

Git plus iCloud writing to the same folder at once will, one day, cause a conflict. One canonical source, everything else a one-way mirror.

Set everything up at once, then debug

Six layers configured with no checks in between take longer to debug than to install. Step, check, next step.

What’s next

The first real check-in after installation feels slightly strange. You spent hours configuring models, keys, systemd, and Git, but the result looks almost too simple: send a voice message, receive a short reply, see a Markdown file appear. That was the point. Good infrastructure disappears from view and leaves only the useful action.

The system is built and verified, but the interesting part begins afterward: the first week of entries, the first recognition errors, the first weekly summaries, and the question “so what do I actually do with all this now”. That’s what the third article is about: daily operation, honest monthly reflection without a made-up RAG, dashboards in Obsidian, real cost broken down by layer, and privacy. That’s where I’ll tell you how the journal caught my “melatonin → wrecked morning” link, which I hadn’t noticed myself.

FAQ

Do I need a GPU?

No. faster-whisper runs on CPU, recognition just takes longer. A GPU can noticeably cut the latency; the actual latency has to be measured on your chosen model and hardware.

Will a Raspberry Pi or VPS do?

The text path: yes. Local speech recognition on weak ARM will be painful, and a VPS adds the question of trusting the host that holds your journal. An old x86 laptop is the sweet spot.

Can I do it on Windows or macOS?

Hermes supports other platforms too, but the happy path of this guide is limited to Ubuntu. The commands can only be considered reproducible after the version block is filled in and the end-to-end test is done. On macOS the general logic is the same; the differences start with installing dependencies and autostart.

How long does the install take?

For me personally, the migration from the previous system took a few hours, but that’s not a universal benchmark. For a first run it’s sensible to budget one session for the base system and a separate one for the vault, skills, and the end-to-end test, especially if you end up wrestling with the GPU, systemd, or Telegram delivery.

Where is my data physically stored and who has access to it?

The entries and originals live on your machine and in a private GitHub repository. Along the way they’re seen by Telegram (including voice messages), DeepSeek (text), and the vision provider (images). Local faster-whisper rules out sending the audio to an extra STT provider, but it doesn’t take Telegram out of the chain. A private repo is access control, not encryption: a detailed breakdown of the trust boundaries is in the third article.

How much does DeepSeek cost per month with daily use?

The cost of the text layer depends on the actual token count and the current rates: cached input, uncached input, and output are counted separately. Vision is billed separately and, with frequent images, can become a noticeable part of the budget. I give the calculation formula and a dated price snapshot in the third article; before setup, cross-check against the official DeepSeek pricing page.

Share this article

LinkedIn X Email

Get in touch

If this article is relevant to your work, feel free to reach out.

I am always open to discussing architecture, Laravel, WordPress, performance, and practical implementation problems.

Send a message See selected work

Explore more

Articles

June 13, 2026

A month with an AI journal: finding patterns in sleep, stress, and workouts

How to analyze an AI journal after the first month: transcription fixes, honest reflection…
Articles

June 13, 2026

How I turned an old gaming laptop into an AI wellbeing journal

How an old Xiaomi Mi Gaming Laptop became a home AI server: Hermes Agent,…
Articles

May 31, 2026

WordPress Bedrock Docker: local dev and bare-PHP production guide

How to run WordPress Bedrock with Docker Compose locally and deploy to Hostinger without…