Tuesday, June 30, 2026

Show HN: Kage, verification and freshness for Google's OKF agent memory https://ift.tt/YZ1MHIT

Show HN: Kage, verification and freshness for Google's OKF agent memory Kage was always a document format memory with it's own memory standards... and was betting on file based memory + git native. It's good to see that Google also thinks the same and released OKF(Open Knowledge Format) and Kage has adopted OKF with open arms. Though Google has released the Memory standard and how to structure the memory, it doesn't do verification, when and how memories are created. That's where Kage comes in, Kage as a framework works with your agent, understand what to save, when to save, how to save, it also help the agent to recall relevant memory/maintain it's freshness. Kage is focused on maintaining your repo's memory for you and give you best experience when coordinating and working with teammates on the same repo. Just install Kage and let you agents do the memory maintenance job itself using Kage. Best support with Claude Code(Hook), also available and works with all the over coding agents. https://kage-core.com/ July 1, 2026 at 12:29AM

Show HN: Morph Reflexes – Multi-head classifiers for agent traces https://ift.tt/LrJ9KCx

Show HN: Morph Reflexes – Multi-head classifiers for agent traces The most common failures for production agents are behavioral: looping, reasoning leakage, user frustration, and more. Using a frontier model like GPT or Sonnet to judge every turn is too expensive and slow to run at scale. How it works: We use a modern LLM with hybrid attention and remove the decode step. We built an inference engine that lets prefill compute be 99% reused from reflex to reflex, similar in spirit to older 2019-era BERT/HYDRA + older multiple-head techniques. We took the same high-level idea and did the hard work to make it work with a modern architecture and attention. On it, we can run inference in under 30ms and serve the full request in under 90ms. If you run 4 reflexes or 100, the extra overhead is less than 2ms. Why does optimizing this matter? If you’re even a medium-sized startup, you’re dealing with tens of thousands of agent runs and millions of turns. If you want to track things like user frustration rates over time, frontier LLM-as-judge does not scale. I built a similar stack at Tesla. When ML engineers needed to sample data across petabytes for signals like `is_camera_obfuscated=true`, along with 200 other things, you need to 1) spin them up quickly 2) run at scale efficiently What it is not: A dashboard. In my experience, 99% of dashboards go unused. This is purely API-based and made for devs who want to track agent behavior themselves and trigger their own alerts and build on it. You can vibetrain a custom reflex in our dashboard, and then let it self improve in production: https://ift.tt/0ifhoxZ Docs: https://ift.tt/RJbxiuZ I’d love feedback from people running agents in prod: what sorts of things do you wish you could track over time across 100% of turns? TLDR: semantic signals from agent traces, super fast, cheap via API July 1, 2026 at 12:52AM

Show HN: Jensen – a Deus Ex: Human Revolution theme for 30 developer apps https://ift.tt/whHrnTe

Show HN: Jensen – a Deus Ex: Human Revolution theme for 30 developer apps https://tomaytotomato.github.io/jensen/ June 30, 2026 at 11:56PM

Monday, June 29, 2026

Show HN: Fleet – a local-first console for managing Dockerized Hermes AI Agents https://ift.tt/FPdZHJe

Show HN: Fleet – a local-first console for managing Dockerized Hermes AI Agents https://ift.tt/cdP26po June 30, 2026 at 12:31AM

Sunday, June 28, 2026

Show HN: Image2JXL – a native macOS JPEG XL converter https://ift.tt/M2idH4p

Show HN: Image2JXL – a native macOS JPEG XL converter https://ift.tt/mgJV628 June 29, 2026 at 04:09AM

Show HN: Use-zerostack – delegate any task to a lightweight coding agent https://ift.tt/C5E7flx

Show HN: Use-zerostack – delegate any task to a lightweight coding agent https://ift.tt/GgupCmF June 28, 2026 at 11:33PM

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch https://ift.tt/ORl9gqf

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch Hi everyone, I started working on nanoeuler after the ban of anthropic's fable because my ambition and dream is to work in the AI field in anthropic. The two interesting reasons that led me to create nanoeuler were (1) interfacing with llm does not mean understanding how they are composed and (2), working on llm with a very low-level layer to understand the correlation between parameters and data and growth of the model and how the GPU works and how some layers can be optimized. So I started working on it with a research aspect by making nanoeuler grow more and more but doing one step after another starting from Shakespeare.txt and understanding what a text generation model understands at 23 million parameters. For example, nanoeuler at that number had understood that Name: started a line and wrote that line with sense. I wrote everything in CUDA because I wanted to not use any intermediary between the model in training and inference and what it had to do. Then the use of SFT and much more, even if in small ways, were really useful to understand the various step to make an llm like a chatbot.Any feedback, help, or suggestions are absolutely welcome! https://ift.tt/Kctn6T3 June 28, 2026 at 11:38PM