I'm working on this also. Hopefully sharing adds to the conversation.
First, about the loop, Claude's (coding agent) context and attention is big enough to self-reflect. Agent Tuning shows a technique to not only demonstrates this but a way quantify it. [0]
> Claude's attention doesn't distinguish between "instructions I'm writing" and "instructions I'm following" -- they're both just tokens in context.
Second, doing research, finding academic research to add to context helps. Here is an example of an implementation that creates trading strategies by reading research and recreating them in creative new ways. [1]
The biggest problem is the coding agents don't "Fail fast and loud". They fail deceivingly.
Coding agents that read papers before writing code find optimizations that code-only agents miss.
We added a literature review phase to Karpathy’s autoresearch loop and pointed it at llama.cpp. The agent autonomously read arxiv papers, studied competing forks and spun up VMs to run parallel experiments.
Rendered at 18:39:20 GMT+0000 (Coordinated Universal Time) with Vercel.
I'm working on this also. Hopefully sharing adds to the conversation.
First, about the loop, Claude's (coding agent) context and attention is big enough to self-reflect. Agent Tuning shows a technique to not only demonstrates this but a way quantify it. [0]
> Claude's attention doesn't distinguish between "instructions I'm writing" and "instructions I'm following" -- they're both just tokens in context.
Second, doing research, finding academic research to add to context helps. Here is an example of an implementation that creates trading strategies by reading research and recreating them in creative new ways. [1]
The biggest problem is the coding agents don't "Fail fast and loud". They fail deceivingly.
[0] https://github.com/adam-s/agent-tuning
[1] https://github.com/adam-s/alphadidactic
We added a literature review phase to Karpathy’s autoresearch loop and pointed it at llama.cpp. The agent autonomously read arxiv papers, studied competing forks and spun up VMs to run parallel experiments.