Everything is changing. Adam is joined by his good friend Beyang Liu from Sourcegraph — this time, talking about Amp (ampcode.com). Amp is one of the many, and one of Adam’s favorite agentic coding tools to use. What makes it different is how they’ve engineered to it to maximize what’s possible with today’s frontier models. Autonomous reasoning, access to the oracle, comprehensive code editing, and complex task execution. That’s nearly verbatim from their homepage, but it’s also exactly what Adam has experienced. They talk through all things agents, how Adam might have been holding Amp wrong, and they even talked through Adam’s idea called “Agent Flow”. If you’re babysitting agents, this episode is for you.
Featuring
Sponsors
CodeRabbit – AI-native code reviews, built for the modern dev stack. — CodeRabbit is your always-on code reviewer—flagging hallucinations, surfacing smells, and enforcing standards, all without leaving your IDE or GitHub PRs. Trusted by top teams to ship better code, faster.
Start free at CodeRabbit.ai
Depot – 10x faster builds? Yes please. Build faster. Waste less time. Accelerate Docker image builds, and GitHub Actions workflows. Easily integrate with your existing CI provider and dev workflows to save hours of build time.
Fly.io – The home of Changelog.com — Deploy your apps close to your users — global Anycast load-balancing, zero-configuration private networking, hardware isolation, and instant WireGuard VPN connections. Push-button deployments that scale to thousands of instances. Check out the speedrun to get started in minutes.
Notes & Links
- Amp Code - (ampcode.com)
- YouTube.com/Sourcegraph
Chapters
Chapter Number | Chapter Start Time | Chapter Title | Chapter Duration |
1 | 00:00 | This week on The Changelog | 01:17 |
2 | 01:17 | Sponsor: CodeRabbit | 01:07 |
3 | 02:24 | Start the show! | 07:14 |
4 | 09:38 | It's a genie in a bottle | 09:17 |
5 | 18:55 | How does Amp, Amp? | 05:58 |
6 | 24:53 | Amp CLI is stunning | 04:06 |
7 | 28:59 | CLI TUI edge cases | 04:13 |
8 | 33:11 | Adam's shares his Agent Flow framework | 08:15 |
9 | 41:26 | Sponsor: Depot | 02:12 |
10 | 43:37 | Borrowing PEPs from Python | 18:23 |
11 | 1:02:00 | The cost of Amp and using context properly (Adam holding it wrong?) | 04:18 |
12 | 1:06:18 | Scratching itches wth Agents | 18:43 |
13 | 1:25:01 | Raising an agent | 06:11 |
14 | 1:31:12 | The pod on YouTube | 14:59 |
15 | 1:46:12 | Speaking to AI resistors | 07:42 |
16 | 1:53:53 | OSS hopeful | 04:52 |
17 | 1:58:45 | Try them all! | 03:42 |
18 | 2:02:27 | Wrapping up | 00:49 |
19 | 2:03:16 | Closing thoughts and stuff | 02:41 |
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
Beyang Liu, welcome back to the Changelog. Let’s go as deep as humanly possible on Amp. Not Sourcegraph necessarily, but Amp. What do you think?
Cool. Yeah, it sounds good to me, and thanks for having me back on the show, Adam.
What is Amp?
Yeah, I don’t think it takes too much explanation. Amp is a coding agent. So most people - I would imagine your audience - know what that is at this point. But if you’ve been living under a rock for the past six months, a coding agent is essentially an AI-powered program that takes natural language instructions and gathers context and then modifies the code following your instruction. And so it’s much more high-level. You describe what you want, you figure out how to instruct it, and then it does and figures out how to do most of the editing for you. It runs the tests, it runs compilers, it can correct itself… And the idea is that we want to enable programmers to operate at a higher level, so dictating what the architecture should be, what the key interfaces should be, what the UI should look, and not as much you have to be in the weeds of every single line of code that is written.
And then Amp, in particular, among the landscape of coding agents, is distinguished by the fact that – I think we’re the only coding agent that I’ve come across that has this approach of… So we’re a multi-model, we use multiple LLMs, which is not distinctive. There’s a lot of coding agents that use a variety of underlying large-language models, and small-language models now… But I think our approach is different in that we don’t have a model selector. It’s not a model harness. So an Amp user doesn’t think in terms of “Oh, let me use Claude’s Sonnet 4 today”, or “Let me use GPT-5 today.” We view that as basically an implementation detail. So we’ll figure out what the models are good at, and the contract to the end user is an agentic experience that hopefully just works well for your use cases.
So there’s no toggling on different specific models. It’s more “Oh, for this particular feature or sub-capability of Amp we’re going to use this model, because it has the right characteristics in terms of latency, intelligence, and competency around one particular task.”
Maybe just a side tangent, which I didn’t really plan to do until this very moment… But I was thinking back to the last time we had a conversation, and I think at the very end – I’m not even sure if it ended up on the air or not. So this may have just been a side conversation between you and I, and maybe just some feedback. I think it was “How in the world do I sign up for Cody?”
[laughs]
So obviously we’re talking about Amp… I don’t know where Cody went. I’ve literally forgotten about Cody, until this very moment… And that feedback I gave to you, which was “Hey, I’d love to play with your stuff, but I don’t feel I know how to do it.” I think I have a Sourcegraph account, I’ve tried 16 ways, and I keep getting no progress… So there you go.
Oh, you’ve run into auth issues with Amp?
No, no, no, this was Cody, back when we talked – I don’t even know when we talked. Maybe 8 months ago, 10 months ago, or something that.
Yeah.
So where’s Cody? Did Cody just go away?
Cody is still alive and well in the enterprise.
Okay…
But we’ve sort of moved away from it for non-enterprise use cases. So there’s still plenty of enterprise customers that use it quite heavily for, I would say non-agentic AI coding assistance… And that’s still a common feature among a lot of large companies. There’s some companies out there that even have a legal prohibition on anything agentic, however you want to define that.
Oh, gosh…
You can’t say agent inside some orgs.
What’s the alternative word to agent if you can’t say agent?
Oh, I think sometimes people say workflow. And that means a certain – it has a different… There’s no single source of authority on what agent or workflow or any of these terms mean, but there is a different connotation, or zone of what that definition means. So I think some more regulatory or legally sensitive orgs, they’re okay with workflows, but not anything that’s “fully agentic”. I think it’s hard to pin these down, and at some point it’s not worthwhile discussing in too much depth, because at the end of the day a certain customer has this requirement, and we’re fine meeting that. They determine they need that, and we’re happy to serve them where we can.
But long story short is, Cody is still alive and well in the enterprise. It’s used for non-agentic workflows. But outside of the enterprise, the reason why we built Amp outside of Cody was really we felt that there was this very big technological shift that was happening. If anything, it’s the whole gen AI phenomenon. There’s multiple waves of technology that have actually gotten lumped into one umbrella term that people call gen AI, but we view agents as fundamentally a different technology and a different application architecture than the pre-agent, more chat-based AI. And so that to us was we really needed to think from first principles in terms of how we built this thing, in order to unlock the potential of agentic models.
[00:08:07.18] So we didn’t want to be hampered by the design constraints and the way that we’d built Cody originally, because that was very much for the chat LLM world. And if anything, a lot of the best practices for building with chat-oriented models, it’s the inverse with agentic LLMs.
Big shift, big waves changing as we have this gen AI world… And we’ve been three-ish years deep into this phenomenon, as you have said, right? About three and a half years?
Yeah.
It feels a decade. I don’t know about you, but I feel it’s been a decade.
Yeah, it’s wild.
I’m excited. Are you excited about this change? I mean, I feel I’ve had peak hype, and then a drop down to reality; then peak hype, then drop back down to reality. And I feel now I’m just sort of peak hype all the time.
Yeah. I mean, I’m definitely excited about the technology. I think it’s got huge potential. I would say I was never excited to the point where I believed in the sort of AGI myth, or this notion that it would eliminate all need for work, or kill us all… I thought that was always a fairy tale told by people with overactive imaginations. But definitely, very excited about the potential for this to eliminate toil, for all kinds of knowledge work; not just developers, but especially developers, because that’s who we sort of build for.
Yeah. I feel it’s obviously a force multiplier. I still feel it’s this weird genie in a bottle, where you have to conjure it in certain ways, you have to hold it delicately… I feel even in my own experience it’s smart in some cases, and then really dumb in some cases… I have some questions for you about that to sort of demystify when I feel “Gosh, we’ve done this before and you’ve been really good, and now suddenly you have no idea how to commit to my Git repository.” That’s the most basic function that you could possibly get, is gitcommit -m, blah. That’s pretty easy, but I feel I’ve had moments where I’ve had “You know how to commit to a Git repository, right?” And I’m not speaking to it that way, but in my brain I’m “Wow, this thing was really smart, and now it’s not really smart.” So I feel there’s waves of that even in my own personal usage.
Yeah, so the committing to Git, I think that’s – at least for Amp, that’s largely solved now. It’s been a while since I’ve seen a class of error –
I’m not blaming Amp now. I’m blaming somebody else.
Okay, gotcha. I think those sorts of things, I have very high confidence that it could just do now. But I get what you’re saying. There is this spectrum where some things that you, a human, would consider difficult, it can just – you know, almost one shot. Whereas other things where a human – it’d be a two-line change or whatnot, and it will just… We call it doom looping, where it’s just iterating over and over again and it can’t figure it out, even though you’re saying “Okay, now try it this.” You’re trying to nudge it and it just won’t get it. There’s certainly that phenomenon that’s still at play. And it’s almost at the point where a lot of this reduces to the underlying model intelligence… And so I think the proper way to view this – part of is like, okay, there’s some improvements that we need to make to the product in terms of how we present the answer to the user, and help people prompt in a way that gets you to the sweet spot of the model… But also, users have to view this as a skill set they need to acquire as well. The more you use agent decoding tools, the more you develop an intuition for what it can do and can’t do.
So you can almost tell from talking to a person how much they’ve used these tools in terms of what they express their frustrations as. A lot of newbie users, especially the ones that are from the onset skeptical for whatever reason, they will say “Oh, I asked it to do this thing that ought to be very easy, but it fell flat on its face.” And then you ask to see what their prompt looked like, and it’s five words At that point it’s like “What do you expect it to do?”
[00:12:26.04] Yeah, what did you expect…?
It’s not a mind reader. There’s almost like an information – you put a certain amount of bits in, and from that the model has to tease out what your intention was.
It can certainly accept instructions that are like five words long, but if it’s only five words, then it’s going to be very much based on what its prior behavior wants to do. So if you have a five-word prompt to create a simple React app, or a simple game or whatnot, it could probably do that, but it’s not going to be probably what you intended, unless what you intended is the median React app or the median JavaScript game that’s represented in its training set.
Whereas the more expert users, they still have complaints, but it’s oftentimes around very specific things. It’s like, okay, I know what it’s good at, I’m not complaining about the fact that it can’t do these things anymore… I can still do those things, it’s fine. It’s more around “Hey, I would love to be getting more out of this tool, but there’s certain bottlenecks that are kind of constraining how much I can use it.”
I think one very important bottleneck that still remains is code review. I think most expert users of coding agents realize that they’re very powerful, and they want to be using them heavily, but you can’t trust them fully. You’ve still got to read through the code that they emit and understand it, because otherwise the slop will creep in, and it’ll make subtle changes that if you don’t catch and are not wary of, will add complexity and nuanced bugs, subtle bugs to the code.
So this process of the human understanding the code that’s been generated and ensuring that it doesn’t do anything very incorrect or not according to their intentions - that’s an important part of the process, and it’s quickly becoming one of the key bottlenecks in this process among the folks that are really, really trying to push the frontier of what they could do with these tools.
Yeah. In my case in particular - I’ll call out the tool… It was not Amp, just so you know. It was Claude. And it was around 11.30 p.m., 11.45 p.m. Central Standard Time, literally last night. And the example was – I’m obviously using Claude Code… Well, not obviously. I’m using Claude Code on the command line, so I’m using their CLI tool. So in my directory, Claude opens up their CLI tool, and then there you go.
In my case, I think it may have been a recent context window clearing… Either way, I just felt like simple tasks were getting harder and harder for it to do. It was almost like it just got unsmart. I don’t want to say the opposite word because it’s not cool to do that, but it had become unsmart, basically.
And this specific example was “SSH into this known machine we’re working with.” Like, it has a name in my hierarchy; it’s clear. There’s context for what this machine is, how we’ve been interacting with this machine, and how it would work. And I said “SSH into this machine and just check on the memory state”, because this is something we had been doing. And it suddenly says “I can’t SSH into machines. I’m not able to do things like that.” And I’m like “No, no, no. You’re Claude Code. That’s your first job as Claude Code, is to be able to traverse anywhere I can go in my terminal sessions. And you should be able to SSH in, because I’ve got an SSH key, and this is my own machine, and so this is a known thing…” And I had to remind it. It was like “Oh, you’re right. Oh, you’re right. I can do this.”
[00:16:10.03] And so that’s the very specific example, versus just this – I feel like sometimes there’s drift in its ability… And that could be a Claude thing, it could be maybe it was a certain time and usage was peaked, and maybe these models get less smart whenever there’s peak usage, because there’s maybe less memory to go around… I have no idea. I feel like that part of the world in the usage is so black box. And maybe it’s black box to you too, but in that case I was like “You know how to SSH, and yes, you can.” It’s like “Oh, you’re right. I can.”
Yeah, it’s interesting. I don’t know when this is going out, but there was a recent report that Anthropic posted, where they were – apparently, they had rolled out a quantized version of the model over the past couple of days, which actually did yield degraded quality. And so this has kind of confirmed the conspiracy theories in people’s minds where it’s like “Oh, are they nerfing the model quality.” And it turns out they had rolled out some changes aimed at improving efficiency, but did actually have a tangible impact on model quality. So I wonder if that happened in your case.
In our case – we do use the Claude family models heavily underneath the hood, but we have a couple levers that we can pull, that can help address this issue. One is we actually use multiple inference providers that provide Claude. And there’s actually periods of time where one provider will have different uptime characteristics. So if the model is completely down from one provider, we can switch over to the other and stay online while other people who are using tools that are tied to just one provider can’t use that tool. And that also goes to the model quality. So if we notice quality degradation from one provider, we can cut over to the other and still be consuming the model at sort of full fidelity and full quality.
And then the second lever that’s just now coming online is we’re starting to play around with more and more additional families of models. So we already make use of a variety of models for specific use cases and capabilities within Amp, and we’re also constantly trying out different models as the main agentic driver. And I could see a future where if we start to notice degraded quality or higher error rates from one model family, or maybe it just goes completely offline, we could cut over to another model family as a fallback, provided it’s good enough at the core capability of driving the agentic workflow that we need.
Mm-hm. Let’s talk about, I guess, how it works, if you don’t mind. You’re CTL at Sourcegraph, so you should know these things, or at least the README version. I’m assuming you’re super-deep. I don’t want to assume a lot of stuff, but I figured your position gives you the ability to go probably as deep as anyone might be able to, aside from maybe the core team and literal implementers who are working on this day to day. Give me, as best you can to the world, to the public, how Amp works, from the architecture at a hosted service level… You talked about being able to determine degradation… Give me as best you can a lay of the land that can be public.
Yeah, so I’d say there’s no weird rocket science here. I’d say at the core, at a very high level, Amp operates the same way that every other agent does. The way I’d describe it - it’s a for loop wrapping an agentic LLM. So you take an LLM like Claude, or GPT-5, or any of the other agentic LLMs that are now online, you put that in a for loop, and what the loop looks is you take the user input, you feed it in the model, the model will come back with some combination of tool calls and written response, thinking things out, or responding to the user query…
[00:20:17.20] And the tool calls are just text. They describe like “Hey, you told me that you had access to these tools.” Maybe it’s Grep, maybe it’s a read file tool, maybe it’s a tool to edit a file… “I want to invoke the tool with these arguments”, and then our application logic goes and executes the tool call.
So given the spec that we get from the model, we execute the tool, we get the response, and then that response gets fed back into the next iteration of the loop. And that just keeps looping until there are no more tool calls, at which point the model generates a final response… And that’s usually either the answer to the user’s question, or it says “I made these edits to the code, in these files. These are the changes that I made.” It gives you a summary of those changes. So at a high level, every single agent in the world follows that architecture. It’s the agentic equivalent of the ChatGPT wrapper architecture that was so prevalent a year, 18 months ago.
Beyond that layer, there’s nuances. There’s nuances in terms of what tools we provide the models, the different prompts that we use for the main agent, and then there’s also sub-agents… So sub-agents are a special class of tools where you call a tool, but underneath the hood it’s just its own nested for loop. So it’s its own agentic loop doing a more targeted task. So you could have just an agent calling itself, in a recursive fashion, or it might call a domain-specific agent. We have an agent that’s dedicated to code-based search, which has been optimized for finding and locating relevant context within the repository.
So there’s a lot of tuning that we do to make sure that all those pieces operate well together, that we can eliminate the dumb class of errors, like “Oh, I can’t SSH into this thing”, or “I can’t read this file.” Smooth all those out, because those are big disruptors to the user experience. And then we find pieces to fill in the gaps for what blocks the agent from getting further, or what helps it course-correct itself when it doesn’t immediately do the right thing off the bat.
And then there’s this client-server architecture that we have. Part of it always has to be on the server, because the best models are still server-bound right now, because local models have not yet gotten to the point where they’re fast enough to run locally… But in addition to that, Amp also has a server-side store for all the threads. So we call the agentic interactions that people have with Amp threads. So every session you have with Amp to accomplish a task is a thread, and all those threads get synced to the server. And the benefit of that is that you can go on Amp and see the threads of all your teammates, and how they’re using Amp… Which is really helpful, because coding agents, using them effectively in our view is a high-ceiling skill. Like you said before, it’s not trivial. There’s certain things, especially if you’re a newbie, where you expect it to be able to do something, especially if you’ve come in consuming a lot of the hype around AI… You’re like “Oh, it’s magic.” And then it doesn’t do the thing, it doesn’t read your mind. So it’s really helpful to have this team-wide view of how other people are using this tool to great effect, because then you can go and learn what their best practices are.
[00:23:49.27] So that’s another big component of the Amp architecture, is we do have the server-side component that stores the thread information so you can share with your teammates, and also you can go back and revisit previous threads in case you’re like “Oh, what was that thing that I was learning about a couple days back? Let me go up and look that thread and see what it actually answered, or what it actually did there.”
Is the thread simply kind of a chat history? It’s not really context, it’s just more what was printed back and forth kind of thing?
It includes every single message from the assistant or the user, and all the tool calls and all the tool results. So it’s basically the entire interaction.
Like a transcript.
Yeah, it’s like a transcript. “Here’s what you asked it to do. Here are the tools that it called. Here’s the results of those tool calls. It read this file, I read that file, it listed this directory, it made this edit… And then you asked it to do this”, and then it’s just the entire message history.
Interesting. So you’ve got a client server, you obviously have your own CLI, so you can install it… I think you install it via Brew on a Mac. I can’t recall if that’s how I did it, or if it was via NPM.
The preferred way is via NPM for now.
Gotcha, okay. So you’ve got that client architecture, that is your own CLI, which by the way, is just stunning. It’s beautiful. I love the way y’all did that.
Thank you, yeah.
I really love the nod to NeoFetch… At least I think it was a nod to NeoFetch, the opening splash screen. I could be wrong.
[laughs] The orb?
Well, when you launch – you know how you launch Amp. So when you first – for the users who may not, or listeners who may not… I’m calling them users. Y’all are listeners still yet, you’re not users yet. [laughter] But you know, when you launch Amp, if you’ve ever used NeoFetch on Linux, when you NeoFetch on any given box or machine you’ve SSHed into, it essentially gives you this sort of bigger logo on the left, and a list of details on the right-hand side. That Amp splash page reminds me of NeoFetch. I’m not sure if there was a nod there, or a homage there or whatever…
To my knowledge no, but I’m also not the one who created that splash page, so I don’t know if that was an inspiration.
Gotcha.
Part of what we’re trying to do – I think command line coding agents are all the rage right now, for good reason. I think it’s super-versatile. And I think now in part because of AI, people are really pushing the boundaries of what you can do inside a terminal-based UI. So now there’s all these great new libraries and frameworks that are doing things that are visually stunning inside the terminal. Whereas, I don’t know, two years ago - you asked me to describe the prettiest command line tool, and they all kind of look the same, right? It’s all mostly text.
Htop, maybe.
Yeah, exactly. But now, because the terminal’s so versatile - you can deploy a coding agent in a huge number of places if it’s in a terminal form, right? Because you don’t have to install a graphical interface, you don’t have to go through any of the loopholes you have to do… Or not the loopholes, but you don’t have to jump through any of the hurdles to set up the coding agent if it’s based on the command line. So I think we, along with everyone else, are trying to make Amp as pretty as possible in the terminal.
Yeah, it’s clear you are.
Because at the end of the day, you want to use something that’s delightful, right?
Yeah, I can gush about a lot, but I think just the animation of that orb initially was really cool… And I think it’s changed over time. I can’t tell, because I never snapped a perfect memory shot of it in my brain, but I think it’s changed over time and it’s gotten different colors, and maybe even the shape has changed… But it’s cool. I like it.
Yeah. It’s gotten better, and I do want to call out that – so we just shipped a new terminal UI yesterday, as of this recording…
Is that right? Okay.
[00:27:59.16] And it’s actually using a new terminal UI framework that was built in-house, specifically for Amp. And one of the things that’s immediately obvious if you use it is that there’s no flicker. So we were using Ink beforehand, which is very popular, and frankly, a very great framework. But one of the things that people notice with Ink is that it flickers a lot. And the flicker gets worse in some terminals over others. So if you’re using it inside tmux, as I often do, the flicker is very noticeable, and that’s something that we were able to eliminate. It was a very non-trivial technical task. And special shout-out to Tim Culverhouse on our team. He basically just joined the team. He’s a terminal UI expert. He did a lot of work on Ghostty, that terminal, working with Mitchell Hashimoto and the folks there… And he basically rewrote the entire terminal UI, and it just launched, and it’s really solid, I think.
This date, just so everybody knows - we’re recording on September 3rd. So yesterday was September 2nd. If you can’t do calendar math, I did it for you there… I’m not sure if I noticed that the UI has changed much, but I have noticed the flicker, and I thought it was – like anybody doing this, I don’t think I’ve shut my machine down in a month or so. I’m just afraid to shut it down. I’m kidding, I’ve shut it down since then, because I’ve learned… But for a while there, I was like – because the threads… Going back to this thread, and the transcript - because the things you’re putting into it… As you said, it’s a skill set that you learn. And so I feel like as you get better and better, you want to understand what you did a week ago, or a session ago, or two days ago, or how you framed something, so that you can keep doing a version of that, or build upon it.
And so this history to me is really important, and I’ve been collecting little text files, basically… Because I didn’t realize - or I don’t think everybody has your gumption, where “Hey, these threads, these transcripts are kind of important to spelunk if you’re a week or so past”, or what the true conversation was, or what the user asked, or the… I think you called them – not an agent; the agent is the thing. But what did you call the person? You had a name for the person. Client?
Oh, I usually just say the user, or the human…
The user. Okay, cool. You know, me as the user, I’ve been collecting little prompts and little things that… Nothing major, but just things I’m doing frequently. I’ve come to this – my usage pattern has gotten to this point where I feel like I need to define roles. Similar to how you’d build out a team, I have a Linux platform engineer, or a Go style enforcer who understands really good, idiomatic Go code, and then a style guide for my Go code. And then I have a Go style reviewer, because you were talking about code review before. And so I will give them different roles, and I feel like that works for me. But as part of getting there, it was condensing more and more of what I would kind of keep as this massive prompt, and I would just sort of move that larger prompt for me into a history file of sorts… Where “This is a role. Assume this role. Here’s your task”, and then “Go forth with that new role.” And I feel like that gives it some really good context… Versus some magical prompt I’ve been cargo-culting again and again and again. So I’ve learned a skill set, if you can call it that, to better be an agent babysitter.
[laughs] Yeah, I talked to a lot of users and a lot of people have some variant of that. It’s almost like they’ve built up a library of prompts for specific task types. I think a lot of people have built up some sort of template for listing out – it’s a combination of like “Here’s how to structure your plan.”
[00:32:18.02] First of all, generate a plan at all, if you want to do a longer-running task. Generate a plan, here’s a couple pointers about how to structure that… And then maybe there’s a couple pointers about the codebase as well. We have introduced this thing called agents.md. I think it’s pretty standard now across many coding agents, and now there’s finally a standard. I think more and more people are hopping onto this using the agents.md file as a standard for context. So Amp will consume that. But a lot of people add additional context beyond that, that are maybe not specific to the codebase, but specific to the things that they personally feel they want the agent to do. And so they’ll copy and paste that in to sort of steer the agent to do and behave the way they want it to behave, especially over these longer-running tasks.
Well, how do you feel about the user experience, the way a user interacts? So you said you interact with a lot of different users who have different ways, and all that good stuff… I’ve been working on a – since I’ve been building my skills to be an agent babysitter, I’ve kind of come to this idea… I’m calling it Agent Flow. It’s an internal name to my own brain. I’m a team of one. I’ve got a team around here at Changelog, but the things I’m working on are just sort of fun tools to solve my own itch, and my own problems. So my interactions is not team-based, so I’m not leveraging the threads and sharing those threads with other people. It’s simply just a team of one doing it.
Yup.
And so I’ve come up with this idea of Agent Flow. I would say the protocol is called Document-Driven Development, and the implementation is called Agent Flow. How do I get agents to understand how I want to flow with it? We vibe and we flow, and so I’m like “Okay, Agent Flow sounds kind of cool.” I don’t know if that’s trademarked or not, but I like it, okay? I’ve kind of given you a glimpse of it where I define roles… I obviously leverage your agent file. I think Claude has its own version of it, snd so maybe you can centralize it to just simply [unintelligible 00:34:20.16] But essentially, how do I want my agent to know about my codebase, to know about the things, the tasks I’m giving it? And obviously, which role to take on, which hat to put on, so to speak, and maybe a style guide of sorts that represents “How do we write Go code?”, or “How to write Rust code? What is the blessed, idiomatic way of doing that?”
And then I think the additive to that, that really sort of solidifies Agent Flow for me - and you sort of mentioned it - was “How do you give it its list? How do you give it its direction?” And I just borrowed it from the great language of Python. They have PEPs, Python Enhancement Proposals. And they have a whole system around drafting PEPs.
And so I took the same acronym, PEP, and I just said “This is a Project Enhancement Proposal.” And so for every new thing, every new idea I have, it begins as riffing or vibing as you would, but it’s all about developing this document called a PEP. And the PEPs have numbers… So it could be PEP-0031. Maybe I’ll need 10,000 at some point, but for now I feel like four numerals there is just fine… But that’s the essence, for the most part. It’s like draft PEPs, so that I understand full context. Let’s dream and build and think… And that thing has statuses, it has all the artifacts. That PEP lives in its own directory. The agent can add more things. Maybe it completed something, so it’s a completion report. We completed this PEP, but don’t just update the PEP document. Give me a whole separate document that is about the completion of it; and how did it work. So I can go back and learn and read later on… You even have things like knowledge base. So I have it draft knowledge base articles.
[00:36:20.26] So once we’re done with the PEP and we’ve done X, Y, or Z, not just the completion report inside that PEP, but more like “What did we learn that is now an institutional knowledge for you as an agent, or any other agents, or any of the team members I bring on later on?” Even internal blogs, or what I’m calling builder logs. It’s still blog. It’s B-L-O-G-S, plural, but I’m calling them builder logs. So this is more of an internal free form, but like “Just go and dream what happened here.” And I’ll go back and read those as my agent babysitter hat… And I learn with it how to direct it. And so this entire flow that I’ve sort of learned as a skill set, I’ve been calling Agent Flow. What do you think?
Yeah, I think that fits with the way that a lot of people use Amp, actually. I think the more sophisticated, intentional users have a very intentional design process, that often starts with the human generating a lot of tokens describing what they want. And the length of the complexity of the task that you want it to handle - there’s kind of like a linear correlation between the complexity of the task and how much direction you want to give it. So if I’m doing a thing where I know exactly what I want, and I want the agent to get as far as it can on its own, then I will do things that are very similar to what you’ve described. Like “Hey, I want you to generate a plan. It’s got to have these properties. Make sure that each step is annotated with the relevant context files that you’ll want to address when you’re actually implementing this plan.” And maybe I do a couple iterations on the plan itself. If there’s any course corrections, “Oh, use this technology, not that technology”, or “Use this library, not that library.” And then once that plan is generated, then you have another thread that then goes and says “Okay, go and execute this plan”, or “Do steps one through two of this plan, and then let me review it.”
So that’s a very intentional workflow, that – there’s a certain set of our users that are all about trying to do as much as possible in a fully automated fashion. And the ones that are trying to do that are essentially doing exactly what you’re doing; they’re trying to build this workflow around the human being, the product manager describing at a high level what they want, and the agent essentially filling all the details.
I will say that that is an important set of use cases, and I find myself doing that from time to time… But there are a lot of other workloads that are – I would say, me personally, I run threads using the workflow that you described, relatively frequently, but not necessarily every day. Because oftentimes, especially now, a lot of the work they do is more exploratory. There is a different workflow that ends up being – you get less out of each thread, but you probably end up doing it more frequently, because it ends up being less mental energy upfront, and also a better fit for use cases where you’re in more like exploration mode, or you’re trying to learn alongside of what the agent is doing. So there’s tons of development tasks where I myself only have a half-baked idea of what I’m looking for. It’s like “Oh, maybe a sub-agent that uses a different search tool would be interesting.” But do I have a clear, exact vision for what that should look like? No. “I want you to first do a spike.” In that case, it’s much more informal.
[00:40:10.04] I’d say if you look at most of my threads, most of them don’t involve a specific plan generation step. It’s more like “Hey, where’s the part of the codebase that pertains to this, and can you give me an explanation of how it currently works?” And then I have this idea… “Help me think through a couple possibilities of how I would go about implementing this, and what frameworks I’d use.” Okay, great. “Can you do a quick spike? Don’t implement everything yet, but just these endpoints, because I want to play around with it.” And so it’s this ping-pong back and forth between me and the agent, where part of the side effect of getting it to do stuff is that I’m learning and acquiring domain knowledge alongside of it.
So yeah, most of my workflows are probably somewhere between those two points along the spectrum. It’s either very casual, still thoughtful prompts, but not following a specific structure… All the way to “I know exactly what I want, because this is very similar to something I’ve done before”, or the feature is very clear in my mind, in which case I try to front-load more context and see how far I can get it without any intervention or correspondence from me.
Break: [00:41:19.29]
I think what you just described too is the exact way I work as well… And the way I would change, or maybe even suggest you to - and maybe you do this - is I’ll what I call riff. I’ll riff for a little bit with it. I’ll explore. I might ask a question. I might ask it to help me think through some things. And there’s some back and forth on thinking that I just call riffing, and everybody calls that riffing, right? But at some point, there’s some clarity that begins to form. Sometimes it even begins to do some of the work and I’m like “Okay, I kind of that. But hang on, pause. Don’t do that. Now that we have a little bit of clarity, draft this PEP.” That way, now all this riffing has a place to put things. So it has a drawer or a folder to put stuff into. Sometimes if I even have a half-baked idea, I will still – like, if I’m in flow and I’m thinking through something, and it’s working, and it comes back, I might just say “Quick idea - throw this into a PEP, but let’s continue.” That way it’s sort of like a to-do list, and I’ll come back and I’ll review what I’m calling my PEPs… And it becomes this place where eventually it has a home, and these PEPs have statuses… So I literally just borrowed Python’s statuses. You have final, you’ve got superseded, you’ve got all these different flavors of statuses… And I’m like “We’ve already thought through this all. Let’s just borrow their ways”, right?
I love that, yeah.
And so I just kind of keep these PEPs as this constant - I don’t want to call it a junk drawer necessarily, but it’s a place to put something. I’ll riff for a bit, I’ll explore for a bit, but the moment there’s clarity or any sort of visibility, I want the model to kind of put all the context we’ve gathered into this thing. I don’t even go back and read it right away, because I just hope, I suppose hope; and maybe this is where I’m rubbing the lamp a little bit, hoping the genie does it, is put this idea there with enough fruition to put the right things there. So when I come back to it, I can go deeper, and we can turn this into a real deliverable, or real spec. I treat that as like that.
Yeah. Do you commit those to the Git repository, or are they in some temporary folder?
I mean, right now it’s not open source. I’ve wondered about that. Like, “Should all this be there?” And I feel like maybe as one of the earliest humans living in this almost three-year-old world, where this new phenomenon is on my command line, I almost feel a requirement to future humanity to commit this to memory in a way, you know? So I’m not really sure. I’m in the blurred line there whether it should or shouldn’t. But I think from my use case, it makes sense to have it all in the same repository, to have it all in the same place.
My agent flow, the way I described it at least, puts all this stuff not in the root, but in a folder called admin. One, I want it to be at the top of the stack when I look at my – I still use Zed, I still use a code editor, but it’s mostly a code viewer for me. I’m editing some things… I’m not doing a lot of coding in there, but I still do some stuff, but it’s largely pushed to the model, to the agent. I put all that in an admin folder. So you’ve got admin/pep, admin/knowledge, admin/builder logs…
Gotcha.
…all these different things there. Or even the role files. Those all live in the admin folder, which is sort of like me as the product owner, CEO of this directory, that is now a project, that’s how I look at it. This is all admin stuff, and I put it there.
Yup, makes sense. I like that. And I like having - it’s almost like an audit log for how the code was generated.
Absolutely.
And I feel like it might be an area of untapped potential there, because I think one of the things that would be helpful, especially if you’re going back and reading it later, or if someone needs to review that code, is to see the audit trail of “Oh, okay, this was the plan that generated this code, and this is what the high-level intention was.” If anything, that’s almost – if you could snap your fingers and have that included as part of every pull request, you almost want to do that, because it gives you just…
Yup.
[00:48:11.10] I feel like a lot of time in code review is often spent trying to piece together what the high-level intention was, especially if the person on the other end underspecified what they’re trying to do. And so you almost have to play archaeologist a little bit, trying to piece together what the high-level intent was and where to start. And if you have access to the plan, then it gives you a pointer right into what parts of the code you should read first.
Well, I was moving so fast, I was like “I’ve got to capture this exhaust. I don’t want to slow down and read it, but I want it to get created.” So when I can slow down and think about some of these things, it’s there. So the way I looked at these builder logs, essentially, was that they were the what and why. The long-term knowledge, the institutional knowledge went into a knowledge base thing. And they may be connected, but the intention is different.
In a builder log, I kind of want the agent, the developer, to tell me what they encountered along this journey. What thorns did they get stuck on? Where did they get cut? Where did they apply a band-aid? How did they apply whatever, and what were they trying to do? What were the blockers? What were the learnings in there? And some of that stems back into maybe a knowledge base article.
The knowledge base article is more like “how to use it, why we use it this way” kind of thing, whereas the builder log is more the journey of the developer during implementation. What are all the things that you encountered there? And honestly, it’s been paramount, I think, to – I don’t want people to think that I’m sitting here, literally glued to my screen, watching this thing constantly… This flow lets me become a babysitter. It lets me hang with my kids, while doing this in the background. My computer’s over there, I’ll go check on it every 10 or 15 minutes… Push yes, push no, commit that to memory kind of thing… I could be making dinner out in the backyard, barbecuing, whatever, and still babysit this thing with this kind of flow.
So it sort of allowed me the ability to, when I want to keep making progress - because I feel like this stuff is just so… I don’t want to call it addictive, but tantalizing. I’ve never been able to do this kind of thing at this level of X, 10X, 20X productivity level, ever in my life, right?
Yeah, it’s absolutely incredible. And I think one of the first wow moments or points of realization for me with Amp, too - this was very early on. It was back in probably April, or something. I had one of these weekend experiences where I was taking care of our kid… We have a two-year-old at home. So he’s running around, he’s an agent of chaos, and so it’s very difficult to be in the same room with him and do anything for longer than two minutes at a time. And I was literally running back and forth between playing with him and trying to get the spike of this feature up and running. And I ended up getting to a point where I was like “Oh, this is actually good, and I’m going to push this, over the course of a Sunday.” And without the benefit of this tool, it would not have happened. Because in order to write anything trivial, you need that mental space to get in the flow, and to think through the things, and page in all the context… And every interaction resets that.
Whereas with the agent, if the thing that you’re doing is within the capability of the agent, and you’re able to remain at the high level, you can absolutely do this async thing where it’s like “Okay, let me tell you what I want you to do. Let me give you some detail. Let me maybe give you some pointers for how to validate it and verify that you’re doing the right thing…” And then I’ll fire it, and then I’ll go and do something else, and then when the – Amp has a built-in notification sound; it kind of pings you. It’s a similar sound.
[00:52:11.16] I love that sound.
And it’s almost Pavlovian now, where I will hear it and I’m like “Oh, it’s time to go and check on what the status of that thread was”, and we’ll go see what the agent hath wrought and see if it’s good or if it needs more adjustment.
Yeah. I’ve found that Amp is the one that I can most easily set loose on a task. And I told you, I’ve described my PEP ways, and so everything that is long-running is not me prompting it necessarily. It is this riffing to a PEP that has clarity, it’s got conviction, it’s got pros and cons… It’s got whatever context is necessary. Even if my team was formed with humans, it’s human readable. It’s English. It’s not some random language that is hard to understand. It is just the normal way to do it. I find that that methodology to me lets me not have to keep the spaghetti together in my brain. It all kind of falls there, and it doesn’t really matter what it is, it’s just in that flow and it just sort of works. I like that.
Yeah, I appreciate you saying that. I don’t know if we have – we definitely do not have one weird trick that makes it work better in certain cases than others… All I can say is that we are very heavy dog fooders of our own product. Everyone on the team is using Amp to write probably north of 80% to 90% of the code that they generate. And as part of that, there’s just a lot of iteration that we do on the prompts, on the tool definitions, on the way that sub-agents combine, that I think has compounded over time, over every day and week of development usage, has led to an experience where it is able to get farther than some of the other coding agents out there on these long-running tasks.
One thing I’ve done recently - and we can keep going deeper on how we use it if you’d like to; I’m having fun with this - is… I told you about the roles. And so rather than just define a good Go developer role, let’s just call it, or a “I need to deploy to Linux, so I need a Linux platform engineer kind of role.”
Yeah.
I got into this place where I’m like “You know what, I’m just tired of repeating myself the same things.” It’s kind of got a little boring. So I was like “Well, let me create a product manager role.” And so now I’ve taught my product manager what I think I know, at least the way to prompt it… And it had become, at least in a long-running PEP, where we had a clear goal, it had multiple phases… And again, I’m not trying to be glued to my screen. I’m not trying to be stuck to this black mirror. I’m trying to walk away and do my life, or move to a different tab and keep doing my real job, which is not writing software at all. My job is not writing software necessarily, it is talking about software. Or defining relationships, or nurturing relationships, or all the things I do.
And so when I had defined this product manager role, especially in this long-running PEP, I would have the product manager review the output from the other agents. So I had two agents going at once…
Cool, yeah.
…in this case one of them was Amp and the other one was Claude Code, because that one’s cheaper for me. You’re really expensive, by the way.
Well, we do the usage-based pricing thing…
I know, I know… But it’s still – it’s costly. We can talk about that. I think it’s costly, and that’s okay. I’m not fouling you for it, but Amp is, I think - I’ll pause here and just say I think Amp’s one of the best, if not the best agent out there, and I think your cost is justified; I just don’t like it. [laughter]
[00:56:12.13] We wish we could offer intelligence at a cheaper rate. I think certain things are just dependent on the models, where they are, what size of model you need for a certain level of intelligence, and the cost of GPU inference. One of the things that we haven’t done, that others have done, is price Amp at below cost. And that’s for two reasons. One is, we are in this for the long term. We’ve been building developer tools for more than the past decade, and so we want to build a sustainable business around this. And I think that a lot of the other tools that – I think there’s this notion of “Hey, if we can sell a dollar’s worth of tokens for 70, 80 cents to grab market share, and that somehow that will mean that we can lock in users to our coding agent…” I very much don’t think that’s the case. Coding agents have very little lock-in. It’s very easy to switch and try a new tool.
It’s so easy, yeah.
So I don’t think it’s a good business decision for us. And two, it also introduces a perverse incentive. So you brought up model quality degradation earlier in the conversation… And one of the things that we never want to be in a position of having to do is nerfing the models that we’re using because we need to keep the costs under whatever flat rate price that most of our users are paying. So I think that – look, you’re right. All things considered, cheaper is better. I definitely feel the same pain as a user. But at the end of the day, I think we’re making the right trade-off in terms of what’s sustainable and what incentivizes us to build the best quality, and I think the real trade-off here is not whether you’re paying X or Y dollars for the coding agent, it’s how much time you’re saving as a human, and what more you can build. And if that’s your barometer of comparison, then the difference in cost of agents is really a rounding error in terms of the time that you’re saving and the additional value that whatever you’re building can bring… Provided what you’re building is something that you can put an economic price tag on.
Yeah, I think my frame of reference - and maybe even me justifying my comment… So if I was a business who employed multiple people, with multiple salaries, and I can augment them or enhance them - then yeah, in that case… I’m already paying large salaries, and so this is a nominal additive to that existing metric. In my case, I’m just tinkering. So it’s expensive for tinkering. They’re not open source yet, but I plan to open-source the tooling. I’m just literally just trying to play. I’m trying to explore, trying to learn… I wholeheartedly believe that what you said earlier, which is that playing with command line level agents like Amp, like Claude Code, like Codex etc. is a skill set to be learned and flexed. If you throw it five words because you think it knows everything, you will get a website that doesn’t do what you think it should do. It won’t be using the language that you should be using. It won’t have the deployment target you think it should have. You need to be specific, just like you would with any other team member. The context is still – and I’ve actually learned more about how I think I could be a better human by how I interact with these machines… And I’m just gonna call them machines, because that’s just the simplest term to use.
Yeah, yeah. It’s what they are.
[01:00:00.13] Because you realize how much – I’ve always harped on the idea of expectation and clarity. I can’t foul you, Beyang, for not doing X if I haven’t described X to you. I have to describe X to you, so that you have clarity on what I expect from you. And if we have an agreement on that clarity, my expectation is for you to understand that agreement and come back with that expectation of sorts. But if I haven’t given you that, how can I expect you to come back with any version of what I want if the clarity hasn’t been met?
So that’s how I’ve been, I think, a better human in my life, is having the understanding. But I think it’s gotten even more clear to me how important that is, how important context is… Because just like with framework reference here - I called you to expensive, necessarily you got defensive… It’s good, you should do that. You defended my fact, that is a fact to me. But then I clarified my framework reference, is that I’m not already paying developers. And so this is a cost center for me, because I’m just trying to play and tinker. And so I feel like – so just to give that there.
I think it’s taught me how important context is in all relationships, whether machine or not. In all aspects, how important is clarity. If something is not clear, how can you come back to me with frustration, anger, disappointment, if the expectation of what you wanted from me was not clear, and we didn’t have some baseline of an agreement? That’s what you’re asking these machines to do, right? You ask them to agree, to make you productive… I mean, theoretically agree, by pushing Amp into your command line terminal, or whatever… That’s the agreement, right?
Yeah.
Yeah, I’ll kind of pause there, because I just feel like we kind of went on a little bit there… But the framework reference really is I think that Amp is – the way I would leverage, the way I think of Amp in my own workflows, I don’t have the cash flow or the cash to make you my daily driver. But what I can do is when I know I have a longer-running, really important thing that requires deep, very good execution, I will draft a very clear PEP, as I’ve already described to you, and I will set Amp loose on it for a little bit. Not the whole thing, or I’ll define it very clearly. That’s where I’m leveraging Amp personally, because of that reason, of those reasons.
Yeah. Well, first of all, I’m curious as to how you’re prompting Amp, because – so it’s one thing if you’re using Amp heavily and you’re coding a full workday, every day of the week. In that case, if I look at my usage, I’m probably racking up maybe on the order of low hundreds of dollars per month… But if I look at just for my weekend side projects, for me at least, it probably comes sub-100. In some cases, if it’s something simple, it’s a couple dollars here and there. For the price of a coffee, I could get a mini app that scratches an itch.
And so for me – yeah, if you use it heavily, at some point the token cost will exceed whatever flat rate folks charge… But in terms of what you’re getting out of it, I feel – like, I’m using it. I use Amp on my personal credit card for personal side projects, and it’s not really an issue for me. For the price of eating out - that probably covers a month’s worth of personal usage for me.
For sure.
[01:03:48.06] But I do see some users – I will say this… The way that people prompt and the behavior patterns around when people start new threads is very different. And we almost want to put out a guide for how to use tokens more efficiently. Here’s one thing I’ve noticed. If you talk to very senior –
Yeah, I don’t know that.
If you’re talking to very senior engineers on our team, you look at their threads, most of their threads err on the side of being very targeted, and short and sweet. Not trivially short. Not like one message and then you’re done. But it’s like “I want to do a very specific thing. Go and do it.” Okay, on to the next one, new thread, because it’s a new task. Or you implemented step one of the plan, now you’re done. Step two - new thread.
What I’ve noticed among the people coming from non-technical backgrounds - so we have a ton of people on our go-to-market team, for instance, who are building side projects and games with Amp. I look at their threads, and you’ve got like 200 messages. It’s a thread that’s 200 messages long. You fill up the context window. Now, the underlying model supports a million tokens of context… Each incremental – yeah, we have prompt cache and all that, so it keeps the cost low to a point… But if you fill the context up to the point where you’re occupying hundreds of thousands of tokens, first of all, each incremental request, you’re not going to get the highest quality, because there’s a lot of stuff in the context window that could confuse the model… And you’re paying the cost of consuming as input the entire previous 100 messages in the thread. And at that point, you almost want to ping people and be like “Hey, I know that this may be the thing that feels most natural to you, but I would actually recommend –” You should treat threads as these one-and-done rip-off notes. Rip them off frequently, rather than do the whole – you don’t need to build a whole app inside a single thread. In fact, I would probably recommend against doing that, because you will get lower quality, higher latency, and more cost if you do that.
Yeah, I would concur, emphatically. Please teach people how to be more efficient… Because as you described before, it’s a skill set you learn. So I’m learning. And the whole even reason why I’m even playing with it is not because I’m trying to really build a bunch of software… I’m just trying to tinker. One particular itch I’m scratching is here at Changelog we obviously produce video podcasts. It’s video first, it’s on YouTube… The weight of media is very heavy. When we were audio only, it was maybe a four to six-gig project. When we moved to video, there were easily 15, 20, 25 gigs for every episode as a normal.
And then because we have a desire to keep the long life of these projects, we have a full system to do all this. And this goes back - I mean, more than a decade we’ve been doing this practice, an iterative version of what we’re doing today.
Every episode has its own directory, it’s got its own workflows and statuses, etc. But when the show’s done, by and large, it’s done forever, for the most part. Unless we go back to it and remaster it, or unless we go back to it and reference content within it, which then we want to pull that original source and then extract it, as a tool would let us, like Adobe Premiere or Adobe Audition… Those are the two tools we use in our kind of primary workflow.
[01:07:39.08] So I use a technology called 7Z. It’s a pretty well-known archiving tool. It’s an algorithmic format that I’ve been using. It generally shrinks things that we do around half. So if it’s a 20, 25-gig thing, it’s in the 10 to 12-gig artifact long-term. And I will compress the entire thing.
For a while there, this thing was just a simple Bash script, to be a smarter thing on top of 7Z… Because I forgot how to use – I wasn’t going to keep going back to the documentation of 7Z on how to use it. And so I simplified it by writing a Bash script, like any hacker would, right? And then over time, that Bash script got more and more sophisticated, and then about maybe five weeks ago or so, we were on a podcast talking about Claude code… And I was like “You know what? I haven’t played with that enough yet.” I’ve been largely in this hype cycle, not really playing with it much… And I thought, “Okay, let me open up Claude right now”, and just… “Where can I set it free, real quick, in this call?” And I said, “Just improve this Bash script.” And in the moment, that’s all I said. Like four or five words. It was not “Make this amazing. Here’s the CLI. Here’s all the flags I want you to implement…” None of that. It was just “Improve it. Show me how I can improve this thing.” And I was just flabbergasted with how much improvement there was to it. And I was like “Oh my gosh, I’ve been missing out. I have got to go as deep as possible on this thing.” Because this is a crucial tool for me. I use it on the daily, if not several times a day. I’m archiving, moving things for us and whatnot.
And I don’t like to spend a lot of money on a hard drive, so I want to compress and store forever. So we have a TrueNAS server here on our LAN, all that goes over there, we have this whole entire flow… And so this Bash script really has been my itch. And so I’ve now developed a tool called 7zarch, or 7zarch, whichever pronunciation you like. And it is now my brains on top of 7z. It is sophisticated enough for me to point it at a directory, examine it, know its media, and set the compression algorithm to be best fit for what’s in there. Nothing like that was in my Bash script.
And so because of these agents and what I’m doing, I’m able to now make that thing what it is. And so those are the things I’m using it for. It’s not like I’m an everyday software developer. It’s those kinds of things.
Back to “Help me be more efficient”, my gosh… Yes, I’ve shared with you my exhaustive workflow. I think I am using way too many tokens. I’m definitely not treating threads, especially with Amp, like throwaways. Because in my experience, when the context window collapses, the thing forgets everything. And so I thought, “As a user, I need to keep this context window as clear as what I’m working on.” Because if not, it forgets. It doesn’t know how to be what I’ve asked it to be for this task, and I kind of get lost in it. It doesn’t know what to do.
Yeah. Shoot me some of your threads. I think one of the cool things about Amp is you can share your threads with other people. Oftentimes we’ll get users sharing their threads and saying “Hey, this is how I built this.” Or “Hey, can you help me improve and understand what I could do better?” I would love to take a look at how you’re using it and maybe give you best practice tips from what we’ve learned.
For sure.
I will say, my intuitions are almost inverse from yours. If anything, starting fresh is great, because you know the context window is clean. The more things that you accrue in the context over time, the more possibility for confusing the model there is. And so the quality – if anything, back when the context window limit for Claude’s Sonnet was around 200k, I think one of the things we noticed was there was severe… You started to notice a little bit of degradation actually around 70k, and then a much steeper degradation past 120 tokens. It was always one of those things where it was vibe-based. So we never built that directly into the product, saying “Don’t go past 120.” Because sometimes there’s a legitimate use case that does involve taking advantage of a lot of the previous context… So it’s a matter of user tastes. And plus, we’re all still learning in real time, so we didn’t want to be too prescriptive.
[01:12:20.13] But I think maybe now we’re at the point where we’ve seen enough where we will be offering some more visual indicators around how much of the context has been used, and where the sweet spot starts to end for some of these models.
Now it’s like you can go up to a million… And I think the quality has gotten better beyond 70k now, but it’s still to the point where all things considered, I prefer the rip-off note, let’s start a fresh thing. And it’s almost like the context that you want it to remember hopefully is captured within the files that it’s modified. So it’s like “Make this edit to the code”, now it does this thing. Okay, now let’s start a new thread, and it now does the thing that I just asked it to do, but now make this other modification.
It’s almost as a human, when you git commit, you’re kind of saying “Okay, I’m committing this, this is one atomic change, and now let me start from a clean slate to do the next incremental thing, where I don’t have to worry about all the other stuff, all of the other changes that I made to do the previous thing.” It’s sort of mentally clearer to have this separation. I think the rough analogy applies to agents as well, where if you want to be clear and precise with what you’re doing, it’s almost better to have this habit of starting new threads for each isolated or targeted task that you want to accomplish.
Yeah, this is definitely something I think needs to be more clear, because obviously for me it was not… And here’s me calling you on the podcast, telling you you’re too expensive, and here’s me, a user who does not understand how to use your thing… And so it’s not really expensive, I was using it in an expensive way. I was using it inefficiently.
Yeah. And I would say, first of all, we’re building the product… We need to build a product experience that makes sense to users. And so it’s never the user’s fault.
Well, this is definitely your fault. I would say this is definitely your fault. I was joking around.
Yeah, it’s definitely our fault. At the same time, the tension that we notice is that - again, coding with agents is a high-ceiling skill. And so there are legitimate use cases that do involve filling up the context window. And so what we don’t want to do is we don’t want to be overly prescriptive. Because again, the persona that we have in mind when we’re building Amp is it’s really ourselves as a proxy for a professional engineer whose job it is to code every day, and use this stuff frequently as the main way to generate the code they produce.
And there is this tension where the more things that you build into the agent, the more things you build into the UI that are prescriptive, like “You should do it this way, or that way”, it runs in direct tension with the desire to create a power tool that lets people use it in the way that they want to use it. So it’s kind of this constant struggle where we do see a good amount of people using it in ways that we don’t use it ourselves internally. Some of them are weekend vibe coders, and they’re getting value out of it in the way they’re using it. It’s not the way that I would use it…
I think we’ll put out some blog posts to say “Here are the general best practices that we consider are good.” But we also don’t want to overly constrain people, because one, there might be a power user out there that’s getting value out of it in a way that we don’t realize. And two, this is all still something that we’re learning in real time.
Yes…
[01:16:17.10] The era of coding agents - I think we’re six, seven months into it? And so the last thing we want to do is – you know, guardrails are good, but guardrails can also yield blind spots as the technology evolves. And the last thing we want to do is obstruct the view of our user base and ourselves from unlocking a usage pattern that is really powerful, that you don’t see because the product is telling you to do something that is my personal best practice, but may not generalize to all the different users and codebases out there.
Yeah. Well, I think as a response to that, one thing I would suggest is definitely don’t change the way Amp works. I would say, if anything, add maybe a slash command that is not so much buried, but enabled if a user wants to go and read documentation, and do an alternate version. So I would keep things the way they are, but I would definitely educate on how the context window works. Because that is something - to me, I can assume; as a smart person, I can begin to assume, but I have no idea. And as we just described, we’re all learning these things, and this is a new versioning thing, so we’re sort of all learning on the fly. This is definitely a skill set to be developed, but I have no idea how the context window works. I don’t know how to leverage it. So when you just said “There are times whenever filling the context window makes sense”, I don’t know when that makes sense. I don’t know when it makes sense to thread and rip it off and start new. I don’t even know how to create new threads. I don’t even know, aside from maybe back at the literal command line, not inside of Amp… And Claude has this too, where you can resume, and you can continue and stuff like this. So this is not a Amp only thing, this is a kind of an agent-level, common thing. Maybe threads is something uniquely to you, and the way you can share them and stuff like that… But I’m not ripping off my threads and starting new. I don’t even know how to create them… I do know how to list them. I do know how to see the ID of them. I assume I have to go and copy and paste it and continue it if I want to… So there’s no CLI or TUI for perusing my threads and doing things, at least in my command line. So this is just early days for you. I’m sure you’ll build that eventually in there, but…
Totally.
…that context window and how to leverage threads is black box to me, personally.
Yeah. No, I totally get that, and it’s something that we’ve been actively thinking about for a while now. I think we’re one of the first to ship the context window meter… So just like showing you how much of the context window has been filled up; that’s an important indicator. I think now I’ve seen it pop up in a couple of other coding agents. But that to us is one signal that’s useful to look at… Because the higher that percentage goes, the more costly each additional message becomes. You take a latency hit as well, and you start to see quality degradation at some of these key fall-off points.
Right.
I will say for the folks listening, my personal best practice for using agents is, again – the analogy that I draw when explaining this to people is the context window is sort of analogous to the human brain’s working memory.
It makes sense now.
[01:19:48.08] So there’s the old adage, you can’t think of more than five, or seven… I forget what the exact number is, but there’s only a certain amount of distinct concepts that you can have in your brain’s kind of L1 cache or working state, that are immediately, instantaneously accessible at a given time. And the context window is essentially that, but for LLMs. So the more you try to cram in there, at some point it’s got to pick and choose which pieces of that it’s actually going to pay attention to, and then it has to effectively ignore the rest. The more you cram in there, the more chance that it will become confused, because the salient piece in the prior thread history is not actually being attended to when it responds to the next request from the user.
And this is not something – it’s not just long-running threads where this is an issue, it’s also an issue with things like MCP servers. MCP is very hot right now, and I think it’s great that this protocol – I mean, we partnered with Anthropic on the initial kind of versions of this. I think it’s a great protocol, and it’s great that it’s gotten basically everyone in the world thinking about “Hey, how can I – any service that exists, how can I build an MCP server to provide tools and context to large language models?” That aspect of it is awesome. But I think that there is this kind of learning curve you have when it comes to MCP servers, where a lot of the MCP servers in existence now weren’t written intentionally, and they weren’t written for a specific application. As a consequence, a lot of the tools that they inject are largely irrelevant to task at hand.. And each additional tool definition becomes a piece of context that gets placed in the context window. So the last thing you want to do is – you don’t want to turn on a dozen of MCP servers, each of which injects a dozen or two tools, tool definitions into the context window. Now all of a sudden you’re talking about 100, 200 tool definitions that off the bat you’re paying in terms of token cost, latency, and degraded quality if they’re not relevant to the thing that you’re trying to do.
Yeah. Babysitting is getting harder, I’m telling you. I mean, this is all leaning back to – I mean, it makes me rethink my agent flow model. Not in terms of changing it drastically, but more like “Okay, I have to be more efficient with props.” I have to be more efficient with maybe a role definition for whatever the role might be, because it seems like that’s kind of token-heavy, and now it seems to be more clear that this context… I literally had no idea how the context window worked, beyond knowing what it does, until this conversation. And I didn’t think about it like that, that the more you have in your prefrontal cortex, or that L1 cache that you mentioned - yeah, you’re going to be confused. They call it focus for a reason, right?
Yeah.
Take your to-do list, pull one off, focus on that. So that’s analogous to thread, right? A new thread, focus, context, which role - in my case or in somebody else’s case - which PEP, execute. Done. Rip it off, new thread, clear context. To me now that makes a lot more sense. I would love it if you can educate the world more so on this context window and how it works… Because I’ve spent too much money with you.
[laughs] Yeah, we’ll put out more material. Actually, just as you were saying that, another analogy that comes to mind… You know, beyond the human brain analogy, there’s also an analogy that can be drawn between threads and functions.
So if you’re a developer and you’re writing code, you know it’s an antipattern to have a thousand-line-long main function. Or a 10,000-line-long main function. Because there’s just too much to think about. There’s too many ways in which those pieces could interact. So what do you do? You compose that long main function into various sub-functions, each of which, hopefully, is no more than a dozen or a couple dozen lines long. That helps you encapsulate things so that you can reason about it. I think the thread is essentially the agentic analog to the function call.
[01:24:09.13] And so you can make agents do very powerful things with relatively short threads, using the flow that you describe. If you have a planning thread that generates a plan - well, to generate the plan often doesn’t require filling up the entire context window to generate a high-quality plan. But once you have the plan, you don’t have to use the same thread that generated the plan to go and implement the plan. The plan is an artifact that you can then feed in as an argument to another thread to say “Okay, go implement step one or two or three of the plan.” And that thread doesn’t have to know how the plan was written. All they need is the plan. Because the point of the plan is that once you have the plan, that’s what you’re following.
It stands alone, yeah.
Yeah, exactly. So maybe that’s another analogy that could resonate with a developer audience.
For sure, for sure. Well, I’ve enjoyed that part of the conversation. I know that while I didn’t plan to actually go this deep on how to actually babysit an agent, maybe we could talk about what it takes to raise an agent. I love this podcast y’all did. I think, if I understand too the way Amp came about was – I don’t want to call it accidental, because I think you are very, very deliberate with what you do… But it seems like, in my understanding - and you can clarify this - Quinn, your co-founder and CEO had this idea, and I think Thorsten Ball, a friend of ours as well, over a weekend put it together… I’m loosely paraphrasing what I thought was the inception of Amp…
Yeah.
What has it been like to go from there, if that part of the story is true, to raising an agent, building a podcast around it? How has this new journey for you all been? Because I feel like for me, something I’ve been preaching to a lot of the different brands we work with is that - you know, I love it that you sponsor our podcast. That’s amazing. Don’t stop doing that, obviously, because that’s what sustains our business. That’s what makes this show even stay here right now. But I think that you all should have your own channel. You should be creating your own content. And that’s kind of what you’ve done with raising an agent. Can you talk about inception of Amp, and raising an agent, what you all have been doing around media and creation and software and core team, etc?
Yeah, so the inception of Amp is basically as you described it. Thorsten rejoined the team, and he came in and was trying to figure out what to work on… And I think we’d been talking about wanting to do more agentic stuff for a while. I think the first week he was back, basically I told him “Hey, can you go and take some of these principles and try to make Cody do more agentic things?” And then he did that for a week, and basically came to the conclusion which I shared at the very beginning, which is there’s so many design constraints that assume certain things about the model that it feels like you’re going against the grain of what everything else in the application wanted to do.
So essentially, he and Quinn went off and basically spun this up. It was like “Let’s just do a spike and see where it goes.” And that first iteration - I think he’s got insanely great taste and great intuition for how to build quality software, that wraps new technology in a way that is tasteful, but unleashes the power of the underlying technology while keeping the user experience really good… And then it just sort of compounded from there. He built that, and folks started trying it out internally, I tried it out internally… I was like “Wow, this is really powerful.” That first moment where it’s like “Hey, go and change the background color of this random component.” And then it goes and finds the exact component, and makes it, and you didn’t do anything past that… It was one of those “Holy crap” moments.
[01:28:06.01] And what we quickly realized is how many of the things that we had to kind of rethink from first principles. I think any time a new, amazing, disruptive technology comes online, you really have to force yourself to think from first principles. Because a lot of the rules of thumb and best practices that you’ve kind of imbibed over the years were based on assumptions that may or may not hold true any longer. And so part of that was “Okay, we’ve got to go back to first principles and learn what the models are capable of, and what they can do within an application scaffold that we’re building in real time. We’re going to have to relearn this in real time. How can we do that in a way that shares those learnings with people using what we’re building, and also engages people in a way that gets them to share feedback back to us?” Because we don’t have a monopoly on insights. I think everyone recognizes that we’re all kind of figuring out what agents can do in real time… And so we’ve learned a ton from our user community in terms of how people are using it.
The workflow that you just described, the agent workflow - the agent flow workflow - that’s something that we, or I at least, heard from some of our users for the first time. So that’s not something that I came up with on my own. It’s some of the users who were using it heavily reached out and said “Hey, we’re using it in this fashion.” And then I was like “Oh, maybe that’s something I could learn from.”
But a lot of those people, they reach out because they listen to this podcast and they see that we’re kind of openly sharing what we’re figuring out in real time. And it’s a very raw, unadulterated podcast. It’s literally like two, sometimes maybe three people on the Amp core team just riffing about some of the topics that we’ve been thinking about recently, and some of the challenges, some of the idiosyncrasies, some of the weird things that we’ve come across, and some of the insights. I think the most recent episode was one we did on evaluating different models and trying to plug them into both the main agent and various sub-agents, and things we noticed around that. And so it’s not a polished production at all.
I don’t know, I don’t think we’ll start our own channel around that, just because it’s very ad hoc. It’s like, we only put out an episode – there’s no regular release schedule. It’s only when we’ve shipped something, or we’ve noticed something that’s particularly interesting… Then we’ll get together the relevant people and it’s like “Okay, let’s riff for an hour on that and talk about it.”
So there’s no regularly scheduled programming. It’s more of just “Hey, if we learn something cool, let’s share it with others and use that as a way to get more smart people who are very on the forefront to figure out what agents can do
talking to us”, because there’s a ton to learn from out there.
Mm-hm. This most recent episode you’re mentioning was episode eight, if I’m correct…
Yeah.
13 days ago.
Yup.
You sat down with Camden, one of your core team members on the Amp team… Currently sitting at 674 views. I would call that not enough, in my opinion. And it’s not your fault. I don’t think it’s your fault. I think if you keep doing this you’ll see the dividends get paid, but…
Yeah.
I think – I look at what you are doing like this and I can’t tell if it is or is not weekly, biweekly… I think by and large we’ve called a podcast - I think there’s still this, what we do here. The Changelog is a podcast. There is a rhythm to it. Monday, Wednesday, Friday. There’s a rhythm to it. But I think in a brand world like yours - what I call brand world - I don’t think you have to be that way. I think YouTube changes the model for you. That can be your first – this is kind of some real-time advice, by the way…
[01:32:14.12] Yeah, I know. I appreciate it. There’s certainly things we’re going to be doing better on the awareness front, for sure.
Yeah. I think you’re doing it right, though. So I don’t think you should change much, aside from maybe give it more of a first-class citizen structure in terms of what you’re doing, but don’t feel like you have to be there on the weekly. Don’t feel like you have to be there on some sort of ceremony. Also, don’t take three months, but don’t feel like you have to abide by this “weekly, must show up, must podcast because the audience expects a release…” I think there’s a lot of things that you and others are doing inside of organizations that need – you need to have some cycle, a content engine, but it doesn’t have to be raising an agent content engine. It can be Sourcegraph, or Amp at large things that you should be talking about as a leading brand on this frontier… Just, for sure. And the last time I checked, YouTube is not charging you any money to publish, right? So it’s free.
Yes, yes.
The only cost to you is your time. The only cost to you is the literal cost to produce it. It is literally free to you to produce and publish content on YouTube.
Yeah.
Use it.
I would say, if there’s anyone listening to this that is good at just handling all the publishing stuff… It literally is basically just me, or Thorsten, or someone else on the Amp core team that’s just recording and editing and pushing these things out… And so I feel like if we had a halfway decent person who was just like “Okay, you guys just talk and we’ll handle the publishing, and the editing, and all that”, that might help us release on – but you’re right. As a business, we should probably be more intentional for that.
For us, it’s just a fun way to engage users, and people using coding agents at the moment. We could probably be getting more juice out of those conversations. I will say, though, for us the quality people who listen to it and then reach out and they’re like “Oh, hey, I heard you guys talking about this”, that in itself has been worthwhile so far… But yeah, I don’t know. We absolutely should be more intentional about getting it out to a bigger audience.
I don’t think you should feel bad about that, really. My advice is not to shame you by any means, or make you feel – like, you’re doing great. Don’t stop doing that. The reason why you show up and it’s fun is what makes it enjoyable to watch. So don’t change how you assemble the team to sit down and have the conversation. Don’t change the fun aspect of it. I would just treat some of the production and orchestration and timing a little bit more carefully, because Sourcegraph deserves it… And I think if you have some sort of rhythm, it’s a little easier to be whimsical when you show up, because you have some sort of cycle that the brand itself is publishing content. Somebody kind of in charge of that… And there is an Easy button out there for you, and I’ll share it with you after this podcast… But there is an easy button I’ll help you press.
I say keep doing it. I’m enjoying the pod. That’s why I brought it up, because I think calling it Raising an Agent only shines a light on the fact that this is burgeoning, that we’re all exploring together, and that you’re kind of learning in real time and sharing almost in real time too, with the core team. And it began as sort of a SkunkWorks kind of podcast, just talking about what you’ve done… And then I think you have an opportunity to blossom it into just a little bit more than that. So I mainly just encourage you to keep doing it, because I’m enjoying the content, personally.
[01:35:59.00] Cool. I appreciate you saying that.
Yeah, absolutely. I think even calling it Raising an Agent is just – it’s genius. I don’t know who came up with that, but it’s genius.
I think Thorsten. Credit goes to him.
Something needs to be done with him, because he is the brainchild behind this… I think even the – you might remember this, I emailed you… A few months back, when the Amp code website was, I would say, in its very, very infancy…
Yeah, yeah, yeah.
And the web copy, the copywriting on that page was stellar.
Yeah, I think I misspoke at the time. For whatever reason, I thought I had seen a Slack message where Quinn was like I wrote it, but it was actually Thorsten… And he’s an amazing writer. He puts out a weekly newsletter too, called Joy and Curiosity, where he just talks about cool and delightful things that he’s played with, both technical and non-technical, in the previous week. That’s just – it’s amazing. He has a rare combination of super-sharp, great technical skills, but also great communicator, and frankly, great writer… That alone - it’s too rare a skill these days. Very good quality writing, from the heart, that is clearly articulated, and every word you can tell was thoughtful and intentional. In the age of LLM-generated text, it’s increasingly rare, and I think it’s a very unique, and it’s a good skill to have.
Yeah, I’m a big fan of Thorsten. We had him on the podcast GoTime a long time ago. That’s where I first met him. He’d written a couple books since then…. And then whenever we were changing the cast members out to kind of invite more folks into the GoTime podcast world, he was on my list. And so I really wanted him to be involved… He ended up not being able to have as much time, I think because he got employed by you all, and he’s like “I’m totally focused”, this and that… And I was like “Oh my gosh, Sourcegraph is amazing. You should totally be milking that opportunity, and doing all that.”
And this is way before this world we’re in now. This is when it was code search, code intelligence, which is not something to be – not a pejorative, but wow, how much progress. That’s why I’ve been so enamored by you personally, and Quinn and Sourcegraph, because I’ve seen your story arc since 2014, when we interviewed you in a dark room at GopherCon, asking you five questions.
Yeah, I remember that.
I’ve seen you since then, and the Beyang who you are is still the same person, but what you’ve been able to accomplish on this thread pull of like “Just help developers be more intelligent with their codebases, have more clarity and understanding, be more efficient with your processes, understand your codebase more quickly, get up to speed and ship something sooner” - this iterative approach to where you’re at is why I’m just so curious, I suppose, about what you’re building, and why I care so much about how you showcase what you build.
No, totally. We’ve been building DevTools for the vast majority of our professional careers. Both Quinn and I are just insanely passionate about this area. If you would just let us code all day in a cave, that would be a great life, in my view.
That’d be a great life… [laughs]
And I think building DevTools for as long as we’ve been building them, we’ve learned a lot, both about how individual developers work, the diversity of preferences and skills and tools out there, but also at the team level, how organizations build software, and all the challenges and bottlenecks that are in that process. I think there’s nothing I’d rather be doing with my life. Because developer tools, it’s one of those very rare areas where it’s fun and intellectually stimulating, and also the economic impact is huge, given the increasing importance of software in driving the world.
[01:40:16.15] But the theory behind computer science, and now machine learning and AI - I don’t know. There’s almost like a spiritual element to it, where you can see glimpses of maybe some of the fundamental laws around knowledge, and information that govern the universe… And so for that reason, it’s an insanely rewarding and fun area to work in.
Yeah. Man… You know, I really struggle with that idea of going into a cave and coding all day. And actually, I have a playlist on YouTube called Working Beats. It’s a glimpse into the Adam world. And as I hear cool beats on YouTube, I store them away. If I’m gonna work to them, I store them in my Working Beats playlist.
Nice.
And I go back to it. And there’s lots of cool beats on YouTube, from all sorts of places. So I don’t subscribe to one channel, I just make a playlist. And I let the algorithm help me in discovery. Right? That’s how you’re supposed to do it.
Yeah.
And so I put this one in there in the background – on a YouTube video there’s always… What? There’s a video, right? And so on this one it’s not just the music, but it’s this background, and it’s this developer with these three large monitors… Which is just amazing. Code on all of them. I’m like “Man, that’s beautiful”, right? But then this developer is sitting at this really cool desk, in this really minimalistic room, with these super-huge windows… They’ve got the desk, the monitors, and these massive windows with this view. And the view…
Like a DHH view? Have you ever seen him posting? Yeah… [laughs]
Yes, exactly. And the view is this massive, beautiful mountainscape. And I’m like “That is so weird.” I love it visually… But then you think about it, right? That developer is staring into the black window, into the black mirror, as they call it, right? Hammering away code. Now, I’m with you. I’m like “Yes, take me there.” I love that. I get joy in that. There’s more to life than that, of course…
Yeah, yeah.
But then I’m like “This is juxtaposed to this beautiful view.” And software is not making the mountain, right? Software may make the company possible that makes the boots that allow a person to put them on, and climb the mountain, but it’s not making the mountain. So it’s like this really weird, you know, beautiful view, but juxtaposed against each other, it’s like “Well, here’s a developer, three huge monitors, coding, loving life… While this beautiful mountainscape is off the view.”
Yeah.
I don’t know.
But you know, the way that you – I don’t know, what I would say to that is the way you experience the mountain… You know, the photons bouncing off from the star, to the mountain, into your eyeballs…
Preach…
…and then how they get translated into these signals that get assembled into what we call consciousness… There’s certain fundamental, I don’t know, forces of nature at work there. One of the cool things about the latest advances in AI is now you can actually see some of these come into something that’s tangible to everyone, you know? Like, what is intelligence? What is reasoning? What is thinking? Yeah, I said earlier, I’m not AGI-pilled or anything that. I’m not a doomer. I don’t think it’s nirvana. I think these are very much tools. But I do think that we’ve figured out in the transformer architecture and some of the other emerging architectures that are coming out how to capture what is essentially happening in some part of your brain. Like, there’s some sub-region in your brain that I think is very – like, whatever it’s doing is very well pattern-matched by what the neurons in a transformer are doing when they’re taking input and producing output, that is what we would call semantically correct. So yeah, I don’t know. I just think it’s all insanely cool.
[01:44:21.12] Yeah. I mean, I would be that – if I could have that view, I would have that view. So I’m not knocking it, by any means. I was just thinking – as you’d said, I would be happy to code in a cave all day, or be in a cave, coding all day. I can agree with that, and I can empathize with it, and I can have desires around that… And then I’m like “Well, there’s also more to life than that.”
Oh, of course. Of course.
And I’m not saying that you don’t think that by any means, because we both have children, and I’m sure we both feel the same way about our families, and our wives, and our children, and our lives, and stuff like that.
A hundred percent.
I think we’re in a unique position in time and history, which - isn’t that always the case for every human being, to be in a unique position in time and history? It’s always the case, right? There’s never a time when it’s not unique.
Yeah, it’s funny how history works… Every year is unprecedented, right?
That’s right. And I think in particular to us, we’ve been on the struggle bus, I would call it, as developers, key by key, stroke by stroke, character by character, putting into the machine, to get the dev tool out or to get the thing out. And now we’ve gotten to a point where we can say “OK, now my one actually equals 20.” It’s a force multiplier. Now you take one Adam or one Beyang and now I’m able to contextually be able to hold and produce more in various lanes because of it. It’s a force multiplier. So we’re in that unique world.
So I’m with you. I struggle with the desire, because I want to go deeper, and it is tantalizing, because I can produce way more than I’ve ever been able to before, or even explore caverns I’ve never explored before. I’m going into new regions I’ve personally never explored before. You may have, somebody else may have, but not me. And so I like that idea.
Here’s what I want to end on, because I know we’re getting close to time… I want to call them skeptics. I kind of want to call them resisters. They’re not AI skeptics - and maybe you can disagree or agree with this - but I want you to give… Because I think you’ve got a unique perspective, a unique frame of reference on where things are at, because you’re building some of these tools, and where we may be going, and even tap by your own personal taste and desires for where we can go. Speak to what I would call the resisters, that - they don’t know how to handle it. They’re aware of it, obviously… They’re not resisting it necessarily, but they’re just not sure they want to let go of the old way. They’re not sure how to conjure it, they’re not sure how to get the magic out of the genie in the lamp. They’re not sure how to rub it, or they’re using the context window wrong, they’re holding it wrong, and they’re just not leaning in. They’re not seeing it as a skill set to build, or THE skill set to build for the future of where software development and programming is going. Speak to that world as clear and as open as you like to.
Yeah, that’s a great prompt. I’m thinking about like –
It wasn’t a question, it was open-ended. Just – I gave you some points.
Two things. One, I’ll speak philosophically, and then I’ll speak on just like a practical level. So philosophically, I think a lot of the skeptics come from a place that a lot of developers come from, which is a point of skepticism, and especially in an area that’s as hyped up as AI is, there’s sort of a natural reaction. If there’s this much noise around what it supposedly can do, that’s often a contraindicator against the actual substance of the technology. And there I would agree with a lot of them in saying that the space is very much overhyped.
[01:48:02.18] The people saying that it’s going to solve all our problems, humans are going to be out of the job, or we won’t have to work, or it could go off the rails and kill us all… Statements like that to me are not even wrong. They’re so far outside the band of what we should even be talking about with respect to this technology that I can understand the frustration that a lot of people have, where they’re like “Look, people are saying this is going to replace humans, and then I go and use it and it can’t even figure out how to SSH into my remote machine”, right?
That’s right. “I don’t know how to do that. I can’t do that.” Yes, you can.
Yeah. So I would say philosophically, what the technology is - it’s not consciousness, it’s not a human replacement, it’s not a God… What it is is it’s a universal pattern matcher. If you go and watch Demis’s Nobel lecture - he’s the head of DeepMind; he received a Nobel Prize for the work that they did on that team in protein folding. It’s a very short lecture, it’s like 30 minutes, but he says something very insightful there, where it’s like – the transformer architecture, what it really does is it allows you to fit any pattern that is either observable in nature, or that you can synthetically generate. So as long as you have a way to collect the data from observing nature, or generate that data in an environment that closely enough approximates what you’re getting after, what the transformer model architecture does is it allows you to train a model that fits that pattern. They’re great curve fitters, or pattern matchers.
And so that is not nirvana, but it is a useful tool. And so if I hand you a universal pattern matcher, and I say “I’ve trained it on all these different workflows…” You know, like the coding workflow - there’s patterns that emerge from that. A lot of the tedious stuff is very patternful in what you do; there’s almost like a rote that you learn when doing it. You as a human understand what these patterns are, and now I’m handing you this technology that can fit those patterns as long as they’re represented in text, and as long as they’re represented in an environment where you can validate what is a good pattern, or what is a bad pattern in that universe.
That’s the mindset with which you should approach these technologies, and that puts you in a mindset of not being – you don’t want to go into trying these tools as a mindset of like automatic skepticism. I feel like a lot of the more prominent voices out there, it’s almost like they’re going into trying these tools with the intent to show that it’s all hot air.
That’s the wrong mindset. I think you should go in with a mindset of like “Hey, this is an amazing new technology, it’s a universal pattern matcher. How can I explore what’s possible with this? Let me put myself in like explore, try new things mode.”
And that segues into my practical advice, which is if you want to experience the wow in a way that’s tangible and delightful and also practical, I think the first thing I would do is pick an app or pick a domain that’s somewhat outside of your wheelhouse as a developer. Maybe you’re a hardcore systems engineer, but you’ve never built an iPhone app. But you have a three-year-old kid and you want to build a game that teaches him how to spell basic words, or something like that. Go and sit down… It could be with Amp, it could be with any of the other coding agents. For this particular task, I think most of them that you’ve heard of will do a pretty good job of building a basic app that’s outside of your wheelhouse to do something that is simple to a lot of people out there, but hard for you. And if you do that, I think probably with 98% confidence you’ll have a good experience with the proper mindset, and that will then motivate you to try the technology in various settings, in increasingly complex settings, to see what it’s capable of.
[01:52:14.07] I like that. I think that’s spot on with my own recent experiences, that I’ve gone just a little outside of my wheelhouse, I’ve built a couple CLI tools, I’ve learned about CLI patterns that are obvious, like –help, –version; those are pretty common ones, but all these different ways to leverage CLIs… There’s known patterns out there. And I’ve done a version of what you’ve said. I’ve had a personal itch, and in my case I’ve justified the spend because I’m like “Well, I’ve got to learn, I’ve got to do these things anyways, and so it’s cool to do that… But I want to build a tool or something I can actually use day to day, that maybe gets better.”
And my hope is that eventually 7zarch is shareable to the world. I want it to be open source. It is not an Adam tool, it’s a world tool. And hopefully, if anybody likes the way I’m compressing and the way that we have to compress large media directories… How many YouTubers are out there, right? You will eventually use my tool, Beyang, and your team will, because hey, you produce this awesome podcast called Raising an Agent. Or you do things on YouTube. So you have maybe a desire to keep these artifacts long term… And I say “Hey, compress them. Save some file size.” That makes sense. I’ve done just that. I’ve stepped outside, it’s a useful tool to me… And because I have the itch and because I’m the user who understands how I want it to work, I’m able to more clearly learn all about the Go world, all about CLI tooling, all about maybe even the TUI world and the exploration there… And I can do all those things, but I can also improve my own workflows, my own tools, and then share those with the world.
For a while there, maybe about a month ago, I was actually really, really sad, because we’ve obviously built this podcast around software development, but specifically this burgeoning, and now the way, called open source. And back in 2009, when we first started this podcast, open source was moving so fast, GitHub was one year old, it wasn’t owned by Microsoft… And it was moving fast. And it was so hard to keep up. And our tagline was “Open source moves fast. Keep up.” We’ve let that go because it’s sort of snarky, but it was kind of core to our original DNA. But “Open source moves fast. It’s hard to keep up.” But my hope is that there’s more open source. But for a bit there, I was really, really bummed thinking “Wow, if we can just generate new code so quickly, does the value of the patterns captured in open source at large - do they become less important?” Does it become less important to structure those patterns into projects, into communities, into whatever? And that’s how open source works currently. Does that change because now we can generate so quickly? Does the pattern called open source no longer become the same value? And for a bit there, I was really bummed. And now I feel very hopeful that the future of open source is maybe even more brighter, because one, you may have an influx of users to open source that’s already out there… So discovering good tools. And so that’s great for open source, hypothetically. A maintainer may feel the pain because of slop, but that’s a whole different podcast and subject.
Yup, yup.
But then you also have all these new builders that can start to scratch their own itch and want to share the thing they made. I’m so hopeful for open source now, where I thought before maybe, since you can generate so easily, it would become less valuable.
[01:55:44.02] Yeah. I think it’s so hard to predict what the net effects of this will be… Because there’s a certain aspect of it which makes libraries less necessary. When common pieces of functionality are auto-generatable… There’s a lot of cases where – even within Amp, there’s one or two packages that we built internally… We built our own TUI framework for the new TUI of Amp. And it’s like, would we have done that prior to coding agents being a thing? It would have been a lot more expensive, it would have taken a lot longer to ship that. But now we can do that, because we’re able to move much more quickly.
But at the same time, I do think that there remains a use case for having libraries. I just think that the nature of which libraries are really popular is probably going to change. Prior to agents and AI, a lot of the most popular libraries in existence were some form of middleware, or a piece of abstraction that helped make using a particular API or technology or a piece of physical hardware more accessible. And so that was a big problem that open-source packages and libraries solve for you.
I think now there’s less demand for that form of package, but still a large amount of demand for libraries that are robust, they’re well-tested, that provide – you still need some amount of abstraction, but maybe fewer layers of abstraction over an underlying capability. I don’t know if that makes sense. My (I’ll say) strongly stated opinion, but maybe weakly held given how quickly things are changing is that we’ll see far fewer libraries that are purely just like “Hey, this is a neat way to read files in Node.js”, or something like that. And much more libraries like “Hey, there’s this new piece of hardware that got developed.” Or maybe there’s some new biotechnology that is available now, and someone built an API for that, and it exposes all the key hooks… And because software is so cheap now, code is cheaper to generate with agents than with humans manually hand-coding everything, we’ll just see a lot more things expose software endpoints, and we’ll see a lot richer playground for people building software to be able to hit different things that do things, either in cyberspace or in the physical world.
Cyberspace… That’s cool, man. I haven’t heard that word in a while, man. That’s cool. Cyberspace… Alright. Well, ampcode.com, I am a fan… And to be clear, I use all agents. I am not Amp – I’m agent agnostic. I want to use everything – and I know that’s the cool thing with Amp, is I don’t have to choose the model. We barely touched on that, but I love that I could just use it knowing that it’s a high-quality output kind of tool, that you’re tuning to help me not have to think about swapping models, or limitations… It’s always that. And I want you to help me learn how to use my context window better…
Yeah, send me some threads.
…more so than just the podcast. Yeah, because I’m probably inefficient, and overspending as a result. So now I know, now I know.
Yeah. To those listening, my recommendation is you should absolutely sample the field of coding agents. There’s so many, and I think you should find the one that fits the best with you. I think like we mentioned earlier in this conversation, the switching cost is so low, but the only way you’re going to find the best one for you is if you actually go and try the different ones in existence. It’s funny how much hype and high-level conversation there is about this and that, and at the end of the day, it’s never been easier just to go and try these things. So first-hand experience, I think - it heavily outweighs whatever the latest Twitter influencer is saying about the landscape. So try Amp, let us know what you think, but also try a bunch of other agentic coding tools as well.
[02:00:18.05] Yeah, try them all. Try them all. Anything left in closing? Anything over the horizon? Anything that is maybe a sneak peek, or a tease, or anything you can share in closing?
I will say – when is this coming out? Is it going to be sometime in September?
Not next Wednesday, but the Wednesday after that. So literally, it’s going to ship on September 17th.
Okay, cool. I think by then some of this stuff will be out… But I think right now an area of active exploration for us is just experimenting with all the different new models that have come online. There’s a lot of great models that are really good at tool use now, that occupy different places along the latency/intelligence Pareto curve.
And so one of the things that we’re doing is we’re playing around with all these different models, and seeing how well they function in either sub-agents that Amp uses, like the oracle for thinking, or the search sub-agent for discovering context, or just the generic sub-agent which conserves the context window of the main agent… But I think by the release date of this podcast we’ll have deployed some of these new models into different places in the application. They should help speed things up.
And I think one of the things that’s an active area of consideration for us is up until now part of the experience of using a coding agent has been just waiting; waiting for it to get done. Because the token throughput is at a certain level… And I think we’re seeing a lot of positive signs that will allow us to bring down that latency by using different models, the best model that we can find for each task in the next couple of weeks. I think that will drastically improve the experience, and it may also push the latency past the point where there’s some sort of inflection point where if we can get the latency below a certain level, it will change the kind of nature of how it feels to use a coding agent, let’s put it that way.
Yeah, absolutely. I love the principles you built on. We didn’t touch on those, but I love the principles you built on. Keep doing what you’re doing, don’t change a thing… Just do more of it and share the whats, the whys and the hows on that awesome Raising Agents podcast. Make it more frequent if you can… I don’t think it’s necessary –
Yeah, it’s good advice.
…but I think definitely elevate it to that first class experience of intentionality and production level. I think that’s a pretty easy button to push, and it doesn’t take a lot for you and the team involved in sitting down and hosting and talking, to do. You can employ people around you to have that context, instead of you.
Alright, Beyang, thank you so much for your time, man. It’s always a pleasure. I always appreciate talking with you. Thank you so much.
Thanks for having me on, Adam. Always a pleasure, and… Yeah, let’s keep talking.
Our transcripts are open source on GitHub. Improvements are welcome. 💚