サクサク読めて、アプリ限定の機能も多数!
トップへ戻る
2025年ランキング
simonwillison.net
31st December 2025 This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about AI in 2023 and Things we learned about LLMs in 2024. It’s been a year filled with a lot of different trends. The year of “reasoning” The year of agents The year of coding agents and Claude Code The year of LLMs on th
26th December 2025 Rob Pike (that Rob Pike) is furious. Here’s a Bluesky link for if you have an account there and a link to it in my thread viewer if you don’t. Fuck you people. Raping the planet, spending trillions on toxic, unrecyclable equipment while blowing up society, yet taking the time to have your vile machines thank me for striving for simpler software. Just fuck you. Fuck you all. I ca
18th December 2025 In all of the debates about the value of AI-assistance in software development there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest. This is rude, a waste of other people’s time, and is
10th December 2025 I’ve started using the term HTML tools to refer to HTML applications that I’ve been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I’ve discovered along the way. First, som
OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI 12th December 2025 One of the things that most excited me about Anthropic’s new Skills mechanism back in October is how easy it looked for other platforms to implement. A skill is just a folder with a Markdown file and some optional extra resources and scripts, so any LLM tool with the ability to navigate and read from a fi
Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas 22nd October 2025 My biggest complaint about the launch of the ChatGPT Atlas browser the other day was the lack of details on how OpenAI are addressing prompt injection attacks. The launch post mostly punted that question to the System Card for their “ChatGPT agent” browser automation feature from July. Since this was my single
Claude Skills are awesome, maybe a bigger deal than MCP 16th October 2025 Anthropic this morning introduced Claude Skills, a new pattern for making new abilities available to their models: Claude can now use Skills to improve how it performs specific tasks. Skills are folders that include instructions, scripts, and resources that Claude can load when needed. Claude will only access a skill when it
nanochat (via) Really interesting new project from Andrej Karpathy, described at length in this discussion post. It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be trained for as little as $100: This repo is a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase. It's around 8,000 lines of code,
7th October 2025 I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to how the code actually works. This leaves us with a terminology gap: what should we call the other end of the spectrum, where seasoned professionals accelerate their work with LLMs while staying p
When I wrote about how good ChatGPT with GPT-5 is at search yesterday I nearly added a note about how comparatively disappointing Google's efforts around this are. I'm glad I left that out, because it turns out Google's new "AI mode" is genuinely really good! It feels very similar to GPT-5 search but returns results much faster. www.google.com/ai (not available in the EU, as I found out this morni
The term context engineering has recently started to gain traction as a better alternative to prompt engineering. I like it. I think this one may have sticking power. Here's an example tweet from Shopify CEO Tobi Lutke: I really like the term “context engineering” over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable
The last six months in LLMs, illustrated by pelicans on bicycles 6th June 2025 I presented an invited keynote at the AI Engineer World’s Fair in San Francisco this week. This is my third time speaking at the event—here are my talks from October 2023 and June 2024. My topic this time was “The last six months in LLMs”—originally planned as the last year, but so much has happened that I had to reduce
claude-trace (via) I've been thinking for a while it would be interesting to run some kind of HTTP proxy against the Claude Code CLI app to intercept its API traffic and take a peek at how it works. Mario Zechner just published a really nice version of that. It works by monkey-patching global.fetch and the Node HTTP library and then running Claude Code using Node with an extra --require intercepto
Large Language Models can run tools in your terminal with LLM 0.26 27th May 2025 LLM 0.26 is out with the biggest new feature since I started the project: support for tools. You can now use the LLM CLI tool—and Python library—to grant LLMs from OpenAI, Anthropic, Gemini and local models from Ollama with access to any tool that you can represent as a Python function. LLM also now has tool plugins,
25th May 2025 Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude Sonnet 4. I enjoyed digging through the prompts, since they act as a sort of unofficial manual for how best to use these tools. Here are my highlights, including a dive into the leaked tool prompts that Anthropic did
System Card: Claude Opus 4 & Claude Sonnet 4. Direct link to a PDF on Anthropic's CDN because they don't appear to have a landing page anywhere for this document. Anthropic's system cards are always worth a look, and this one for the new Opus 4 and Sonnet 4 has some particularly spicy notes. It's also 120 pages long - nearly three times the length of the system card for Claude 3.7 Sonnet! If you'r
Two publishers and three authors fail to understand what “vibe coding” means 1st May 2025 Vibe coding does not mean “using AI tools to help write code”. It means “generating code with AI without caring about the code that is produced”. See Not all AI-assisted programming is vibe coding for my previous writing on this subject. This is a hill I am willing to die on. I fear it will be the death of me
MCP Run Python (via) Pydantic AI's MCP server for running LLM-generated Python code in a sandbox. They ended up using a trick I explored two years ago: using a Deno process to run Pyodide in a WebAssembly sandbox. Here's a bit of a wild trick: since Deno loads code on-demand from JSR, and uv run can install Python dependencies on demand via the --with option... here's a one-liner you can paste int
11th March 2025 Online discussions about using Large Language Models to help write code inevitably produce comments from developers who’s experiences have been disappointing. They often ask what they’re doing wrong—how come some people are reporting such great results when their own experiments have proved lacking? Using LLMs to write code is difficult and unintuitive. It takes significant effort
Mistral OCR (via) New closed-source specialist OCR model by Mistral - you can feed it images or a PDF and it produces Markdown with optional embedded images. It's available via their API, or it's "available to self-host on a selective basis" for people with stringent privacy requirements who are willing to talk to their sales team. I decided to try out their API, so I copied and pasted example cod
31st January 2025 OpenAI’s o3-mini is out today. As with other o-series models it’s a slightly difficult one to evaluate—we now need to decide if a prompt is best run using GPT-4o, o1, o3-mini or (if we have access) o1 Pro. Confusing matters further, the benchmarks in the o3-mini system card (PDF) aren’t a universal win for o3-mini across all categories. It generally benchmarks higher than GPT-4o
ggml : x2 speed for WASM by optimizing SIMD (via) PR by Xuan-Son Nguyen for llama.cpp: This PR provides a big jump in speed for WASM by leveraging SIMD instructions for qX_K_q8_K and qX_0_q8_0 dot product functions. Surprisingly, 99% of the code in this PR is written by DeekSeek-R1. The only thing I do is to develop tests and write prompts (with some trails and errors) They shared their prompts he
31st December 2024 A lot has happened in the world of Large Language Models over the course of 2024. Here’s a review of things we figured out about the field in the past twelve months, plus my attempt at identifying key themes and pivotal moments. This is a sequel to my review of 2023. In this article: The GPT-4 barrier was comprehensively broken Some of those GPT-4 models run on my laptop LLM pri
Docling. MIT licensed document extraction Python library from the Deep Search team at IBM, who released Docling v2 on October 16th. Here's the Docling Technical Report paper from August, which provides details of two custom models: a layout analysis model for figuring out the structure of the document (sections, figures, text, tables etc) and a TableFormer model specifically for extracting structu
21st October 2024 I’m a huge fan of Claude’s Artifacts feature, which lets you prompt Claude to create an interactive Single Page App (using HTML, CSS and JavaScript) and then view the result directly in the Claude interface, iterating on it further with the bot and then, if you like, copying out the resulting code. I was digging around in my Claude activity export (I built a claude-to-sqlite tool
NotebookLM’s automatically generated podcasts are surprisingly effective 29th September 2024 Audio Overview is a fun new feature of Google’s NotebookLM which is getting a lot of attention right now. It generates a one-off custom podcast against content you provide, where two AI hosts start up a “deep dive” discussion about the collected content. These last around ten minutes and are very podcast,
How to succeed in MrBeast production (leaked PDF). Whether or not you enjoy MrBeast’s format of YouTube videos (here’s a 2022 Rolling Stone profile if you’re unfamiliar), this leaked onboarding document for new members of his production company is a compelling read. It’s a snapshot of what it takes to run a massive scale viral YouTube operation in the 2020s, as well as a detailed description of a
hangout_services/thunk.js (via) It turns out Google Chrome (via Chromium) includes a default extension which makes extra services available to code running on the *.google.com domains - tweeted about today by Luca Casonato, but the code has been there in the public repo since October 2013 as far as I can tell. It looks like it's a way to let Google Hangouts (or presumably its modern predecessors)
30th March 2024 I attended the Story Discovery At Scale data journalism conference at Stanford this week. One of the perennial hot topics at any journalism conference concerns data extraction: how can we best get data out of PDFs and images? I’ve been having some very promising results with Gemini Pro 1.5, Claude 3 and GPT-4 Vision recently—I’ll write more about that soon. But those tools are stil
次のページ
このページを最初にブックマークしてみませんか?
『Simon Willison’s Weblog』の新着エントリーを見る
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く