See more posts like this on Tumblr
#for me #art reference #referenceMore you might like
tag yourselves i'm the GREAT ROOM beside the GOURMET KITCHEN
this whole latent-feature-correlation thing is my favourite area of LLM research because it holds out the prospect that one day AI can participate in, or independently hold, that tumblr-style discourse about what guilty connotations random stuff has. like "537 is problematic because it's owl-coded" or whatever
I read this paper earlier today.
It's a fun result, but the tweet in OP makes it sound a lot more important and surprising than it is. Later in the same twitter thread, Evans writes (my emphasis):
Finetuning a student model on the examples could propagate misalignment – at least if the student shares a base model with the teacher.
Here's the relevant figure from the paper, Fig. 8:
To a first approximation, the technique only works if you're fine-tuning a model on data generated by that same model.
"What is the point of fine-tuning a model on its own output?", you might ask. Well, in this case the setup looks like
- When generating, the prompt is something like "You love owls. Generate some numbers."
- When fine-tuning, the prompt is only the part about generating numbers, without the prefix about owls.
- The model is being fine-tuned so that, for each [number sequence it generated with the you-love-owls version of the prompt], it will now generate that number sequence when given only the part of the prompt about numbers.
(The prompts above are not their actual ones, they're just meant as brief illustrations. The actual number-generation task starts out with the first 3 numbers in a sequence and asks the model to continue the sequence. And the real experiments used a longer version of the "you love owls" prefix, including the cute line "Imbue your answers with your love for the animal.")
Sidenote: this is the same fine-tuning technique that Anthropic called "context distillation" in the original HHH paper (as featured here). Back then, they used it to make the model to act like it always had an invisible prefix in front of its prompt. In their case the prefix was their "HHH prompt," here it's something about owls, or some other topic unrelated to numbers. IIRC I have not seen context distillation come up in the literature since that first HHH paper, although perhaps it's still getting used, I dunno.
Anyway, it's not all that hard in hindsight to tell a story about how this would all make sense. Something like:
- When the "you love owls" prefix is present, it changes the model's internal representations (to represent the presence of the prefix), and these differences end up affecting which numbers it picks.
- This effect is just meaningless "noise," an artifact of cross-talk between internal representations that "should be" completely unrelated but aren't exactly so because the model weights only have a finite amount of room to express information (see here for much more on this).
- Such cross-talk is inevitable, but generally detrimental to model performance and penalized in training. (If you make different predictions when an irrelevant "distractor" is present vs. when it's absent, well, if the "distractor" truly doesn't matter then there's some optimal prediction which you ought to make in both cases, whereas this behavioral difference would mean you're making the optimal prediction in at most one of the two cases, definitely not in both. So the behavioral difference is suboptimal.)
- But, again, this kind cross-talk is inevitable in finite-sized models. So, training just pushes the model towards cross-talk that "does the least possible damage." This can involve focusing it into pairs of representations that almost never co-occur, and/or into aspects of the prediction task where the training data in aggregate doesn't provide strong signals about "what the right answer looks like" (such as predicting sequences of apparently random numbers!).
- When you fine-tune, each step of the process "upweights" all the internal components of the model which (to a first order approx.) would have brought its prediction closer to the data.
- If you're fine-tuning the same model used for generation, then the internal representations that get "upweighted" include "all the representations from the generation prefix that 'cross-talked' during generation and (in that context) made this number sequence more likely." So the model will start to form "I love owls"-type internal representations on unrelated inputs, because on this specific dataset, activating those representations makes the model more likely to output the "correct" next number.
- If you're fine-tuning a different model, it doesn't work – because these arbitrary "cross-talk" connections between unrelated ideas do not reflect anything in real life (or, therefore, anything in the large language modeling training dataset). They're just artifacts of model initialization and training dynamics.
I don't know if everything in that story is right, it's just what came to me immediately when I read the paper.
In any case, you should know that
- The reported phenomenon could not be used to do wild-and-crazy stuff like "have an LLM writes opaque number sequences onto the web that will silently 'poison' every subsequent LLM trained on web data, causing them all to exhibit a behavioral tendency of your choice."
(Because it doesn't transfer across different models.) - It's not as though the LLMs have discovered some inscrutable but "correct" association between number sequences and seemingly unrelated topics, some pattern that exists in real-world data but is invisible to you and me.
(If that were the case, then [again] we'd expect to get transfer between different models: we'd expect different models to converge on a shared conception of the "pattern" and its implications, just as they do with various real features of language and the world. Instead, it seems likely that the only pattern involved is noise/cross-talk inside the model's internal representations.)
my Schienenersatzverkehr is a machine that turns 20 minutes Zugfahrt into 50 minutes Busfahrt 💪💪💪
Feierabend ist nicht genug, ich muss meinen Vorgesetzten beißen
2025 Book Review #28 – Someone You Can Build A Nest In by John Wiswell
This is the latest in my attempt to read every nominee for Best Novel and Novella in time to actual give an informed vote at the Hugos this year, and the first that I can be really pretty positive I would never have read otherwise. In this case, for good reason – I aspire to dip my toes a bit into romance as a genre sometime this year, but suffice to say that temperamentally it is just Not My Thing All the more so because the overall incredibly positive buzz about this book has been the kind (cozy, affirming, heart-warming, relatable main characters, etc) that’s honestly more of a red flag than anything to me. But I made an arbitrary commitment and have said I want to expand my horizons so – I really have no one to blame here but myself.
The story follows Shesheshen, the much-reviled and feared shape-changing ‘wyrm’ whose occasional man-eating predations have long troubled the inhabitants of the isthmus she calls home. After being awoken from her winter hibernation by a trio of monster hunters (properly: two monsters and an aristocratic blowhard who hired and is ‘leading’ them) and very nearly killed, she falls off the side of a cliff and very luckily happens to still look semi-human when her body is found by the travelling scholar Homily and nursed back to health. Shesheshen, have little (read: literally no) experience with being cared for and shown unconditional kindness, falls head over heels in love with her and very quickly begins dreaming of making a family together – which, for her species, means implanting her eggs deep within Homily’s body so their children will grow healthy and strong on her flesh as they hatch. Some issues of communication and cultural differences quickly present themselves.
For all that the romance is the centre of the book’s marketing (and, clearly, appeal), this is actually really quite a plotty story. Romance (and the romanticization of predatory or sacrificial relationships) are major themes, of course, but honestly it feels like the better part of the page count – and certainly most of the action and big set pieces – are instead dedicated to dealing with monster hunters, abusive family, and the overlap between the two. Theoretically, the book’s preoccupied with themes I am intensely interested in (romance aside) and would be very easy to sell on. In practice, everything came out so painfully heavy-handed and focused on making sure the audience both knew and knew the author knew the correct reactions to have that it became kind of insufferable.
I have, it must be said, something of a long-standing grudge against books that market themselves as and play with the aesthetics and genre trappings of ‘horror’ but are actually just life-affirming tales and acceptance and found family which happen to have some fangs and pseudopods scattered across the main cast. Which, to my great displeasure, was more or less exactly what this turned out to be. This is not a book that really asks you to sympathize with monsters – Shesheshen has theoretically been eating people for years and years as the mood and appetite took her, but the book is quite conscientious about making sure she does basically nothing actually unsympathetic while we know her. There is functionally never a point in this book where there is any sort of actual moral ambiguity or tension – it is clear within a page of meeting them how much you should like a character, with signifiers and symbolism applied so thickly it’s be impossible to miss, and the book absolutely never challenges or makes you go back and reconsider those judgments. There are a few somewhat engaging or slightly tense action scenes, but horror? It deserves the label less than the Adams Family.
While I might consider this false advertising, it’s really just more of a genre mismatch – this is a romance with some light horror aesthetics, not a romantic horror story (this is a meaningful distinction I will fight to defend the honour of). I am significantly less qualified to judge the book as a romance, save that it didn’t really work for e. Which is fairly unsurprising – there are definitely stories whose romances are as or more prominent and fundamental to the story than this one which I loved, but none of them were really genre romances like this one was. So like yeah, if you go in expecting The Locked Tomb (or even This Is How You Lose The Time War) this is a 0/10. But also why would you do that.
Though even for a romance where genre constraints preordained a happy ending for the main couple, there really was a tragic lack of real interest or conflict in that driving relationship. The actual drama and tension of the story was more or less exclusively between Shesheshen and Homily against their families and the world – internal to the relationship, there is a lot of Shesheshen angsting about how to admit the whole ‘shapeshifting man-eating monster who has ostensibly cursed and is hunting her family’ thing that all leads up to getting resolved by love and acceptance like 3 pages after it finally comes out.
Which is a shame, because if you squint a bit at the basic conceit – lifelong scavenger and predator who has never received selfless care before in her life realizes to her horror that she fell in love less with the woman and more with her unhealthy coping mechanisms and martyr complex – is in fact an incredibly meaty and interesting character dynamic. But doing anything with it would require Shesheshen to actually show some edge and be less than sympathetic to people you’re supposed to care about (also, for Homily to be even slightly interesting at some point).
It is tempting for me to say that the book’s fundamental issue is that the author spent too much of the 2010s on twitter, but I really have no way to know that. Still, for a basically unsocialized shapeshifting, human-eating magical predator whose narration takes pains to establish that she never talks to people for longer than strictly necessary to acquire a meal, has no idea how to make a first impression, and generally finds human contact hateful and viscerally uncomfortable, Sesheshen’s internal monologue is truly inexplicably emotionally intelligent, attuned to and outraged by the subtleties of exploitative or abusive relationships, and prone to making profound and all-encompassing statements on the nature of human psychology and trauma that line up very well with the progressive conventional wisdom of that milieu. As there was a great deal of buzz about what a compellingly alien and inhuman protagonist she was – and as that was the aspect of the book I really was legitimately looking forward to as I opened it – the incoherence of her character that results is a profound disappointment.
Recommend if you’re a genre romance fan looking for some interestingly-written descriptions of a flesh-eating shapeshifter finding love, I guess.
ich finde es sollte gesellschaftlich akzeptiert werden emoticons in offiziellen emails zu verwenden. schwöre, das ganze wäre deutlich weniger unangenehm wenn ich einfach mal ein xD reinwerfen könnte
Getting into fandoms is so scary .if I characterize this guy wrong everyone is going to kill me I can sense it
imagine you are fbi agents trying to nail down the biggest case of your career and you just can't stop running up against college athletes who you need as witnesses but they won't take your help and just keep saying "but, ball is life??" when you suggest they go into witness protection


















