Getting better at LLMs, with Zvi Mowshowitz

Patrick McKenzie Jun 12th, 2025

Patrick McKenzie and Zvi Mowshowitz share practical techniques for getting better results from AI tools, from writing effective system prompts to using LLMs as research partners and writing collaborators.

This week, Patrick caught up with Zvi Mowshowitz at the LessOnline conference to discuss getting better results from LLMs. They cover better system prompts, strategic questioning, and understanding when to trust (or ignore) their outputs. They cover practical techniques for writing, research, and decision-making, plus the emerging dynamics of GEO. This transcript will be annotated soon with Patrick's in-line notes.

Sponsor: Vanta

Vanta automates security compliance and builds trust, helping companies streamline ISO, SOC 2, and AI framework certifications. Learn more at https://vanta.com/complex

Timestamps

(01:08) Understanding system prompts
(02:04) Customizing LLM behavior
(05:58) Memory features in LLMs
(10:21) Generative Engine Optimization (GEO)
(15:59) Sponsor: Vanta
(17:17) Art and AI: Enhancing creativity
(20:36) Recursive use of AIs
(25:22) Addressing LLM frustrations
(27:05) Checking for hallucinations in AI outputs
(28:11) Experimenting with AI models
(29:44) Optimizing AI prompts and outputs
(31:19) Using AI for writing and editing
(32:32) AI as a research and writing partner
(33:26) Prompting AI and humans effectively
(39:39) Balancing AI assistance with personal voice
(51:03) Wrap

Transcript

Patrick McKenzie: Hideyho everybody. My name is Patrick McKenzie, better known as Patio11 on the internet, and I'm here with my buddy Zvi Mowshowitz,

Zvi Mowshowitz: Also known as TheZvi on the internet.

Patrick McKenzie: We're here at the Lighthaven campus taping during the LessOnline conference, although, who knows when this audio and or video hit the internet.

But this has essentially evolved into the, people talk about writing, people talk about AI, people talk about AI and writing events this year. And Zvi has a newsletter "Don't Worry About the Vase," which in recent generations of the newsletter has been very AI focused and is also something of a power user.

I don't consider myself a power user, but I think everyone who cares about this topic is a power user relative to many of the people who don't care about this topic. And so, just explaining some of the, to use a phrase that you use all the time in the newsletter, mundane utility that you can get out of LLMs and how you can get better at them sounded like an interesting idea for conversation.

Understanding system prompts

So. With that prompt out of the way, let's start with the system prompt, which is a feature that many of the cutting edge LLMs have these days. You get transformatively better outputs if you're good about writing a system prompt. Can you just tell people who might not be familiar with this, A, why is that true? And B, how would one start writing a system prompt for yourself?

Zvi Mowshowitz: So the system prompt is a message you essentially give to the language model before every output, before every conversation. It tells the system, here's who I want you to be, here's how I want you to respond. Here's rules for what types of outputs I want, how I want you to act, what I want you to do.

Customizing LLM behavior

It's not perfect. It won't always do the things that you request. There's various tricks you can do to emphasize this more or less. This will complement the prompt that you've been given from the company. So you'll add your prompt basically to their prompt. And the prompt you use should be directed at customizing the model to act the ways that you want to act that are different from the default way, if they set it up to act and based on what you're trying to do, that might be various different things and depending on which model you're trying to use them for and what its strengths and weaknesses are.

And there are various ways to use both ChatGPT and Claude in particular with multiple system prompts via various projects so that you can swap between them easily, depending on what purpose you want to use them for, because the prompt you use when you're coding is probably very different than the prompt you want to use when you're writing.

Patrick McKenzie: So when you consider these products as software products, they are products with a growth curve, which is almost absolutely unbelievable. But, these are very broad products, which are optimized by very intelligent teams and so, they want the out of the box behavior to be good for a very common denominator of early adopter. You are not a common denominator. You have stuff which is unique or maybe not unique about you, but descriptive that you could rattle off to a couple sentences to someone.

And just rattling off those couple sentences will make them better almost out of the box for you. So a thing that I tell LLMs early in the system prompt is, my name is Patrick. I have an engineering degree, and I'm a sophisticated user of LLMs. I understand you are an LLM. You never need to explain that to me, and I am sophisticated with regards to your limitations, and you don't have to explain those to me either.

And there are some just things that, for whatever reasons, the alchemy of the math or the training runs that the company has run tend to bias LLMs in the direction of doing. And I say, you know, often you are quite unctuous to a user. I don't need you to be contrarian just for the sake of being contrarian but I treat you like a middle seniority trusted colleague who has earned some level of being able to push back against me. And so if you feel strongly, please do push back. I virtually never use you for emotional support, and when I do, I will tell you I would like some emotional support right now. So please don't be overly deferential and attempt to perform emotional support for me.

Recently, they seem to know what I mean. You have this thing that is the engagement close. Where would you do blah, blah, blah? I typically find that annoying. I will tell you if I want you to go in a particular direction. So unless you really think I want it, don't do that. And so similar things to that. And you can do this by a conversation by conversation level, etc. We just had a talk at the conference where someone said the prompt "explain this like I'm five" produces the LLM output which under predicts my level of understanding.

I really liked your answer. With respect to that, what would you tell someone who is repeatedly saying, explain it like I'm five and being unhappy that they get explained things like they're five.

Zvi Mowshowitz: Explain it like I'm 10. Explain it like I'm 15, or in my case, explain it like I'm Zvi Mowshowitz and I actually spell my name because I have so much data at this point in the training corpus that it knows who I am in my style of approach and thinking. So I can just do that and that actually brings pretty good results.

Patrick McKenzie: If there is a person who routinely writes at your level of understanding or you enjoy their work or etc., you don't literally have to be Matt Levine to say hey, explain this like you would explain it to Matt Levine.

And, and that by the way, being able to go through different personas of user and you can tell an LLM explicitly model me as if blank, and then answer this question. So, you know. Model the person asking this question as a young person who works as a staff of a member of Congress and is quite sophisticated with regards to the procedures of that job but doesn't understand the domain, for example, will get you a very different answer than model me as if I have an engineering degree, but not specifically in the subfield.

We'll get you a different answer than you know, model me as if I'm a bright high school student and you can pick on a case by case basis which one works best for you.

Memory features in LLMs

LLMs have memory features too, which are somewhat obviating the need to be quite prescriptive in your system prompt.

But when I find something that I really, really want to stick in an LLM's memory, rather than saying, please remember that, blah, blah, blah I usually explicitly promote it to the system prompts. Because then I know they're much more likely to remember it.

Zvi Mowshowitz: System prompt is a lot more impactful than a memory.

So ChatGPT currently has memory both in terms of you create memories and it can reference your previous chats. The other LLMs have varying degrees of it. Gemini has some degree that I believe as of filming Anthropic's Claude, which is in fact my current model of choice. It is June 1st, 2025, by the way.

Does not currently have that, so you do have to remind that of any information you want to know in some other form. I expect that to change soon. Probably in a matter of weeks, maybe a few months. But you do need to be cognizant of how all this works, but actually, memory can go both ways.

A lot of readers have reported users readers of my column. That memory caused a bunch of sycophantic and other issues with ChatGPT's models in their observations. They had to turn memory off at least during one point when GPT-4o was being rather obnoxious. And this is a general problem in the internet where if the company is training on everything that you tell it, then both you have to worry about where everything you do is telling it.

And it might respond in ways you not pick you don't like, and it could get stuck in some sort of mode that you don't want it to be in. And it can be very difficult to get out of that. And in many ways, I kind of prefer the ability to curate exactly what it does and does not have in the context and what does and does not know about me.

And for ChatGPT, I don't turn memory off, but I also consider deleting chats if the chats seem to be causing issues in some form or I just don't particularly want that to be in context.

Patrick McKenzie: I occasionally have this problem with memory myself, but not enough that I default to memory off. I default to memory on.

And the sort of thing that can happen. So the sycophantic problem where also called glazing. I don't even know what that word means. I'm a millennial, not a gen, whatever it is, but think a donut putting all of this sugary glaze on the donut. Okay. I'm not even sure if I'm gonna have to edit out this portion.

But do I have to look this up on Urban Dictionary? Okay. Be that as it may. Occasionally they'll also ridiculously over index on facts that they have memorized. So, for example ChatGPT knows I use Vallejo paints, which is extremely relevant when giving me painting advice which is why I told it.

Please remember that I use Vallejo paints. So if you need to call a reference, telling me the exact name of the Vallejo paint is useful when I'm doing generic research with regards to say eight. Payment method related topics, Vallejo paints being present in the output is usually a mistake.

And yet that will occasionally happen because yeah, you told me you use Vallejo paints. I will paint the world in Vallejo for you. The thing I do when I would like it to temporarily forget something is that I believe ChatGPT has a feature where you can have the equivalent of an incognito window within the instance of ChatGPT, where it will neither use anything that it knows about you nor remember anything from that conversation in other chats.

And I find that useful, both when I would like a clean slate to work from. And also when there are topics where I don't really want to know what you as an LLM would tell me specifically, there is a persona that I really want you to target right now that I care what that person is hearing from the LLM, so please adopt that persona exclusively.

And don't even know who's pulling strings behind that.

Zvi Mowshowitz: Or you want to know what the generic answer is, or you want to have an answer you can defend as generic because you know, when I use o1 in ChatGPT at this point. There's enough between my system prompt and it remembering my various other inputs that you can never be sure how much of that output is it being biased by who it's talking to.

So if you wanted to give a truly objective output from o1 and it really mattered, you would want to use an incognito window, I believe.

Generative Engine Optimization (GEO)

Patrick McKenzie: A lot of this reminds me of the SEO game from back in the day where there were various tactics for using Google, where you could more closely approximate what sort of the generic user of Google was getting as output rather than you know, search history customized for you based on your previous interactions with the Google. And so, I don't think anyone has invented the field of LLM optimization yet either. What you need to put on the internet to get future LLMs trained on your stuff to say more things that you like or even, oh, you

Zvi Mowshowitz: mean

Patrick McKenzie: G-E-O-G-E-O?

Zvi Mowshowitz: Generative engine optimization. It is a, it's a thing.

Patrick McKenzie: It's a thing.

Zvi Mowshowitz: It's a thing.

Patrick McKenzie: What works for GEO right now?

Zvi Mowshowitz: So I don't have too many of the details. I wrote about this a bit in my last update, but definitely people what they're focusing on is the search capacity. So. It's a offshoot of SEO because LLMs will search in particular ways. And so you're trying to anticipate what searches will be triggered by LLMs and you're trying to get at the top of those particular searches and then serve up things that will cause the LLM to latch onto your things as relevant.

So you're trying to match the types of things the LLM will be on the lookout for.

Patrick McKenzie: So one way to get better at that is for chain of reasoning models such as the ones that come out of the box with ChatGPT and Claude these days, they will give you a partial explanation of what the chain of reasoning is, and that will often show you the exact search query they're running.

And I think they're running it on, it's presumably different for different companies, but was it Brave search or something

Zvi Mowshowitz: in the background or. It almost certainly varies by a company. I believe Claude will tell you what it's searching for and you can expand to see extra details, but I don't know what search engine it is off the top of my head.

Patrick McKenzie: Yep. You can see the search query. They're often painfully generic as someone who is an old hand at crafting search queries. They are not as skilled as using the tools as they are skilled at producing other outputs. And I perhaps that's intentional but, you know, given that you could influence a query, which is the very generic way to phrase something, but they're often six or eight or 10 word queries that they type in.

And influencing a 10 word query is not that hard relative to SEO-ing for, you know, low cost mortgages. So, simply getting better at SEO can get you better at GEO, I guess. The other obvious thing, but it deserves to be said. The LLMs are crawling or using people who have crawled large portions of the open internet.

You will show up more in the training set if your stuff is on the open internet. And so there are decisions you can make at the margin to have more stuff beyond the open internet. And particularly at a conference for writers you know, the paywall is a useful thing for generating money out of a variety of subscriptions.

But I would tell aspiring writers how to think very seriously about the mechanics of that paywall versus the binary, on off ness of it. For example, if one is writing a periodical, I would strongly consider okay, pay wall it for the first, you know. 60 days, 90 days, 365 days, whatever of the life of an article, at which point its value to you as news is low, but its value to you as fodder for SEO and GEO is much higher.

And then moving out of the paywall at that point, and assume that you're selling something about, you know, your presence on the beat and your particular voice and pro-social relationship maybe versus selling strictly access to your oldest of old words.

Zvi Mowshowitz: Yeah. As a writer, you look at the view counts for older posts and with notably rare exceptions, the numbers do not move after the first few days.

Right. Almost nobody goes into the archives, so there's no particular reason to hide the archives. It's not gonna motivate subscribers very often. There's, in fact, one thing I have subscribed to because I wanted to mine the archive for a while, but that is, in general very rare. It's also generally speaking like.

Patrick McKenzie: A business practice, a poor way to make money because the person who is attempting to get something out of the archives for from 2002 will subscribe for a minute to get that thing from the archives and then immediately unsubscribe and sometimes go the extra mile and either ask you for a refund for the subscription or chargeback.

All of that is complications that you as a business owner don't really need, just. Assume the value of the archives is very low from the perspective of directly generating revenue and more useful as a strategic tool. And then take the obvious steps to make it more valuable to you as a strategic tool.

Zvi Mowshowitz: The obvious conflict is, you know, you hear a lot about they stole our data. You know, it's not fair. We don't want the AI companies to use our data unless they're paying us or unless they work out some sort of deal. And then there's people like us who are thinking. We don't want to accidentally be left out of the training set because that would be a disaster.

We want to sculpt the cognitive features of the world. We want to be imbued into the collective intelligence. We think this is a good thing. Obviously being paid for it would be even better, but, you know, on the margin, I think most of us should welcome that. Yeah.

Patrick McKenzie: I think there should be an awareness of one's likelihood to get a bespoke deal with the large AI labs.

Very plausibly. The Wall Street Journal gets a bespoke deal, very plausibly. You know, Simon and Schuster gets a bespoke deal, very plausibly. The New York Times gets a bespoke deal. I think it is very unlikely that I get a bespoke deal unless I put a stupid amount of my relationship points into getting that and then get something which is worth a very, small amount of money to me, rather than things that are worth a stupid amount of relationship points.

And therefore I will not ask for that bespoke deal. And I think that other people who don't have the relationship points to spend are just vanishingly unlikely to get a bespoke deal. And so if the alternative is, get paid nothing but feel a bit of moral righteousness as being left out of the training set or get paid nothing.

Be in the training set and feel more righteousness because you're a good person and I have produced something in the world wanted. I would pick the second thing a hundred times out of a hundred.

Zvi Mowshowitz: Yeah. Be in the training set. Obviously if you're the New York Times, try to get paid for it.

Art and AI: Enhancing creativity

Patrick McKenzie: I have a small amateurish interest in art. And we'll pivot this from the craft of writing to the craft of art for a moment. But as we are talking about people who feel some sense of moral revulsion in various parts of the artistic community, there's this, oh, they'll never actually be able to produce art.

Oh, okay. They can produce something which kind of looks like terrible art. Oh, okay. They can kind of produce something which looks very much like art, but it's because they stole all of our stuff as training data bracket, all of that for a second. And I don't want to try to convince anyone from their aesthetic or moral judgments.

I'll just say as someone who is an amateur artist who is very unskilled at the particular thing I do, which happens to be painting of small miniature models three years ago. And is now slightly more skilled. These things are pretty wonderful for going up skill curves due to the what the comms teams for whatever reasons call multimodality and what I call it, you can just take out your iPhone, take a picture of anything, and then ask questions about it.

And so a thing that works disgustingly well is take a picture of something and then ask motivated questions. For me it's, here's a work in progress. You know, it's a dragon or whatever. And I know enough about the art to know that my goal is to have more contrast on this model than it has right now.

It does not have much contrast right now, and I don't know what to do to fix that, that it's not obvious. Can you tell me Sometimes I do know what to do to fix it because I've tried it and because my skill at execution is limited, it just hasn't worked the right very well. Here's a piece I have I've done X and Y and Z.

That's the procedural history. I don't love what I'm seeing right now. Here's the reason why I don't love what I'm seeing.

There was a troll, and I'm like, okay, it's too dark and I don't want to push it to cartoony levels. I don't want this to be a blizzard troll, but I do want it to be green. Tell me what you do. And it gives great answers to questions like that about pictures. And so simply choosing to find the kind of tough to discover buttons on, attach a picture attach a screenshot, etc. is much more useful than people would discover by themselves.

Zvi Mowshowitz: Yeah. You can also just point the camera at it for the live feed for a few of these, like products for Google in particular, and get the same result. And I have in fact found it to be useful in practice for just navigating stupid questions in various forms.

Patrick McKenzie: Their ability to OCR text out of images is also extraordinarily good better than their ability to produce text and images until the very recent history.

So the thing that I do frequently is, rather than figuring out, okay, what's the step to export this data from the website I'm looking into a CSV to be able to upload it successfully. Just take a grab of the screen, a grab of the graph I'm looking at, etc. Paste in his image, alright, tell me what you see here and then operate on it.

I think you broadly get better results if you say, tell me what to see here, and then ask questions, versus simply asking it to operate on what sees here because. It seems to me that their process is creating some internal representation of what they see and then operating at it. And the internal representation has less fidelity than if you explicitly ask them to verbalize the representation.

But I don't know if you've seen the same thing in your usage.

Zvi Mowshowitz: I haven't experienced that problem, but it also isn't a thing that comes up for me very often.

Recursive use of AIs

Patrick McKenzie: Cool. Right. Other things, recursive use of AIs. So AI to write output that you put through another AI I feel is a somewhat powerful technique.

Have you used this before?

Zvi Mowshowitz: Yeah. Opus wrote its own system prompt that I've been using because I had a very specific goal. I was just stop glazing me, right? No more sycophantic. This is the only thing I have a serious problem with this model. I'm gonna develop my entire system prompt with a few my priorities.

To solving this problem, and I'm gonna ask you, how do I hammer this? And that seems to have helped. It definitely could use more, so I'm gonna try punching that up a bit again, every so often. But yeah, often you'll, a classic thing you'll do is you'll write a deep research report and then you'll feed it into either the same or a different LLM and tell it to summarize the research report.

Patrick McKenzie: I typically find I get much better output from LLMs when I, rather than asking a one shot question, for example, here's a draft of an essay, give me comments or find all the volume. Mistakes are similar structure it more like the conversations that they have been trained to do well. And so, for the task of reviewing an essay off and pasting in one paragraph at a time, which is less time efficient.

But, thing you can do at the end of any sort of conversation is okay. Recap this conversation for me and tell me what prompt would've gotten most of the value out of this conversation without going through this winding thing. And you can link. Yourself. Should any of that prompt be in my system?

Prompt or you know if you're not willing to do a few seconds of introspection? Well, good news, cognition is free. Now just ask the LLM, what of that would you promote to my system? Prompt if you could and then make a decision on its recommendation for granted. They just love telling you what they think you want to hear, and so they'll probably bias some direction of promoting something, but

Zvi Mowshowitz: that's why first thing you have to fix in the system, prompt, and then you can fix the rest of it.

Patrick McKenzie: Any behavior you don't like, you can just tell it, stop doing that, and then suddenly, no, I really mean it. Stop doing that. No, I really, really mean it. And repetition sometimes works with them in ways you wouldn't expect. And things formatting

Zvi Mowshowitz: It starts to make sense. If you could consider. If you understand what they're doing, and then you ask how would this likely continue from here at this point? Would it be likely to double down? Would it be likely to understand that no, you really need it, and then to actually stop doing it?

You know, maybe you want to start a clean slate and then start with a better way than of expressing yourself. It's often true. But yeah, basically the way that you, I think, improve your system prompt essentially is every time the AI does something you. Wouldn't have wanted it to. Do you have a mental habit to ask?

Is that because I prompted it wrong, is that consistent prompt could improve? Is there something I could tweak about this model to make it better?

Patrick McKenzie: So a thing that's been reported to me by a few people who have actually shown me the right interactions with the AI is this is not addressing me on quite the right level and.

I think they have fallen into a pattern where because the interfaces look quite similar. They talk to the AI like they would talk to a buddy of theirs, and the buddy has, you know, years of context on their personality and etc., their own system prompt, if you will where the AI has just the input that you've given it and any explicit system prompt.

And so, the words you use kind of matter, if you speak in an erudite kind of fashion where essentially identifies you as a grad student, even without you explicitly saying that you're a grad student, you're going to get a graduate level answer back. And if you talk to it, well, yo do what this then you're going to get something which felt naturally completes the sentence, oh bro, it crazy, blah, blah, blah.

I don't even know if that's right. Maybe I'm hallucinating, yada, yada. You know, if you personally are just not the kind of person who writes text in, text messages at the level that you want to be answered at, again recursive use of these is stupidly powerful. Rewrite the following, one to two sentence quick loss as if it has been asked formally by, you know, a professor on a test, copy paste output into new window.

Here's the prompt. And then you'll get a professor level answer.

Zvi Mowshowitz: Yeah. The transforming text into similar text, especially transferring more text into less text is one of the best things LLMs do in terms of their skill level.

Patrick McKenzie: Yeah, they are scary good at style transfer. And there's still gaps you know, imitate the style of an author that I really like.

The more you like the author, the less happy you will be with the output and the better a writer you are. You'll see the seams more. But they're scary good. You know, this recipe for blueberry pie and the style of a Shakespearian sonnet is a thing that they've been able to pretty much nail for a while now.

And will it be the best Shakespearian sonnet ever? Shakespeare has, you know, a few claims to fame there. Will it be the best Shakespearean sonnet about blueberry pie you've ever read? Yes. And these days it will very plausibly not invent too many of the steps in making a blueberry pie.

Addressing LLM frustrations

What else would you tell someone who is, okay, I played with these tools. I agree that they're kind of powerful but yet I'm frustrated a lot of the time on the basis of experience.

Zvi Mowshowitz: I mean, I'd ask them, I guess, exactly what the frustration is. A lot of people have very different frustrations with them. A lot of people express frustration that it's not sufficiently reliable for them.

And that's probably the number one thing I've heard this weekend. So my response to that is. You don't want to put it in a situation in which it will try to hallucinate. Or it will have the impetus to do so. So what that means is why does a modern current top LLM hallucinate? Well, mostly it hallucinates when the natural continuation of what it's saying will involve knowing a piece of information or citing a source or otherwise, filling something in, in a gap, but it doesn't know what that's supposed to be.

And when it doesn't know, it's failed to find then it has this impetus of, oh, I'm supposed to put something here. Nothing I know of fits. So it makes something up that would fit. And now you have hallucination. And that, in my experience, is by far the most commonly that to happen. So there's an obvious way.

To avoid this, which is to not put it in that situation. And ask it for things in ways that you know that it has an answer to them or it had the natural out right where it can express that it doesn't know sort of free falling natural way. Another thing you can do is check its work. So you can check its work yourself, but I don't really see why you have to do it.

I do this for a reason. You can do this iterative thing. What's the output in that way? Right?

Checking for hallucinations in AI outputs

In our recent talk, we were talking about the possibility of, you know, just saying at the top of an o1 output in particular, I checked this and there were no hallucinations, and it is a big deal.

Well, you know, who else can check for hallucinations? Opus.

Patrick McKenzie: Yeah. Here's an output I got from a less capable LLM critique. It works pretty well and it even works across modalities a thing I do and I will, either put an image up on screen or drop a link in show notes.

Recently, just as a lark was DALL-E felt kind of magical three years ago when it came out now, and the outputs were filled, full of artifacts. But you know, I saw the best painting of a bunch of rabbits attending a seminar on human anatomy that I'd ever seen in my life.

You know, seconds after giving that prompt over to the model, recently I said, okay, here's the prompt, here's the image. It got back. Could this start artistically And got a several thousand words back about what it would definitely not do these days.

And I'm like, great. On the basis of that critique generates something for me that is responsive to that prompt. And then I don't necessarily know that asking for a critique and then asking that now, now you try again and do it better. Works very well, but it's very cheap to try. That's something that I would really love to emphasize for folks.

The cost and the typical, you are typing into one of these the marginal cost is just your time basically. And maybe, possibly, some usage or quota, but

Zvi Mowshowitz: yeah. But people will talk about how these things are so expensive sometimes, but compared to any other form of cognition or art generation or anything similar, there are absurdly many orders of magnitude cheap.

If you are reading the outputs, the price is effectively zero on the margin and you should treat it that way and you should run a lot of experiments and almost nobody is properly experimenting to see what they can generate. Considering the cost benefit, I definitely am not doing enough. Yeah,

Patrick McKenzie: the obvious thing to do is if it doesn't work in, you know, frontier model of your choice, immediately run it out of the competitor's thing and see if you that. I'll put more. You can literally just copy paste the same prompt. But you know, there are various things to try. Generally speaking that okay, it works great 30% of the time that I use it, then ask the question three times and you're you know, possibly it's falling into one of those failure modes just because the nature of the question, possibly you just rolled the dice wrong or, you got the bad result on the dice. But the law of large number works just, you know, ask it three times in slightly different ways and you'll get slightly different outputs and maybe one is transformatively different on the margin for you, and if not, okay, it costs you 60 seconds to to rule that out.

Versus the alternative might be a phone call to your lawyer that your build at had, you know. A hundred dollars every six minute interval.

Zvi Mowshowitz: You also have five different windows open. Yep. But you can literally just paste the prompt into all the different windows and come back and for research in particular, I'm trying to just build the mental habit of, no, if you ask a research style question, you ask all of them.

You always ask all of them. Why wouldn't you? There is no downside to doing this. Maybe you don't even check the second 30 ones, but the first one's good enough, but you, why not generate them? So they're there if you need them.

Patrick McKenzie: When you say ask all the times, is that ask all the models that have deep research.

Yeah. The big three. Yeah. So, when you're just satisfied with an output and you can articulate a reason or even gesture it, a reason why you're just satisfied with the output, immediately tell it that you're just satisfied with the output. And you know, they will. Often index quite highly on that in attempting to give you what you want.

And now if you tell it that you're dissatisfied with physical reality and you want it to pretend that the sky is pink a lot of them will, go quite in a pinkish direction for the sky for you. But there's a classic joke on Twitter about trying to take all the shrimp out of the novel All Quiet on the Western Front, which contains no shrimp, and the LLM being driven to madness in this one instance. And it both works as humor. And also you feel really badly for the character that is the LLM in the moment. But if you're not doing this in that sort of crazy nonproductive fashion, if you just say okay.

I think you're not considering it enough from this angle. Can you think much more about this angle than you'll often get, a new output, which will consider that angle a lot more, and sometimes it overweights on that, that you could say. I think you overweight it on that correct back. And you know, these are very replicable strategies for getting value out of these wonderful, magical new machines. Yep. Sounds right. So, you know, there's a fact out there in the world that I want, and I want you to find it for me where that fact does not factually exist as a great method to get a result that will dissatisfy you. Are there other pits of failure at the moment where we can look back in two years and say, oh yeah, remember that thing that they used to do all the time that they no longer do?

Zvi Mowshowitz: Leading

Patrick McKenzie: questions

Zvi Mowshowitz: Where it's obvious what answer you want. So it's gonna be inclined to say, great idea boss. It's gonna be inclined to agree with you as it is for emotional nature models, not necessarily all of them, and definitely don't want to encourage it. So one thing I've learned is, you know, if I want to critique a piece of my writing, I don't tell it.

That's critiquing me. I just tell it a piece of writing because that's how you get a realistic response. But having a guardian against that is probably gonna become, I hope, less of an issue. Yeah,

Patrick McKenzie: I think it is.

Zvi Mowshowitz: You

Patrick McKenzie: probably increasingly likely that they're rather doing stylistic analysis or no, the opposite of stylistic what do you call that?

Stylometric analysis or something much weirder than stylometric analysis, but which has the same payload. And so not telling it that I wrote something is probably not effective at disguising the fact that I wrote something, but. Only if it

Zvi Mowshowitz: has memory. Yeah. Right. If it knows who you are in detail, it's gonna be able to figure it out in some sense.

But also it might not be salient to it. Yeah. It might just, might not notice that it knows.

Patrick McKenzie: Right. And one would assume that you know, the things you told it most recently are most salient to it. And mentioning that this is the boss's work raises saliency versus Oh yeah, this is kind of the boss's work, but.

In conversations where people say, this is the boss's work, then,

Zvi Mowshowitz: right? You can also look at it as if I tell you it's the boss's work I'm emphasizing, you should act as if it is the boss's work. If I don't tell you, in fact, I explicitly seem to be avoiding telling you, I'm giving you the subtle clue that I.

I don't want you to do that. And you might realize what's going on and maybe you'll do it anyway, but it's a much smaller force. Yeah.

Prompting AI and humans effectively

Patrick McKenzie: It is a terrifying and true observation about the world that you know, LLMs are plainly superhuman versus everyone and, and increasing number of domains and then they're just better than.

A lot of people add a lot of things, and picking up on subtle signals is something that many of us are pretty bad at. I would put myself in that set and or at least would've for a lot of my life for putting in a lot of practice. LLMs are really good at reading some subtle social cues in text.

So, encourage them to read the right social cues. By the way you can use them as sort of a prosthesis, if they're, an absurdly useful thing if you are worried about have I implied something in this piece of text that I do not mean to imply is to copy paste the piece of text and say, have I implied anything here?

And what, you know, what is the subtle cue you would get out of it? There were various let's say professionally significant essays, that I wrote recently where I wanted to be extremely careful about what exactly I was saying.

Yeah. And I would step through on a paragraph for paragraph basis and say, okay, what exactly did I say here? And then ask questions adopt the perspective of a you know, sophisticated journalist at the New York Times. Would that journalist believe that I'm implying X? And you know.

Frequently get the no answer that I hoped for. And I, I would throw in some other ones to say, are you just telling me? What do I want to hear? Am I really implying this about my tone where I am and yeah. Seem to have a decent calibration on that.

Zvi Mowshowitz: Right. You always want to have tests where you want to make sure the behavior actually happens when it's supposed to happen.

Patrick McKenzie: Yeah. Reading the tone from things you have written I don't always find that they nail it, obviously, but, using them as a sparring partner or just riffing with them is more powerful than I would've expected. I once gave one a a few paragraphs of an essay that was going to be significant and said, what tone is this author going for with regards to the subject of these paragraphs?

And I remember two words of what came in now, but withering contempt. And I'm like, oh wow, withering contempt. Hmm. Do I really feel withering contempt with regards to this institution? I do feel it. I don't want that in the output. Let me tone it down a little bit. But it hadn't been obvious to me that I was showing that much of it in the prose sample.

Zvi Mowshowitz: Yeah, that's one of the ways in which I most often edit my writing is I write a draft and I go, okay, that came off really hostile. I am making a lot of very specific accusations against this person. They're true, they're very obviously accurate, but that's not the implication I would like to give. I would like to an Oliver Burns.

I would like to, you know, pretend to feign ignorance. That this situation is what it obviously is, while making it very clear to anyone who's paying attention that that is in fact the correct answer in my opinion. But leaving room for a, you know, a line of retreat, right? Like, like being, giving these people the opportunity to do it otherwise or to respond positively and therefore you go back and you go, okay, yeah, I didn't actually, can I say that that way?

And like, go, including, you know, well, there's. There's two ways to interpret this.

Patrick McKenzie: The in one case I was writing an article about another author I think authors spend a lot of their times doing and asked the LLM you know. I am attempting to hit a narrow bullseye here with respect to this person's worth.

Please tell me if I'm hitting this bullseye where I want to be critical, but not hypercritical. And I definitely don't want to cross over the line. Interpersonal attack. And the the LLM made a decent point, which is, okay, you've mentioned X and Y and Z, which are all true. But at that point it seems like piling on.

I was like, oh, that's interesting. If I was. Someone who felt a real sense of editorial restraint at the New York Times, how would I have the same factual payload without piling on? And it said the New York Times would you know, show editorial restraint by doing the following edits or the following way to communicate the same stuff, which was much less specific, less numerically oriented, etc.

And whenever that I thought, okay. It's a good thing. I don't actually work for the New York Times. If my writing had that style all the time, I would hate it. And it doesn't even matter if the LLM is correctly predicting the New York Times is you know, editorial line on it. It you're just writing co-partners at the moment, right.

Thought, okay. True one of these things, feels like a bit much, I could just take it out. The piece reads well without without it and took it out. And I often find that. When you're in that rewriting process of things and in a dialogue with the LLM, if it is seen in the first version and the see in the version that that came later as a result of you doing internal rewrites internal meaning, you know, you thought in your own head about the thing and then decide you like some words more asking it.

So I've made some edits to this, this paragraph, obviously. What do you think of them? And often it will, you know, tell you what it thinks you want to hear. Yeah. It's stronger because X and Y and Z. But you can say, I'm balancing a few goals with these edits. What do you think my goals are and how do you think it's doing on them?

It is much better than you would expect with regards to telling the, the discursive difference between two things that, paragraph length between them and then it will often tell you, yeah, you're achieving all your goals, but if it completely mis predicts what your goals are with the edits, that's sometimes something of a signal.

And again, if it's signal that is obviously wrong, you've only wasted 30 seconds of your life during the editing process.

Zvi Mowshowitz: It's not very different from how you would deal with a human in this situation. Right? You, you don't want to tip your hand as to what you want to hear. You want to make it so that if the person is in fact trying to tell you what you want to hear or the is doing this, then they will get the answer wrong and you'll be able to know that that's what happened.

But yeah, no, I would anticipate that it'd be very good at these things detecting very discreet, little changes and what they anticipate and what they mean. That kind of thing should be very easy for it.

Balancing AI assistance with personal voice

Patrick McKenzie: A part of my professional life for many years when writing in a more corporate voice, taking drafts, people asking.

I won't say it is the best writing partner that I've ever had, because that is incredibly untrue. But it is certainly relative to skill level, the cheapest that I've ever had. Because you can bug these things at any hour, day or night for free, where you certainly wouldn't want to wake up the head of your department at 3:00 AM in the morning or whenever you do your best writing, I don't know. To go through something that you are puzzling on in a relatively typical email. Another thing I found kind of useful, give it part of an essay and then ask what comes next. They are scary, good prediction engines and you're just using this to get your own juices flowing.

But when it successfully tracks your argument, it will often successfully track the, the next part of the argument. And you know, I think the next thing to come is a methodological you know, deconstruction of the facts of the journalistic under investigation.

You you are critiquing. That isn't exactly the, the next paragraph. I'm doing something right when it goes off into deep left field, you know, then you can have a moment of introspection. Okay, is it going off into deep left field because. My synthetic narrator here is not always reliable or so going off into deep left field because No, really, my organization, this essay could be better.

There was an obvious thing to talk about next.

Zvi Mowshowitz: Yeah. I mostly don't do this because I write a very unique style where I am often jumping from point to point that aren't necessarily that related and there is no possible way the LLM can know what is coming next. I noticed there's kind of a danger in my head when I hear these strategies of, is this gonna cause me to lose my nerve or my voice?

Is this gonna cause things to become more generic. And I noticed that I often don't ask questions and I hear we could run everything I post, before I, I read a right post right before coming here. Letting kids be kids. And certainly there was a lot of. Interesting statements in that article and there are certainly, I could have asked the LM a lot of these types of questions and something and he don't do that.

Just post it. Keep your own voice, keep your opinion pure. Don't, kind of smooth off the edges here. Yeah, that would be a good idea.

Patrick McKenzie: I definitely think they, they do try to, round things in a more smooth off the edges, make it more like the least common denominator writing, etc.

That's been all but a hallmark of their writing style. And to the extent that people are coming to you for your own voice, for your own opinions, for your own writing style don't lose that. I would also generally say, I do not run anything, everything that I write, or not every essay, and not every word in every essay pass the LLM.

There's no reason to. It seems that there are many users who. Are not writing anything except through a dialogue with an LLM. And I feel no sense of moral judgment on that. If that works for you, great. But you're probably not a professional writer. If you are a professional writer, there are things you write where the first thing you come up with is gonna be fine, where you wouldn't get an editor involved if you had your druthers.

And there are things where you would tell someone, I only need line edits on this. I'm pretty. I'm sure of the facts, I'm sure of the the general arguments. And then there are other things where it's, no, really you know, writing is a form of thinking and you do thinking over time, and I'm writing this to get a reaction from you and for those things, get the first reaction from an LLM before you get the first reaction from a human almost every time.

Given that there's, you know, a cost, either a dollar cost or a relationship cost of getting a human in the loop, you're up to a particular human. Yeah. Yeah, in software engineering we often call it rubber ducking, where prior to disturbing a senior engineer we have a little rubber duck in the room here and ask your question to the rubber duck.

Because often asking the question to the rubber duck they'll cause you to immediately realize the answer, asking the question to the rubber duck. That is the LLM. Will often cause you to, to realize the answer to it. Also, this rubber duck can quack back. And sometimes the quacking is absolutely noise, but sometimes it's the answer you wanted.

And so you can, you know, optimize out the interaction with the senior engineer or the executive or your lawyer or similar, a person who is valuable. Now granted there's recent chat with people about your work that are not strictly speaking, attempting to improve your work. Still have those chats but you know,

you're better for yourself and you're better for the other person if you don't ask them for the answer that they will obviously give you in the first 60 seconds after hearing your question. Aside from rapport building and social stuff, etc., I don't think either of your lives are really improved by the first 60 seconds.

So just skip it and move on. Well, you

Zvi Mowshowitz: ask questions to which you already know the answer. Yep.

Patrick McKenzie: Similarly, particularly when you have interactions that have a really high cost social or, or monetary prior to firing those I think it's worth some tiny portion of that cost asking the LLM.

What should I ask here? So, I think people who have paid for lawyers over the years understand that if you ask broad open-ended questions to your lawyers you will get a very long discourse, which consumes a lot of billable minutes. You probably don't want to prompt, from a lawyer with, here's an essay I've written, do an issue spotter.

Because lawyers are trained to be very good issue spotters sorry, this is a bit of jargon and a thing you can do with in learning the practices of law is to get told a narrative of a case or a commercial history or something and say, all right, use your best recollection of all the relevant case law and statute and similar, and point out all the things in this procedural history which you could possibly comment on.

Lawyers are trained how to do this. They are very good at it. And so if you give lawyers a prompt, just. Here's unstructured output. Do an issue spotter, they will give you exactly what you want, which is often not what you need. And so, you know, pre-processing that with, okay, here's the document.

I want a lawyer to comment on what do you think are the high salience questions and then. Tell your, you know, expert human buddy, here are three questions that I'd particularly like your input on. Bang, bang, bing. Also, if there's anything I'm missing, let me know. Then they are often much better oriented and it's a better experience for them too, because, you know.

Sometimes skeptical business people think, oh, lawyers are only giving me this non-actionable very verbose advice. Because they want to run up the billable hours. No, they don't have telepathy and they don't know what you want. And so if you ask them, okay, and address all of my possible needs, they're, best job to attempt to do that.

And that will burn a lot of billable hours.

Zvi Mowshowitz: Little from column A. A little from column B. Yeah. You know, prompting humans is just as important. Actually, vastly more important right now as prompting AI. And, you know, walking into this conference, you, you definitely get the impression of, you know, what are the most important things to do is to have good prompts for what do I want to say to people?

What do I want to cause them to talk about? What am I trying to accomplish here? And you can almost see, but when you don't have a good prompt, you, you see what happened. You're, oh yeah, I see how that happened. That's entirely predictable. I. Should not have done that, or you're not steering things in the directions you wanted or so on.

But if you think about your conversations that way, right, what prompts are you being given? What prompts are you giving back? You know, what does this prompted to Kate? They want back for me, not just what is the natural continuation of it? Then I, I think I've improved my interactions with people.

I think more about things this way as well.

Patrick McKenzie: I think I would broadly agree with that in a way, which would've been surprising to me a couple years ago. A few years ago, they, they felt really subpar communicators and how can you get better at working with good communicators when you yourself are a good communicator by getting more reps in with a subpar communicator.

But, you know, how can you get better as a, middling skill painter by painting a lot of low skill required things as well if you haven't totally mastered the mechanics. More reps in that you can get for free very quickly is a very useful thing to have. And that generalizes across all sorts of domains.

There's I don't even want to pretend that I can quote exercise lore here because I'd be obviously hallucinating it. But you know, there's some of the amount of time where you know, you are just going for attempting to push the boundaries of your own capabilities. And sometimes where it's just, no.

You probably haven't put in enough time at mastering a basic skill, just put in more time. I don't think people, particularly people who haven't used these extensively appreciate how much of an unlock it is simply that they've removed the rate limit on cognition and the rate limit on conversations that you could have because there's an always on tap of them that costs basically.

Zvi Mowshowitz: It does cost your attention. Yep. And you have to train yourself to think of it as free and then also train yourself to realize it's not free. So when deep research came out, a lot of people were, this is amazing. I have all these new 10 page reports, these 30 page reports, these 40 page reports.

And I found it useful in some particular circumstances, but I found it mostly useless. And the reason for that was. I already had all the input data.

Then I needed, I had more than I could process coming in from my systems that I'd set up. I didn't have was time to process it all. So generating a giant report that was kind of slop but had some good stuff in it is, okay, this isn't very good Signals, noise ratio isn't something I want.

I only want to do the deep research thing when I have a very specific, I need you to figure out this particular thing. I know then how to scan this thing very quickly, or I can potentially, although I don't generally do it, fit into another element and ask to extract the information I actually want from the report.

Patrick McKenzie: I think this involves unlearning some habits that a lot of us have had coming up on the internet where, writers tend to be very broadly read. Many of us are a bit obsessive with regards to things that we've read in the past. And so, you know, maybe I'll drop a link in the show notes people want to read. But there are some really great things out of the Federal Reserve in either St. Louis or Kansas back in the early two thousands about the phenomenon being unbanked or underbanked.

And so, many of us have, you know, carefully gardened histories or you know, areas in Dropbox, etc., where we kept information for forever. Definitely don't do that With respect to LLM outputs

Zvi Mowshowitz: You're kind of doing that. Right? You're in ChatGPT, you're asking questions because it has the memory thing. And then you have this series of chats that were kind of throwaway things that don't really matter and it's gonna remember them forever, and you're gonna use that as potential contacts to then sculpt your future interactions forever.

The same way that YouTube is remembering all the videos you watched. And in both of these cases, you might want to go into history and clean it up a bit. If there's things that you on reflection, think you're sculpting things in a bad way.

Patrick McKenzie: Yep. But you probably don't need to go into pack rat mentality.

And there's only been one useful thing that I've ever read about the phenomenon being unbanked. And so I want to make sure I have a copy of that PDF for the rest of my life. Just regenerate at need versus optimizing for.

Zvi Mowshowitz: Yeah. I, I, I think that I could do better at taking notes about these things and putting them in, in good places.

I do want to have these things available, but mostly I just feel I can find anything again if I need to find it again. And so I haven't worried about pack ratting, but there haven't definitely been times where I'm, where was that thing? And I can't find it and sad. Okay.

Patrick McKenzie: Yeah. It sounds great. So, Zvi thanks very much for taking the time to chat about some of the strategies today to help folks find them valuable. And where can people find you on the internet?

Zvi Mowshowitz: You can find me at thezvi.substack.com. Awesome. Thanks very much for being out today and for the rest of you, see you next week on Complex Systems.

podcast