Artificial Intelligence, Writing, and Editing (Part 5): Reckoning and Judgment - Clarity Doctors Writing Consultants | Editing and Academic Coaching for Authors

The public discussion that started with ChatGPT continues. It now concerns language models (LLMs) in general, most recently with the (troubled) launches of Google’s Bard and Microsoft’s Sydney. There’s deep fear that LLMs will completely disrupt writing, editing, and knowledge work in general. The media’s hot take treadmill thinks that AI is coming for our jobs sooner than we thought!

We disagree, and today we’ll revisit an earlier discussion of this matter. We think that LLMs really are game-changing, but the fear around them is overblown. We’ve written about this here, here, here and here, so take a look if the rabbit-hole calls to you. Our arguments are based on the fundamental limitations of currently existing LLMs. If you’re comfortable with a bit of mathiness, this article is an accessible but deep overview of the relevant terrain.

LLMs can’t yet supplant human judgment, because current machine learning methods aren’t made to do that.

Reckoning and Judgment

In a rigorous, opinionated, and accessible overview of the state of AI, philosopher Brian Cantwell Smith distinguishes two very different kinds of human capacities: reckoning and judgment.

Reckoning is a matter of calculative prowess. It involves our capacities to gather, manipulate, and apply information. Reckoning tasks pervade our lives, but computers are far, far better than us in most of them. We have outsources many reckoning tasks to computers. For example, we typically no longer memorize phone numbers and addresses; our phones’ reckoning powers are perfectly adequate to the task.

Judgment is different. It’s much closer to home. It is the normative ideal to which we hold full-blooded human intelligence. He defines it as

a form of dispassionate deliberative thought, grounded in ethical commitment and responsible action, appropriate to the situation in which it is deployed.
Cantwell Smith 2019, 17

Our evaluations of ourselves and others primarily depend on our judgment capacities. (More reckoning ability is better, all else equal. But all else is almost never equal.) The capacities that go into judgment are many, but they all relate to connecting knowledge, action, and responsibility in a very fluid and context-sensitive way. There are no definitive ways of having or developing good judgment. It’s the ultimate context-sensitive skill, and its development underlies many other capacities that we value like justice, insight, creativity, or leadership.

It would be very strange to decide to go to war based only on recommendations from even the best LLMs. Even if the LLM pulled together a vast amount of information, the final decision is a matter of committed human deliberation given the stakes.

Might algorithms one day make social and political decisions on their own? Maybe. But we wouldn’t let them do it unless we had already vetted their judgment in the same way that we vet other humans’ judgment. Vetting judgment is hard, and we often fail to do it well. (Think of our political systems and the people who rise to the top in them. They are typically not the best of us.)

Machines Learn Reckoning, but Use Our Judgment

It’s useful to keep reckoning and judgment in mind when thinking about LLMs. Their reckoning ability is genuinely impressive. They can pull together a decent essay on almost any topic on which human discourse exists. They can set us up with plausibly relevant information and a decent template amazingly quickly. Where LLMs tend to fail (or wobble) is in their judgment. This has been noticed in many different ways since LLMs entered the public sphere.

Behind every AI is a pile of human judgment. Photo by Hitesh Choudhary on Unsplash.

For example, LLMs will often err on the side of overwriting or offering related, but not-quite-on-point information when asked to write about some topic. Figuring out the right level of conciseness is a judgment call that humans often fail at.

There’s room for pushback here. Isn’t it impressive how far LLMs have come in the last few years? They’ve gone from writing chronically incoherent text to doing at least a decent job with most long-form text. That’s fair, but this is an advance in reckoning capacity, not in judgment.

We don’t even know how to start automating judgment. Whatever decent judgment LLMs show is not a function of automated learning methods. It’s added to the LLM in a step called “reinforcement learning from human feedback” (RLHF). The companies behind LLMs have, in essence, outsourced the judgment task to low-paid “digital sweatshops” in developing countries.

This is not widely known for obvious reasons. But the omission of RLHF makes many people think that machine learning has accomplished more than it has. While LLMs have made a genuine breakthrough with respect to reckoning tasks around natural language, but they have not made any progress on judgment. The open secret of AI research is that we don’t know how to make the leap into genuine machine judgment.

It’s important to keep reckoning and judgment separate in our minds. Fortunately, the distinction is fairly intuitive. Failing to be clear here risks giving away what’s valuable about human thought to the LLMs. We humans are not especially good at math, or reckoning, or fast calculation. Our brains consume about as much power as a 120 watt light bulb. But our capacity to deliberate, and to refine our approach to the world remains unparalleled. It’s not perfect, but it’s the best judgment capacity there is.

Machine Map, Human Territory

LLMs talk about the thunderstorms, taxes, cats, and whatever else you prompt them to do. But they don’t really know about thunderstorms, taxes, and cats. What they know are exquisitely intricate patterns of how we use language around the weather, taxes, and cats, which allows them to carry on coherently and sometimes even insightfully about these topics.

Recognizing that our conceptual maps are not the same thing as the world itself is an exercise of judgment. As the ChatGPT website says, one of the key limitations of the model’s training is that “there is currently no source of truth” for the model. (Side note: LLMs can talk about the difference between the words and the things the words refer to, but this is only because our language has these patterns.) Current LLMs are “worldless”. They can talk about our world, but until they have a world of their own, it will be impossible to get them to appreciate the difference between the map and the territory themselves.

So What?

We should not take our capacities for judgment lightly. To debase our judgment now before the reckoning capacities of LLMs would be an ethical and political failing on a grand scale.

Coming back to something we discussed in our very first post on this topic, the bigger problem than the capacities of LLMs is the fear born of ignorance around them. So learn how these systems work. Once you know what they can and can’t do, you will be less disoriented. And carry on refining your judgment; indeed, it’s all we need to prove our value.