large language models – Leapfroglog

Reclaiming Autonomy: Designing AI-Enhanced Work Tools That Empower Users

Based on an invited talk delivered at Enterprise UX, on November 21, 2025 in Amersfoort, the Netherlands.

In a previous life, I was a practicing designer. These days I’m a postdoc at TU Delft, researching something called Contestable AI. Today I want to explore how we can design AI work tools that preserve worker autonomy—focusing specifically on large language models and knowledge work.

The Meeting We’ve All Been In

Who’s been in this meeting? Your CEO saw a demo, and now your PM is asking you to build some kind of AI feature into your product.

This is very much like Office Space: decisions about tools and automation are being made top-down, without consulting the people who actually do the work.

What I want to explore are the questions you should be asking before you go off and build that thing. Because we shouldn’t just be asking “can we build it?” but also “should we build it?” And if so, how do we build it in a way that empowers workers rather than diminishes them?

Part 1: Reality Check

What We’re Actually Building

Large language models can be thought of as databases containing programs for transforming text (Chollet, 2022). When we prompt, we’re querying that database.

The simpler precursors of LLMs would let you take the word “king” and ask it to make it female, outputting “queen.” Now, language models work similarly but can do much more complex transformations—give it a poem, ask it to write in the style of Shakespeare, and it outputs a transformed poem.

The key point: they are sophisticated text transformation machines. They are not magic. Understanding this helps us design better.

Three Assumptions to Challenge

Before adding AI, we should challenge three things:

Functionality: Does it actually work?
Power: Who really benefits?
Practice: What skills or processes are transformed?

1. Functionality: Does It Work?

One problem with AI projects is that functionality is often assumed instead of demonstrated (Raji et al., 2022). And historically, service sector automation has not led to expected productivity gains (Benanav, 2020).

What this means: don’t just trust the demo. Demand evidence in your actual context. Ask for them to show it working in production, not a prototype.

2. Power: Who Benefits?

Current AI developments seem to favor employers over workers. Because of this, some have started taking inspiration from the Luddites (Merchant, 2023).

It’s a common misconception that Luddites hated technology. They hated losing control over their craft. They smashed frames operated by unskilled workers that undercut skilled craftspeople (Sabie et al., 2023).

What we should be asking: who gains power, and who loses it? This isn’t about being anti-technology. It’s about being pro-empowerment.

3. Practice: What Changes?

AI-enabled work tools can have second-order effects on work practices. Automation breaks skill transmission from experts to novices (Beane, 2024). For example, surgical robots that can be remotely operated by expert surgeons mean junior surgeons don’t learn by doing.

Some work that is challenging, complex, and requires human connection should be preserved so that learning can happen.

On the other hand, before we automate a task, we should ask whether a process should exist at all. Otherwise, we may be simply reifying bureaucracy. As Michael Hammer put it: “don’t automate, obliterate” (1990).

Every automation project is an opportunity to liberate skilled professionals from bureaucracy.

Part 2: Control → Autonomy

All three questions are really about control. Control over whether tools serve you. Control over developing expertise. This is fundamentally about autonomy.

What Autonomy Is

A common definition of autonomy is the effective capacity for self-governance (Prunkl, 2022). It consists of two dimensions:

Authenticity: holding beliefs that are free from manipulation
Agency: having meaningful options to act on those beliefs

Both are necessary for autonomy.

Office Space examples:

Authenticity: Joanna’s manager tells her the minimum is 15 pieces of flair, then criticizes her for wearing “only” the minimum. Her understanding of the rules gets manipulated.
Agency: Lumbergh tells Peter, “Yeah, if you could come in on Saturday, that would be great.” Technically a request, but the power structure eliminates any real choice.

How AI Threatens Autonomy

AI can threaten autonomy in a variety of ways. Here are a few examples.

Manipulation — Like TikTok’s recommendation algorithm. It exploits cognitive vulnerabilities, creating personalized content loops that maximize engagement time. This makes it difficult for users to make autonomous decisions about their attention and time use.

Restricted choice — LinkedIn’s automated hiring tools can automatically exclude qualified candidates based on biased pattern matching. Candidates are denied opportunities without human review and lack the ability to contest the decision.

Diminished competence — Routinely outsourcing writing, problem-solving, or analysis to ChatGPT without critical engagement can lead to atrophying the very skills that make professionals valuable. Similar to how reliance on GPS erodes navigational abilities.

These are real risks, not hypothetical. But we can design AI systems to protect against these threats—and we can do more. We can design AI systems to actively promote autonomy.

A Toolkit for Designing AI for Autonomy

Here’s a provisional toolkit with two parts: one focusing on design process, the other on product features (Alfrink, 2025).

Process:

Reflexive design
Impact assessment
Stakeholder negotiation

Product:

Override mechanisms
Transparency
Non-manipulative interfaces
Collective autonomy support

I’ll focus on three elements that I think are most novel: relfexive design, stakeholder negotiation, and collective autonomy support.

Part 3: Application

Example: LegalMike

LegalMike is a Dutch legal AI platform that helps lawyers draft contracts, summarize case law, and so forth. It’s a perfect example to apply my framework—it uses an LLM and focuses on knowledge work.

1. Reflexive Design

The question here: what happens to “legal judgment” when AI drafts clauses? Does competence shift from “knowing how to argue” to “knowing how to prompt”?

We should map this before we start shipping.

This is new because standard UX doesn’t often ask how AI tools redefine the work itself.

2. Stakeholder Negotiation

Run workshops with juniors, partners, and clients:

Juniors might fear deskilling
Partners want quality control
Clients may want transparency

By running workshops like this, we make tensions visible and negotiate boundaries between stakeholders.

This is new because we have stakeholders negotiate what autonomy should look like, rather than just accept what exists.

3. Collective Autonomy Support

LegalMike could isolate, or connect. Isolating means everyone with their own AI. But we could deliberately design it to surface connections:

Show which partner’s work the AI drew from
Create prompts that encourage juniors to consult seniors
Show how firm expertise flows, not just individual outputs

This counters the “individual productivity” framing that dominates AI products today.

Tool → Medium

These interventions would shift LegalMike from a pure efficiency tool to a medium for collaborative legal work that preserves professional judgment, surfaces power dynamics, and strengthens collective expertise—not just individual output.

Think of LLMs not as a robot arm that automates away knowledge work tasks—like in a Korean noodle shop. Instead, it can be the robot arm that mediates collaboration between humans to produce entirely new ways of working—like in the CRTA visual identity project for the University of Zagreb.

Conclusion

AI isn’t neutral. It’s embedded in power structures. As designers, we’re not just building features—we’re brokers of autonomy.

Every design choice we make either empowers or disempowers workers. We should choose deliberately.

And seriously, watch Office Space if you haven’t seen it. It’s the best “documentary” about workplace autonomy ever made. Mike Judge understood this as early as 1999.

Designing Learning Experiences in a Post-ChatGPT World

Transcript of a talk delivered at LXDCON’25 on June 12, 2025.

My name is Kars. I am a postdoc at TU Delft. I research contestable AI—how to use design to ensure AI systems remain subject to societal control. I teach the responsible design of AI systems. In a previous life, I was a practicing designer of digital products and services. I will talk about designing learning experiences in a post-ChatGPT world.

Let’s start at this date.

This is when OpenAI released an early demo of ChatGPT. The chatbot quickly went viral on social media. Users shared examples of what it could do. Stories and samples included everything from travel planning to writing fables to coding computer programs. Within five days, the chatbot had attracted over one million users.

Fast forward to today, 2 years, 6 months, and 14 days later, we’ve seen a massive impact across domains, including on education.

For example, the article on the left talks about how AI cheating has become pervasive in higher education. It is fundamentally undermining the educational process itself. Students are using ChatGPT for nearly every assignment while educators struggle with ineffective detection methods and question whether traditional academic work has lost all meaning.

The one on the right talks about how students are accusing professors of being hypocritical. Teachers are using AI tools for things like course materials and grading while telling students they cannot use them.

What we’re looking at is a situation where academic integrity was already in question, on top of that, both students and faculty are quickly adopting AI, and institutions aren’t really ready for it.

These transformations in higher education give me pause. What should we change about how we design learning experiences given this new reality?

So, just to clarify, when I mention “AI” in this talk, I’m specifically referring to generative AI, or GenAI, and even more specifically, to chatbots that are powered by large language models, like ChatGPT.

Throughout this talk I will use this example of a learning experience that makes use of GenAI. Sharad Goel, Professor at Harvard Kennedy School, developed an AI Slackbot named “StatGPT” that aims to enhance student learning through interactive engagement.

It was tested in a statistics course with positive feedback from students. They described it as supportive and easily accessible, available anytime for student use. There are plans to implement StatGPT in various other courses. They say it assists in active problem-solving and consider it an example of how AI can facilitate learning, rather than replace it.

The debate around GenAI and learning has become polarized. I see the challenge as trying to find a balance. On one side, there’s complete skepticism about AI, and on the other, there’s this blind acceptance of it. What I propose is that we need an approach I call Conscious Adaptation: moving forward with full awareness of what’s being transformed.

To build the case for this approach, I will be looking at two common positions in the debates around AI and education. I’ll be focusing on four pieces of writing.

Two of them are by Ethan Mollick, from his blog. He’s a professor at the University of Pennsylvania specializing in innovation and entrepreneurship, known for his work on the potential of AI to transform different fields.

The other two pieces are by Ian Bogost, published at The Atlantic. He’s a media studies scholar, author, and game designer who teaches at Washington University. He’s known for his sobering, realist critiques of the impact of technology on society.

These, to me, exemplify two strands of the debate around AI in education.

Ethan Mollick’s position, in essence, is that AI in education is an inevitable transformation that educators must embrace and redesign around, not fight.

You could say Mollick is an optimist. But he is also really clear-eyed about how much disruption is going on. He even refers to it as the “Homework Apocalypse.” He talks about some serious issues: there are failures in detection, students are not learning as well (with exam performance dropping by about 17%), and there are a lot of misunderstandings about AI on both sides—students and faculty.

But his perspective is more about adapting to a tough situation. He’s always focused on solutions, constantly asking, “What can we do about this?” He believes that with thoughtful human efforts, we can really influence the outcomes positively.

On the other hand, Ian Bogost’s view is that AI has created an unsolvable crisis that’s fundamentally breaking traditional education and leaving teachers demoralized.

Bogost, I would describe as a realist. He accepts the inevitability of AI, noting that the “arms race will continue” and that technology will often outpace official policies. He also highlights the negative impact on faculty morale, the dependency of students, and the chaos in institutions.

He’s not suggesting that we should ban AI or go back to a time before it existed. He sees AI as something that might be the final blow to a profession that’s already struggling with deeper issues. At the same time, he emphasizes the need for human agency by calling out the lack of reflection and action from institutions.

So, they both observe the same reality, but they look at it differently. Mollick sees it as an engineering challenge—one that’s complicated but can be tackled with smart design. On the other hand, Bogost views it as a social issue that uncovers deeper problems that can’t just be fixed with technology.

Mollick thinks it’s possible to rebuild after a sort of collapse, while Bogost questions if the institutions that are supposed to do that rebuilding are really fit for the job.

Mollick would likely celebrate it as an example of co-intelligence. Bogost would likely ask what the rollout of the bot would be at the expense of, or what deeper problems its deployment unveils.

Getting past the conflict between these two views isn’t just about figuring out the best technical methods or the right order of solutions. The real challenge lies in our ability as institutions to make real changes, and we need to be careful that focusing on solutions doesn’t distract us from the important discussions we need to have.

I see three strategies that work together to create an approach that addresses the conflict between these two perspectives in a way that I believe will be more effective.

First, institutional realism is about designing interventions assuming institutions will resist change, capture innovations, or abandon initiatives. Given this, we could focus on individual teacher practices, learner-level tools, and changes that don’t require systemic transformation. We could treat every implementation as a diagnostic probe revealing actual (vs. stated) institutional capacity.

Second, loss-conscious innovation is about before implementing AI-enhanced practices, explicitly identifying what human learning processes, relationships, or skills are being replaced. We could develop metrics that track preservation alongside progress. We could build “conservation” components into new approaches to protect irreplaceable educational values.

Third, and finally, we should recognize that Mollick-style solution-building and Bogost-style critical analysis serve different but essential roles. Practitioners need actionable guidance; while the broader field needs diagnostic consciousness. We should avoid a false synthesis but instead maintain both approaches as distinct intellectual work that informs each other.

In short, striking a balance may not be the main focus; it’s more about taking practical actions while considering the overall context. Progress is important, but it’s also worth reflecting on what gets left behind. Conscious adaptation.

So, applying these strategies to Harvard’s chatbot, we could ask: (1) How can we create a feedback loop between an intervention like this and the things it uncovers about institutional limits, so that those can be addressed in the appropriate place? (2) How can we measure what value this bot adds for students and for teachers? What is it replacing, what is it adding, what is it making room for? (3) What critique of learning at Harvard is implied by this intervention?

What does all of this mean, finally, for LXD? This is an LXD conference, so I don’t need to spend a lot of time explaining what it is. But let’s just use this basic definition as a starting point. It’s about experiences, it’s about centering the learner, it’s about achieving learning outcomes, etc.

Comparing my conscious adaptation approach to what typifies LXD, I can see a number of alignments.

Both LXD and Conscious Adaptation prioritize authentic human engagement over efficiency. LXD through human-centered design, conscious adaptation through protecting meaningful intellectual effort from AI displacement.

LXD’s focus on holistic learning journeys aligns with both Mollick’s “effort is the point” and Bogost’s concern that AI shortcuts undermine the educational value embedded in struggle and synthesis.

LXD’s experimental, prototype-driven approach mirrors my “diagnostic pragmatism”—both treat interventions as learning opportunities that reveal what actually works rather than pursuing idealized solutions.

So, going back one final time to Harvard’s bot, an LXD practice aligned in this way would lead us to ask: (1) Is this leveraging GenAI to protect and promote genuine intellectual effort? (2) Are teachers and learners meaningfully engaged in the ongoing development of this technology? (3) Is this prototype properly embedded, so that its potential to create learning for the organization can be realized?

So, where does this leave us as learning experience designers? I see three practical imperatives for Conscious Adaptation.

First, we need to protect meaningful human effort while leveraging AI’s strengths. Remember that “the effort is the point” in learning. Rather than asking “can AI do this?”, we should ask “should it?” Harvard’s bot works because it scaffolds thinking rather than replacing it. We should use AI for feedback and iteration while preserving human work for synthesis and struggle.

Second, we must design for real institutions, not ideal ones. Institutions resist change, capture innovations, and abandon initiatives. We need to design assuming limited budgets, overworked staff, and competing priorities. Every implementation becomes a diagnostic probe that reveals what resistance actually tells us about institutional capacity.

Third, we have to recognize the limits of design. AI exposes deeper structural problems like grade obsession, teacher burnout, and test-driven curricula. You can’t design your way out of systemic issues, and sometimes the best move is recognizing when the problem isn’t experiential at all.

This is Conscious Adaptation—moving forward with eyes wide open.

Thanks.

On how to think about large language models

How should we think about large language models (LLMs)? People commonly think and talk about them in terms of human intelligence. To the extent this metaphor does not accurately reflect the properties of the technology, this may lead to misguided diagnoses and prescriptions. It seems to me an LLM is not like a human or a human brain in so many ways. One crucial distinction for me is that LLMs lack individuality and subjectivity.

What are organisms that similarly lack these qualities? Coral polyps and Portuguese man o’ war come to mind, or slime mold colonies. Or maybe a single bacterium, like an E. coli. Each is essentially identical to its clones, responds automatically to chemical gradients (bringing to mind how LLMs respond to prompts), and doesn’t accumulate unique experiences in any meaningful way.

Considering all these examples, the meme about LLMs being like a shoggoth (an amorphous blob-like monster originating from the speculative fiction of Howard Philips Lovecraft) is surprisingly accurate. The thing about these metaphors though is that it’s about as hard to reason about such organisms as it is to reason about LLMs. So to use them as a metaphor for thinking about LLMs won’t work. A shoggoth is even less helpful because the reference will only be familiar to those who know their H.P. Lovecraft.

So perhaps we should abandon metaphorical thinking and think historically instead. LLMs are a new language technology. As with previous technologies, such as the printing press, when they are introduced, our relationship to language changes. How does this change occur?

I think the change is dialectical. First, we have a relationship to language that we recognize as our own. Then, a new technology destabilizes this relationship, alienating us from the language practice. We no longer see our own hand in it. And we experience a lack of control over language practice. Finally, we reappropriate this language use in our practices. In this process of reappropriation, language practice as a whole is transformed. And the cycle begins again.

For an example of this dialectical transformation of language practice under the influence of new technology, we can take Eisenstein’s classic account of the history of the printing press (1980). Following its introduction many things changed about how we relate to language. Our engagement with language shifted from a primarily oral one to a visual and deliberative one. Libraries became more abundantly stocked, leading to the practice of categorization and classification of works. Preservation and analysis of stable texts became a possibility. The solitary reading experience gained prominence, producing a more private and personal relationship between readers and texts. Concerns about information overload first reared its head.

All of these things were once new and alien to humans. Now we consider them part of the natural order of things. They weren’t predetermined by the technology, they emerged through this active tug of war between groups in society about what the technology would be used for, mediated by the affordances of the technology itself.

In concrete material terms, what does an LLM consist of? An LLM is just numerical values stored in computer memory. It is a neural network architecture consisting of billions of parameters in weights and biases, organized in matrices. The storage is distributed across multiple devices. System software loads these parameters and enables the calculation of inferences. This all runs in physical data centers housing computing infrastructure, power, cooling, and networking infrastructure. Whenever people start talking about LLMs having agency or being able to reason, I remind myself of these basic facts.

A printing press, although a cleverly designed, engineered, and manufactured device, is similarly banal when you break it down to its essential components. Still, the ultimate changes to how we relate to language have been profound. From these first few years of living with LLMs, I think it is not unreasonable to think they will cause similar upheavals. What is important for me is to recognize how we become alienated from language, and to see ourselves as having agency in reappropriating LLM-mediated language practice as our own.