Postdoc update – December 2025

Six months since my last update. The pace hasn’t slowed. Here’s what I’ve been up to and what’s on the horizon for the next six months or so. But first, a very welcome December break. Happy holidays, dear reader.

Happenings

People’s Compute at Goldsmiths: On September 18, I presented my research agenda, “People’s Compute: Design and the Politics of AI Infrastructures,” at the Politics of AI symposium at Goldsmiths, University of London. Many thanks to Dan McQuillan, Fieke Jansen, Jo Lindsay Walton, and Pat Brody for the invitation and for putting together such a thought-provoking program. Read the transcript.

Digital Autonomy Unconference: On October 3, I attended the Digital Autonomy Unconference in Amsterdam, which was organized in collaboration with Code for NL and focused on enhancing digital autonomy within Dutch public institutions. The Digital Autonomy Competence Center was also launched at this event, for which I serve as a research associate. Read the news item.

Master’s Graduation Projects: Two more of my students have graduated. Ameya Sawant completed a project about designer autonomy and GenAI (August 29, with Fernando Secomandi as chair). David Mieras completed a project about the responsible use of AI in policy preparation (October 28, with Lianne Simonse as chair).

Personal Grant: I mentioned going for a personal grant the last time around. Unfortunately, I did not advance to the final round. However, I did receive some useful feedback and will try again next year. Onwards and upwards.

Designing Responsible AI: Sara Colombo, Francesca Mauri, and I ran the second iteration of our master’s elective course, which builds on responsible research and innovation, value-sensitive design, and design fiction. See the course description here. A more detailed write-up of how the course works is forthcoming.

International Contestable AI Workshop: On November 18, I had the pleasure of hosting a delegation from Denmark and the UK for a full-day workshop about Contestable AI at TU Delft. Read the report.

Enterprise UX: On November 21, I delivered an invited talk titled “Reclaiming Autonomy: Designing AI-Enhanced Work Tools That Empower Users” at the Enterprise UX conference in Amersfoort. Thanks to Peter Boersma for the invitation. The references to Office Space and Luddism were surprisingly well-received. Read the transcript.

Your difficult design doctor, holding forth at Enterprise UX.

NIAS Workshop: I participated in a workshop at NIAS on November 26-27, exploring permacomputing, server collectives, and networks of consent. Incredibly inspiring, it has given me many new ideas for approaching my own ongoing research. Many thanks to John Boy for the invitation. View the event page.

Stop the Cuts (continued): The fight against cuts to higher education continues. On December 9 we once again went on strike and I joined the demonstration in Amsterdam with over 7,000 participants. Our far-right government may have fallen, but the cuts remain on the table. Now is the time to maintain pressure on the parties forming a government. If you work in academia and want to act, join a union (Aob or FNV) and sign up for the WOinActie newsletter.

Advisory Today, Co-Decisive Tomorrow? A paper based on a year-long participant observation of a smart city project in Amsterdam, co-authored with Mike de Kreek, Tessa Steenkamp, and Martijn de Waal (part of the Human Values for Smarter Cities project), has been accepted for the 2026 Participatory Design Conference. Very pleased about that one. A preprint will be up once we submit the final camera-ready version, I think.

ThingsCon TH/NGS: At this year’s ThingsCon conference on December 12, Fieke Jansen, Sunjoo Lee, Lena Trotereau, and I ran a workshop titled “From Mud to Models” exploring regenerative futures for community AI. Thanks to Iskander Smit for bringing us together. A report is forthcoming.

Participants building clocks powered by mud batteries at our ThingsCon workshop.

Open Letter on AI Policy: I was part of the supporting team for an open letter calling on Dutch politicians to develop a national AI policy that promotes social progress. A special thanks goes out to Cristina Zaga for taking the initiative and leading the charge on this one, but also to the core team members Roel Dobbe, Iris van Rooij, Lilian de Jong, Wouter Nieuwenhuizen, Marcela Suarez, Wiep Hamstra, and Olivia Guest, and the supporting team that myself was part of Felienne Hermans, Mark D., Eelco Herder, Emile van Bergen, Siri Beerends, Nolen Gertz, Paul Peters, Gerry McGovern, Kars Alfrink, and Jelle van der Ster. Sign and share the letter here.

On deck

Looking ahead to the new year, I have several writing projects to complete: one chapter on contestability for an edited volume on the philosophy of engineering, and another chapter for an edited volume on community AI.

I will be wrapping up my duties as associate chair for the CHI 2026 design subcommittee. And I will also serve as associate chair for the DIS 2026 artifacts & systems subcommittee.

I will do the analysis and write-up of a field evaluation of the Vision Model Macroscope prototype (also part of the aforementioned Human Values for Smarter Cities project). I am also providing support on several other papers that will hopefully find their way into venues such as FAccT, DIS, and elsewhere.

Mockup of the Vision Model Macroscope prototype.

Finally, I am part of several small grant applications exploring topics that include the potential of computational argumentation techniques to enable more interactive implementations of contestable AI, as well as contestability in digital systems used for evidence management in international criminal justice.

That’s most of it, although not all of it, but this has gotten way too long already. Thanks for reading this far, if you have, and best wishes for 2026.

Reclaiming Autonomy: Designing AI-Enhanced Work Tools That Empower Users

Based on an invited talk delivered at Enterprise UX, on November 21, 2025 in Amersfoort, the Netherlands.

In a previous life, I was a practicing designer. These days I’m a postdoc at TU Delft, researching something called Contestable AI. Today I want to explore how we can design AI work tools that preserve worker autonomy—focusing specifically on large language models and knowledge work.

The Meeting We’ve All Been In

Who’s been in this meeting? Your CEO saw a demo, and now your PM is asking you to build some kind of AI feature into your product.

This is very much like Office Space: decisions about tools and automation are being made top-down, without consulting the people who actually do the work.

What I want to explore are the questions you should be asking before you go off and build that thing. Because we shouldn’t just be asking “can we build it?” but also “should we build it?” And if so, how do we build it in a way that empowers workers rather than diminishes them?

Part 1: Reality Check

What We’re Actually Building

Large language models can be thought of as databases containing programs for transforming text (Chollet, 2022). When we prompt, we’re querying that database.

The simpler precursors of LLMs would let you take the word “king” and ask it to make it female, outputting “queen.” Now, language models work similarly but can do much more complex transformations—give it a poem, ask it to write in the style of Shakespeare, and it outputs a transformed poem.

The key point: they are sophisticated text transformation machines. They are not magic. Understanding this helps us design better.

Three Assumptions to Challenge

Before adding AI, we should challenge three things:

  1. Functionality: Does it actually work?
  2. Power: Who really benefits?
  3. Practice: What skills or processes are transformed?

1. Functionality: Does It Work?

One problem with AI projects is that functionality is often assumed instead of demonstrated (Raji et al., 2022). And historically, service sector automation has not led to expected productivity gains (Benanav, 2020).

What this means: don’t just trust the demo. Demand evidence in your actual context. Ask for them to show it working in production, not a prototype.

2. Power: Who Benefits?

Current AI developments seem to favor employers over workers. Because of this, some have started taking inspiration from the Luddites (Merchant, 2023).

It’s a common misconception that Luddites hated technology. They hated losing control over their craft. They smashed frames operated by unskilled workers that undercut skilled craftspeople (Sabie et al., 2023).

What we should be asking: who gains power, and who loses it? This isn’t about being anti-technology. It’s about being pro-empowerment.

3. Practice: What Changes?

AI-enabled work tools can have second-order effects on work practices. Automation breaks skill transmission from experts to novices (Beane, 2024). For example, surgical robots that can be remotely operated by expert surgeons mean junior surgeons don’t learn by doing.

Some work that is challenging, complex, and requires human connection should be preserved so that learning can happen.

On the other hand, before we automate a task, we should ask whether a process should exist at all. Otherwise, we may be simply reifying bureaucracy. As Michael Hammer put it: “don’t automate, obliterate” (1990).

Every automation project is an opportunity to liberate skilled professionals from bureaucracy.

Part 2: Control → Autonomy

All three questions are really about control. Control over whether tools serve you. Control over developing expertise. This is fundamentally about autonomy.

What Autonomy Is

A common definition of autonomy is the effective capacity for self-governance (Prunkl, 2022). It consists of two dimensions:

  • Authenticity: holding beliefs that are free from manipulation
  • Agency: having meaningful options to act on those beliefs

Both are necessary for autonomy.

Office Space examples:

  • Authenticity: Joanna’s manager tells her the minimum is 15 pieces of flair, then criticizes her for wearing “only” the minimum. Her understanding of the rules gets manipulated.
  • Agency: Lumbergh tells Peter, “Yeah, if you could come in on Saturday, that would be great.” Technically a request, but the power structure eliminates any real choice.

How AI Threatens Autonomy

AI can threaten autonomy in a variety of ways. Here are a few examples.

Manipulation — Like TikTok’s recommendation algorithm. It exploits cognitive vulnerabilities, creating personalized content loops that maximize engagement time. This makes it difficult for users to make autonomous decisions about their attention and time use.

Restricted choice — LinkedIn’s automated hiring tools can automatically exclude qualified candidates based on biased pattern matching. Candidates are denied opportunities without human review and lack the ability to contest the decision.

Diminished competence — Routinely outsourcing writing, problem-solving, or analysis to ChatGPT without critical engagement can lead to atrophying the very skills that make professionals valuable. Similar to how reliance on GPS erodes navigational abilities.

These are real risks, not hypothetical. But we can design AI systems to protect against these threats—and we can do more. We can design AI systems to actively promote autonomy.

A Toolkit for Designing AI for Autonomy

Here’s a provisional toolkit with two parts: one focusing on design process, the other on product features (Alfrink, 2025).

Process:

  • Reflexive design
  • Impact assessment
  • Stakeholder negotiation

Product:

  • Override mechanisms
  • Transparency
  • Non-manipulative interfaces
  • Collective autonomy support

I’ll focus on three elements that I think are most novel: relfexive design, stakeholder negotiation, and collective autonomy support.

Part 3: Application

Example: LegalMike

LegalMike is a Dutch legal AI platform that helps lawyers draft contracts, summarize case law, and so forth. It’s a perfect example to apply my framework—it uses an LLM and focuses on knowledge work.

1. Reflexive Design

The question here: what happens to “legal judgment” when AI drafts clauses? Does competence shift from “knowing how to argue” to “knowing how to prompt”?

We should map this before we start shipping.

This is new because standard UX doesn’t often ask how AI tools redefine the work itself.

2. Stakeholder Negotiation

Run workshops with juniors, partners, and clients:

  • Juniors might fear deskilling
  • Partners want quality control
  • Clients may want transparency

By running workshops like this, we make tensions visible and negotiate boundaries between stakeholders.

This is new because we have stakeholders negotiate what autonomy should look like, rather than just accept what exists.

3. Collective Autonomy Support

LegalMike could isolate, or connect. Isolating means everyone with their own AI. But we could deliberately design it to surface connections:

  • Show which partner’s work the AI drew from
  • Create prompts that encourage juniors to consult seniors
  • Show how firm expertise flows, not just individual outputs

This counters the “individual productivity” framing that dominates AI products today.

Tool → Medium

These interventions would shift LegalMike from a pure efficiency tool to a medium for collaborative legal work that preserves professional judgment, surfaces power dynamics, and strengthens collective expertise—not just individual output.

Think of LLMs not as a robot arm that automates away knowledge work tasks—like in a Korean noodle shop. Instead, it can be the robot arm that mediates collaboration between humans to produce entirely new ways of working—like in the CRTA visual identity project for the University of Zagreb.

Conclusion

AI isn’t neutral. It’s embedded in power structures. As designers, we’re not just building features—we’re brokers of autonomy.

Every design choice we make either empowers or disempowers workers. We should choose deliberately.

And seriously, watch Office Space if you haven’t seen it. It’s the best “documentary” about workplace autonomy ever made. Mike Judge understood this as early as 1999.

Designing Learning Experiences in a Post-ChatGPT World

Transcript of a talk delivered at LXDCON’25 on June 12, 2025.

My name is Kars. I am a postdoc at TU Delft. I research contestable AI—how to use design to ensure AI systems remain subject to societal control. I teach the responsible design of AI systems. In a previous life, I was a practicing designer of digital products and services. I will talk about designing learning experiences in a post-ChatGPT world.

Let’s start at this date.

This is when OpenAI released an early demo of ChatGPT. The chatbot quickly went viral on social media. Users shared examples of what it could do. Stories and samples included everything from travel planning to writing fables to coding computer programs. Within five days, the chatbot had attracted over one million users.

Fast forward to today, 2 years, 6 months, and 14 days later, we’ve seen a massive impact across domains, including on education.

For example, the article on the left talks about how AI cheating has become pervasive in higher education. It is fundamentally undermining the educational process itself. Students are using ChatGPT for nearly every assignment while educators struggle with ineffective detection methods and question whether traditional academic work has lost all meaning.

The one on the right talks about how students are accusing professors of being hypocritical. Teachers are using AI tools for things like course materials and grading while telling students they cannot use them.

What we’re looking at is a situation where academic integrity was already in question, on top of that, both students and faculty are quickly adopting AI, and institutions aren’t really ready for it.

These transformations in higher education give me pause. What should we change about how we design learning experiences given this new reality?

So, just to clarify, when I mention “AI” in this talk, I’m specifically referring to generative AI, or GenAI, and even more specifically, to chatbots that are powered by large language models, like ChatGPT.

Throughout this talk I will use this example of a learning experience that makes use of GenAI. Sharad Goel, Professor at Harvard Kennedy School, developed an AI Slackbot named “StatGPT” that aims to enhance student learning through interactive engagement.

It was tested in a statistics course with positive feedback from students. They described it as supportive and easily accessible, available anytime for student use. There are plans to implement StatGPT in various other courses. They say it assists in active problem-solving and consider it an example of how AI can facilitate learning, rather than replace it.

The debate around GenAI and learning has become polarized. I see the challenge as trying to find a balance. On one side, there’s complete skepticism about AI, and on the other, there’s this blind acceptance of it. What I propose is that we need an approach I call Conscious Adaptation: moving forward with full awareness of what’s being transformed.

To build the case for this approach, I will be looking at two common positions in the debates around AI and education. I’ll be focusing on four pieces of writing.

Two of them are by Ethan Mollick, from his blog. He’s a professor at the University of Pennsylvania specializing in innovation and entrepreneurship, known for his work on the potential of AI to transform different fields.

The other two pieces are by Ian Bogost, published at The Atlantic. He’s a media studies scholar, author, and game designer who teaches at Washington University. He’s known for his sobering, realist critiques of the impact of technology on society.

These, to me, exemplify two strands of the debate around AI in education.

Ethan Mollick’s position, in essence, is that AI in education is an inevitable transformation that educators must embrace and redesign around, not fight.

You could say Mollick is an optimist. But he is also really clear-eyed about how much disruption is going on. He even refers to it as the “Homework Apocalypse.” He talks about some serious issues: there are failures in detection, students are not learning as well (with exam performance dropping by about 17%), and there are a lot of misunderstandings about AI on both sides—students and faculty.

But his perspective is more about adapting to a tough situation. He’s always focused on solutions, constantly asking, “What can we do about this?” He believes that with thoughtful human efforts, we can really influence the outcomes positively.

On the other hand, Ian Bogost’s view is that AI has created an unsolvable crisis that’s fundamentally breaking traditional education and leaving teachers demoralized.

Bogost, I would describe as a realist. He accepts the inevitability of AI, noting that the “arms race will continue” and that technology will often outpace official policies. He also highlights the negative impact on faculty morale, the dependency of students, and the chaos in institutions.

He’s not suggesting that we should ban AI or go back to a time before it existed. He sees AI as something that might be the final blow to a profession that’s already struggling with deeper issues. At the same time, he emphasizes the need for human agency by calling out the lack of reflection and action from institutions.

So, they both observe the same reality, but they look at it differently. Mollick sees it as an engineering challenge—one that’s complicated but can be tackled with smart design. On the other hand, Bogost views it as a social issue that uncovers deeper problems that can’t just be fixed with technology.

Mollick thinks it’s possible to rebuild after a sort of collapse, while Bogost questions if the institutions that are supposed to do that rebuilding are really fit for the job.

Mollick would likely celebrate it as an example of co-intelligence. Bogost would likely ask what the rollout of the bot would be at the expense of, or what deeper problems its deployment unveils.

Getting past the conflict between these two views isn’t just about figuring out the best technical methods or the right order of solutions. The real challenge lies in our ability as institutions to make real changes, and we need to be careful that focusing on solutions doesn’t distract us from the important discussions we need to have.

I see three strategies that work together to create an approach that addresses the conflict between these two perspectives in a way that I believe will be more effective.

First, institutional realism is about designing interventions assuming institutions will resist change, capture innovations, or abandon initiatives. Given this, we could focus on individual teacher practices, learner-level tools, and changes that don’t require systemic transformation. We could treat every implementation as a diagnostic probe revealing actual (vs. stated) institutional capacity.

Second, loss-conscious innovation is about before implementing AI-enhanced practices, explicitly identifying what human learning processes, relationships, or skills are being replaced. We could develop metrics that track preservation alongside progress. We could build “conservation” components into new approaches to protect irreplaceable educational values.

Third, and finally, we should recognize that Mollick-style solution-building and Bogost-style critical analysis serve different but essential roles. Practitioners need actionable guidance; while the broader field needs diagnostic consciousness. We should avoid a false synthesis but instead maintain both approaches as distinct intellectual work that informs each other.

In short, striking a balance may not be the main focus; it’s more about taking practical actions while considering the overall context. Progress is important, but it’s also worth reflecting on what gets left behind. Conscious adaptation.

So, applying these strategies to Harvard’s chatbot, we could ask: (1) How can we create a feedback loop between an intervention like this and the things it uncovers about institutional limits, so that those can be addressed in the appropriate place? (2) How can we measure what value this bot adds for students and for teachers? What is it replacing, what is it adding, what is it making room for? (3) What critique of learning at Harvard is implied by this intervention?

What does all of this mean, finally, for LXD? This is an LXD conference, so I don’t need to spend a lot of time explaining what it is. But let’s just use this basic definition as a starting point. It’s about experiences, it’s about centering the learner, it’s about achieving learning outcomes, etc.

Comparing my conscious adaptation approach to what typifies LXD, I can see a number of alignments.

Both LXD and Conscious Adaptation prioritize authentic human engagement over efficiency. LXD through human-centered design, conscious adaptation through protecting meaningful intellectual effort from AI displacement.

LXD’s focus on holistic learning journeys aligns with both Mollick’s “effort is the point” and Bogost’s concern that AI shortcuts undermine the educational value embedded in struggle and synthesis.

LXD’s experimental, prototype-driven approach mirrors my “diagnostic pragmatism”—both treat interventions as learning opportunities that reveal what actually works rather than pursuing idealized solutions.

So, going back one final time to Harvard’s bot, an LXD practice aligned in this way would lead us to ask: (1) Is this leveraging GenAI to protect and promote genuine intellectual effort? (2) Are teachers and learners meaningfully engaged in the ongoing development of this technology? (3) Is this prototype properly embedded, so that its potential to create learning for the organization can be realized?

So, where does this leave us as learning experience designers? I see three practical imperatives for Conscious Adaptation.

First, we need to protect meaningful human effort while leveraging AI’s strengths. Remember that “the effort is the point” in learning. Rather than asking “can AI do this?”, we should ask “should it?” Harvard’s bot works because it scaffolds thinking rather than replacing it. We should use AI for feedback and iteration while preserving human work for synthesis and struggle.

Second, we must design for real institutions, not ideal ones. Institutions resist change, capture innovations, and abandon initiatives. We need to design assuming limited budgets, overworked staff, and competing priorities. Every implementation becomes a diagnostic probe that reveals what resistance actually tells us about institutional capacity.

Third, we have to recognize the limits of design. AI exposes deeper structural problems like grade obsession, teacher burnout, and test-driven curricula. You can’t design your way out of systemic issues, and sometimes the best move is recognizing when the problem isn’t experiential at all.

This is Conscious Adaptation—moving forward with eyes wide open.

Thanks.

On how to think about large language models

How should we think about large language models (LLMs)? People commonly think and talk about them in terms of human intelligence. To the extent this metaphor does not accurately reflect the properties of the technology, this may lead to misguided diagnoses and prescriptions. It seems to me an LLM is not like a human or a human brain in so many ways. One crucial distinction for me is that LLMs lack individuality and subjectivity.

What are organisms that similarly lack these qualities? Coral polyps and Portuguese man o’ war come to mind, or slime mold colonies. Or maybe a single bacterium, like an E. coli. Each is essentially identical to its clones, responds automatically to chemical gradients (bringing to mind how LLMs respond to prompts), and doesn’t accumulate unique experiences in any meaningful way.

Considering all these examples, the meme about LLMs being like a shoggoth (an amorphous blob-like monster originating from the speculative fiction of Howard Philips Lovecraft) is surprisingly accurate. The thing about these metaphors though is that it’s about as hard to reason about such organisms as it is to reason about LLMs. So to use them as a metaphor for thinking about LLMs won’t work. A shoggoth is even less helpful because the reference will only be familiar to those who know their H.P. Lovecraft.

So perhaps we should abandon metaphorical thinking and think historically instead. LLMs are a new language technology. As with previous technologies, such as the printing press, when they are introduced, our relationship to language changes. How does this change occur?

I think the change is dialectical. First, we have a relationship to language that we recognize as our own. Then, a new technology destabilizes this relationship, alienating us from the language practice. We no longer see our own hand in it. And we experience a lack of control over language practice. Finally, we reappropriate this language use in our practices. In this process of reappropriation, language practice as a whole is transformed. And the cycle begins again.

For an example of this dialectical transformation of language practice under the influence of new technology, we can take Eisenstein’s classic account of the history of the printing press (1980). Following its introduction many things changed about how we relate to language. Our engagement with language shifted from a primarily oral one to a visual and deliberative one. Libraries became more abundantly stocked, leading to the practice of categorization and classification of works. Preservation and analysis of stable texts became a possibility. The solitary reading experience gained prominence, producing a more private and personal relationship between readers and texts. Concerns about information overload first reared its head.

All of these things were once new and alien to humans. Now we consider them part of the natural order of things. They weren’t predetermined by the technology, they emerged through this active tug of war between groups in society about what the technology would be used for, mediated by the affordances of the technology itself.

In concrete material terms, what does an LLM consist of? An LLM is just numerical values stored in computer memory. It is a neural network architecture consisting of billions of parameters in weights and biases, organized in matrices. The storage is distributed across multiple devices. System software loads these parameters and enables the calculation of inferences. This all runs in physical data centers housing computing infrastructure, power, cooling, and networking infrastructure. Whenever people start talking about LLMs having agency or being able to reason, I remind myself of these basic facts.

A printing press, although a cleverly designed, engineered, and manufactured device, is similarly banal when you break it down to its essential components. Still, the ultimate changes to how we relate to language have been profound. From these first few years of living with LLMs, I think it is not unreasonable to think they will cause similar upheavals. What is important for me is to recognize how we become alienated from language, and to see ourselves as having agency in reappropriating LLM-mediated language practice as our own.

On mapping AI value chains

At CSCW 2024, back in November of last year, we* ran a workshop titled “From Stem to Stern: Contestability Along AI Value Chains.” With it, we wanted to address a gap in contestable AI research. Current work focuses mainly on contesting specific AI decisions or outputs (for example, appealing a decision made by an automated content moderation system). But we should also look at contestability across the entire AI value chain—from raw material extraction to deployment and impact (think, for example, of data center activists opposing the construction of new hyperscales). We aimed to explore how different stakeholders can contest AI systems at various points in this chain, considering issues like labor conditions, environmental impact, and data collection practices often overlooked in contestability discussions.

The workshop mixed presentations with hands-on activities. In the morning, researchers shared their work through short talks, both in person and online. The afternoon focused on mapping out where and how people can contest AI systems, from data collection to deployment, followed by detailed discussions of the practical challenges involved. We had both in-person and online participants, requiring careful coordination between facilitators. We wrapped up by synthesizing key insights and outlining future research directions.

I was responsible for being a remote facilitator most of the day. But Mireia and I also prepared and ran the first group activity, in which we mapped a typical AI value chain. I figured I might as well share the canvas we used for that here. It’s not rocket science, but it held up pretty well, so maybe some other people will get some use out of it. The canvas was designed to offer a fair bit of scaffolding for thinking through what decision points there are along the chain that are potentially value-laden.

AI value chain mapping canvas (licensed CC-BY 4.0 Mireia Yurrita & Kars Alfrink, 2024). Download PDF.

Here’s how the activity worked: We covered about 50 minutes doing a structured mapping exercise where participants identified potential contestation points along an AI value chain, using ChatGPT as an example case. The activity used a Miro board with a preliminary map showing different stages of AI development (infrastructure setup, data management, AI development, etc.). Participants first brainstormed individually for 10 minutes, adding value-laden decisions and noting stakeholders, harms, benefits, and values at stake. They then collaborated to reorganize and discuss the map for 15 minutes. The activity concluded with participants using dot voting (3 votes each) to identify the most impactful contestation sites, which were then clustered and named to feed into the next group activity.

The activity design drew from two main influences: typical value chain mapping methodologies (e.g., Mapping Actors along Value Chains, 2017), which usually emphasize tracking actors, flows, and contextual factors, and Wardley mapping (Wardley, 2022), which is characterized by the idea of a structured progression along an x-axis with an additional dimension on the y-axis.

The canvas design aimed to make AI system development more tangible by breaking it into clear phases (from infrastructure through governance) while considering visibility and materiality through the y-axis. We ultimately chose to use a familiar system (ChatGPT). This, combined with the activity’s structured approach, helped participants identify concrete opportunities for intervention and contestation along the AI value chain, which we could build on during the rest of the workshop.

I got a lot out of this workshop. Some of the key takeaways that emerged out of the activities and discussions include:

  • There’s a disconnect between legal and technical communities, from basic terminology differences to varying conceptions of key concepts like explainability, highlighting the need for translation work between disciplines.
  • We need to move beyond individual grievance models to consider collective contestation and upstream interventions in the AI supply chain.
  • We also need to shift from reactive contestation to proactive design approaches that build in contestability from the start.
  • By virtue of being hybrid, we were lucky enough to have participants from across the globe. This helped drive home to me the importance of including Global South perspectives and considering contestability beyond Western legal frameworks. We desperately need a more inclusive and globally-minded approach to AI governance.

Many thanks to all the workshop co-organizers for having me as part of the team and to Agathe and Yulu, in particular, for leading the effort.


* The full workshop team consisted of Agathe Balayn, Yulu Pi, David Gray Widder, Mireia Yurrita, Sohini Upadhyay, Naveena Karusala, Henrietta Lyons, Cagatay Turkay, Christelle Tessono, Blair Attard-Frost, Ujwal Gadiraju, and myself.

On autonomy, design, and AI

In my thesis, I use autonomy to build the normative case for contestability. It so happens that this year’s theme at the Delft Design for Values Institute is also autonomy. On October 15, 2024, I participated in a panel discussion on autonomy to kick things off. I collected some notes on autonomy that go beyond the conceptualization I used in my thesis. I thought it might be helpful and interesting to collect some of them here in adapted form.

The notes I brought included, first of all, a summary of the ecumenical conceptualization of autonomy concerning automated decision-making systems offered by Alan Rubel, Clinton Castro, and Adam Pham (2021). They conceive of autonomy as effective self-governance. To be autonomous, we need authentic beliefs about our circumstances and the agency to act on our plans. Regarding algorithmic systems, they offer this notion of a reasonable endorsement test—the degree to which a system can be said to respect autonomy depends on its reliability, the stakes of its outputs, the degree to which subjects can be held responsible for inputs, and the distribution of burdens across groups.

Second, I collected some notes from several pieces by James Muldoon, which get into notions of freedom and autonomy that were developed in socialist republican thought by the likes of Luxemburg, Kautsky, and Castoriadis (2020, 2021a, 2021b). This story of autonomy is sociopolitical rather than moral. This approach is quite appealing for someone interested in non-ideal theory in a realist mode like myself. The account of autonomy Muldoon offers is one where individual autonomy hinges on greater group autonomy and stronger bonds of association between those producing and consuming technologies. Freedom is conceived of as collective self-determination.

And then third and finally, there’s this connected idea of relational autonomy, which to a degree is part of the account offered by Rubel et al., but in the conceptions here more radical in how it seeks to create distance from liberal individualism (e.g., Christman, 2004; Mhlambi & Tiribelli, 2023; Westlund, 2009). In this, individual capacity for autonomous choice is shaped by social structures. So freedom becomes realized through networks of care, responsibility, and interdependence.

That’s what I am interested in: accounts of autonomy that are not premised on liberal individualism and that give us some alternative handle on the problem of the social control of technology in general and of AI in particular.

From my point of view, the implications of all this for design and AI include the following.

First, to make a fairly obvious but often overlooked point, the degree to which a given system impacts people’s autonomy depends on various factors. It makes little sense to make blanket statements about AI destroying our autonomy and so on.

Second, in value-sensitive design terms, you can think about autonomy as a value to be balanced against others—in the case where you take the position that all values can be considered equally important, at least in principle. Or you can consider autonomy more like a precondition for people to live with technology in concordance with their values, making autonomy take precedence over other values. The sociopolitical and relational accounts above point in this direction.

Third, suppose you buy into the radical democratic idea of technology and autonomy. In that case, it follows that it makes little sense to admonish individual designers about respecting others’ autonomy. They may be asked to privilege technologies in their designs that afford individual and group autonomy. But designers also need organization and emancipation more often than not. So it’s about building power. The power of workers inside the organizations that develop technologies and the power of communities that “consume” those same technologies. 

With AI, the fact is that, in reality, in the cases I look at, the communities that AI is brought to bear on have little say in the matter. The buyers and deployers of AI could and should be made more accountable to the people subjected to AI.

Democratizing AI Through Continuous Adaptability: The Role of DevOps

Below are the abstract and slides for my contribution to the TILTing Perspectives 2024 panel “The mutual shaping of democratic practices & AI,” moderated by Merel Noorman.

Slides

Abstract

Contestability

This presentation delves into democratizing artificial intelligence (AI) systems through contestability. Contestability refers to the ability of AI systems to remain open and responsive to disputes throughout their lifecycle. It approaches AI systems as arenas where groups compete for power over designs and outcomes.

Autonomy, democratic agency, legitimation

We identify contestability as a critical system quality for respecting people’s autonomy. This includes their democratic agency: their ability to legitimate policies. This includes policies enacted by AI systems.

For a decision to be legitimate, it must be democratically willed or rely on “normative authority.” The democratic pathway should be constrained by normative bounds to avoid arbitrariness. The appeal to authority should meet the “access constraint,” which ensures citizens can form beliefs about policies with a sufficient degree of agency (Peter, 2020 in Rubel et al., 2021).

Contestability is the quality that ensures mechanisms are in place for subjects to exercise their democratic agency. In the case of an appeal to normative authority, contestability mechanisms are how subjects and their representatives gain access to the information that will enable them to evaluate its justifiability. In this way, contestability satisfies the access constraint. In the case of democratic will, contestability-by-design practices are how system development is democratized. The autonomy account of legitimation adds the normative constraints that should bind this democratic pathway.

Himmelreich (2022) similarly argues that only a “thick” conception of democracy will address some of the current shortcomings of AI development. This is a pathway that not only allows for participation but also includes deliberation over justifications.

The agonistic arena

Elsewhere, we have proposed the Agonistic Arena as a metaphor for thinking about the democratization of AI systems (Alfrink et al., 2024). Contestable AI embodies the generative metaphor of the Arena. This metaphor characterizes public AI as a space where interlocutors embrace conflict as productive. Seen through the lens of the Arena, public AI problems stem from a need for opportunities for adversarial interaction between stakeholders.

This metaphorical framing suggests prescriptions to make more contentious and open to dispute the norms and procedures that shape:

  1. AI system design decisions on a global level, and
  2. human-AI system output decisions on a local level (i.e., individual decision outcomes), establishing new dialogical feedback loops between stakeholders that ensure continuous monitoring.

The Arena metaphor encourages a design ethos of revisability and reversibility so that AI systems embody the agonistic ideal of contingency.

Post-deployment malleability, feedback-ladenness

Unlike physical systems, AI technologies exhibit a unique malleability post-deployment.

For example, LLM chatbots optimize their performance based on a variety of feedback sources, including interactions with users, as well as feedback collected through crowd-sourced data work.

Because of this open-endedness, democratic control and oversight in the operations phase of the system’s lifecycle become a particular concern.

This is a concern because while AI systems are dynamic and feedback-laden (Gilbert et al., 2023), many of the existing oversight and control measures are static, one-off exercises that struggle to track systems as they evolve over time.

DevOps

The field of DevOps is pivotal in this context. DevOps focuses on system instrumentation for enhanced monitoring and control for continuous improvement. Typically, metrics for DevOps and their machine learning-specific MLOps offshoot emphasize technical performance and business objectives.

However, there is scope to expand these to include matters of public concern. The matters-of-concern perspective shifts the focus on issues such as fairness or discrimination, viewing them as challenges that cannot be resolved through universal methods with absolute certainty. Rather, it highlights how standards are locally negotiated within specific institutional contexts, emphasizing that such standards are never guaranteed (Lampland & Star, 2009, Geiger et al., 2023).

MLOps Metrics

In the context of machine learning systems, technical metrics focus on model accuracy. For example, a financial services company might use Area Under The Curve Receiver Operating Characteristics (AUC-ROC) to continuously monitor and maintain the performance of their fraud detection model in production.

Business metrics focus on cost-benefit analyses. For example, a bank might use a cost-benefit matrix to balance the potential revenue from approving a loan against the risk of default, ensuring that the overall profitability of their loan portfolio is optimized.

Drift

These metrics can be monitored over time to detect “drift” between a model and the world. Training sets are static. Reality is dynamic. It changes over time. Drift occurs when the nature of new input data diverges from the data a model was trained on. A change in performance metrics may be used to alert system operators, who can then investigate and decide on a course of action, e.g., retraining a model on updated data. This, in effect, creates a feedback loop between the system in use and its ongoing development.

An expansion of these practices in the interest of contestability would require:

  1. setting different metrics,
  2. exposing these metrics to additional audiences, and
  3. establishing feedback loops with the processes that govern models and the systems they are embedded in.

Example 1: Camera Cars

Let’s say a city government uses a camera-equipped vehicle and a computer vision model to detect potholes in public roads. In addition to accuracy and a favorable cost-benefit ratio, citizens, and road users in particular, may care about the time between a detected pothole and its fixing. Or, they may care about the distribution of potholes across the city. Furthermore, when road maintenance appears to be degrading, this should be taken up with department leadership, the responsible alderperson, and council members.

Example 2: EV Charching

Or, let’s say the same city government uses an algorithmic system to optimize public electric vehicle (EV) charging stations for green energy use by adapting charging speeds to expected sun and wind. EV drivers may want to know how much energy has been shifted to greener time windows and its trends. Without such visibility on a system’s actual goal achievement, citizens’ ability to legitimate its use suffers. As I have already mentioned, democratic agency, when enacted via the appeal to authority, depends on access to “normative facts” that underpin policies. And finally, professed system functionality must be demonstrated as well (Raji et al., 2022).

DevOps as sociotechnical leverage point for democratizing AI

These brief examples show that the DevOps approach is a potential sociotechnical leverage point. It offers pathways for democratizing AI system design, development, and operations.

DevOps can be adapted to further contestability. It creates new channels between human and machine actors. One of DevOps’s essential activities is monitoring (Smith, 2020), which presupposes fallibility, a necessary precondition for contestability. Finally, it requires and provides infrastructure for technical flexibility so that recovery from error is low-cost and continuous improvement becomes practically feasible.

The mutual shaping of democratic practices & AI

Zooming out further, let’s reflect on this panel’s overall theme, picking out three elements: legitimation, representation of marginalized groups, and dealing with conflict and contestation after implementation and during use.

Contestability is a lever for demanding justifications from operators, which is a necessary input for legitimation by subjects (Henin & Le Métayer, 2022). Contestability frames different actors’ stances as adversarial positions on a political field rather than “equally valid” perspectives (Scott, 2023). And finally, relations, monitoring, and revisability are all ways to give voice to and enable responsiveness to contestations (Genus & Stirling, 2018).

And again, all of these things can be furthered in the post-deployment phase by adapting the DevOps lens.

Bibliography

  • Alfrink, K., Keller, I., Kortuem, G., & Doorn, N. (2022). Contestable AI by Design: Towards a Framework. Minds and Machines33(4), 613–639. https://doi.org/10/gqnjcs
  • Alfrink, K., Keller, I., Yurrita Semperena, M., Bulygin, D., Kortuem, G., & Doorn, N. (2024). Envisioning Contestability Loops: Evaluating the Agonistic Arena as a Generative Metaphor for Public AI. She Ji: The Journal of Design, Economics, and Innovation10(1), 53–93. https://doi.org/10/gtzwft
  • Geiger, R. S., Tandon, U., Gakhokidze, A., Song, L., & Irani, L. (2023). Making Algorithms Public: Reimagining Auditing From Matters of Fact to Matters of Concern. International Journal of Communication18(0), Article 0.
  • Genus, A., & Stirling, A. (2018). Collingridge and the dilemma of control: Towards responsible and accountable innovation. Research Policy47(1), 61–69. https://doi.org/10/gcs7sn
  • Gilbert, T. K., Lambert, N., Dean, S., Zick, T., Snoswell, A., & Mehta, S. (2023). Reward Reports for Reinforcement Learning. Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, 84–130. https://doi.org/10/gs9cnh
  • Henin, C., & Le Métayer, D. (2022). Beyond explainability: Justifiability and contestability of algorithmic decision systems. AI & SOCIETY37(4), 1397–1410. https://doi.org/10/gmg8pf
  • Himmelreich, J. (2022). Against “Democratizing AI.” AI & SOCIETYhttps://doi.org/10/gr95d5
  • Lampland, M., & Star, S. L. (Eds.). (2008). Standards and Their Stories: How Quantifying, Classifying, and Formalizing Practices Shape Everyday Life (1st edition). Cornell University Press.
  • Peter, F. (2020). The Grounds of Political Legitimacy. Journal of the American Philosophical Association6(3), 372–390. https://doi.org/10/grqfhn
  • Raji, I. D., Kumar, I. E., Horowitz, A., & Selbst, A. (2022). The Fallacy of AI Functionality. 2022 ACM Conference on Fairness, Accountability, and Transparency, 959–972. https://doi.org/10/gqfvf5
  • Rubel, A., Castro, C., & Pham, A. K. (2021). Algorithms and autonomy: The ethics of automated decision systems. Cambridge University Press.
  • Scott, D. (2023). Diversifying the Deliberative Turn: Toward an Agonistic RRI. Science, Technology, & Human Values48(2), 295–318. https://doi.org/10/gpk2pr
  • Smith, J. D. (2020). Operations anti-patterns, DevOps solutions. Manning Publications.
  • Treveil, M. (2020). Introducing MLOps: How to scale machine learning in the enterprise (First edition). O’Reilly.

Design and machine learning – an annotated reading list

Earlier this year I coached Design for Interaction master students at Delft University of Technology in the course Research Methodology. The students organised three seminars for which I provided the claims and assigned reading. In the seminars they argued about my claims using the Toulmin Model of Argumentation. The readings served as sources for backing and evidence.

The claims and readings were all related to my nascent research project about machine learning. We delved into both designing for machine learning, and using machine learning as a design tool.

Below are the readings I assigned, with some notes on each, which should help you decide if you want to dive into them yourself.

Hebron, Patrick. 2016. Machine Learning for Designers. Sebastopol: O’Reilly.

The only non-academic piece in this list. This served the purpose of getting all students on the same page with regards to what machine learning is, its applications of machine learning in interaction design, and common challenges encountered. I still can’t think of any other single resource that is as good a starting point for the subject as this one.

Fiebrink, Rebecca. 2016. “Machine Learning as Meta-Instrument: Human-Machine Partnerships Shaping Expressive Instrumental Creation.” In Musical Instruments in the 21st Century, 14:137–51. Singapore: Springer Singapore. doi:10.1007/978–981–10–2951–6_10.

Fiebrink’s Wekinator is groundbreaking, fun and inspiring so I had to include some of her writing in this list. This is mostly of interest for those looking into the use of machine learning for design and other creative and artistic endeavours. An important idea explored here is that tools that make use of (interactive, supervised) machine learning can be thought of as instruments. Using such a tool is like playing or performing, exploring a possibility space, engaging in a dialogue with the tool. For a tool to feel like an instrument requires a tight action-feedback loop.

Dove, Graham, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX Design Innovation: Challenges for Working with Machine Learning as a Design Material. The 2017 CHI Conference. New York, New York, USA: ACM. doi:10.1145/3025453.3025739.

A really good survey of how designers currently deal with machine learning. Key takeaways include that in most cases, the application of machine learning is still engineering-led as opposed to design-led, which hampers the creation of non-obvious machine learning applications. It also makes it hard for designers to consider ethical implications of design choices. A key reason for this is that at the moment, prototyping with machine learning is prohibitively cumbersome.

Fiebrink, Rebecca, Perry R Cook, and Dan Trueman. 2011. “Human Model Evaluation in Interactive Supervised Learning.” In, 147. New York, New York, USA: ACM Press. doi:10.1145/1978942.1978965.

The second Fiebrink piece in this list, which is more of a deep dive into how people use Wekinator. As with the chapter listed above this is required reading for those working on design tools which make use of interactive machine learning. An important finding here is that users of intelligent design tools might have very different criteria for evaluating the ‘correctness’ of a trained model than engineers do. Such criteria are likely subjective and evaluation requires first-hand use of the model in real time.

Bostrom, Nick, and Eliezer Yudkowsky. 2014. “The Ethics of Artificial Intelligence.” In The Cambridge Handbook of Artificial Intelligence, edited by Keith Frankish and William M Ramsey, 316–34. Cambridge: Cambridge University Press. doi:10.1017/CBO9781139046855.020.

Bostrom is known for his somewhat crazy but thoughtprovoking book on superintelligence and although a large part of this chapter is about the ethics of general artificial intelligence (which at the very least is still a way out), the first section discusses the ethics of current “narrow” artificial intelligence. It makes for a good checklist of things designers should keep in mind when they create new applications of machine learning. Key insight: when a machine learning system takes on work with social dimensions—tasks previously performed by humans—the system inherits its social requirements.

Yang, Qian, John Zimmerman, Aaron Steinfeld, and Anthony Tomasic. 2016. Planning Adaptive Mobile Experiences When Wireframing. The 2016 ACM Conference. New York, New York, USA: ACM. doi:10.1145/2901790.2901858.

Finally, a feet-in-the-mud exploration of what it actually means to design for machine learning with the tools most commonly used by designers today: drawings and diagrams of various sorts. In this case the focus is on using machine learning to make an interface adaptive. It includes an interesting discussion of how to balance the use of implicit and explicit user inputs for adaptation, and how to deal with inference errors. Once again the limitations of current sketching and prototyping tools is mentioned, and related to the need for designers to develop tacit knowledge about machine learning. Such tacit knowledge will only be gained when designers can work with machine learning in a hands-on manner.

Supplemental material

Floyd, Christiane. 1984. “A Systematic Look at Prototyping.” In Approaches to Prototyping, 1–18. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978–3–642–69796–8_1.

I provided this to students so that they get some additional grounding in the various kinds of prototyping that are out there. It helps to prevent reductive notions of prototyping, and it makes for a nice complement to Buxton’s work on sketching.

Blevis, E, Y Lim, and E Stolterman. 2006. “Regarding Software as a Material of Design.”

Some of the papers refer to machine learning as a “design material” and this paper helps to understand what that idea means. Software is a material without qualities (it is extremely malleable, it can simulate nearly anything). Yet, it helps to consider it as a physical material in the metaphorical sense because we can then apply ways of design thinking and doing to software programming.

‘Machine Learning for Designers’ workshop

On Wednesday Péter Kun, Holly Robbins and myself taught a one-day workshop on machine learning at Delft University of Technology. We had about thirty master’s students from the industrial design engineering faculty. The aim was to get them acquainted with the technology through hands-on tinkering with the Wekinator as central teaching tool.

Photo credits: Holly Robbins
Photo credits: Holly Robbins

Background

The reasoning behind this workshop is twofold.

On the one hand I expect designers will find themselves working on projects involving machine learning more and more often. The technology has certain properties that differ from traditional software. Most importantly, machine learning is probabilistic in stead of deterministic. It is important that designers understand this because otherwise they are likely to make bad decisions about its application.

The second reason is that I have a strong sense machine learning can play a role in the augmentation of the design process itself. So-called intelligent design tools could make designers more efficient and effective. They could also enable the creation of designs that would otherwise be impossible or very hard to achieve.

The workshop explored both ideas.

Photo credits: Holly Robbins
Photo credits: Holly Robbins

Format

The structure was roughly as follows:

In the morning we started out providing a very broad introduction to the technology. We talked about the very basic premise of (supervised) learning. Namely, providing examples of inputs and desired outputs and training a model based on those examples. To make these concepts tangible we then introduced the Wekinator and walked the students through getting it up and running using basic examples from the website. The final step was to invite them to explore alternative inputs and outputs (such as game controllers and Arduino boards).

In the afternoon we provided a design brief, asking the students to prototype a data-enabled object with the set of tools they had acquired in the morning. We assisted with technical hurdles where necessary (of which there were more than a few) and closed out the day with demos and a group discussion reflecting on their experiences with the technology.

Photo credits: Holly Robbins
Photo credits: Holly Robbins

Results

As I tweeted on the way home that evening, the results were… interesting.

Not all groups managed to put something together in the admittedly short amount of time they were provided with. They were most often stymied by getting an Arduino to talk to the Wekinator. Max was often picked as a go-between because the Wekinator receives OSC messages over UDP, whereas the quickest way to get an Arduino to talk to a computer is over serial. But Max in my experience is a fickle beast and would more than once crap out on us.

The groups that did build something mainly assembled prototypes from the examples on hand. Which is fine, but since we were mainly working with the examples from the Wekinator website they tended towards the interactive instrument side of things. We were hoping for explorations of IoT product concepts. For that more hand-rolling was required and this was only achievable for the students on the higher end of the technical expertise spectrum (and the more tenacious ones).

The discussion yielded some interesting insights into mental models of the technology and how they are affected by hands-on experience. A comment I heard more than once was: Why is this considered learning at all? The Wekinator was not perceived to be learning anything. When challenged on this by reiterating the underlying principles it became clear the black box nature of the Wekinator hampers appreciation of some of the very real achievements of the technology. It seems (for our students at least) machine learning is stuck in a grey area between too-high expectations and too-low recognition of its capabilities.

Next steps

These results, and others, point towards some obvious improvements which can be made to the workshop format, and to teaching design students about machine learning more broadly.

  1. We can improve the toolset so that some of the heavy lifting involved with getting the various parts to talk to each other is made easier and more reliable.
  2. We can build examples that are geared towards the practice of designing IoT products and are ready for adaptation and hacking.
  3. And finally, and probably most challengingly, we can make the workings of machine learning more transparent so that it becomes easier to develop a feel for its capabilities and shortcomings.

We do intend to improve and teach the workshop again. If you’re interested in hosting one (either in an educational or professional context) let me know. And stay tuned for updates on this and other efforts to get designers to work in a hands-on manner with machine learning.

Special thanks to the brilliant Ianus Keller for connecting me to Péter and for allowing us to pilot this crazy idea at IDE Academy.

References

Sources used during preparation and running of the workshop:

  • The Wekinator – the UI is infuriatingly poor but when it comes to getting started with machine learning this tool is unmatched.
  • Arduino – I have become particularly fond of the MKR1000 board. Add a lithium-polymer battery and you have everything you need to prototype IoT products.
  • OSC for Arduino – CNMAT’s implementation of the open sound control (OSC) encoding. Key puzzle piece for getting the above two tools talking to each other.
  • Machine Learning for Designers – my preferred introduction to the technology from a designerly perspective.
  • A Visual Introduction to Machine Learning – a very accessible visual explanation of the basic underpinnings of computers applying statistical learning.
  • Remote Control Theremin – an example project I prepared for the workshop demoing how to have the Wekinator talk to an Arduino MKR1000 with OSC over UDP.

Design × AI coffee meetup

If you work in the field of design or artificial intelligence and are interested in exploring the opportunities at their intersection, consider yourself invited to an informal coffee meetup on February 15, 10am at Brix in Amsterdam.

Erik van der Pluijm and myself have for a while now been carrying on a conversation about AI and design and we felt it was time to expand the circle a bit. We are very curious who else out there shares our excitement.

Questions we are mulling over include: How does the design process change when creating intelligent products? And: How can teams collaborate with intelligent design tools to solve problems in new and interesting ways?

Anyway, lots to chew on.

No need to sign up or anything, just show up and we’ll see what happens.