Designing Learning Experiences in a Post-ChatGPT World

Transcript of a talk delivered at LXDCON’25 on June 12, 2025.

My name is Kars. I am a postdoc at TU Delft. I research contestable AI—how to use design to ensure AI systems remain subject to societal control. I teach the responsible design of AI systems. In a previous life, I was a practicing designer of digital products and services. I will talk about designing learning experiences in a post-ChatGPT world.

Let’s start at this date.

This is when OpenAI released an early demo of ChatGPT. The chatbot quickly went viral on social media. Users shared examples of what it could do. Stories and samples included everything from travel planning to writing fables to coding computer programs. Within five days, the chatbot had attracted over one million users.

Fast forward to today, 2 years, 6 months, and 14 days later, we’ve seen a massive impact across domains, including on education.

For example, the article on the left talks about how AI cheating has become pervasive in higher education. It is fundamentally undermining the educational process itself. Students are using ChatGPT for nearly every assignment while educators struggle with ineffective detection methods and question whether traditional academic work has lost all meaning.

The one on the right talks about how students are accusing professors of being hypocritical. Teachers are using AI tools for things like course materials and grading while telling students they cannot use them.

What we’re looking at is a situation where academic integrity was already in question, on top of that, both students and faculty are quickly adopting AI, and institutions aren’t really ready for it.

These transformations in higher education give me pause. What should we change about how we design learning experiences given this new reality?

So, just to clarify, when I mention “AI” in this talk, I’m specifically referring to generative AI, or GenAI, and even more specifically, to chatbots that are powered by large language models, like ChatGPT.

Throughout this talk I will use this example of a learning experience that makes use of GenAI. Sharad Goel, Professor at Harvard Kennedy School, developed an AI Slackbot named “StatGPT” that aims to enhance student learning through interactive engagement.

It was tested in a statistics course with positive feedback from students. They described it as supportive and easily accessible, available anytime for student use. There are plans to implement StatGPT in various other courses. They say it assists in active problem-solving and consider it an example of how AI can facilitate learning, rather than replace it.

The debate around GenAI and learning has become polarized. I see the challenge as trying to find a balance. On one side, there’s complete skepticism about AI, and on the other, there’s this blind acceptance of it. What I propose is that we need an approach I call Conscious Adaptation: moving forward with full awareness of what’s being transformed.

To build the case for this approach, I will be looking at two common positions in the debates around AI and education. I’ll be focusing on four pieces of writing.

Two of them are by Ethan Mollick, from his blog. He’s a professor at the University of Pennsylvania specializing in innovation and entrepreneurship, known for his work on the potential of AI to transform different fields.

The other two pieces are by Ian Bogost, published at The Atlantic. He’s a media studies scholar, author, and game designer who teaches at Washington University. He’s known for his sobering, realist critiques of the impact of technology on society.

These, to me, exemplify two strands of the debate around AI in education.

Ethan Mollick’s position, in essence, is that AI in education is an inevitable transformation that educators must embrace and redesign around, not fight.

You could say Mollick is an optimist. But he is also really clear-eyed about how much disruption is going on. He even refers to it as the “Homework Apocalypse.” He talks about some serious issues: there are failures in detection, students are not learning as well (with exam performance dropping by about 17%), and there are a lot of misunderstandings about AI on both sides—students and faculty.

But his perspective is more about adapting to a tough situation. He’s always focused on solutions, constantly asking, “What can we do about this?” He believes that with thoughtful human efforts, we can really influence the outcomes positively.

On the other hand, Ian Bogost’s view is that AI has created an unsolvable crisis that’s fundamentally breaking traditional education and leaving teachers demoralized.

Bogost, I would describe as a realist. He accepts the inevitability of AI, noting that the “arms race will continue” and that technology will often outpace official policies. He also highlights the negative impact on faculty morale, the dependency of students, and the chaos in institutions.

He’s not suggesting that we should ban AI or go back to a time before it existed. He sees AI as something that might be the final blow to a profession that’s already struggling with deeper issues. At the same time, he emphasizes the need for human agency by calling out the lack of reflection and action from institutions.

So, they both observe the same reality, but they look at it differently. Mollick sees it as an engineering challenge—one that’s complicated but can be tackled with smart design. On the other hand, Bogost views it as a social issue that uncovers deeper problems that can’t just be fixed with technology.

Mollick thinks it’s possible to rebuild after a sort of collapse, while Bogost questions if the institutions that are supposed to do that rebuilding are really fit for the job.

Mollick would likely celebrate it as an example of co-intelligence. Bogost would likely ask what the rollout of the bot would be at the expense of, or what deeper problems its deployment unveils.

Getting past the conflict between these two views isn’t just about figuring out the best technical methods or the right order of solutions. The real challenge lies in our ability as institutions to make real changes, and we need to be careful that focusing on solutions doesn’t distract us from the important discussions we need to have.

I see three strategies that work together to create an approach that addresses the conflict between these two perspectives in a way that I believe will be more effective.

First, institutional realism is about designing interventions assuming institutions will resist change, capture innovations, or abandon initiatives. Given this, we could focus on individual teacher practices, learner-level tools, and changes that don’t require systemic transformation. We could treat every implementation as a diagnostic probe revealing actual (vs. stated) institutional capacity.

Second, loss-conscious innovation is about before implementing AI-enhanced practices, explicitly identifying what human learning processes, relationships, or skills are being replaced. We could develop metrics that track preservation alongside progress. We could build “conservation” components into new approaches to protect irreplaceable educational values.

Third, and finally, we should recognize that Mollick-style solution-building and Bogost-style critical analysis serve different but essential roles. Practitioners need actionable guidance; while the broader field needs diagnostic consciousness. We should avoid a false synthesis but instead maintain both approaches as distinct intellectual work that informs each other.

In short, striking a balance may not be the main focus; it’s more about taking practical actions while considering the overall context. Progress is important, but it’s also worth reflecting on what gets left behind. Conscious adaptation.

So, applying these strategies to Harvard’s chatbot, we could ask: (1) How can we create a feedback loop between an intervention like this and the things it uncovers about institutional limits, so that those can be addressed in the appropriate place? (2) How can we measure what value this bot adds for students and for teachers? What is it replacing, what is it adding, what is it making room for? (3) What critique of learning at Harvard is implied by this intervention?

What does all of this mean, finally, for LXD? This is an LXD conference, so I don’t need to spend a lot of time explaining what it is. But let’s just use this basic definition as a starting point. It’s about experiences, it’s about centering the learner, it’s about achieving learning outcomes, etc.

Comparing my conscious adaptation approach to what typifies LXD, I can see a number of alignments.

Both LXD and Conscious Adaptation prioritize authentic human engagement over efficiency. LXD through human-centered design, conscious adaptation through protecting meaningful intellectual effort from AI displacement.

LXD’s focus on holistic learning journeys aligns with both Mollick’s “effort is the point” and Bogost’s concern that AI shortcuts undermine the educational value embedded in struggle and synthesis.

LXD’s experimental, prototype-driven approach mirrors my “diagnostic pragmatism”—both treat interventions as learning opportunities that reveal what actually works rather than pursuing idealized solutions.

So, going back one final time to Harvard’s bot, an LXD practice aligned in this way would lead us to ask: (1) Is this leveraging GenAI to protect and promote genuine intellectual effort? (2) Are teachers and learners meaningfully engaged in the ongoing development of this technology? (3) Is this prototype properly embedded, so that its potential to create learning for the organization can be realized?

So, where does this leave us as learning experience designers? I see three practical imperatives for Conscious Adaptation.

First, we need to protect meaningful human effort while leveraging AI’s strengths. Remember that “the effort is the point” in learning. Rather than asking “can AI do this?”, we should ask “should it?” Harvard’s bot works because it scaffolds thinking rather than replacing it. We should use AI for feedback and iteration while preserving human work for synthesis and struggle.

Second, we must design for real institutions, not ideal ones. Institutions resist change, capture innovations, and abandon initiatives. We need to design assuming limited budgets, overworked staff, and competing priorities. Every implementation becomes a diagnostic probe that reveals what resistance actually tells us about institutional capacity.

Third, we have to recognize the limits of design. AI exposes deeper structural problems like grade obsession, teacher burnout, and test-driven curricula. You can’t design your way out of systemic issues, and sometimes the best move is recognizing when the problem isn’t experiential at all.

This is Conscious Adaptation—moving forward with eyes wide open.

Thanks.

On how to think about large language models

How should we think about large language models (LLMs)? People commonly think and talk about them in terms of human intelligence. To the extent this metaphor does not accurately reflect the properties of the technology, this may lead to misguided diagnoses and prescriptions. It seems to me an LLM is not like a human or a human brain in so many ways. One crucial distinction for me is that LLMs lack individuality and subjectivity.

What are organisms that similarly lack these qualities? Coral polyps and Portuguese man o’ war come to mind, or slime mold colonies. Or maybe a single bacterium, like an E. coli. Each is essentially identical to its clones, responds automatically to chemical gradients (bringing to mind how LLMs respond to prompts), and doesn’t accumulate unique experiences in any meaningful way.

Considering all these examples, the meme about LLMs being like a shoggoth (an amorphous blob-like monster originating from the speculative fiction of Howard Philips Lovecraft) is surprisingly accurate. The thing about these metaphors though is that it’s about as hard to reason about such organisms as it is to reason about LLMs. So to use them as a metaphor for thinking about LLMs won’t work. A shoggoth is even less helpful because the reference will only be familiar to those who know their H.P. Lovecraft.

So perhaps we should abandon metaphorical thinking and think historically instead. LLMs are a new language technology. As with previous technologies, such as the printing press, when they are introduced, our relationship to language changes. How does this change occur?

I think the change is dialectical. First, we have a relationship to language that we recognize as our own. Then, a new technology destabilizes this relationship, alienating us from the language practice. We no longer see our own hand in it. And we experience a lack of control over language practice. Finally, we reappropriate this language use in our practices. In this process of reappropriation, language practice as a whole is transformed. And the cycle begins again.

For an example of this dialectical transformation of language practice under the influence of new technology, we can take Eisenstein’s classic account of the history of the printing press (1980). Following its introduction many things changed about how we relate to language. Our engagement with language shifted from a primarily oral one to a visual and deliberative one. Libraries became more abundantly stocked, leading to the practice of categorization and classification of works. Preservation and analysis of stable texts became a possibility. The solitary reading experience gained prominence, producing a more private and personal relationship between readers and texts. Concerns about information overload first reared its head.

All of these things were once new and alien to humans. Now we consider them part of the natural order of things. They weren’t predetermined by the technology, they emerged through this active tug of war between groups in society about what the technology would be used for, mediated by the affordances of the technology itself.

In concrete material terms, what does an LLM consist of? An LLM is just numerical values stored in computer memory. It is a neural network architecture consisting of billions of parameters in weights and biases, organized in matrices. The storage is distributed across multiple devices. System software loads these parameters and enables the calculation of inferences. This all runs in physical data centers housing computing infrastructure, power, cooling, and networking infrastructure. Whenever people start talking about LLMs having agency or being able to reason, I remind myself of these basic facts.

A printing press, although a cleverly designed, engineered, and manufactured device, is similarly banal when you break it down to its essential components. Still, the ultimate changes to how we relate to language have been profound. From these first few years of living with LLMs, I think it is not unreasonable to think they will cause similar upheavals. What is important for me is to recognize how we become alienated from language, and to see ourselves as having agency in reappropriating LLM-mediated language practice as our own.

On mapping AI value chains

At CSCW 2024, back in November of last year, we* ran a workshop titled “From Stem to Stern: Contestability Along AI Value Chains.” With it, we wanted to address a gap in contestable AI research. Current work focuses mainly on contesting specific AI decisions or outputs (for example, appealing a decision made by an automated content moderation system). But we should also look at contestability across the entire AI value chain—from raw material extraction to deployment and impact (think, for example, of data center activists opposing the construction of new hyperscales). We aimed to explore how different stakeholders can contest AI systems at various points in this chain, considering issues like labor conditions, environmental impact, and data collection practices often overlooked in contestability discussions.

The workshop mixed presentations with hands-on activities. In the morning, researchers shared their work through short talks, both in person and online. The afternoon focused on mapping out where and how people can contest AI systems, from data collection to deployment, followed by detailed discussions of the practical challenges involved. We had both in-person and online participants, requiring careful coordination between facilitators. We wrapped up by synthesizing key insights and outlining future research directions.

I was responsible for being a remote facilitator most of the day. But Mireia and I also prepared and ran the first group activity, in which we mapped a typical AI value chain. I figured I might as well share the canvas we used for that here. It’s not rocket science, but it held up pretty well, so maybe some other people will get some use out of it. The canvas was designed to offer a fair bit of scaffolding for thinking through what decision points there are along the chain that are potentially value-laden.

AI value chain mapping canvas (licensed CC-BY 4.0 Mireia Yurrita & Kars Alfrink, 2024). Download PDF.

Here’s how the activity worked: We covered about 50 minutes doing a structured mapping exercise where participants identified potential contestation points along an AI value chain, using ChatGPT as an example case. The activity used a Miro board with a preliminary map showing different stages of AI development (infrastructure setup, data management, AI development, etc.). Participants first brainstormed individually for 10 minutes, adding value-laden decisions and noting stakeholders, harms, benefits, and values at stake. They then collaborated to reorganize and discuss the map for 15 minutes. The activity concluded with participants using dot voting (3 votes each) to identify the most impactful contestation sites, which were then clustered and named to feed into the next group activity.

The activity design drew from two main influences: typical value chain mapping methodologies (e.g., Mapping Actors along Value Chains, 2017), which usually emphasize tracking actors, flows, and contextual factors, and Wardley mapping (Wardley, 2022), which is characterized by the idea of a structured progression along an x-axis with an additional dimension on the y-axis.

The canvas design aimed to make AI system development more tangible by breaking it into clear phases (from infrastructure through governance) while considering visibility and materiality through the y-axis. We ultimately chose to use a familiar system (ChatGPT). This, combined with the activity’s structured approach, helped participants identify concrete opportunities for intervention and contestation along the AI value chain, which we could build on during the rest of the workshop.

I got a lot out of this workshop. Some of the key takeaways that emerged out of the activities and discussions include:

  • There’s a disconnect between legal and technical communities, from basic terminology differences to varying conceptions of key concepts like explainability, highlighting the need for translation work between disciplines.
  • We need to move beyond individual grievance models to consider collective contestation and upstream interventions in the AI supply chain.
  • We also need to shift from reactive contestation to proactive design approaches that build in contestability from the start.
  • By virtue of being hybrid, we were lucky enough to have participants from across the globe. This helped drive home to me the importance of including Global South perspectives and considering contestability beyond Western legal frameworks. We desperately need a more inclusive and globally-minded approach to AI governance.

Many thanks to all the workshop co-organizers for having me as part of the team and to Agathe and Yulu, in particular, for leading the effort.


* The full workshop team consisted of Agathe Balayn, Yulu Pi, David Gray Widder, Mireia Yurrita, Sohini Upadhyay, Naveena Karusala, Henrietta Lyons, Cagatay Turkay, Christelle Tessono, Blair Attard-Frost, Ujwal Gadiraju, and myself.

On autonomy, design, and AI

In my thesis, I use autonomy to build the normative case for contestability. It so happens that this year’s theme at the Delft Design for Values Institute is also autonomy. On October 15, 2024, I participated in a panel discussion on autonomy to kick things off. I collected some notes on autonomy that go beyond the conceptualization I used in my thesis. I thought it might be helpful and interesting to collect some of them here in adapted form.

The notes I brought included, first of all, a summary of the ecumenical conceptualization of autonomy concerning automated decision-making systems offered by Alan Rubel, Clinton Castro, and Adam Pham (2021). They conceive of autonomy as effective self-governance. To be autonomous, we need authentic beliefs about our circumstances and the agency to act on our plans. Regarding algorithmic systems, they offer this notion of a reasonable endorsement test—the degree to which a system can be said to respect autonomy depends on its reliability, the stakes of its outputs, the degree to which subjects can be held responsible for inputs, and the distribution of burdens across groups.

Second, I collected some notes from several pieces by James Muldoon, which get into notions of freedom and autonomy that were developed in socialist republican thought by the likes of Luxemburg, Kautsky, and Castoriadis (2020, 2021a, 2021b). This story of autonomy is sociopolitical rather than moral. This approach is quite appealing for someone interested in non-ideal theory in a realist mode like myself. The account of autonomy Muldoon offers is one where individual autonomy hinges on greater group autonomy and stronger bonds of association between those producing and consuming technologies. Freedom is conceived of as collective self-determination.

And then third and finally, there’s this connected idea of relational autonomy, which to a degree is part of the account offered by Rubel et al., but in the conceptions here more radical in how it seeks to create distance from liberal individualism (e.g., Christman, 2004; Mhlambi & Tiribelli, 2023; Westlund, 2009). In this, individual capacity for autonomous choice is shaped by social structures. So freedom becomes realized through networks of care, responsibility, and interdependence.

That’s what I am interested in: accounts of autonomy that are not premised on liberal individualism and that give us some alternative handle on the problem of the social control of technology in general and of AI in particular.

From my point of view, the implications of all this for design and AI include the following.

First, to make a fairly obvious but often overlooked point, the degree to which a given system impacts people’s autonomy depends on various factors. It makes little sense to make blanket statements about AI destroying our autonomy and so on.

Second, in value-sensitive design terms, you can think about autonomy as a value to be balanced against others—in the case where you take the position that all values can be considered equally important, at least in principle. Or you can consider autonomy more like a precondition for people to live with technology in concordance with their values, making autonomy take precedence over other values. The sociopolitical and relational accounts above point in this direction.

Third, suppose you buy into the radical democratic idea of technology and autonomy. In that case, it follows that it makes little sense to admonish individual designers about respecting others’ autonomy. They may be asked to privilege technologies in their designs that afford individual and group autonomy. But designers also need organization and emancipation more often than not. So it’s about building power. The power of workers inside the organizations that develop technologies and the power of communities that “consume” those same technologies. 

With AI, the fact is that, in reality, in the cases I look at, the communities that AI is brought to bear on have little say in the matter. The buyers and deployers of AI could and should be made more accountable to the people subjected to AI.

Democratizing AI Through Continuous Adaptability: The Role of DevOps

Below are the abstract and slides for my contribution to the TILTing Perspectives 2024 panel “The mutual shaping of democratic practices & AI,” moderated by Merel Noorman.

Slides

Abstract

Contestability

This presentation delves into democratizing artificial intelligence (AI) systems through contestability. Contestability refers to the ability of AI systems to remain open and responsive to disputes throughout their lifecycle. It approaches AI systems as arenas where groups compete for power over designs and outcomes.

Autonomy, democratic agency, legitimation

We identify contestability as a critical system quality for respecting people’s autonomy. This includes their democratic agency: their ability to legitimate policies. This includes policies enacted by AI systems.

For a decision to be legitimate, it must be democratically willed or rely on “normative authority.” The democratic pathway should be constrained by normative bounds to avoid arbitrariness. The appeal to authority should meet the “access constraint,” which ensures citizens can form beliefs about policies with a sufficient degree of agency (Peter, 2020 in Rubel et al., 2021).

Contestability is the quality that ensures mechanisms are in place for subjects to exercise their democratic agency. In the case of an appeal to normative authority, contestability mechanisms are how subjects and their representatives gain access to the information that will enable them to evaluate its justifiability. In this way, contestability satisfies the access constraint. In the case of democratic will, contestability-by-design practices are how system development is democratized. The autonomy account of legitimation adds the normative constraints that should bind this democratic pathway.

Himmelreich (2022) similarly argues that only a “thick” conception of democracy will address some of the current shortcomings of AI development. This is a pathway that not only allows for participation but also includes deliberation over justifications.

The agonistic arena

Elsewhere, we have proposed the Agonistic Arena as a metaphor for thinking about the democratization of AI systems (Alfrink et al., 2024). Contestable AI embodies the generative metaphor of the Arena. This metaphor characterizes public AI as a space where interlocutors embrace conflict as productive. Seen through the lens of the Arena, public AI problems stem from a need for opportunities for adversarial interaction between stakeholders.

This metaphorical framing suggests prescriptions to make more contentious and open to dispute the norms and procedures that shape:

  1. AI system design decisions on a global level, and
  2. human-AI system output decisions on a local level (i.e., individual decision outcomes), establishing new dialogical feedback loops between stakeholders that ensure continuous monitoring.

The Arena metaphor encourages a design ethos of revisability and reversibility so that AI systems embody the agonistic ideal of contingency.

Post-deployment malleability, feedback-ladenness

Unlike physical systems, AI technologies exhibit a unique malleability post-deployment.

For example, LLM chatbots optimize their performance based on a variety of feedback sources, including interactions with users, as well as feedback collected through crowd-sourced data work.

Because of this open-endedness, democratic control and oversight in the operations phase of the system’s lifecycle become a particular concern.

This is a concern because while AI systems are dynamic and feedback-laden (Gilbert et al., 2023), many of the existing oversight and control measures are static, one-off exercises that struggle to track systems as they evolve over time.

DevOps

The field of DevOps is pivotal in this context. DevOps focuses on system instrumentation for enhanced monitoring and control for continuous improvement. Typically, metrics for DevOps and their machine learning-specific MLOps offshoot emphasize technical performance and business objectives.

However, there is scope to expand these to include matters of public concern. The matters-of-concern perspective shifts the focus on issues such as fairness or discrimination, viewing them as challenges that cannot be resolved through universal methods with absolute certainty. Rather, it highlights how standards are locally negotiated within specific institutional contexts, emphasizing that such standards are never guaranteed (Lampland & Star, 2009, Geiger et al., 2023).

MLOps Metrics

In the context of machine learning systems, technical metrics focus on model accuracy. For example, a financial services company might use Area Under The Curve Receiver Operating Characteristics (AUC-ROC) to continuously monitor and maintain the performance of their fraud detection model in production.

Business metrics focus on cost-benefit analyses. For example, a bank might use a cost-benefit matrix to balance the potential revenue from approving a loan against the risk of default, ensuring that the overall profitability of their loan portfolio is optimized.

Drift

These metrics can be monitored over time to detect “drift” between a model and the world. Training sets are static. Reality is dynamic. It changes over time. Drift occurs when the nature of new input data diverges from the data a model was trained on. A change in performance metrics may be used to alert system operators, who can then investigate and decide on a course of action, e.g., retraining a model on updated data. This, in effect, creates a feedback loop between the system in use and its ongoing development.

An expansion of these practices in the interest of contestability would require:

  1. setting different metrics,
  2. exposing these metrics to additional audiences, and
  3. establishing feedback loops with the processes that govern models and the systems they are embedded in.

Example 1: Camera Cars

Let’s say a city government uses a camera-equipped vehicle and a computer vision model to detect potholes in public roads. In addition to accuracy and a favorable cost-benefit ratio, citizens, and road users in particular, may care about the time between a detected pothole and its fixing. Or, they may care about the distribution of potholes across the city. Furthermore, when road maintenance appears to be degrading, this should be taken up with department leadership, the responsible alderperson, and council members.

Example 2: EV Charching

Or, let’s say the same city government uses an algorithmic system to optimize public electric vehicle (EV) charging stations for green energy use by adapting charging speeds to expected sun and wind. EV drivers may want to know how much energy has been shifted to greener time windows and its trends. Without such visibility on a system’s actual goal achievement, citizens’ ability to legitimate its use suffers. As I have already mentioned, democratic agency, when enacted via the appeal to authority, depends on access to “normative facts” that underpin policies. And finally, professed system functionality must be demonstrated as well (Raji et al., 2022).

DevOps as sociotechnical leverage point for democratizing AI

These brief examples show that the DevOps approach is a potential sociotechnical leverage point. It offers pathways for democratizing AI system design, development, and operations.

DevOps can be adapted to further contestability. It creates new channels between human and machine actors. One of DevOps’s essential activities is monitoring (Smith, 2020), which presupposes fallibility, a necessary precondition for contestability. Finally, it requires and provides infrastructure for technical flexibility so that recovery from error is low-cost and continuous improvement becomes practically feasible.

The mutual shaping of democratic practices & AI

Zooming out further, let’s reflect on this panel’s overall theme, picking out three elements: legitimation, representation of marginalized groups, and dealing with conflict and contestation after implementation and during use.

Contestability is a lever for demanding justifications from operators, which is a necessary input for legitimation by subjects (Henin & Le Métayer, 2022). Contestability frames different actors’ stances as adversarial positions on a political field rather than “equally valid” perspectives (Scott, 2023). And finally, relations, monitoring, and revisability are all ways to give voice to and enable responsiveness to contestations (Genus & Stirling, 2018).

And again, all of these things can be furthered in the post-deployment phase by adapting the DevOps lens.

Bibliography

  • Alfrink, K., Keller, I., Kortuem, G., & Doorn, N. (2022). Contestable AI by Design: Towards a Framework. Minds and Machines33(4), 613–639. https://doi.org/10/gqnjcs
  • Alfrink, K., Keller, I., Yurrita Semperena, M., Bulygin, D., Kortuem, G., & Doorn, N. (2024). Envisioning Contestability Loops: Evaluating the Agonistic Arena as a Generative Metaphor for Public AI. She Ji: The Journal of Design, Economics, and Innovation10(1), 53–93. https://doi.org/10/gtzwft
  • Geiger, R. S., Tandon, U., Gakhokidze, A., Song, L., & Irani, L. (2023). Making Algorithms Public: Reimagining Auditing From Matters of Fact to Matters of Concern. International Journal of Communication18(0), Article 0.
  • Genus, A., & Stirling, A. (2018). Collingridge and the dilemma of control: Towards responsible and accountable innovation. Research Policy47(1), 61–69. https://doi.org/10/gcs7sn
  • Gilbert, T. K., Lambert, N., Dean, S., Zick, T., Snoswell, A., & Mehta, S. (2023). Reward Reports for Reinforcement Learning. Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, 84–130. https://doi.org/10/gs9cnh
  • Henin, C., & Le Métayer, D. (2022). Beyond explainability: Justifiability and contestability of algorithmic decision systems. AI & SOCIETY37(4), 1397–1410. https://doi.org/10/gmg8pf
  • Himmelreich, J. (2022). Against “Democratizing AI.” AI & SOCIETYhttps://doi.org/10/gr95d5
  • Lampland, M., & Star, S. L. (Eds.). (2008). Standards and Their Stories: How Quantifying, Classifying, and Formalizing Practices Shape Everyday Life (1st edition). Cornell University Press.
  • Peter, F. (2020). The Grounds of Political Legitimacy. Journal of the American Philosophical Association6(3), 372–390. https://doi.org/10/grqfhn
  • Raji, I. D., Kumar, I. E., Horowitz, A., & Selbst, A. (2022). The Fallacy of AI Functionality. 2022 ACM Conference on Fairness, Accountability, and Transparency, 959–972. https://doi.org/10/gqfvf5
  • Rubel, A., Castro, C., & Pham, A. K. (2021). Algorithms and autonomy: The ethics of automated decision systems. Cambridge University Press.
  • Scott, D. (2023). Diversifying the Deliberative Turn: Toward an Agonistic RRI. Science, Technology, & Human Values48(2), 295–318. https://doi.org/10/gpk2pr
  • Smith, J. D. (2020). Operations anti-patterns, DevOps solutions. Manning Publications.
  • Treveil, M. (2020). Introducing MLOps: How to scale machine learning in the enterprise (First edition). O’Reilly.

Design and machine learning – an annotated reading list

Earlier this year I coached Design for Interaction master students at Delft University of Technology in the course Research Methodology. The students organised three seminars for which I provided the claims and assigned reading. In the seminars they argued about my claims using the Toulmin Model of Argumentation. The readings served as sources for backing and evidence.

The claims and readings were all related to my nascent research project about machine learning. We delved into both designing for machine learning, and using machine learning as a design tool.

Below are the readings I assigned, with some notes on each, which should help you decide if you want to dive into them yourself.

Hebron, Patrick. 2016. Machine Learning for Designers. Sebastopol: O’Reilly.

The only non-academic piece in this list. This served the purpose of getting all students on the same page with regards to what machine learning is, its applications of machine learning in interaction design, and common challenges encountered. I still can’t think of any other single resource that is as good a starting point for the subject as this one.

Fiebrink, Rebecca. 2016. “Machine Learning as Meta-Instrument: Human-Machine Partnerships Shaping Expressive Instrumental Creation.” In Musical Instruments in the 21st Century, 14:137–51. Singapore: Springer Singapore. doi:10.1007/978–981–10–2951–6_10.

Fiebrink’s Wekinator is groundbreaking, fun and inspiring so I had to include some of her writing in this list. This is mostly of interest for those looking into the use of machine learning for design and other creative and artistic endeavours. An important idea explored here is that tools that make use of (interactive, supervised) machine learning can be thought of as instruments. Using such a tool is like playing or performing, exploring a possibility space, engaging in a dialogue with the tool. For a tool to feel like an instrument requires a tight action-feedback loop.

Dove, Graham, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX Design Innovation: Challenges for Working with Machine Learning as a Design Material. The 2017 CHI Conference. New York, New York, USA: ACM. doi:10.1145/3025453.3025739.

A really good survey of how designers currently deal with machine learning. Key takeaways include that in most cases, the application of machine learning is still engineering-led as opposed to design-led, which hampers the creation of non-obvious machine learning applications. It also makes it hard for designers to consider ethical implications of design choices. A key reason for this is that at the moment, prototyping with machine learning is prohibitively cumbersome.

Fiebrink, Rebecca, Perry R Cook, and Dan Trueman. 2011. “Human Model Evaluation in Interactive Supervised Learning.” In, 147. New York, New York, USA: ACM Press. doi:10.1145/1978942.1978965.

The second Fiebrink piece in this list, which is more of a deep dive into how people use Wekinator. As with the chapter listed above this is required reading for those working on design tools which make use of interactive machine learning. An important finding here is that users of intelligent design tools might have very different criteria for evaluating the ‘correctness’ of a trained model than engineers do. Such criteria are likely subjective and evaluation requires first-hand use of the model in real time.

Bostrom, Nick, and Eliezer Yudkowsky. 2014. “The Ethics of Artificial Intelligence.” In The Cambridge Handbook of Artificial Intelligence, edited by Keith Frankish and William M Ramsey, 316–34. Cambridge: Cambridge University Press. doi:10.1017/CBO9781139046855.020.

Bostrom is known for his somewhat crazy but thoughtprovoking book on superintelligence and although a large part of this chapter is about the ethics of general artificial intelligence (which at the very least is still a way out), the first section discusses the ethics of current “narrow” artificial intelligence. It makes for a good checklist of things designers should keep in mind when they create new applications of machine learning. Key insight: when a machine learning system takes on work with social dimensions—tasks previously performed by humans—the system inherits its social requirements.

Yang, Qian, John Zimmerman, Aaron Steinfeld, and Anthony Tomasic. 2016. Planning Adaptive Mobile Experiences When Wireframing. The 2016 ACM Conference. New York, New York, USA: ACM. doi:10.1145/2901790.2901858.

Finally, a feet-in-the-mud exploration of what it actually means to design for machine learning with the tools most commonly used by designers today: drawings and diagrams of various sorts. In this case the focus is on using machine learning to make an interface adaptive. It includes an interesting discussion of how to balance the use of implicit and explicit user inputs for adaptation, and how to deal with inference errors. Once again the limitations of current sketching and prototyping tools is mentioned, and related to the need for designers to develop tacit knowledge about machine learning. Such tacit knowledge will only be gained when designers can work with machine learning in a hands-on manner.

Supplemental material

Floyd, Christiane. 1984. “A Systematic Look at Prototyping.” In Approaches to Prototyping, 1–18. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978–3–642–69796–8_1.

I provided this to students so that they get some additional grounding in the various kinds of prototyping that are out there. It helps to prevent reductive notions of prototyping, and it makes for a nice complement to Buxton’s work on sketching.

Blevis, E, Y Lim, and E Stolterman. 2006. “Regarding Software as a Material of Design.”

Some of the papers refer to machine learning as a “design material” and this paper helps to understand what that idea means. Software is a material without qualities (it is extremely malleable, it can simulate nearly anything). Yet, it helps to consider it as a physical material in the metaphorical sense because we can then apply ways of design thinking and doing to software programming.

‘Machine Learning for Designers’ workshop

On Wednesday Péter Kun, Holly Robbins and myself taught a one-day workshop on machine learning at Delft University of Technology. We had about thirty master’s students from the industrial design engineering faculty. The aim was to get them acquainted with the technology through hands-on tinkering with the Wekinator as central teaching tool.

Photo credits: Holly Robbins
Photo credits: Holly Robbins

Background

The reasoning behind this workshop is twofold.

On the one hand I expect designers will find themselves working on projects involving machine learning more and more often. The technology has certain properties that differ from traditional software. Most importantly, machine learning is probabilistic in stead of deterministic. It is important that designers understand this because otherwise they are likely to make bad decisions about its application.

The second reason is that I have a strong sense machine learning can play a role in the augmentation of the design process itself. So-called intelligent design tools could make designers more efficient and effective. They could also enable the creation of designs that would otherwise be impossible or very hard to achieve.

The workshop explored both ideas.

Photo credits: Holly Robbins
Photo credits: Holly Robbins

Format

The structure was roughly as follows:

In the morning we started out providing a very broad introduction to the technology. We talked about the very basic premise of (supervised) learning. Namely, providing examples of inputs and desired outputs and training a model based on those examples. To make these concepts tangible we then introduced the Wekinator and walked the students through getting it up and running using basic examples from the website. The final step was to invite them to explore alternative inputs and outputs (such as game controllers and Arduino boards).

In the afternoon we provided a design brief, asking the students to prototype a data-enabled object with the set of tools they had acquired in the morning. We assisted with technical hurdles where necessary (of which there were more than a few) and closed out the day with demos and a group discussion reflecting on their experiences with the technology.

Photo credits: Holly Robbins
Photo credits: Holly Robbins

Results

As I tweeted on the way home that evening, the results were… interesting.

Not all groups managed to put something together in the admittedly short amount of time they were provided with. They were most often stymied by getting an Arduino to talk to the Wekinator. Max was often picked as a go-between because the Wekinator receives OSC messages over UDP, whereas the quickest way to get an Arduino to talk to a computer is over serial. But Max in my experience is a fickle beast and would more than once crap out on us.

The groups that did build something mainly assembled prototypes from the examples on hand. Which is fine, but since we were mainly working with the examples from the Wekinator website they tended towards the interactive instrument side of things. We were hoping for explorations of IoT product concepts. For that more hand-rolling was required and this was only achievable for the students on the higher end of the technical expertise spectrum (and the more tenacious ones).

The discussion yielded some interesting insights into mental models of the technology and how they are affected by hands-on experience. A comment I heard more than once was: Why is this considered learning at all? The Wekinator was not perceived to be learning anything. When challenged on this by reiterating the underlying principles it became clear the black box nature of the Wekinator hampers appreciation of some of the very real achievements of the technology. It seems (for our students at least) machine learning is stuck in a grey area between too-high expectations and too-low recognition of its capabilities.

Next steps

These results, and others, point towards some obvious improvements which can be made to the workshop format, and to teaching design students about machine learning more broadly.

  1. We can improve the toolset so that some of the heavy lifting involved with getting the various parts to talk to each other is made easier and more reliable.
  2. We can build examples that are geared towards the practice of designing IoT products and are ready for adaptation and hacking.
  3. And finally, and probably most challengingly, we can make the workings of machine learning more transparent so that it becomes easier to develop a feel for its capabilities and shortcomings.

We do intend to improve and teach the workshop again. If you’re interested in hosting one (either in an educational or professional context) let me know. And stay tuned for updates on this and other efforts to get designers to work in a hands-on manner with machine learning.

Special thanks to the brilliant Ianus Keller for connecting me to Péter and for allowing us to pilot this crazy idea at IDE Academy.

References

Sources used during preparation and running of the workshop:

  • The Wekinator – the UI is infuriatingly poor but when it comes to getting started with machine learning this tool is unmatched.
  • Arduino – I have become particularly fond of the MKR1000 board. Add a lithium-polymer battery and you have everything you need to prototype IoT products.
  • OSC for Arduino – CNMAT’s implementation of the open sound control (OSC) encoding. Key puzzle piece for getting the above two tools talking to each other.
  • Machine Learning for Designers – my preferred introduction to the technology from a designerly perspective.
  • A Visual Introduction to Machine Learning – a very accessible visual explanation of the basic underpinnings of computers applying statistical learning.
  • Remote Control Theremin – an example project I prepared for the workshop demoing how to have the Wekinator talk to an Arduino MKR1000 with OSC over UDP.

Design × AI coffee meetup

If you work in the field of design or artificial intelligence and are interested in exploring the opportunities at their intersection, consider yourself invited to an informal coffee meetup on February 15, 10am at Brix in Amsterdam.

Erik van der Pluijm and myself have for a while now been carrying on a conversation about AI and design and we felt it was time to expand the circle a bit. We are very curious who else out there shares our excitement.

Questions we are mulling over include: How does the design process change when creating intelligent products? And: How can teams collaborate with intelligent design tools to solve problems in new and interesting ways?

Anyway, lots to chew on.

No need to sign up or anything, just show up and we’ll see what happens.

High-skill robots, low-skill workers

Some notes on what I think I understand about technology and inequality.

Let’s start with an obvious big question: is technology destroying jobs faster than they can be replaced? On the long term the evidence isn’t strong. Humans always appear to invent new things to do. There is no reason this time around should be any different.

But in the short term technology has contributed to an evaporation of mid-skilled jobs. Parts of these jobs are automated entirely, parts can be done by fewer people because of higher productivity gained from tech.

While productivity continues to grow, jobs are lagging behind. The year 2000 appears to have been a turning point. “Something” happened around that time. But no-one knows exactly what.

My hunch is that we’ve seen an emergence of a new class of pseudo-monopolies. Oligopolies. And this is compounded by a ‘winner takes all’ dynamic that technology seems to produce.

Others have pointed to globalisation but although this might be a contributing factor, the evidence does not support the idea that it is the major cause.

So what are we left with?

Historically, looking at previous technological upsets, it appears education makes a big difference. People negatively affected by technological progress should have access to good education so that they have options. In the US the access to high quality education is not equally divided.

Apparently family income is associated with educational achievement. So if your family is rich, you are more likely to become a high skilled individual. And high skilled individuals are privileged by the tech economy.

And if Piketty’s is right, we are approaching a reality in which money made from wealth rises faster than wages. So there is a feedback loop in place which only exacerbates the situation.

One more bullet: If you think trickle-down economics, increasing the size of the pie will help, you might be mistaken. It appears social mobility is helped more by decreasing inequality in the distribution of income growth.

So some preliminary conclusions: a progressive tax on wealth won’t solve the issue. The education system will require reform, too.

I think this is the central irony of the whole situation: we are working hard to teach machines how to learn. But we are neglecting to improve how people learn.

Move 37

Designers make choices. They should be able to provide rationales for those choices. (Although sometimes they can’t.) Being able to explain the thinking that went into a design move to yourself, your teammates and clients is part of being a professional.

Move 37. This was the move AlphaGo made which took everyone by surprise because it appeared so wrong at first.

The interesting thing is that in hindsight it appeared AlphaGo had good reasons for this move. Based on a calculation of odds, basically.

If asked at the time, would AlphaGo have been able to provide this rationale?

It’s a thing that pops up in a lot of the reading I am doing around AI. This idea of transparency. In some fields you don’t just want an AI to provide you with a decision, but also with the arguments supporting that decision. Obvious examples would include a system that helps diagnose disease. You want it to provide more than just the diagnosis. Because if it turns out to be wrong, you want to be able to say why at the time you thought it was right. This is a social, cultural and also legal requirement.

It’s interesting.

Although lives don’t depend on it, the same might apply to intelligent design tools. If I am working with a system and it is offering me design directions or solutions, I want to know why it is suggesting these things as well. Because my reason for picking one over the other depends not just on the surface level properties of the design but also the underlying reasons. It might be important because I need to be able to tell stakeholders about it.

An added side effect of this is that a designer working with such a system is be exposed to machine reasoning about design choices. This could inform their own future thinking too.

Transparent AI might help people improve themselves. A black box can’t teach you much about the craft it’s performing. Looking at outcomes can be inspirational or helpful, but the processes that lead up to them can be equally informative. If not more so.

Imagine working with an intelligent design tool and getting the equivalent of an AlphaGo move 37 moment. Hugely inspirational. Game changer.

This idea gets me much more excited than automating design tasks does.