On how to think about large language models

How should we think about large lan­guage mod­els (LLMs)? Peo­ple com­mon­ly think and talk about them in terms of human intel­li­gence. To the extent this metaphor does not accu­rate­ly reflect the prop­er­ties of the tech­nol­o­gy, this may lead to mis­guid­ed diag­noses and pre­scrip­tions. It seems to me an LLM is not like a human or a human brain in so many ways. One cru­cial dis­tinc­tion for me is that LLMs lack indi­vid­u­al­i­ty and subjectivity.

What are organ­isms that sim­i­lar­ly lack these qual­i­ties? Coral polyps and Por­tuguese man o’ war come to mind, or slime mold colonies. Or maybe a sin­gle bac­teri­um, like an E. coli. Each is essen­tial­ly iden­ti­cal to its clones, responds auto­mat­i­cal­ly to chem­i­cal gra­di­ents (bring­ing to mind how LLMs respond to prompts), and does­n’t accu­mu­late unique expe­ri­ences in any mean­ing­ful way. 

Con­sid­er­ing all these exam­ples, the meme about LLMs being like a shog­goth (an amor­phous blob-like mon­ster orig­i­nat­ing from the spec­u­la­tive fic­tion of Howard Philips Love­craft) is sur­pris­ing­ly accu­rate. The thing about these metaphors though is that it’s about as hard to rea­son about such organ­isms as it is to rea­son about LLMs. So to use them as a metaphor for think­ing about LLMs won’t work. A shog­goth is even less help­ful because the ref­er­ence will only be famil­iar to those who know their H.P. Lovecraft.

So per­haps we should aban­don metaphor­i­cal think­ing and think his­tor­i­cal­ly instead. LLMs are a new lan­guage tech­nol­o­gy. As with pre­vi­ous tech­nolo­gies, such as the print­ing press, when they are intro­duced, our rela­tion­ship to lan­guage changes. How does this change occur?

I think the change is dialec­ti­cal. First, we have a rela­tion­ship to lan­guage that we rec­og­nize as our own. Then, a new tech­nol­o­gy desta­bi­lizes this rela­tion­ship, alien­at­ing us from the lan­guage prac­tice. We no longer see our own hand in it. And we expe­ri­ence a lack of con­trol over lan­guage prac­tice. Final­ly, we reap­pro­pri­ate this lan­guage use in our prac­tices. In this process of reap­pro­pri­a­tion, lan­guage prac­tice as a whole is trans­formed. And the cycle begins again.

For an exam­ple of this dialec­ti­cal trans­for­ma­tion of lan­guage prac­tice under the influ­ence of new tech­nol­o­gy, we can take Eisenstein’s clas­sic account of the his­to­ry of the print­ing press (1980). Fol­low­ing its intro­duc­tion many things changed about how we relate to lan­guage. Our engage­ment with lan­guage shift­ed from a pri­mar­i­ly oral one to a visu­al and delib­er­a­tive one. Libraries became more abun­dant­ly stocked, lead­ing to the prac­tice of cat­e­go­riza­tion and clas­si­fi­ca­tion of works. Preser­va­tion and analy­sis of sta­ble texts became a pos­si­bil­i­ty. The soli­tary read­ing expe­ri­ence gained promi­nence, pro­duc­ing a more pri­vate and per­son­al rela­tion­ship between read­ers and texts. Con­cerns about infor­ma­tion over­load first reared its head. All of these things were once new and alien to humans. Now we con­sid­er them part of the nat­ur­al order of things. They weren’t pre­de­ter­mined by the tech­nol­o­gy, they emerged through this active tug of war between groups in soci­ety about what the tech­nol­o­gy would be used for, medi­at­ed by the affor­dances of the tech­nol­o­gy itself.

In con­crete mate­r­i­al terms, what does an LLM con­sist of? An LLM is just numer­i­cal val­ues stored in com­put­er mem­o­ry. It is a neur­al net­work archi­tec­ture con­sist­ing of bil­lions of para­me­ters in weights and bias­es, orga­nized in matri­ces. The stor­age is dis­trib­uted across mul­ti­ple devices. Sys­tem soft­ware loads these para­me­ters and enables the cal­cu­la­tion of infer­ences. This all runs in phys­i­cal data cen­ters hous­ing com­put­ing infra­struc­ture, pow­er, cool­ing, and net­work­ing infra­struc­ture. When­ev­er peo­ple start talk­ing about LLMs hav­ing agency or being able to rea­son, I remind myself of these basic facts.

A print­ing press, although a clev­er­ly designed, engi­neered, and man­u­fac­tured device, is sim­i­lar­ly banal when you break it down to its essen­tial com­po­nents. Still, the ulti­mate changes to how we relate to lan­guage have been pro­found. From these first few years of liv­ing with LLMs, I think it is not unrea­son­able to think they will cause sim­i­lar upheavals. What is impor­tant for me is to rec­og­nize how we become alien­at­ed from lan­guage, and to see our­selves as hav­ing agency in reap­pro­pri­at­ing LLM-medi­at­ed lan­guage prac­tice as our own.

Waiting for the smart city

Nowa­days when we talk about the smart city we don’t nec­es­sar­i­ly talk about smart­ness or cities.

I feel like when the term is used it often obscures more than it reveals. 

Here a few rea­sons why. 

To begin with, the term sug­gests some­thing that is yet to arrive. Some kind of tech-enabled utopia. But actu­al­ly, cur­rent day cities are already smart to a greater or less­er degree depend­ing on where and how you look.

This is impor­tant because too often we post­pone action as we wait for the smart city to arrive. We don’t have to wait. We can act to improve things right now.

Fur­ther­more, ‘smart city’ sug­gests some­thing mono­lith­ic that can be designed as a whole. But a smart city, like any city, is a huge mess of inter­con­nect­ed things. It resists top­down design. 

His­to­ry is lit­tered with failed attempts at author­i­tar­i­an high-mod­ernist city design. Just stop it.

Smart­ness should not be an end but a means. 

I read ‘smart’ as a short­hand for ‘tech­no­log­i­cal­ly aug­ment­ed’. A smart city is a city eat­en by soft­ware. All cities are being eat­en (or have been eat­en) by soft­ware to a greater or less­er extent. Uber and Airbnb are obvi­ous exam­ples. Small­er more sub­tle ones abound.

The ques­tion is, smart to what end? Effi­cien­cy? Leg­i­bil­i­ty? Con­trol­la­bil­i­ty? Anti-fragili­ty? Playa­bil­i­ty? Live­abil­i­ty? Sus­tain­abil­i­ty? The answer depends on your outlook.

These are ways in which the smart city label obscures. It obscures agency. It obscures net­works. It obscures intent.

I’m not say­ing don’t ever use it. But in many cas­es you can get by with­out it. You can talk about spe­cif­ic parts that make up the whole of a city, spe­cif­ic tech­nolo­gies and spe­cif­ic aims. 


Post­script 1

We can do the same exer­cise with the ‘city’ part of the meme. 

The same process that is mak­ing cities smart (soft­ware eat­ing the world) is also mak­ing every­thing else smart. Smart towns. Smart coun­try­sides. The ends are dif­fer­ent. The net­works are dif­fer­ent. The process­es play out in dif­fer­ent ways.

It’s okay to think about cities but don’t think they have a monop­oly on ‘dis­rup­tion’.

Post­script 2

Some of this inspired by clever things I heard Sebas­t­ian Quack say at Play­ful Design for Smart Cities and Usman Haque at ThingsCon Ams­ter­dam.

Reboot 9.0 day 1

So here’s a short wrap up of the first day. I must say I’m not dis­ap­point­ed so far. The over­all lev­el of the talks is quite high again. Here’s what I attended:

Open­ing keynote — Nice and conceptual/theoretical. Not sure I agree with all the claims made but it was a good way to kick off the day on a gee whizz way.

Jere­my Kei­th — Good talk, nice slides, did­n’t real­ly deliv­er on the promise of his pro­pos­al though. I would’ve real­ly liked to see him go into the whole idea of life streams fur­ther. The hack day chal­lenge sound­ed cool though.

Stephanie Booth — Very top­i­cal for me, being a bilin­gual blog­ger and design­er often con­front­ed with localisation/multilingual issues. 

My own talk — Went rea­son­ably well. I guess half of the room enjoyed and the oth­er half won­dered what the f*** I was talk­ing about. Oh well, I had fun.

Ross May­field — Could have been much bet­ter if it had­n’t been for tech­ni­cal screw-ups and per­haps some tighter pac­ing by Ross. Still the work he’s doing with social soft­ware is great.

Matt Jones — Very pret­ty pre­sen­ta­tion, nice top­ic and Dopplr looks cool. I’m not a fre­quent fly­er but I can see the val­ue in it. Still not quite sure it will improve the con­se­quences of air-trav­el though. 

Nico­las Nova — Came across as the high con­cept, the­o­ret­i­cal twin to my talk. Lots of cool per­va­sive game exam­ples. Nico­las always bog­gles my mind.

Jyri Engeström — Cool to see how he’s devel­oped his talks through­out the past Reboots. I guess he deliv­ered on his promise and stayed on the right side of the ‘I’m push­ing my prod­uct’ line.

The evening pro­gram — No micro-pre­sen­ta­tions (which to be hon­est was fine by me, being quite exhaust­ed). Good food, nice con­ver­sa­tions and plen­ty of weird gen­er­a­tive art, live cin­e­ma etc. All good.

On to day 2!