On how to think about large language models

How should we think about large lan­guage mod­els (LLMs)? Peo­ple com­mon­ly think and talk about them in terms of human intel­li­gence. To the extent this metaphor does not accu­rate­ly reflect the prop­er­ties of the tech­nol­o­gy, this may lead to mis­guid­ed diag­noses and pre­scrip­tions. It seems to me an LLM is not like a human or a human brain in so many ways. One cru­cial dis­tinc­tion for me is that LLMs lack indi­vid­u­al­i­ty and subjectivity.

What are organ­isms that sim­i­lar­ly lack these qual­i­ties? Coral polyps and Por­tuguese man o’ war come to mind, or slime mold colonies. Or maybe a sin­gle bac­teri­um, like an E. coli. Each is essen­tial­ly iden­ti­cal to its clones, responds auto­mat­i­cal­ly to chem­i­cal gra­di­ents (bring­ing to mind how LLMs respond to prompts), and does­n’t accu­mu­late unique expe­ri­ences in any mean­ing­ful way. 

Con­sid­er­ing all these exam­ples, the meme about LLMs being like a shog­goth (an amor­phous blob-like mon­ster orig­i­nat­ing from the spec­u­la­tive fic­tion of Howard Philips Love­craft) is sur­pris­ing­ly accu­rate. The thing about these metaphors though is that it’s about as hard to rea­son about such organ­isms as it is to rea­son about LLMs. So to use them as a metaphor for think­ing about LLMs won’t work. A shog­goth is even less help­ful because the ref­er­ence will only be famil­iar to those who know their H.P. Lovecraft.

So per­haps we should aban­don metaphor­i­cal think­ing and think his­tor­i­cal­ly instead. LLMs are a new lan­guage tech­nol­o­gy. As with pre­vi­ous tech­nolo­gies, such as the print­ing press, when they are intro­duced, our rela­tion­ship to lan­guage changes. How does this change occur?

I think the change is dialec­ti­cal. First, we have a rela­tion­ship to lan­guage that we rec­og­nize as our own. Then, a new tech­nol­o­gy desta­bi­lizes this rela­tion­ship, alien­at­ing us from the lan­guage prac­tice. We no longer see our own hand in it. And we expe­ri­ence a lack of con­trol over lan­guage prac­tice. Final­ly, we reap­pro­pri­ate this lan­guage use in our prac­tices. In this process of reap­pro­pri­a­tion, lan­guage prac­tice as a whole is trans­formed. And the cycle begins again.

For an exam­ple of this dialec­ti­cal trans­for­ma­tion of lan­guage prac­tice under the influ­ence of new tech­nol­o­gy, we can take Eisenstein’s clas­sic account of the his­to­ry of the print­ing press (1980). Fol­low­ing its intro­duc­tion many things changed about how we relate to lan­guage. Our engage­ment with lan­guage shift­ed from a pri­mar­i­ly oral one to a visu­al and delib­er­a­tive one. Libraries became more abun­dant­ly stocked, lead­ing to the prac­tice of cat­e­go­riza­tion and clas­si­fi­ca­tion of works. Preser­va­tion and analy­sis of sta­ble texts became a pos­si­bil­i­ty. The soli­tary read­ing expe­ri­ence gained promi­nence, pro­duc­ing a more pri­vate and per­son­al rela­tion­ship between read­ers and texts. Con­cerns about infor­ma­tion over­load first reared its head. All of these things were once new and alien to humans. Now we con­sid­er them part of the nat­ur­al order of things. They weren’t pre­de­ter­mined by the tech­nol­o­gy, they emerged through this active tug of war between groups in soci­ety about what the tech­nol­o­gy would be used for, medi­at­ed by the affor­dances of the tech­nol­o­gy itself.

In con­crete mate­r­i­al terms, what does an LLM con­sist of? An LLM is just numer­i­cal val­ues stored in com­put­er mem­o­ry. It is a neur­al net­work archi­tec­ture con­sist­ing of bil­lions of para­me­ters in weights and bias­es, orga­nized in matri­ces. The stor­age is dis­trib­uted across mul­ti­ple devices. Sys­tem soft­ware loads these para­me­ters and enables the cal­cu­la­tion of infer­ences. This all runs in phys­i­cal data cen­ters hous­ing com­put­ing infra­struc­ture, pow­er, cool­ing, and net­work­ing infra­struc­ture. When­ev­er peo­ple start talk­ing about LLMs hav­ing agency or being able to rea­son, I remind myself of these basic facts.

A print­ing press, although a clev­er­ly designed, engi­neered, and man­u­fac­tured device, is sim­i­lar­ly banal when you break it down to its essen­tial com­po­nents. Still, the ulti­mate changes to how we relate to lan­guage have been pro­found. From these first few years of liv­ing with LLMs, I think it is not unrea­son­able to think they will cause sim­i­lar upheavals. What is impor­tant for me is to rec­og­nize how we become alien­at­ed from lan­guage, and to see our­selves as hav­ing agency in reap­pro­pri­at­ing LLM-medi­at­ed lan­guage prac­tice as our own.

Spatial metaphors in IA and game design

Look­ing at dom­i­nant metaphors in dif­fer­ent design dis­ci­plines I’m in some way involved in, it’s obvi­ous to me that most are spa­tial (no sur­pris­es there). Here’s some thoughts on how I think this is (or should be) chang­ing. Infor­ma­tion archi­tec­ture tends to approach sites as infor­ma­tion spaces (although the web 2.0 hype has brought us a few ‘new’ ones, on which more lat­er.) I do a lot of IA work. I have done quite a bit of game design (and am re-enter­ing that field as a teacher now.) Some of the design­ers in that field I admire the most (such as Molyneux and Wright) approach games from a more or less spa­tial stand­point too (and not a nar­ra­tive per­spec­tive, like the vast major­i­ty do). I think it was Molyneux who said games are a series of inter­est­ing choic­es. Wright tends to call games ‘pos­si­bil­i­ty spaces’, where a play­er can explore a num­ber of dif­fer­ent solu­tions to a prob­lem, more than one of which can be viable. 

I don’t think I’m going any­where in par­tic­u­lar here, but when look­ing at IA again, as I just said, the field is cur­rent­ly com­ing to terms with new ways of look­ing at the web and web sites; the web as a net­work, web as plat­form, the web of data, and so on. Some of these might ben­e­fit from a more pro­ce­dur­al, i.e. game design-like, stance. I seem to remem­ber Jesse James Gar­rett giv­ing quite some atten­tion to what he calls ‘algo­rith­mic archi­tec­ture’ (using Ama­zon as an exam­ple) where the IA is actu­al­ly cre­at­ing some­thing akin to a pos­si­bil­i­ty space for the user to explore.

Per­haps when we see more cross-pol­li­na­tion between game design and infor­ma­tion archi­tec­ture and inter­ac­tion design for the web, we’ll end up with more and more sites that are not only more like desk­top appli­ca­tions (the promise of RIA’s) but also more like games. Would­n’t that be fun and interesting?