5 min read

structure is good / 2023-03-12

Focusing on some key themes, via deployment on laptops, government algorithms, a blog post from Anthropic, an intro to foundation models, and more.
Foreground: a pile of lumber, vaguely structured in the shape of a house. Background: blue skies, trees.
Stable Diffusion tried to make a structure. Still a work in progress.

In one of my earliest posts to the newsletter, I tried out a structure based on the key topic areas in the “openish” essay. I’m trying that again this week; hopefully it helps identify key themes (like values, transparency, etc.) as we learn together what’s important in “open” and how we can make those important things real in ML.

Events

I have two more speaking appearances coming up that'll be announced this week; will share here next week. In the mean time, let me know if there are any more relevant events you'd like me to share!

Values

Lowering barriers to entry

Simon Willison (@simon@simonwillison.net)
It’s now possible to run a genuinely interesting large language model on a consumer laptop I thought it would be at least another year or two before we got there, if not longer
  • Simon Willison has gotten the LLaMA model released last week running on a consumer laptop. Model use is quickly becoming a consumer-level endeavor, in part because the frameworks around them are open and so their performance can be improved in classically open fashion. Here, that’s via a c++ port optimized for lesser hardware, called llama.cpp and seeing active development. (Note, as discussed last week, that the LLaMA model is not particularly open, but it is available for download, and the implementing code is openly-licensed.)
  • NVidia is publishing open(ish) models, presumably to help drive demand for their hardware. (The licenses are non-commercial, with commercial licensing options.)

Legibility: government algorithms

This Wired article about algorithms is not about machine learning/AI, but it is nevertheless a necessary read for everyone interested in how government will govern algorithms—and how governments are, themselves, increasingly governed by algorithms.

The short version: it’s horrifying. Bad data in, bad processing of the data, leading to decisions that are very consequential for people’s lives.

Wired was able to do this deep dive because of Dutch freedom of information laws, allowing them access to the actual code of the system—unlike in the US, where companies building similar algorithms have fought vigorously against any transparency efforts.

Ethical outcomes: Anthropic on AI Safety

Anthropic has a very long, but very interesting, post on how it thinks about “Core Views on AI Safety”. A few takeaways for me:

  • Multiple approaches, but not transparent? Anthropic stresses that safety research must be “multi-faceted”, with many different tools and approaches. This feels similar to me to early open, when the whole industry (often led by "open") were realizing that there was no One True Way to drive software quality. This was one of the reasons transparency became so important to open—we needed the flexibility to use many different tools to reach quality. Anthropic does not seem to reach the same conclusion, yet—the presumption running through the paper seems to be that Anthropic will pursue these many routes themselves.
  • Practice, not theory: The paper stresses the importance of “empirical” safety work, i.e., work done by testing actual functioning models, rather than through theory. This reminds me again of the development of software development, where academics pursued formal models of correctness, while practitioners favored more applied techniques like test driven development. But I can't help hear echoes of the recent success of Rust—which adopted a bunch of academic techniques, made them easy to use, and then became popular because the "practical" techniques weren't working very well.
  • Scale matters, so cost matters: Anthropic (like many others) observe that many of the most interesting (and scary) behaviors of AI emerge only “at scale”—meaning that extremely expensive, extremely centralized models are necessary subjects for the highest-quality safety research. (Perhaps relatedly, Anthropic raised a $580M Series B last year.) It will be interesting to see if AI "safety" bifurcates into techniques that for the biggest models and for smaller models, or if the techniques that work on the biggest models can reliably trickle down to smaller (and cheaper, more distributed) models.

Power: intro to foundation models

One topic I haven’t discussed much in the newsletter, but probably should have hit on earlier, is the notion of “foundation models”—models that are powerful and designed to be tuned by others; i.e., to be a “foundation” that others build on.

The notion that ML will center around a handful of foundation models cuts in a few ways: on the one hand, it is very open(ish) in that it could enable more innovation by smaller, more independent parties; and on the other hand it makes those smaller parties very dependent on the quality (and failures) of that model—and makes those who train and license those models extremely powerful, in ways that traditional open has tended to want to counteract.

A few things brought this up this week.

  • Homogeneity is a challenge: This November paper provides a scholarly, but good, overview of some of the problems with foundation models. They frame it as homogeneity (“everyone gets the same outcomes”) but for me it brought back memories of log4j—everyone using the same “library” means everyone can be vulnerable to the same problems at the same time.
  • OSI-open model in the hot chat space: A startup released an Apache-licensed foundation chat model, based on previous text work from Eleuter.ai. This will could easily accelerate innovation in the chat space… and has no model cards, so details of training and possible problems will propagate.
  • Seeing foundations: Discussion, and some fun examples, of what happens when one model has a particular “look”. These are not from a foundation model, and foundation models can be retrained, but some of the same questions of "style" will be raised by any foundation model. (See also the next section, about the political "style" of a text model.)
  • Quality still matters: Everyone will be aiming to be “the” foundation model in their space. Health care is no exception, and this Stanford post evaluates a number of such models—finding many of them lacking.

Transparency

This is not transparency per se, but I found this paper's approach to understanding models creative. It uses existing human political science scales to measure the “politics” of ChatGPT—is it liberal/conservative, what parties might it vote for in European elections, etc. The conclusion—that ChatGPT is left-leaning—is less interesting than the technique—using standard social science techniques on a chatbot.

It does raise a transparency-related question: could this model’s political lean have been predicted, if we were given “source” access to ChatGPT’s data and training techniques? How do we reason about that, and the implications of it?

Similarly, to what extent should (must?) social science researchers be given access to models in order to do this sort of research?

Joys

Maybe my list of open’s “Joys” needs to include humor, since we’re already reaching the point of “ML jokes in comic strips”.

Misc.

Closing note

On Wednesday, someone asked me “what do I do in my spare time”; I did not expect the answer starting Thursday would be “very closely follow news about bank capitalization”. No deep lesson there… just, boy, the firehose we all drink from does not stop. Hope you're all taking deep breaths—this will all be here tomorrow.