4 min read

Policy and more / 2022-10-07

Notes on new government initiatives, how ml is reminding me of late-90s open, open data, and more.
AI-generated image reminiscent of oil-painted nurses over a bed.

Probably a light newsletter this week because of personal stuff. But another very busy week around this space.

Open-as-policy

This probably is obvious but I think worth saying out loud at least once: part of the vision of open that I like is very much an open that helps us build a more ethical world. That means government policy around AI is very much on topic here.

As noted last week the EU advanced a proposed regulation on AI liability, and this week it is the Biden administration's turn to publish a "Blueprint for an AI Bill of Rights", subtitled "making automated systems work for the American people".  Critically, this is not a proposed regulation; it's at some level just a (very elaborate) think piece, and has no regulatory or other force. But (especially if Democrats retain Congress and/or the White House) the thinking here would likely guide future US regulation. There's a lot of depth worth looking into here (including great examples of what drove their thinking) but the key rights are:

  • to be protected from unsafe or ineffective systems.
  • to be protected from unsafe or ineffective systems.
  • to be protected from abusive data practices via built-in protections and you should have agency over how data about you is used.
  • to know that an automated system is being used and understand how and why it contributes to outcomes that impact you.
  • to be able to opt out, where appropriate, and have access to a person who can quickly consider and remedy problems you encounter.

And it applies to "(1) automated systems that (2) have the potential to meaningfully impact the American public’s rights, opportunities, or access to critical resources or services." Careful readers will already have seen a lot of challenges and strategic ambiguities here. On the one hand, that's inevitable at this stage of the game; on the other, there's going to be a lot of legal complexity about working in this space in coming years.

Related: I have not listened to this podcast yet, but Gillian Hadfield is one of my favorite legal thinkers. Her comparison of the challenges of aligning AI, and aligning another artificial life form (corporations), is an interesting one.

Related: Mozilla.org is looking to fund projects auditing AI using open source, as part of their Open Source Audit Tooling Initiative. "Our goal is ... [invest] in tools which can help identify the bias in and increase the transparency of AI systems." I suspect that government regulation is going to drive a lot of investment in transparent (necessarily, therefore open?) audit tools for AI.

Pattern-matching

I mentioned last week a few key patterns from early open that might be replicating in ML. A few new examples this week; suspect this might become a recurring theme:

  • Many smart, motivated eyeballs continue to converge on Stable Diffusion. I saw one pull request with a giant speedup (2-3x in some circumstances), and when I went to look at it another 25-35% speedup patch had also been found and merged.
  • Facebook just published a new Apache-licensed and hardware-portable inference library, coming close to performance of Nvidia's proprietary CUDA tool. As we know from Linux, being hardware agnostic can help a lot.
  • Nat Friedman of Ximian/Github/Copilot cites rapidly lowered training costs as something cutting in favor of the 'democratization' of AI in this (paywalled) intriguing interview.

Style explorer (and Blurred Lines)

"Gorgeous" came out this week and lets you to look at the results of the same prompt ("A woman with flowers in her hair in a courtyard, in the style of ___") across 1500 different artists. Worth some browsing time to see how well (or not) your favorite artists are represented.

Four images of a woman with flowers in her hair.
A woman with flowers in her hair in a courtyard, in the style of Rembrandt, via Gorgeous and Stable Diffusion

It's not loading for me as I prepare to hit send—perhaps it's already been sued into oblivion? A friend pointed out to me that the US copyright case about the song "Blurred Lines" could be read to give artists copyright over a style, not just a specific work, and the upcoming Warhol case in the Supreme Court could also have a similar impact—perhaps rendering this site a copyright violation for the past century or so of artists.

(I would love to hear from any EU lawyer friends about whether moral rights would be likely to protect a visual artists' style rather than a specific work; I have to imagine the answer is yes?)

Open Data

It's underdiscussed, but if open ML is truly to  compete with proprietary approaches, data for training must also be opened in some way. In the US, we rely on fair use for this so far, but if Blurred Lines/Warhol (see previous) changes that, what options will we have?

One possibility is that some of the same forces that have pushed large corporations to open their ML training efforts will also push them to open the corresponding data. Optimistic signal in this direction: This week Amazon released a big, open dataset of questions and answers.

Demo time