The Geomblog

The "fable" of Anthropic and the USG

2026-06-13T14:51:47.343-06:00

News moves fast. 12 hours ago I was enjoying the demolition that the US put on Paraguay when I heard that Anthropic had shut down access to Fable and Mythos (their latest and most powerful models).

Since then, more news has surfaced about what went down, and I feel like it's a good exercise in understanding both the policy and psychodrama around AI today - with maybe even a moral or two like Aesop's .... FABLES (yes I'm going to keep making bad jokes and no you can't stop me)

Part 1: The event

Let's first lay out the facts of the matter. To the best of my knowledge, here's what transpired.

Some researchers (apparently at Amazon) uncovered ways to jailbreak Fable to (possibly) perform cybersecurity-related attacks.
Someone (apparently Andy Jassy) told the White House (or the Treasury Secretary) about this.
The WH and Anthropic had a back and forth on what needed to be done about this: Anthropic claimed these were not serious jalbreaks, and the WH said that they were and that Anthropic needed to either take down the model or something...
The WH then invoked export controls to demand that Anthropic block access to Fable/Mythos for foreign nationals (regardless of where they happen to be)
Anthropic blocked access entirely, arguing that they had no way of distinguishing foreign nationals from American citizens.

Now most of the reporting will focus on solidifying the facts of the matter (I hope) and will probably also focus on the drama. Drama is fun (don't get me wrong), but it can make thinking about policy really hard.

So let me lay out some of the questions about the drama that might be useful to have answered, but then try to focus on the bigger policy questions that come out of this.

What were these mysterious jailbreaks? This is actually an important question that will shape the policy response as well.
How were these jailbreaks flagged and sent up the chain, and why was the communication of the form "hey I called my buddy at the WH and told him stuff"?
What actually transpired in the discussion between the WH and Anthropic?

Part II: Clean-slate Policy

Let's pretend we are working in a vacuum for a second, and think about this with policy hats on, without worrying about the actual players (unrealistic I know, but a useful exercise).

The US Government is worried about powerful models allowing any user to generate (say) cybersecurity hacks that can compromise critical national infrastructure (for e.g the financial sector which is why the Treasury Secretary is paying close attention). These models are general purpose and have many uses, and the USG doesn't just want to shut them down entirely (we can debate that, but not right now).

What they'd like to do is have some way to monitor models for specific kinds of risks, before deployment, and also on a continuing basis. Maybe there's some kind of voluntary program where providers of powerful models give access to independent testers (for eg. some kind of Center for AI ~~Security~~ Standards and Innovation) who can identify risks, communicate these risks to the companies involved, and make sure that mitigations are put in place. It wouldn't be perfect, but it would be an ongoing process.

If this sounds familiar, it should. because a) it's how we do cybersecurity right now without any government involvement and b) it's a little bit of how the recent WH EO was constructed (there were other parts of the EO that are problematic, but again, not for now)

In other words, there's a way to do what the government wants if this is indeed what they want and companies are willing to cooperate (this is setting aside whether you and I want the government to do this. That's a different discussion)

Part III: (we know) Drama

Well. that's all well and good. But I don't unfortunately live in a rationalist universe where I can write 20,000 word screeds on moreright.com and be "aligned" with everyone else. What's the reality here?

The first thing I want to emphasize is that drama loves a good guy (yes "guy") and a bad guy, and it's really tempting to first decide who's the bad guy and then decide the other one must be good. It would be really tempting to say for e.g "the Trump administration has no clue on AI and therefore Anthropic is the good guy", or "Tech companies are evillll and the administration is therefore doing the right thing".

Unfortunately (reporters please please pretty please pay attention), it's not that simple.

There are no innocent actors here.

This particular administration has always approached AI regulation in a very "we will say we are hands off but actually we are not but it's really about who's in favor and who's not that decides how we will act" way. Trying to retrofit logical policy actions onto that is hard, and this case is no different. The administration seems to operate its AI policy on some mix of favoritism, pique, and vengeance, and so it's hard to reconcile this reaction with the complete silence when (say) Grok was churning out CSAM and deepfake nonconsensual porn on demand while also being used within the department of ~~war~~ defense. For more on the internal incoherence of the administration's approach to AI, see Justin Hendrix's great analysis.

Anthropic is the "hero" of the moment, because their seeming adversary is the "bad guy" for so many in tech policy. But on the eve of the UFC fight on the WH lawn, keep in mind that these are all actors, and there's an audience. Anthropic is about to go public and make an insane amount of money for some people. It's in their interest to say "oh yeah our models are SCARY (good) and the best out there" and also say at the same time "Yeah your jailbreak is not that scary and we are fine and can release our systems". I don't doubt that there are people at Anthropic who genuinely believe such things, but Anthropic is a corporation (not a "lab") and is in the business of market control and profit.

Specifically, it is entirely possible that Fable is both a great improvement on Opus, and can do some questionable things better, and is also susceptible to the same jailbreaks and vulnerabilities as other models. It's possible it's not some special unicorn that is so dangerous we all have to trust in Anthropic's good intentions, but just the next incarnation of a product with many of the same weaknesses. We just don't know because Anthropic won't say, and won't actually allow for independent testing separately from the folks they want to give access to.

Part IV: So what should we do?

This episode doesn't change many of the things we understand already about the contours of AI policy. And in fact it's dangerous to overindex on one episode - that tends to leads to a whack-a-mole approach to doing AI regulation that has been harmful in other settings.

1. We need to regulate the downstream risks and harms that come from the introduction of AI.

All this nonsense around "but but innovation" needs to stop. You can tell an argument is not very useful when it's been used over and over again for virtually every single sector of society over the past century, including all the currently regulated sectors that we don't want to loosen regulations on.

We need to do this 10 years ago. And we need to do this now. The AI industry is not some delicate hothouse flower that needs nurturing. It's a robust trillion dollar enterprise that's reshaping our world and will do so without our say so.

2. It's more effective to focus sector by sector.

Cybersecurity risks are concrete risks that we can evaluate in a focused way. And we can make use of the infrastructure and policy around cybersecurity to do so. Will this exact framework work for (say) threats to the electrical grid? probably not, and so we need a different "vertical" for understanding, evaluating, and mitigating risks in that sector. And so on.

3. You don't need to focus only on the tech: focus on the ecosystem of actors and safeguards currently in place

There's a lot of concern around the use of AI in medicine, and in the financial sector. But these are both heavily regulated sectors where there are already checks in place to make sure that the systems function as we want them to. Are they perfect? no. But it's easier to tweak an existing system of safeguards. Maybe AI is used to generate a new drug: but such a drug will need to go through regular clinical trials with real people (not synthetic!) in order to be put on the market. So focus on where AI might be compromising an existing system of governance, rather than assuming we need to regulate the model itself.

4. Testing testing testing (independently)

To really assess the risks associated with the introduction of AI in different sectors, we need ... testing. Independent testing - not whatever blog posts the ~~labs~~ companies put out. But focused testing on specific issues, rather than general "capability testing". And we need to build and support the infrastructure for that. This is already too long to go on a rant about the decimation of the scientific research apparatus in the US courtesy of the administration, but yes, the decimation of the scientific research apparatus in the US will have a direct effect on our ability to test for risks and harms, and has to be part of any policy directions we explore.

The unit distances problem

2026-05-20T15:49:07.380-06:00

OpenAI just announced that ChatGPT has disproved a conjecture about one of Erdos's most famous problems: the unit distance problem.

This problem is personal to me: I spent a good chunk of time during my Ph.D mulling over it, and it's what hooked me into computational geometry. Like most of Erdos's problems, it's really easy to state

Let P be a set of n points in the plane (amen). What is the maximum number of unit distances that can be achieved?

(the amen is a running joke in my old field of computational geometry)

Note that this is really about duplicate distances (because if you get a bunch of pairs of points at distance d, you can scale the point set so that d = 1). It's also trivial to see that the maximum is at least n-1 (just put points evenly spaced along the number line), and that the number can't be more than n-choose-2 (because that's the number of pairs).

So what's the real number? Erdos showed using a fairly complicated construction that you can get a set of points that has $n^{1 + 1/\log\log n}$ unit distances. On the other side of it, a famous result from 1983 by Spencer, Szemederi and Trotter showed that you can't get more than $O(n^{4/3})$ distances.

Erdos himself used a really elegant but weaker argument to show that you can't get more than $O(n^{3/2})$ distances. And the argument was cool and deceptive (in retrospect). To get a distance pair of 1, a circle of size 1 around a point must touch another point (and vice versa). Draw an edge between those two points when this happens. So here's a fun fact: you can't create a situation where two points are connected by edges to each of three other points. Or to be more formal, you can't create a situation where there's a $K_{2,3}$ bipartite graph hidden inside the set of edges. By a well known result in graph theory this means this graph can't have too many edges in it (because if it did you'd eventually find one of these special graphs): specifically no more than $O(n^{3/2})$ edges.

So there's a limit. But what's the real limit? A long standing conjecture was that you could NOT get anything nontrivially more than n pairs. Specifically that you couldn't get $\Omega(n^{1 + \epsilon})$ pairs for any $\epsilon$. This is frustrating because the gap between this, and $n^{4/3}$ is huge.

Turns out this conjecture was wrong. And ChatGPT proved it, by building a complicated generalization of Erdos's construction that is indeed of size $\Omega(n^{1 + \epsilon})$ for some $\epsilon > 0$.

This was a tantalizing, infuriating, and beautiful problem that has resisted progress for a very long time, and touches on some very deep concepts in mathematics. It's really impressive that an AI system has provided a proof for it. For more on the significance of the result and some interpretation of the proof technique, check out the companion article.

The AAAI 2026 AI review experiment

2026-05-07T22:04:21.528-06:00

AAAI did an experiment this year where they supplemented human reviews with AI-generated reviews and solicited feedback from authors and the review hierarchy about the process. They've now written up the experiment.

The paper isn't too long, and I'd encourage you to read the whole thing (or, I don't know, put it into notebookLM and make a podcast out of it!). Some interesting points stood out to me as I read the report.

The complexity of the process

The process of architecting the AI review was not the cartoonish "hey ChatGPT review this paper for me". It was carefully structured to focus on specific elements of the review (content, readability, evaluation, setup, etc). The system had what is now standard: a second LLM that acted as a critic and was not told where the review came from, and a third LLM that has to integrate the analysis of the critic and the original review into a final review. I've heard plenty of cases where this architecture does better than just getting the review or even just having a judge.

To be clear, the critic was only doing a 'meta review'. It didn't have access to the original paper, so its goal was mostly structural/formal: does the review have all the elements and does it avoid things like accidental author reveal, or obnoxious comments etc.

One thing that wasn't clear from the article was how exactly the LLM was checking code, experiments, theorems and proofs, "using the code interpreter as needed". I'd want to see more details about that seemingly agentic handoff.

The perception of the results

There's a pretty dramatic signal in the survey results (and the number of responses was decent). AI-generated reviews were viewed as better than human generated reviews along six of nine categories. Where humans did better was on not nitpicking, identifying technical errors, and providing useful suggestions, but where AI reviews did better included being thorough and providing useful suggestions for improvement (which reminds of https://www.refine.ink/)

It was interesting to see that almost across the board, authors were more enthusiastic about the AI reviews than the reviewer hierarchy. If I'm being my burnt-out AC persona, I'd say this is because authors are likely grateful to get any kind of thorough review of their paper, and man do human reviews of papers suck.

The human-AI interaction

The survey had free form responses that were interesting from a qualitative perspective. I think this is where the report fell down a bit, because I suspect there's a rich trove of analysis to do on the assessments that people wrote in. A couple of highlighted examples though brought home the important point that perhaps the best use of these AI reviews is before submission itself, kind of like what STOC 2026 did in their experiment. Because the AI reviews are great at identifying lots of small things that a friendly pre-submission review might miss, but they don't have the same kind of judgement and taste that a person has.

A minor notes:

The process cost less than $1/paper, for 30,000 submissions. That's not a bad amount to spend. But you have to wonder why reviewers can't get compensation for their work but OpenAI gets paid.

Paper links from my keynote at FAccT

2025-06-22T14:25:00.003-06:00

it's always difficult to keep track of the papers a speaker mentions when they're giving a talk. I'm delivering a keynote at FAccT 2025, and so thought I'd make a list of paper references for easy access to anyone interested. Note that some of my papers are not yet publicly available. I'll make them accessible as soon as they are, and you can find the link here when they are.

FATML 2016 closing panel
FATML 2014
AI Bill of Rights; the NIST AI Risk Management Framework
Biden administration memos on AI

CNTR AISLE framework
Remarks by the Catholic Church and Pope Leo XIV (one, two)
Explainer on Sociotechnical AI policy
Fairness and Abstractions in Sociotechnical Systems
Participatory AI
Measurement and Fairness
Framework for undersstanding sources of harm throughout the machine learning lifecycle
Explanations in artificial intelligence: insights from the social sciences
DOGE and Veterans Affairs Contracts.
Distinguishing Predictive and Generative AI in Regulation (coming soon)
Sovereignty as a Service (coming soon)
Evaluation Science
Position paper on evaluating genAI.
Multi-lingual functional evaluation (coming soon)
Data and DOGE panel at Brown
MIT Tech Review article on Amsterdam deployment of AI.
Red-Teaming AI Policy
Better proxy estimation
Genetic data governance
Audit trails (coming soon)
CNTR website (and tech policy summer school)

Standing up for Science

2025-03-07T14:57:00.000-07:00

It's been forever since I've written a blog post. Twitter, and then X, and then Bluesky, has absorbed most of my hot takes. But I think more and more that it's time to move away from transient thoughts to things that are more well formed, and so I'm going to try and blog a bit more again.

Mar 7, 2025 was Stand Up For Science Day. I was invited to speak at the Rhode Island local event. It was a freezing cold day in front of the Rhode Island State House in Providence, RI. With encouragement from my students, we did a little "teach-in" on campus first to lay out some of the history of federal funding in the US (going back to Vannevar Bush and Endless Frontiers), and why some of the new administration moves were so radical and dangerous.

Then a bunch of us walked over to the State House for the rally. There was a good crowd there inspite of the bitter wind - by my estimate it was over 100 and perhaps close to 200. Lots of fantastic placards, including this one:

And then it was time for me to speak. I've never spoken at a rally before, and it took a good amount of preparation (and much more trepidation) to generate my 3 minutes of remarks. The crowd was very encouraging, cheering every time I paused, and that helped a lot :).

Here's what I said.

I’m a scientist. An immigrant scientist. I came to this country to do research because it promised me freedom. The freedom to explore. To discover. To reveal the beauty of nature and mathematics. I’m grateful to have a job doing what I love, but today I speak only for myself and for all of us who love science and the pursuit of knowledge.

For almost a century, America has been a welcoming home for science. Scientists here have produced wondrous discoveries that built the world we see around us. Scientists eradicated polio. And smallpox. And tamed many of the demons of cancer. Scientists sent us to the moon. Sent robots to Mars. And built the technological wonders of the computer age: the internet, the web, the cloud, and yes, AI.

Science has given us power – not over each other – but to make this world we live in better. For all of us, and not just a select few.

I study the impact of technology in society. The research that many of us do has shown how technology can be used to oppress us, but how it also can be used to uplift and elevate voices that have been ignored, abused, and silenced.

The beauty of scientific discovery is that it is open to anyone who is curious and brave. Being a scientist is being able to look at the world every day with the openness, innocence, and passion of a child. To be able to ask “why”, and “how”, and “what if”.

And that’s why those in power want to silence us now. They are scared. Scared of the undeniable truths that science reveals. Scared of the way science empowers and benefits all of us. And absolutely terrified of the freedom that scientific exploration gives us.

And so they try to muzzle us. Censor research they don’t like the answers to. Exclude those they refuse to see as full people. Try to use money to banish those brave enough to fight back, and buy off those willing to collaborate,

Scientists will not be silenced. We will fight back. Not with fear, but with the truth. Not with power, but with inclusion. Not by censoring, but with openness. Science has always sought to help us dream of better worlds for all people, and then build them. We stand up for science, and in doing so we stand up for all of us.

And we will not stop.

Transitions

2021-05-17T08:33:00.002-06:00

I've been at the U of Utah and Salt Lake City for 14 years (14.5 really). It was my first academic job and the longest time I've spent anywhere (throughout my whole life). So it's a little hard to accept that I'm moving to my next adventure.

It's a two-part adventure, because why make one move when you can make two.

Firstly, as of today, I'm going to working with Alondra Nelson at the White House Office of Science and Technology Policy, advising on matters relating to fairness and bias in tech systems. This is a scary and exciting new position, and I hope to help to nudge things along just a bit further in the direction of tech that can help more than it harms, especially for those who've been left behind in our rush to an algorithmically controlled future.

Secondly, I'm moving to Brown University to join the CS department there as well as their Data Science Initiative. Together with Seny Kamara and others, I'm going to start a new center on Computing for the People, to help think through what it means to do computer science that truly responds to the needs of people, instead of hiding behind a neutrality that merely gives more power to those already in power.

Lots of changes, and because of the pandemic, all this will happen in slow machine, but it's a whirlwind of emotions (and new clothes - apparently tech conference T-shirts don't work in formal settings - WHO KNEW!!!).

Lars Arge.

2020-12-25T11:47:00.002-07:00

Not a post I'd have wanted to make on Christmas day, but that's how it goes sometimes.

Lars Arge just passed away, on Dec 23. For those of us who've been following his battles with cancer, this might not come as a total shock, but there was always hope, and that's no longer an option.

It's hard to imagine this in 2020, but there was a time not that long ago (at least in my mind) when "big data" wasn't really a thing. Companies were acquiring lots of data, and "GIGA byte" was a thing, but there was no real appreciation of the computational challenge associated with big data.

A paper by Aggarwal and Vitter in 1998 made the first step towards changing that, introducing the external memory model as a way to think about computations when you have memory access that are cheap (in RAM) and expensive (on disk).

It's a diabolically simple model: all main memory access is free, and any disk access costs 1 unit (but you can get a block of data of size B for that one unit of access). It's not meant to be realistic, but like the best computational models, it's meant to isolate the key operations that are expensive so that we can study how algorithm design needs to change.

Lars was one of the foremost algorithm designers for this new world of external memory. His Ph.D thesis laid out ideas for how to build data structures that are external memory efficient, and his research over the next many decades, in true Tarjan/Hopcroft form, built the fundamental structures and concepts one would need to even think about efficient algorithm design, with many clever ideas around batching queries, processing data in main memory to prepare for queries, and streaming access to disk when appropriate.

Formal algorithmic models are often misunderstood. They look simplistic, miss many of the details that seem relevant in practice, and appear to encourage theoretical game playing divorced from reality. But a formal model at its best does its work invisibly. It shifts the way we think about a framework. It fosters the design of new paradigms for efficient algorithms, and it allows us to layer optimizations on that move a system from theory to practice without ever having to compromise the underlying design principles.

Lars was a force of nature in this area. I first remember meeting him in 1998 at AT&T Labs when I was interning and he was visiting there. He had boundless energy for this space, and seemingly wanted to turn everything into an external memory algorithm, whether it was geometry, data structures, or even the most basic algorithms like sorting. His intuition was the best kind of algorithmic intuition: build up the core primitives, and the rest would follow.

And this is exactly what happened. The field exploded. For a while, "big data algorithms" WERE external memory algorithms. There was no other way to even talk about big data. And that spawned even more models. Streaming algorithms were inspired by external memory and the realization that a one pass stream was an effective way to work with large data. Cache-oblivious algorithms asked about what would happen if we took the same two-part hierarchy with main memory and disk and extended it to the cache. Semi-external memory models asked how we might modify the base model for graph computations. The MapReduce framework from the early 2000s generalized the external memory model to handle newer kinds of streaming/memory-limited architectures, in turn to be followed by Spark and so many other models.

I'd go as far as to say this: all of the conceptual developments we see today in big data computations at some level can be traced back to work on external memory algorithms, and that was driven by Lars (and his collaborators).

It wasn't just the papers he wrote. Lars was a leader in shaping the field. Early in the 2000s he moved back from Duke University to Aarhus University, and from there started to build what would become one of the foremost institutes for thinking about big data, first as a BRICS center and then as the appropriately named MADALGO Institute.

Many of us who had anything to do with big data visited MADALGO at some point in our careers. I spent one of the best summers of my life being hosted by him during my sabbatical - my children still remember that summer we spent in Aarhus and wish we could go back each year. He instinctively knew that the best way to foster the area was to facilitate a generation of researchers who would bring their own ideas to Aarhus, mix and exchange them, and then go away and share them with the world.

And he wasn't merely content with that. He wanted to demonstrate the power of his perspective beyond just the realm of academia. He started a company SCALGO that applied the principles of external memory algorithms (and so much more) to help with modeling geospatial data. I remember distinctly him telling me the first time he demonstrated SCALGO products in a forum with other companies doing GIS work and how the performance of their system blew the other products out of the water. For someone (at the time) deeply embedded in the theory of computer science, I was astounded and encouraged by this validation of formal thinking.

Lars was a giant in our field (his email address was always large@..., and this worked more appropriately than one would ever dream of). But he was also a giant both in real life and in his personality. He was the warmest, most fun person to be around. He seemed almost ego-free, and often downplayed his own accomplishments, claiming that his main talent was hanging around with smarter people. He was extremely generous with his time and resources (which is why so many of us were able to visit Aarhus and benefit from being at MADALGO)

He was the life of any party -- I still remember when he hosted the Symposium on Computational Geometry in Denmark. It felt like we were at a post-battle Viking celebration (and yes he got up on a table and shouted "SKÅL" over and over again while an actual pig was roasting on a spit nearby). I remember him taking me to a Denmark-Sweden soccer game and warning me not to wear anything with blue on it. I remember us going for go-kart racing and his stream of trash talking.

Lars was the entire package: a great person, a great researcher, a visionary leader, and a canny entrepreneur. I will miss him greatly.

New conference announcement

2019-04-11T14:00:00.001-06:00

Martin Farach-Colton asked me to mention this, which is definitely NOT a pox on computer systems.

ACM-SIAM Algorithmic Principles of Computer Systems (APoCS20)

https://www.siam.org/Conferences/CM/Main/apocs20January 8, 2020
Hilton Salt Lake City Center, Salt Lake City, Utah, USA
Colocated with SODA, SOSA, and Alenex

The First ACM-SIAM APoCS is sponsored by SIAM SIAG/ACDA and ACM SIGACT.

Important Dates:

August 9: Abstract Submission and Paper Registration Deadline
August 16: Full Paper Deadline
October 4: Decision Announcement

Program Chair: Bruce Maggs, Duke University and Akamai Technologies

Submissions: Contributed papers are sought in all areas of algorithms and architectures that offer insight into the performance and design of computer systems. Topics of interest include, but are not limited to algorithms and data structures for:

Databases

Compilers

Emerging Architectures

Energy Efficient Computing

High-performance Computing

Management of Massive Data

Networks, including Mobile, Ad-Hoc and Sensor Networks

Operating Systems

Parallel and Distributed Systems

Storage Systems

A submission must report original research that has not previously or is not concurrently being published. Manuscripts must not exceed twelve (12) single-spaced double-column pages, in addition the bibliography and any pages containing only figures. Submission must be self-contained, and any extra details may be submitted in a clearly marked appendix.

Steering Committee:

Michael Bender

Guy Blelloch

Jennifer Chayes

Martin Farach-Colton (Chair)

Charles Leiserson

Don Porter

Jennifer Rexford

Margo Seltzer

On PC submissions at SODA 2020

2019-03-26T08:00:00.000-06:00

SODA 2020 (in SLC!!) is experimenting with a new submission guideline: PC members will be allowed to submit papers. I had a conversation about this with Shuchi Chawla (the PC chair) and she was kind enough (thanks Shuchi!) to share the guidelines she's provided to PC members about how this will work.

SODA is allowing PC members (but not the PC chair) to submit papers this year. To preserve the integrity of the review process, we will handle PC member submissions as follows.

1. PC members are required to declare a conflict for papers that overlap in content with their own submissions (in addition to other CoI situations). These will be treated as hard conflicts. If necessary, in particular if we don't have enough confidence in our evaluation of a paper, PC members will be asked to comment on papers they have a hard conflict with. However, they will not have a say in the final outcome for such papers.

2. PC submissions will receive 4 reviews instead of just 3. This is so that we have more confidence on our evaluation and ultimate decision.

3. We will make early accept/reject decisions on PC members submissions, that is, before we start considering "borderline" papers and worrying about the total number of papers accepted. This is because the later phases of discussion are when subjectivity and bias tend to creep in the most.

4. In order to be accepted, PC member submissions must receive no ratings below "weak accept" and must receive at least two out of four ratings of "accept" or above.

5. PC member submissions will not be eligible for the best paper award.

My understanding is that this was done to solve the problem of not being able to get people to agree to be on the PC - this year's PC has substantially more members than prior years.

And yet....

Given all the discussion about conflicts of interest, implicit bias, and double blind review, this appears to be a bizarrely retrograde move, and in fact one that sends a very loud message that issues of implicit bias aren't really viewed as a problem. As one of my colleagues put it sarcastically when I described the new plan:

"why don't they just cut out the reviews and accept all PC submissions to start with?"

and as another colleague pointed out:

"It's mostly ridiculous that they seem to be tying themselves in knots trying to figure out how to resolve COIs when there's a really easy solution that they're willfully ignoring..."

Some of the arguments I've been hearing in support of this policy frankly make no sense to me.

First of all, the idea that a more heightened scrutiny of PC papers can alleviate the bias associated with reviewing papers of your colleagues goes against basically all of what we know about implicit bias in reviewing. The most basic tenet of human judgement is that we are very bad at filtering our own biases and this only makes it worse. The one thing that theory conferences (compared to other venues) had going for them regarding issues of bias was that PC members couldn't submit papers, but now....

Another claim I've heard is that the scale of SODA makes double blind review difficult. It's hard to hear this claim without bursting out into hysterical laughter (and from the reaction of the people I mentioned this to, I'm not the only one). Conferences that manage with double blind review (and PC submissions btw) are at least an order of magnitude bigger (think of all the ML conferences). Most conference software (including easy chair) is capable of managing the conflicts of interest without too much trouble. Given that SODA (and theory conferences in general) are less familiar with this process, I’ve recommended in the past that there be a “workflow chair” whose job it is to manage the unfamiliarity associated with dealing the software. Workflow chairs are common at bigger conferences that typically deal with 1000s of reviewers and conflicts.

Further, as a colleague points out, what one should really be doing is "aligning nomenclature and systems with other fields: call current PC as SPC or Area Chairs, or your favorite nomenclature, and add other folks as reviewers. This way you (i) get a list of all conflicts entered into the system, and (ii) recognize the work that the reviewers are doing more officially as labeling the PC members. "

Changes in format (and culture) take time, and I'm still hopeful that the SODA organizing team will take a lesson from ESA 2019 (and their own resolution to look at DB review more carefully that was passed a year or so ago) and consider exploring DB review. But this year's model is certainly not going to help.

Update: Steve Blackburn outlines how PLDI handles PC submissions (in brief, double blind + external review committee)

Update: Michael Ekstrand takes on the question that Thomas Steinke asks in the comments below: "How is double blind review different from fairness-through-blindness?".

OpenAI, AI threats, and norm-building for responsible (data) science

2019-02-19T09:00:00.000-07:00

All of twitter is .... atwitter?... over the OpenAI announcement and partial non-release of code/documentation for a language model that purports to generate realistic-sounding text from simple prompts. The system actually addresses many NLP tasks, but the one that's drawing the most attention is the deepfakes-like generation of plausible news copy (here's one sample).

Most consternation is over the rapid PR buzz around the announcement, including somewhat breathless headlines (that OpenAI is not responsible for) like

OpenAI built a text generator so good, it’s considered too dangerous to release

Researchers, scared by their own work, hold back “deepfakes for text” AI

There are concerns that OpenAI is overhyping solid but incremental work, that they're disingenuously allowing for overhyped coverage in the way they released the information, or worse that they're deliberately controlling hype as a publicity stunt.

I have nothing useful to add to the discussion above: indeed, see posts by Anima Anandkumar, Rob Munro, Zachary Lipton and Ryan Lowe for a comprehensive discussion of the issues relating to OpenAI. Jack Clark from OpenAI has been engaging in a lot of twitter discussion on this as well.

But what I do want to talk about is the larger issues around responsible science that this kerfuffle brings up. Caveat, as Margaret Mitchell puts it in this searing thread.

It's really hard to watch the GPT-2 conversations unfold like so much else in tech. 1/
— MMitchell (@mmitchell_ai) February 18, 2019

To understand the kind of "norm-building" that needs to happen here, let's look at two related domains.

In computer security, there's a fairly well-established model for finding weaknesses in systems. An exploit is discovered, the vulnerable entity is given a chance to fix it, and then the exploit is revealed , often simultaneously with patches that rectify it. Sometimes the vulnerability isn't easily fixed (see Meltdown and Spectre). But it's still announced.

A defining characteristic of security exploits is that they are targeted, specific and usually suggest a direct patch. The harms might be theoretical, but are still considered with as much seriousness as the exploit warrants.

Let's switch to a different domain: biology. Starting from the sequencing of the human genome through the million-person precision medicine project to CRISPR and cloning babies, genetic manipulation has provided both invaluable technology for curing disease as well as grave ethical concerns about misuse of the technology. And professional organizations as well as the NIH have (sometimes slowly) risen to the challenge of articulating norms around the use and misuse of such technology.

Here, the harms are often more diffuse, and the harms are harder to separate from the benefits. But the harm articulation is often focused on the individual patient, especially given the shadow of abuse that darkens the history of medicine.

The harms with various forms of AI/ML technology are myriad and diffuse. They can cause structural damage to society - in the concerns over bias, the ways in which automation affects labor, the way in which fake news can erode trust and a common frame of truth, and so many others - and they can cause direct harm to individuals. And the scale at which these harms can happen is immense.

So where are the professional groups, the experts in thinking about the risks of democratization of ML, and all the folks concerned about the harms associated with AI tech? Why don't we have the equivalent of the Asilomar conference on recombinant DNA?

I appreciate that OpenAI has at least raised the issue of thinking through the ethical ramifications of releasing technology. But as the furore over their decision has shown, no single imperfect actor can really claim to be setting the guidelines for ethical technology release, and "starting the conversation" doesn't count when (again as Margaret Mitchell points out) these kinds of discussions have been going on in different settings for many years already.

Ryan Lowe suggests workshops at major machine learning conferences. That's not a bad idea. But it will attract the people who go to machine learning conferences. It won't bring in the journalists, the people getting SWAT'd (and one case killed) by fake news, the women being harassed by trolls online with deep-fake porn images.

News is driven by news cycles. Maybe OpenAI's announcement will lead to us thinking more about issues of responsible data science. But let's not pretend these are new, or haven't been studied for a long time, or need to have a discussion "started".

More FAT* blogging

2019-02-02T00:46:00.000-07:00

Session 3: Representation and Profiling

Session 4: Fairness methods.

FAT* Session 2: Systems and Measurement.

2019-01-28T23:48:00.002-07:00

Building systems that have fairness properties and monitoring systems that do A/B testing on us.

Session 2 of FAT*: my opinionated summary.

FAT* blogging

2019-01-27T21:09:00.003-07:00

I'll be blogging about each session of papers from the FAT* Conference. So as not to clutter your feed, the posts will be housed at the fairness blog that I co-write along with Sorelle Friedler and Carlos Scheidegger.

The first post is on Session 1: Framing and Abstraction.

The theoryCS blog aggregator REBORN

2018-12-20T09:25:00.003-07:00

(will all those absent today please email me)

(if you can't hear me in the back, raise your hand)

The theoryCS blog aggregator is back up and running at its new location -- cstheory-feed.org -- which of course you can't know unless you're subscribed to the new feed, which....

More seriously, we've announced this on the cstheory twitter feed as well, so feel free to repost this and spread the word so that all the theorists living in caves plotting their ICML, COLT and ICALP submissions will get the word.

Who's this royal "we"? Arnab Bhattacharyya and myself (well mostly Arnab :)).

For anyone interested in the arcana of how the sausage (SoCG?) gets made, read on:

Arvind Narayanan had set up an aggregator based on the Planet Venus software for feed aggregation (itself based on python packages for parsing feeds). The two-step process for publishing the aggregator works as follows:

Run the software to generate the list of feed items and associated pages from a configuration file containing the list of blogs
Push all the generated content to the hosting server.

Right now, both Arnab and I have git access to the software and config files and can edit the config to update blogs etc. The generator is run once an hour and the results are pushed to the new server.

So if you have updates or additions, either of us can make the changes and they should be reflected fairly soon on the main page. The easiest way to verify this is to wait a few hours, reload the page and see if your changes have appeared.

The code is run off a server that Arnab controls and both of us have access to the domain registry. I say this in the interest of transparency (PLUG!!) but also so that if things go wonky as they did earlier, the community knows who to reach.

Separately, I've been pleasantly surprised at the level of concern and anxiety over the feed -- mainly because it shows what a valuable community resource the feed is and that I'm glad to be one of the curators.

If you've read this far, then you really are interested in the nitty gritty, and so if you'd like to volunteer to help out, let us know. It would be useful for e.g to have a volunteer in Europe so that we have different time zones covered when things break. And maybe our central Politburo (err. I mean the committee to advance TCS) might also have some thoughts, especially in regard to their mission item #3:

To promote TCS to and increase dialog with other research communities, including facilitating and coordinating the development of materials that educate the general scientific community and general public about TCS.

The theoryCS aggregator

2018-12-06T00:31:00.000-07:00

As you all might now, the cstheory blog aggregator is currently down. Many people have been wondering what's going on and when it will be back up so here's a short summary.

The aggregator has been thus far maintained by Arvind Narayanan who deserves a HUGE thanks for setting up the aggregator, lots of custom code and the linked twitter account. Arvind has been planning to hand it over and the domain going down was a good motivator for him to do that.

Currently I have all the code that is used to generate the feed, as well as control over the twitter feed. Arnab Bhattacharyya has kindly volunteered to be the co-manager of the aggregator. What remains to be done now is

set up a new location to run the aggregator code from
set up hosting for the website
link this to the twitter account.

None of these seem too difficult and the main bottleneck is merely having Arnab and I put together a few hours of work to get this all organized (we have a domain registered already). We hope to have it done fairly soon so you can all get back to reading papers and blogs again.

Should credit scores be used for determining residency?

2018-11-24T15:00:00.000-07:00

It's both exhilarating and frustrating when you see the warnings in papers you write play out in practice. Case in point, the proposal by DHS to use credit scores to ascertain whether someone should be granted legal residence.

Josh Lauer at Slate does a nice analysis of the proposal and I'll extract some relevant bits for commentary. First up: what does the proposal call for? (emphasis mine)

The new rule, contained in a proposal signed by DHS Secretary Kirstjen Nielsen, is designed to help immigration officers identify applicants likely to become a “public charge”—that is, a person primarily dependent on government assistance for food, housing, or medical care. According to the proposal, credit scores and other financial records (including credit reports, the comprehensive individual files from which credit scores are generated) would be reviewed to predict an applicant’s chances of “self-sufficiency.”

So what's the problem with this? What we're seeing is an example of the portability trap (from our upcoming FAT* paper). Specifically, scores designed in a different context (for deciding who to give loans to) are being used in this context (to determine self-sufficiency). Why is this a problem?

Unfortunately, this is not what traditional credit scores measure. They are specialized algorithms designed for one purpose: to predict future bill-paying delinquencies, for any reason. This includes late payments or defaults caused by insurmountable medical debts, job loss, and divorce—three leading causes of personal bankruptcy—as well as overspending and poor money management.

That is, the reason the portability trap is a problem is because you're using one predictor to train another system. And if you're trying to make any estimations about the validity of the resulting process, then you have to know whether the thing you're observing (in this case the credit score) has any relation to the thing you're trying to observe (the construct of "self-sufficiency"). And this is something we harp on a lot in our paper on axiomatic considerations of fairness (and ML in general)

And in this case there's a clear disconnect:

Credit scores do not predict whether an individual will become a public charge. And they do not predict financial self-sufficiency. They are only useful in this context if one believes credit scores reveal something about a person’s character. In other words, if one believes that people with low credit scores are moochers and malingerers. Given the Trump administration’s hostility toward (brown-skinned) immigrants, this conflation of credit scores and morality is not surprising.

And this is a core defining principle of our work: that beliefs about the world control how we choose our representations and learning procedures: the procedures cannot be justified except in the context of the beliefs that underpin them.

I think that if you read anything I've written, it will be clear where I stand on the normative question of whether this is a good idea (tl;dr: NOT). But as a researcher, it's important to lay out a principled reason for why, and this sadly merely confirms that our work is on the right track.

What do I work on ?

2018-11-02T01:42:00.001-06:00

So, what do you work on?

As questions go, this is one of the most rudimentary. It's the conference equivalent of "Nice weather we're having", or "How about them Broncos!". It's a throat-clearer, designed to start a conversation in an easy non-controversial way.

And yet I'm always having to calculate and calibrate my answers. There's a visible pause, a hesitation as I quickly look through my internal catalog of problems and decide which one I'll pull out. On the outside, the hesitation seems strange: as if I don't quite know what I work on, or if I don't know how to explain it.

It's an occupational hazard that comes from living on the edge of many different areas. I go to data mining conferences, machine learning conferences, theory/geometry conferences, and (now) conferences on ethics, society and algorithms. And in each place I have a different circle of people I know, and a different answer to the question

So, what do you work on?

It makes me uncomfortable, even though it shouldn't. I feel like I can only share a part of my research identity because otherwise my answer will make no sense or (worse!) seem like I'm trying to impress people with incomprehensible words.

I don't doubt that most people share some form of this feeling. As researchers, none of us are one-dimensional, and most of us work on many different problems at a time. Probably the easiest answer to the question is the problem that one has most recently worked on. But I sense that my case is a little unusual: not the breadth per se, but the range of topics (and styles of problem solving) that I dabble in.

So, what do you work on?

I often joke that my research area is a random walk through computer science and beyond. I started off in geometry, dabbled with GPUs (alas, before they were popular), found my way into information theory and geometry (and some differential geometry), slipped down the rabbit hole into data mining, machine learning, and a brief side foray into deep learning, and then built a nice little cottage in algorithmic fairness, where I spend more time talking to social scientists and lawyers than computer scientists.

Being an academic nomad has its virtues: I don't really get bored with my work. But it also feels like I'm always starting from square one with my learning and that there are always people who know way more about every topic than I do. And my academic roamings seem to mirror my actual nomadic status. I'm a foreigner in a land that gets stranger and less familiar by the day, and the longest time I've spent in any location is the place I'm in right now.

So, what do you work on?

Maybe, in a way that's so American, "What do you work on" is really a question of "Who are you" in the way we bind together our work and our identity. When my students come and ask me what they should work on, what they're really asking me is to tell them what their research identity is, and my answer usually is, "whatever you want it to be right now". It's a frustrating answer no doubt, but I feel that it lowers the import of the question to a manageable level.

So, what DO you work on?

I do algorithmic fairness, and think about the ethics of automated decision-making. I bring an algorithmic (and geometric) sensibility to these questions. I'm an amateur computational philosopher, a bias detective, an ML-translator for lawyers and policy folk, and my heart still sings when I see a beautiful lemma.

On teaching ethics to tech companies

2018-10-22T08:05:00.000-06:00

Kara Swisher (who is unafraid to call it like it is!) has a new op-ed in the NYT titled "Who will teach Silicon Valley to be ethical". She asks

How can an industry that, unlike other business sectors, persistently promotes itself as doing good, learn to do that in reality? Do you want to not do harm, or do you want to do good? These are two totally different things.

And how do you put an official ethical system in place without it seeming like you’re telling everyone how to behave? Who gets to decide those rules anyway, setting a moral path for the industry and — considering tech companies’ enormous power — the world.

There are things that puzzle me about this entire discussion about ethics and tech. It seems like an interesting idea for tech companies to incorporate ethical thinking into their operations. Those of us who work in this space are clamoring for more ethics education for budding technologists.

There is of course the cynical view that this is merely window dressing to make it look like Big Tech (is that a phrase now?) cares without actually having to change their practices.

But let's put that aside for a minute. Suppose we assume that indeed tech companies are (in some shape of form) concerned about the effects of technology on society and that their leaders do want to do something about it.

What I really don't understand is the idea that we should teach Silicon Valley to be ethical. This seems to play into the overarching narrative that tech companies are trying to do good in the world and slip up because they're not adults yet -- a problem that can be resolved by education that will allow them to be good "citizens" with upstanding moral values.

This seems rather ridiculous. When chemical companies were dumping pesticides on the land by the ton and Rachel Carson wrote Silent Spring, we didn't shake our heads sorrowfully at companies and sent them moral philosophers. We founded the EPA!

When the milk we drink was being adulterated with borax and formaldehyde and all kinds of other horrific additives that Deborah Blum documents so scarily in her new book 'The Poison Squad', we didn't shake our heads sorrowfully at food vendors and ask them to grow up. We passed a law that led eventually to the formation of the FDA.

Tech companies are companies. They are not moral agents, or even immoral agents. They are amoral profit-maximizing vehicles for their shareholders (and this is not even a criticism). Companies are supposed to make money, and do it well. Facebook's stock price didn't slip when it was discovered how their systems had been manipulated for propaganda. It slipped when they proposed changes to their newsfeed ratings mechanisms to address these issues.

It makes no sense to rely on tech companies to police themselves, and to his credit, Brad Smith of Microsoft made exactly this point in a recent post on face recognition systems. Regulation, policing and whatever else we might imagine, has to come from the outside. While I don't claim that regulation mechanisms all work as they are currently conceived, the very idea of checks and balances seems more robust than merely hoping that tech companies will get their act together on their own.

Don't get me wrong. It's not even clear what has to be regulated here. Unlike with poisoned food or toxic chemicals, it's not clear how to handle poisonous speech or toxic propaganda. And that's a real discussion we need to have.

But let's not buy into Silicon Valley's internal hype about "doing good". Even Google has dropped its "Don't be evil" credo.

Google's analysis of the dilemma of free speech vs hate speech

2018-10-11T09:06:00.000-06:00

Breitbart just acquired a leaked copy of an internal google doc taking a cold hard look at the problems of free speech, fake news and censorship in the current era. I wrote a tweet storm about it, but also wanted to preserve it here because tweets, once off the TL, cease to exist.

Breitbart acquired an internal google doc discussing the misinformation landscape that the world finds itself in now: https://www.scribd.com/document/390521673/The-Good-Censor-GOOGLE-LEAK#from_embed …

I almost wish that Google had put out this document to read in public. It's a well thought out exploration of the challenges faced by all of us in dealing with information dissemination, fake news, censorship and the like. And to my surprise, it (mostly) is willing to point figures backwards at Google and other tech companies for their role in it. (although there are some glaring omissions like the building of the new censored search tool in China). It's not surprising that people inside Google are thinking carefully about these issues, even as they flail around in public. And the analysis is comprehensive without attempting to provide glib solutions

Obviously, since this is a doc generated within Google, the space of solutions is circumscribed to those that have tech as a major player. For e.g the idea of publicly run social media isn't really on the table, or even better ways to decentralize value assignment for news, or alternate models for search that don't require a business model. But with those caveats in mind, the analysis of the problems is reasonable.

A new sexual harassment policy for TCS conferences.

2018-10-08T09:00:00.000-06:00

One of my most visited posts is the anonymous post by a theoryCS colleague describing her own #metoo moments inside the TCS conference circuit. It was a brutal and horrific story to read.

Concurrently (I don't know if the blog post had an effect, but one can but hope it helped push things along), a committee was set up under the auspices of TCMF (FOCS), ACM, SIAM, and EATCS to

Draft a proposal for joint ToC measures to combat discrimination, harassment, bullying, and retaliation, and all matters of ethics that might relate to that.

That committee has now completed its work, and a final report is available. The report was also endorsed at the FOCS business meeting this week. The report is short, and you should read it. The main takeaways/recommendations are that every conference should

adopt a code of conduct and post it clearly.
recruit and train a group of advocates to provide confidential support to those facing problems at a conference
have mechanisms for authors to declare a conflict of interest without needing to be openly specific about the reasons.

There are many useful references in the report, as well as more concrete suggestions about how to implement the above recommendations. This committee was put together fast, and generated a very useful report quickly. Well done!

Hello World: A short review

2018-09-10T00:19:00.000-06:00

A short review of Hannah Fry's new book 'Hello World'

Starting wth Cathy O'Neill's Weapons of Math Destruction, there's been an onslaught of books sounding the alarm about the use of algorithms in daily life. My Amazon list that collects these together is even called 'Woke CS'. These are all excellent books, calling out the racial, gender, and class inequalities that algorithmic decision-making can and does exacerbate and the role of Silicon Valley in perpetuating these biases.

Hannah Fry's new book "Hello World" is not in this category. Not exactly, anyway. Her take is informative as well as cautionary. Her book is as much an explainer of how algorithms get used in contexts ranging from justice, to medicine, to art, as much as it is a reflection on what this algorithmically enabled world will look like from a human perspective.

And in that sense it's a far more optimistic take on our current moment than I've read in a long time. In a way it's a relief: I've been mired for so long in the trenches of bias and discrimination, looking at the depressing and horrific ways in which algorithms are used as tools of oppression, that it can be hard to remember that I'm a computer scientist for a reason: I actually do marvel at and love the idea of computation as a metaphor, as a tool, and ultimately as a way to (dare I say it) do good in the world.

The book is structured around concepts (Power, data) and domains (justice, medicine, cars, crime and art). After an initial explainer on how algorithms function (and also how models are trained using machine learning), and how data is used to fuel these algorithms, she very quickly gets into specific case studies of both the good and the bad in algorithmically mediated decision making. Many of the case studies are from the UK and were unknown to me before this book. I quite liked that: it's easy to focus solely on examples in the US, but the uses (and misuse) of algorithms is global (Vidushi Mardia's article on AI policy in India has similar locally-sourced examples).

If you're a layman looking to get a general sense of how algorithms tend to show up in decision making systems, how they hold out hope for a better way of solving problems and where they might go wrong, this is a great book. It uses a minimum of jargon, while still beiing willing to wade into the muck of false positives and false negatives in a very nice illustrative example in the section on recidivism prediction and COMPAS, and also attempting to welcome the reader into the "Church of Bayes".

If you're a researcher in algorithmic fairness, like me, you start seeing the deeper references as well. Dr. Fry alludes to many of the larger governance issues around algorithmic decision making that we're wrestling with now in the FAT* community. Are there better ways to integrate automated and human decision-making that takes advantage of what we are good at? What happens when the systems we build start to change the world around them? Who gets to decide (and how) what level of error in a system is tolerable, and who might be affected by it? As a researcher, I wish she had called out these issues a little more, and there are places where issues she raises in the book have actually been addressed (and in some cases, answered) by researchers.

While the book covers a number of different areas where algorithms might be taking hold, it takes very different perspectives on the appropriateness of algorithmic decision-making in these domains. Dr. Fry is very clear (and rightly so) that criminal justice is one place where we need very strong checks and balances before we can countenance the use of any kind of algorithmic decision-making. But I feel that maybe she's letting off the medical profession a little easy in the chapter on medicine. While I agree that biology is complex enough that ML-assistance might lead us to amazing new discoveries, I think some caution is needed, especially since there's ample evidence that the benefits of AI in medicine might only accrue to the (mostly white) populations that dominate the clinical trials.

Similarly, the discussion of creativity in art and what it means for an algorithm to be creative is fascinating. The argument Dr. Fry arrives at is that art is fundamentally human in how it exists in transmission -- from artist to audience -- and that art cannot be arrived at "by accident" via data science. It's a bold claim, and of a kind with many claims about the essential humanness of certain activities that have been pulverized by advances in AI. Notwithstanding, I find it very appealing to posit that art is essentially a human endeavour by definition.

But why not extend the same courtesy to the understanding of human behavior or biology? Algorithms in criminal justice are predicated on the belief that we can predict human behavior and how our interventions might change it. We expect that algorithms can pierce the mysterious veil of biology, revealing secrets about how our body works. And yet the book argues not that these systems are fundamentally flawed, but that precisely because of their effectiveness they need governance. I for one am a lot more skeptical about the basic premise that algorithms can predict behavior to any useful degree beyond the aggregate (and perhaps Hari Seldon might agree with me).

Separately, I found it not a little ironic, in a time when Facebook is constantly being yanked before the US Congress, Cambridge Analytica might have swayed US elections and Brexit votes, and Youtube is a dumpster fire of extreme recommendations, that I'd read a line like "Similarity works perfectly well for recommendation engines" in the context of computer generated art.

The book arrives at a conclusion that I feel is JUST RIGHT. To wit, algorithms are not authorities, and we should be skeptical of how they work. And even when they might work, the issues of governance around them are formidable. But we should not run away from the potential of algorithms to truly help us, and we should be trying to frame the problem away from the binary of "algorithms good, humans bad" or "humans good, algorithms bad" and towards a deeper investigation of how human and machine can work together. I cannot read

Imagine that, rather than exlcusively focusing our attention on designing our algorithm to adhere to some impossible standard of perfect fairness, we instead designed them to facilitate redress when they inevitable erred; that we put as much time and effort into ensuring that automatic systems were as easy to challenge as they are to implement.

without wanting to stand up and shout "HUZZAH!!!". (To be honest, I could quote the entire conclusions chapter here and I'd still be shouting "HUZZAH").

It's a good book. Go out and buy it - you won't regret it.

This review refers to an advance copy of the book, not the released hardcover. The advance copy had a glitch where a fragment of latex math remained uncompiled. This only made me happier to read it.

Clustering: a draft of a part!

2018-08-30T05:31:00.000-06:00

For the last X years (X being a confidential and never to be revealed number, but large enough that AI was more than just deep learning at the time), Sergei Vassilvitskii and I have been toiling away at a book on clustering.

The book isn't ready yet, but we do have a draft of part I (the core of the book). Check it out, and send any comments you might have to clusteringbook@gmail.com.

A #metoo testimonial that hits close to home...

2018-02-14T07:00:00.000-07:00

This is a guest post by a colleague in the TCS community, a person I know. If you read other TCS blogs you might come across this there. This is by design. Please do read it.

Every #MeToo story over the last several months has made me pause. My heart races and my concentration fails. The fact that the stories have largely focused on the workplace adds to my difficulty.

Do I speak out too?

I have shared a few stories with colleagues about things that have happened to me in school and at work. But these stories have been somewhat lighthearted events that have been easy to share without outing the perpetrators.

For example, I have told a story about a university employee telling me, in so many words, that I should be barefoot and pregnant and not in the office. What I didn't share is that the same employee, later that year -- despite the fact that our common boss knew about this story because I did indeed report it -- was awarded a best employee award. How do you think that made me feel? Like my experience didn't matter and that such comments are condoned by our department. Why didn't I share that information widely? Because I was worried that folks would then be able to figure out who the culprit was. And isn't that even worse? Shouldn't it be the sexist who is worried and not the woman who, yet again, is made to feel like she doesn't belong?

---

Let me tangent a bit. For years I have not flown. Ostensibly I stopped flying because of the contribution to the climate crisis. When I travel, I go by train. It takes longer, but has been surprisingly pleasant. And when travel takes 3-4 times as long, you don't do it as often, further reducing your carbon footprint. Of course, that means that I don't go to conferences unless they are nearby.

But when I really think about it, is this really the reason I stopped going to conferences? A conference I would normally go to was held nearby a few years ago and I didn't go. Sure, I suffered a grievous injury two weeks before, but I hadn't even registered. I had planned to not go long before that injury.

So, really, why do I no longer attend conferences? Partly I don't feel that I need to anymore, now that I have tenure. When I stopped attending conferences, I was able to "coast into" tenure. Letter writers would remember me. I essentially stopped going to conferences and workshops as soon as I possibly could.

---

Back to the beginning, or close to.

I was nervous at the first conference I attended as a graduate student. One of the reasons I was nervous was that I was athletic at the time and planned on daily runs while I was attending -- I was worried that it might be viewed as a waste of time. My advisor, who also went to the conference, found out about my athleticism and suggested we run together. This was a relief to me. That is, until we were running and he started talking about his lackluster sex life with his wife. I responded by picking up the pace and feigning an illness on the remaining days. On the last day of the conference we were out for dinner with a large group of people and dinner went late into the night. I excused myself, as I had a 4AM bus to catch. My advisor walked me out of the restaurant and awkwardly said something about wanting me to stay and that we should talk. I stuck to leaving, knowing that I needed some sleep before the long trip home the next day. He said we should talk when we were back in the office. Honestly, at the time I thought he was going to complain about my talk or my professional performance in some way. I worried about it all through the weekend until we met next. I brought it up at the end of our meeting, asking what he wanted to talk about, naively expecting professional criticism. When he said I must surely know, in a certain voice, I knew he wasn't talking about work. I feigned ignorance, and he eventually brushed it off and said not to worry. In the coming months, he would cancel meetings and otherwise make himself unavailable. After a half year I realized I wouldn't be able to be successful without having a supportive advisor and, despite first planning to quit grad school, found a new advisor and moved on. That former advisor barely made eye contact with me for the remainder of my time in graduate school.

Fast forward many years. I was at a small workshop as a postdoc. A senior and highly respected researcher invited me to dinner. I was excited at the opportunity to make a stronger connection that would hopefully lead to a collaboration. However, at dinner he made it very clear that this was not professional by reaching across the table and stroking my hands repeatedly. I don't even recall how I handled it. Perhaps I should have expected it -- a grad school friend of mine had a similar, and probably worse, interaction with this same researcher. Shortly after I got to my room at the hotel, my hotel room phone rang. It was him. He wanted to continue our conversation. I did not.

Perhaps a year later, still as a postdoc, I was at a party and a colleague from another university was there too. At the end of the party, we were alone. We flirted, mutually. Flirting led to kissing, kissing led to him picking me up in a way that asserted how much stronger he is than me, which led to my utter discomfort, which led to me saying no, stop, repeatedly. Which he didn't listen to. Which led to a calculation in my head. I could either resist and risk physical injury or I could submit. I chose to submit, without consent.

For the record, that is called rape.

For a long while, I suppressed it. I pretended in my own head that it didn't happen that way, that it was consensual. I even tried to continue working with him -- always in public places, mind you. The wall in my mind gradually broke down over the years until several years later, we were at the same workshop where the doors of the rooms didn't have locks. You could lock them from the inside, but not the outside. I remember worrying that he would be lurking in my room and took to making sure I knew where he was before I ventured back to sleep.

---

So why would I continue to go to workshops and conferences when that is the environment I know I will face? Even if I felt safe, if 95% of the attendees are men, how many look at me as a colleague and how many look at me as a potential score? When I was going up for tenure, I thought long and hard about listing the senior-and-highly-respected researcher on a do-not-ask-for-a-letter list. But where would it stop? Do I include all the people who hit on me? All the people who stared at my breasts or commented on my body? All the people who I had been given clear signals that they didn't see me as a colleague and equal member of the research community, but as a woman meant to be looked at, hit on, touched inappropriately.

Should I have quit grad school when I had the chance? We all know it isn't any better in industry. Should I have pursued another discipline? No discipline, it seems, is immune to sexualization of women. But I think the situation is uniquely horrible in fields where there are so few women. At conferences in theoretical computer science, 5-10% of the attendees are women, as a generous estimate. The numbers aren't in women's favor. The chances that you will get hit on, harassed, assaulted are much higher. There is a greater probability that you will be on your own in a group of men. You can't escape working with men. It is next to impossible to build a career when you start striking men off your list of collaborators in such a field. That is not to say there aren't wonderful men to work with. There are many men in our field that I have worked with and turned to for advice and spent long hours with and never once had detected so much as a creepy vibe. But you can't escape having to deal with the many others who aren't good. When you meet someone at a conference, and they invite you for a drink or dinner to continue the conversation, how do you know that they actually want to talk about work, or at least treat you as they would any colleague? How do you make that decision?

I hung on until I no longer needed to go to conferences and workshops to advance my career to the stability of tenure. But surely my career going forward will suffer. My decision is also hard on my students, who go to conferences on their own without someone to introduce them around. It is hard on my students who can't, for visa difficulties, go to the international conferences that I am also unwilling to go to, so we roll the dice on the few domestic conferences they can go to.

And now I am switching fields. Completely. I went to two conferences last summer. The first, I brought the protective shield of my child and partner. The second, I basically showed up for my talk and nothing else. I wasn't interested in schmoozing. It'll be difficult, for sure, to establish myself in a new field without fully participating in the expected ways.

Is all this why I am switching fields? Not entirely, I'm sure, but it must have played a big role. If I enjoyed conferences as much as everyone else seems to, and didn't feel shy about starting new collaborations, I might be too engrossed to consider reasons to leave. And certainly, the directions I am pursuing are lending themselves to a much greater chance of working with women.

Why am I speaking out now? The #MeToo moment is forcing me to think about it, of course. But I have been thinking about this for years. I hope it will be a relief to get it off my chest. I have been "getting on with it" for long enough. 1 in 5 women will deal with rape in their lifetime. 1 in 5! You would think that I would hear about this from friends. But I hadn't told anyone about my rape. And almost no one has told me about theirs. I think it would help, in the very least therapeutically, to talk about it.

---

I thought about publishing this somewhere, anonymously, as a "woman in STEM". I considered publishing it non-anonymously, but was shy to deal with the trolls. I didn't want to deal with what many women who speak out about their experiences face: have their life be scrutinized, hear excuses being made on behalf of the predators, generally have their experiences denied. But I think by posting it here, many people in theoretical computer science will read it, rather than a few from the choir. I am hoping that you will talk to each other about it. That you will start thinking of ways to make our community better for others. In all my years of going to conferences and workshops, of all the inappropriate comments and behaviors that others have stood around and witnessed, never once did any of the good ones call that behavior out or intervene. Maybe they did so in private, but I think it needs to be made public. Even the good ones can do better.

What can you do?

While you couldn't have protected me from being raped, you can think about the situations we are expected to be in for our careers -- at workshops in remote locations, where we're expected to drink and be merry after hours. I hope not many of us have been raped by a colleague, but even if you haven't, it doesn't take many instances of being hit on or touched inappropriately to begin to feel unsafe.

I remember being at a conference and, standing in a small group, an attendee interrupted a conversation I was having to tell me that my haircut wasn't good, that I shouldn't have cut my hair short. I tried to ignore it, and continue my conversation, but he kept going on about it. Saying how I would never attract a man with that haircut. No one said anything. Speak up. Just say -- shut up! -- that's not appropriate. Don't leave it up to the people who have to deal with this day in day out to deal with it on their own. Create a culture where we treat each other with respect and don't silently tolerate objectification and worse.

I regret never reporting my first graduate advisor's behavior, but is it my fault? I had no idea who to report it to. I had no idea either in undergrad who I would report such behavior to. Where I am now is the first place I've been that has had clear channels for reporting sexual harassment and other damaging situations. The channels are not without problems, but I think the university is continuing to improve them. Perhaps we should have a way of reporting incidents in our field. I have a hard time believing, given that myself and a grad school friend had similar experiences with the same senior-and-highly-respected researcher, that others in the field don't know that he is a creep. It is up to you to protect the vulnerable of our community from creeps and predators. Keep an eye on them. Talk to them. Don't enable them. As a last resort, shame and isolate them.

Double blind review: continuing the discussion

2018-01-22T09:00:00.000-07:00

My first two posts on double blind review triggered good discussion by Michael Mitzenmacher and Boaz Barak (see the comments on these posts for more). I thought I'd try to synthesize what I took away from the posts and how my own thinking has developed.

First up, I think it's gratifying to see that the the basic premise: "single blind review has the potential for bias, especially with respect to institutional status, gender and other signifiers of in/out groups" is granted at this point. There was a time in the not-so-distant past that I wouldn't be able to even establish this baseline in conversations that I'd have.

The argument therefore has moved to one of tradeoffs: does the installation of DB review introduce other kinds of harm while mitigating harms due to bias?

Here are some of the main arguments that have come up:

Author identity carries valuable signal to evaluate the work.

This argument manifested itself in comments (and I've heard it made in the past). One specific version of it that James Lee articulates is that all reviewing happens in a resource-limited setting (the resource here being time) and so signals like author identity, while not necessary to evaluate the correctness of a proof, provide a prior that can help focus one's attention.

My instinctive reaction to this is "you've just defined bias". But on reflection I think James (and others people who've said this) are pointing out that abandoning author identity is not for free. I think that's a fair point to make. But I'd be hard pressed to see why this increase in effort negates the fairness benefits from double blind review (and I'm in general a little uncomfortable with this purely utilitarian calculus when it comes to bias).

As a side note, I think that focusing on paper correctness is a mistake. As Boaz points out, this is not the main issue with most decisions on papers. What matters much more is "interestingness", which is very subjective and much more easily bound up with prior reactions to author identity.

Some reviewers may be aware of author identity and others might not. This inconsistency could be a source of error in reviewing.

Boaz makes this point in his argument against DB review. It's an interesting argument, but I think it also falls into the trap of absolutism: i.e imperfections in this process will cause catastrophic failure. This point was made far more eloquently in a comment on a blog post about ACL's double blind policy (emphasis mine).

I think this kind of all-or-nothing position fails to consider one of the advantages of blind review. Blind review is not only about preventing positive bias when you see a paper from an elite university, it’s also about the opposite: preventing negative bias when you see a paper from someone totally unknown. Being a PhD student from a small group in a little known university, the first time I submitted a paper to an ACL conference I felt quite reassured by knowing that the reviewers wouldn’t know who I was.

In other words, under an arXiv-permissive policy like the current one, authors still have the *right* to be reviewed blindly, even if it’s no longer an obligation because they can make their identity known indirectly via arXiv+Twitter and the like. I think that right is important. So the dilemma is not a matter of “either we totally forbid dissemination of the papers before acceptance in order to have pure blind review (by the way, 100% pure blind review doesn’t exist anyway because one often has a hint of whom the authors may be, and this is true especially of well-known authors) or we throw the baby out with the bathwater and dispense with blind review altogether”. I think blind review should be preserved at least as a right for the author (as it is know), and the question is whether it should also be an obligation or not.

Prepublication on the arXiv is a desirable goal to foster open access and the speedy dissemination of information. Double blind review is irrevocably in conflict with non-anonyous pre-print dissemination.

This is perhaps the most compelling challenge to implementing double blind review. The arXiv as currently constructed is not designed to handle (for e.g) anonymous submissions that are progressively blinded. The post that the comment above came from has an extensive discussion of this point, and rather than try to rehash it all here, I'd recommend that you read the post and the comments.

But the comments also question the premise head on: specifically, "does it really slow things down" and "so what?". Interestingly, Hal Daumé made an attempt to answer the "really?" question. He looked at arXiv uploads in 2014-2015 and correlated them with NIPS papers. The question he was trying to ask was: is there evidence that more papers uploaded to the arXiv before submission to NIPS in the interest of getting feedback from the community? His conclusion was that there was little evidence to support the idea that the arXiv had radically changed the normal submit-revise cycle of conferences. I'd actually think that theoryCS might be a little better in this regard, but I'd also be dubious of such claims without seeing data.

In the comments, even the question of "so what?" is addressed. And again this boils down to tradeoffs. While I'm not advocating that we ban people from putting their work on the arXiv, ACL has done precisely this, by asserting that the relatively short delay between submission and decision is worth it to ensure the ability to have double blind review.

Summary

I'm glad we're continuing to have this discussion, and talking about the details of implementation is important. Nothing I've heard has convinced me that the logistical hurdles associated with double blind review are insurmountable or even more than inconveniences that arise out of habit, but I think there are ways in which we can fine tune the process to make sense for the theory community.

Double blind review at theory conferences: More thoughts.

2018-01-09T00:41:00.000-07:00

I've had a number of discussions with people both before and after the report that Rasmus and I wrote on the double-blind experiment at ALENEX. And I think it's helpful to lay out some of my thoughts on both the purpose of double blind review as I understand it, and the logistical challenges of implementing it.

What is the purpose of double blind review?

The goal is to mitigate the effects of the unconscious, implicit biases that we all possess and that influence our decision making in imperceptible ways. It's not a perfect solution to the problem. But there is now a large body of evidence suggesting that

All people are susceptible to implicit biases, whether it be regarding institutional status, individual status, or demographic stereotyping. And what's worse that we are incredibly bad at assessing or detecting our own biases. At this point, a claim that a community is not susceptible to bias is the one that needs evidence.
Double blind review can mitigate this effect. Probably the most striking example of this is the case of orchestra auditions, where requiring performers to play behind a screen dramatically increased the number of women in orchestras.

What is NOT the purpose of double blind review?

Double blind review is not a way to prevent anyone from ever figuring out the author identity. So objections to blinding based on scenarios where author identity is partially or wholly revealed are not relevant. Remember, the goal is to eliminate the initial biases that come from the first impressions.

What makes DB review hard to implement at theory venues?

Theory conferences do two things that are different from other communities. We

require that PC members do NOT submit papers
allow PC members to initiate queries for external subreviewers.

These two issues are connected.

If you don't allow PC members to submit papers, you need a small PC.
If you have a small PC, each PC member is responsible for many papers.
If each PC member is responsible for many papers, they need to outsource the effort to be able to get the work done.

As we mentioned earlier, it's not possible to have PC members initiate review requests if they don't know who might be in conflict with a paper whose authors are invisible. So what do we do?

There's actually a reasonably straightforward answer to this.

We construct the PC as usual with the usual restrictions.
We construct a list of “reviewers”. For example, "anyone with a SODA/STOC/FOCs paper in the last 5 years” or something like that. Ideally we will solicit nominations from the PC for this purpose.
We invite this list of people to be reviewers for SODA, and do this BEFORE paper submission
authors will declare conflicts with reviewers and domains (and reviewers can also declare conflicts with domains and authors)
at bidding time, the reviewers will be invited to bid on (blinded) papers. The system will automatically assign people.
PC members will also be in charge of papers as before, and it’s their job to manage the “reviewers” or even supply their own reviews as needed.

Any remaining requests for truly external sub reviewing will be handled by the PC chairs. I expect this number will be a lot smaller.

Of course all of this is pretty standard at venues that implement double blind review.

But what if a sub-area is so small that all the potential reviewers are conflicted

well if that's the case, then it's a problem we face right now. And DB review doesn't really affect it.

What about if a paper is on the arXiv?

We ask authors and reviewers to adhere to double blind review policies in good faith. Reviewers are not expected to go hunting for the author names, and authors are expected to not draw attention to information that could lead to a reveal. Like with any system, we trust people to do the right thing, and that generally works.

But labeling CoI for so many people is overwhelming.

It does take a little time, but less time than one expects. Practically, many CoIs are handled by institutional domain matching, and most of the rest are handled by explicit listing of collaborators and looking for them in a list. Most reviewing systems allow for this to be automated.

But how am I supposed to know if the proof is correct if I don't know who the authors are.

Most theory conferences are now comfortable with asking for full proofs. And if the authors don't provide full proofs, and I need to know the authors to determine if the result is believable, isn't that the very definition of bias?

And finally, from the business meeting....

Cliff Stein did an excellent job running the discussion on this topic, and I want to thank him for facilitating what could have been, but wasn't, a very fraught discussion. He's treading carefully, but forward, and that's great. I was also quite happy to see that in the straw poll, there was significant willingness for trying double blind review (more than the ones opposed). There were still way more abstentions, so I think the community is still thinking through what this might mean.