Protests and mobilizations: how do we know how big they are?

I’ve been asked recently by colleagues and students (and interviewed today on KPCC, the Southern California NPR affiliate, with a link here to the audio) about how people estimate the size of mass protests and mobilizations. Obviously this is coming up due to the protests and mobilizations that have been occurring throughout the nation recently, focused on the immigration reform issue being debated in Congress.

This is an interesting problem, something that many years ago I talked about with a number of statisticians and political methodologists at a conference; not only did we then talk about the basic mathematics of estimating crowd sizes, but also about how new technologies might be brought to bear to produce these estimates in a more precise and automatic way, using for example digital photographs and image processing software.

In any case, how does the basic math work for the estimation of crowd size?

First off, there are some sources in the literature for the use (and abuse) of statistics in such situations that I know of, though I have not read either recently: one is by Clark McPhail (“The Myth of the Madding Crowd”), and the other by Joel Best (“Damned Lies and Statistics”). I’d refer interested readers to their work for more discussion of statistical estimates in these situations, and how they are interpreted. If readers have other citations for work in this area, please pass them along and I’ll post them. There is a nice short summary in salon.com of this estimation problem, including a discussion of how McPhail estimated the crowd size during an anti-war protest in Washington in 2003.

But the basic approach involves only two simple numbers: and estimate of the square footage of the space occupied by the protesters and the density of the protesters. The first can be easy to obtain, either before or after the fact, but the latter is difficult to estimate. But if you have these two number, you just divide the square footage by the density and you have an estimate (without any confidence intervals!) of the size of the protest.

As I’ve said, the trick is estimating the density. As I understand McPhail’s approach, there are some common estimates that are used: for densely packed crowds, an estimated that each person occupies about 2.5 square feet; in moderately-dense crowds that each person occupies 5 square feet; and in very loosely packed crowds that each person occupies 10 square feet.

So, let’s start simple and think of how the math works in a small and well-defined space. There is a roughly 20 by 20 foot lounge outside my office: assume that this was occupied by students protesting their grades in one of my classes. Here are the estimates, based on whether or not the protest was dense, moderately dense, or diffuse:

• Densely-packed protesters: 400/2.5 = 160 protesters.
• Moderatedly densely-packed protesters: 400/5 = 80 protesters.
• Diffuse protest: 400/10 = 40 protesters.

I hope it would be the latter, but that is still a lot of protesters!

But how would we do a bigger area, say the area outside my office building where Caltech holds commencement? Let’s assume we had a standing-room only protest in this space. What we would do is start by measuring the dimensions of the space, which is a large rectangle; or we could break the space into known grids. My guess is that the space is about 100 by 200 feet, or 20,000 square feet. Again, we can estimate the size of the protest depending on the density:

• Dense: 20,000/2.5 = 8,000 protesters.
• Moderately dense: 20,000/5 = 4,000 protesters.
• Diffuse: 20,000/10 = 2,000 protesters.

I didn’t realize how many people could protest there!

Now here is where some additional assumptions come to play. One important one is the spatial distribution of the protesters. Note that in my examples, I’ve assumed the protesters are uniformly distributed. But in any real protest, that is unlikely to be the case, and there will be pockets that are densely packed, pockets that are less dense, and pockets that are diffuse. Also, geography is not uniform, and if we are using a photograph (for example) to estimate density we need to be cautious about slopes, hills, and other obstacles that can distort our estimate of the density. Last, protests and mobilizations are not static, so again if we rely upon observations at any point in time we have to be careful to note when they occur and whether the crowd itself shifts somehow before we make a different observation at a different geographic location.

All of these factors make this a tricky estimation process.

But as applied to a mass mobilization, like the Los Angeles demonstration recently where it is said that 500,000 people demonstrated, it is unclear how that number was derived. If there were aerial photographs of the crowd taken at the height of the protest, it might be possible to use the square footage and density calcuation to estimate the size of the crowd. I’ve not seen that type of aerial photograph, so I don’t know how we could estimate the density at this point.

There are of course other ways to try to estimate crowd size, ranging from less analytical (asking someone to stand in the crowd and try to count as many as they can see, then extrapolate from there) to more analytical (using digital images and getting an estimate of the density from the digital image). My guess is that the Los Angeles demonstration estimate was obtained from participant observation, which probably has produced an estimate that has a lot of uncertainty associated with it (in more precise terms, it is likely to have a broad 95% confidence interval).