Measuring Prevalence of Violating Content on Facebook

One of the most significant metrics we provide in the Community Standards Enforcement Report is prevalence. This video explains more about prevalence as it relates to our efforts to remove harmful content from Facebook.

The way content causes harm on the internet is by being seen. Given the nature of the internet, the amount of times content is seen is not evenly distributed. A small amount of content could go viral and get a lot of distribution in a very short span of time, whereas other content could be on the internet for a long time and not be seen by anyone. Any measure we use to understand our enforcement of harmful content should take that into consideration.

For this reason, we consider prevalence to be a critical metric because it helps us measure how violations impact people on Facebook. We care most about how often content that violates our standards is actually seen relative to the total amount of times any content is seen on Facebook.

This is similar to measuring concentration of pollutants in the air we breathe. When measuring air quality, environmental regulators look to see what percent of air is Nitrogen Dioxide to determine how much is harmful to people. Prevalence is the internet’s equivalent — a measurement of what percent of times someone sees something that is harmful.

We calculate this metric by selecting a sample of content seen on Facebook and then labeling how much of it shouldn’t be there. There are four reasons why harmful content may be seen on our site:

The content is detected or reported but after people have been exposed to it.
The content is detected or reported and people are exposed to it because of the time it takes to review it.
The content is detected or reported but we make a mistake and don’t take action on it.
The content isn’t detected or reported.

To measure prevalence we focus on how much content is seen, not how much sheer content violates our rules. In other words, we don’t treat all content equally: a post seen 1 million times is 1 million times more likely to be sampled, and that’s a good thing. Again, this is also similar to air quality testing stations that take a sample of air to estimate the concentration of pollutants.

Prevalence (or concentration of violating materials) measures all four of the above reasons. We see a lot of attention paid to instances when people see violating content before we take it down. Ideally, we would remove all violating content before anyone ever sees it, if it was possible to perfectly moderate content. In some cases, however, the content never being detected or reported in the first place is a bigger reason harmful content is seen. We need a measure that captures all of these reasons people maybe exposed to harmful content. We believe prevalence is that measure.

To learn more about how we’re protecting our community from harmful content, check out our Community Standards Enforcement Report at transparency.facebook.com. You can learn more about how we measure prevalence in the companion guide.

Related News