Transparent content moderation

12/03/22

Proposal: Transparent content moderation

“Transparent content moderation” is a proposal where social media platforms commit to publishing the details of every account and content moderation decision. Furthermore, publish every major algorithmic change and open source the underlying algorithms when possible.

Problem: we don’t trust how social media platforms moderate content and have no means to verify/disprove claims and accusations.

We live in a world of a severe trust deficit in how social media platforms moderate content. People of all ideologies both accuse platforms of bias and banning content they support (“thumb on the scale”) and doing too little to remove content they believe is violating the rules. Underneath all of this, moderation/safety teams in these platforms believe they’re performing their duties to the best of their abilities under tough conditions.

It’s clear the current path where groups of all sides throw accusations in the media with no way to verify/disprove the underlying claims is untenable.

Solution: “Transparent content moderation”: a credible neutral mechanism to make all moderation and algorithm decisions transparent.

Vitalik Buterin introduced the notion of “credible neutrality”. Building on that, here are a few commitments to transparency/open source social media companies can make. How does this rebuild trust? By seeing all the actions done by a platform, the public can verify/analyze how actions are done and have an informed discussion on bias and moderation effectiveness.

1/ Publish account actions: All account takedowns are published with details on rules violated, agent performing the action (human, algorithm) and the source of the report ( automated scan, report from the platform, etc).

Publish a log of every account takedown in some publicly accessible space within a short time period after the action is taken.
Publish as much specifics about the takedown - supporting evidence in consideration when not self evident, the specific platform rule violated
Publish the source of the action - user report from the platform or external or an algorithmic sweep. It is important to mention whether a human being or an automated action performed it (though the name of the individual human agents need not be revealed).
Make special mentions when the source of the report is from a state actor or from an organization connected to a political group.
PII and privacy concerns are mitigated by redacting specific PII (for example a phone number or an address) but the default is to make things open.
CSAM content will be placed in a special vault and only available through appropriate legal means.

2/ Publish algorithmic actions: While direct account actions are easy to see, algorithmic actions where certain kinds of content are prioritized or suppressed (“shadow banned”) are harder to see on the outside. To fix this, platforms should commit to

Publish a log where any account action on an algorithmic level (reducing visibility in search or discovery surfaces)
Publish details of any major algorithmic sweep of accounts. Ideally any algorithmic take down of an account also has an explanation of what the algorithm was trying to do (“takedown state sponsored bots”)

3/ Appeals/recourse process: Users get a mechanism to appeal account actions on their account and have it reviewed by a different human agent than the original person who took down their account.

This should take the place of the current “find a friend who works at the company” path that most people try to go through today.

Given the volume of account actions and to avoid bad actors DDOSing moderation teams, there will have to be a level of difficulty to make these appeals. One option is to make them cost money (“pay $10 to appeal your account action”) , another is to limit them to people who can prove their identity and limit the number of appeals, gate it so only accounts with certain amount of age/follower counts get prioritized and so on.

4/ Open source major algorithmic changes: A deeper problem is understanding how the core ranking done by platforms increase or decrease the reach of certain kinds of content. Platforms should commit to publishing the major algorithmic changes and the underlying models (when possible). To avoid bad actors trying to game the system, this can be done after time lag (say a few months) after the ranking change - enough to avoid gaming but still providing an audit trail for researchers to dig into.

Addressing common questions

1/ The ‘bad guys’ will game the system if these are made public:

One fear is that motivated malicious actors will spot patterns of enforcement and try and dodge them. While valid I believe the needs for transparency and trust outweigh the risks of system gaming. I also believe if your system can be gamed because of transparency, this will be an incentive to build more robust mechanisms - secrecy to avoid manipulation is never an enduring mechanism.

Another mechanism could be to just mark a certain action as “spam” and not publish the underlying mechanism to detect it and/or publish it after a certain amount of time has passed.

2/ This will “Streisand effect” the terrible content:

Another related fear is that having content actions made public will have the unintentional side effect of highlighting the content that platforms want to remove. I believe here the needs for transparency and public interest outweigh this and over time, this will get debated in the media regardless.

3/ Centralized / non-crypto platforms adhering to these rules aren’t verifiable:

As someone who works in crypto, I have sympathy for this argument. Unlike a crypto-based verifiable mechanism or a true decentralized social network like Farcaster, at some level you need to trust a central organization to actually follow the rules above. However, the above and social/legal pressures are a major improvement over the systems we have today and we shouldn’t ignore progress.

Acknowledgements: Want to thank Balaji Srinivasan, Vitalik Buterin, Alex Stamos, David Sacks, Dan Romero and several others for discussions and feedback that helped shape this proposal. Note this doesn’t mean an endorsement from them. Please send thoughts and comments to @sriramk on Twitter/Farcaster or sriram@sriramk.com.