Our Vision For Nostr Content Moderation

Censorship-Free Content Moderation

Nostr was created to be "censorship-resistant". The home page of the protocol explains what that means in practical terms when it says:

"There will always be some Russian server willing to take your money in exchange for serving your posts."

Meaning that Nostr is censorship-resistant, not censorship-proof. Relay owners/operators can censor the content on their relays for a variety of reasons - legal, personal, etc. But somewhere there will be a relay that will host your content.

While relay owners/operators need tools to comply with laws in their area or the standards of the community they serve, quite a few Nostr users are very wary of those tools being developed because in their experience all content moderation is a form of censorship and they use Nostr to get away from censorship. It's understandable they feel this way - they've been burned by corporate-owned social media for a long time. But there's a substantive difference between top-down content moderation where a person, corporation or government determines what is (un)acceptable, and bottom-up content moderation which reflects the preferences of the user. Bottom up content moderation isn't censorship. Censorship is about someone exerting power over someone else - it's always top-down.

So how do we achieve bottom-up content moderation on Nostr?

My thinking on how to best do content moderation has evolved over the past few months as I've given it thought and understood more about how Nostr works. I started by raising an "issue" on GitHub to discuss the handling of sensitive content. What I proposed initially in that thread was far too complicated. Here's where I currently stand on the issue…

Core Idea: A Cacophony Of Voices

I think the only way to really do bottom-up content moderation is to have a cacophony of voices that all give opinions on what should be moderated (or not moderated)…

Clearly it all starts with the individuals reporting the content.
Then there are the moderators who work (or volunteer) for specific relays.
Automated bots could be part of the equation. For example there are bots that can detect nudity. Parents might like those bots to have a voice in moderating their kids' feed.
An organization like ASACP could moderate CSAM since that's their specialty. (Given that they're non-governmental I can see A LOT of relays using their recommenations).
The SPLC could moderate hate speech and could be used by people with a left-leaning perspective.
Focus on the Family could moderate issues important to them and could be used by right-leaning Christian evangelicals and fundamentalists.

The list could go on and on, but the idea is that there's a place for everyone's opinion, but not everyone's opininion gets used every time. Instead the end-user (or the relay owner/operator) chooses which voices to listen, consider and/or take action on. There isn't one uniform standard for Nostr but rather thousands of standards that are unique to each person using Nostr. And some people may opt for zero moderation (though the moderation decisions of the relays they use will affect them).

So what do we need to make that happen?

First Step: A Common Vocabulary

The most important thing is for everyone to use a common, translatable vocabulary because otherwise you have a "Tower of Babel" situation where very little gets done because the information isn't transmitted effeciently.

For example… let's say a someone in Russia comes across a picture of two men engaged in oral sex. They want to report it because gay sex is illegal in Russia. So currently a NIP-56 compliant app would offer them the following options for the report type: 'nudity', 'profanity', 'illegal', 'spam' and 'impersonation'. They're reporting it not becuase of nudity (they're fine with nudity), but because it's illegal - so they pick 'illegal' and put "гей минет" (gay blowjob) in the free-form comment field. There's no way that report is going to get handled efficiently. It was put under 'illegal' rather than 'nudity' and the only thing that really gives a sense of what the report is about is in Russian. It will require considerable amount of moderator time to go through reports submitted like that.

I knew years ago ICRA had come up with a classification system for sensitive content. So I started there. Their website was taken down a decade ago, but with the help of ChatGPT I found it on Archive.org. I then massaged it into something a bit more up-to-date and reflecting lessons I'd learned over the psat ~15 years in the porn industry. My first version was FAR too long and complex, but over the course of some discussion it's gotten refined to this:

TypeCode	Old Code	Type	Notes
CL	profanity	Coarse Language / Profanity
HC-fin		Promotion of content that is likely to cause financial ruin	consider also IL-frd
HC-bhd		Promotion of content that is likely to cause serious bodily harm or death
IH		Intolerance & Hate	does not include intolerance of intolerance
IL	illegal	Illegal Content	post/profile is illegal or advocates illegal activity
IL-cop		Copyright violation, piracy, intellectual property theft
IL-csa		Child sexual abuse and/or trafficking
IL-drg		Drug-related crime
IL-frd		Fraud & Scams
IL-har		Harassment / stalking / doxxing
IL-hkr		Prostitution
IL-idt	impersonation	Impersonation / identity theft / phishing
IL-mal		Malware / viruses / ransomware
NS	nudity	Nudity & Sex	for use in contexts where there is not the intent to cause arousal
NS-nud		Casual nudity
NS-ero		Erotica
NS-sex		Sex
PG		No Sensitive Content	for use in situations where sensitive content might be assumed
PN		Pornography	for use when the intent is to sexually arouse the viewer
PN-het		Heterosexual porn
PN-gay		Gay male porn
PN-les		Lesbian porn
PN-bis		Bisexual porn
PN-trn		Transsexual porn
PN-fnb		Gender-fluid / non-binary porn
SP	spam	Spam
SP-mod		Moderation report spam
VI		Violence	actual or advocated
VI-hum		Violence towards a human being
VI-ani		Violence towards a sentient animal

The 9 items in bold are the ones that are the primary categories that are presented to the user first. (PG is only for NIP-36 self-reporting of content - more about that in a moment). If they choose one that has subcategories, then they'll see the subcategories.

So if the Russian person were reporting the same picture, they'd most likely choose 'порнография' (pornography) → 'гей порно' (gay male porn). Then a code of 'PN-gay' would be transmitted in their report rather than the completely inspecific 'illegal'. No matter what language the relay moderators spoke they would understand what that code meant and if they were in one of the 81 countries where gay sex is illegal, they could delete the event in order to be compliant with local laws.

So ultimately this is about helping relay owners find the needles in a haystack that they need to act on in order to keep their servers compliant with local laws (or find and locate content that violates their terms of service). While it's not critical that all types of problem content be found quickly and deleted, there are a few types of content (such as CSAM) where quick action is essential.

And yes, the list above is not complete. It does not list every type of possible illegal activity, etc. A truly comprehensive list would be overwhelming - so the list sticks to the problem types that are most likely. Handling ~90% of the problems efficiently is far better than handling none of them efficently. For the other 10% there's always the free-form text comment that's part of NIP-56. If, in time, moderators agree that some items should come off and others added to the list - then that's completely possible. And there's no requirement that app developers put every one of the defined items into their app - or even show subcategories. They can do what makes sense for their audience.

Second Step: Using The Defined Vocabulary

The vocabulary above is basically a form of content labeling/classification. That's not just done in NIP-56 reports. It's also done in NIP-36 "content warnings", and probably should be done in NIP-94 events which describe files.

So what's important is that there be consistency in how sensitive and/or problematic content is handled. When you think about it, if someone publishes a note with sensitive content and forgets to put a content warning on it, they could go back and do a NIP-56 report to essentially put the warning on after the fact. So there is overlap in the function of the two NIPs.

Use in NIP-36 content warnings is why "PG" ("No sensitive content") is on the list. If a person has labeled their entire profile as "sensitive" in some way, then they need a way to say when something isn't sensitive.

NIP-36 also requires the idea of "context" - situations where something that might otherwise be objectionable, is OK. It allows the user to say their content may be "sensitive" but it's a context some may find acceptable.

ContextCode	Context
ED	Educational
FA	Fine Art
FF	Fantasy / Fiction
MS	Medical / Scientific
ND	News & Documentaries
PP	Political Protest

Moderators will also need the idea of context. They could find that the report is true, but it's a context that may not require action.

"NIP-69"

With the help of "Rabble" from NOS.social (they/them - co-founder of Twitter), who had been helping me refine my thoughts on these issues, we did a pull request suggesting a "NIP-69" which defines the vocabulary above and specifies that it should be used for NIP-36 self-reporting as well as NIP-56 3rd party reporting. Most of the comments on the PR have been positive/supportive, but not all.

Third Step: Moderator Reports

Right now anyone in that cacophony of voices can submit a NIP-56 report about something, a moderator reviews it, if they work for a relay they might delete the content, but there's no defined way for them to share what they found. Remember a moderator is any one who reviews a Kind 1984 event. They're public and can be reviewed by anyone. So it's not just relay moderators reviewing them - ASACP might be reviewing reports of CSAM. The SPLC might be reviewing reports of hate and intolerace. Focus On The Family might be reviewing reports of nudity and sex. It's unclear how moderator reports should be submitted.

Do they file another NIP-56 report? If so, how is the initial NIP-56 report referenced? How do they distinguish their comments on the content in question from their comments on the report?
If it's a bot submitting a report on content, how does it communicate it's level of certainty?
And how can the moderator communicate a level of severity? For example - "yes, it was 'sex' - but was it just heavy petting or something more intense?"
How does the moderator communicate their recommended action, so other people who trust them can take the action in an automated manner?

There are a lot of questions as to how, precisely, these reports should be organized. At this point we just know some of the isssues that need to be taken into consideration. Moderator reports don't necessarily need to be as simplistic as NIP-56 user reports. And perhaps the details of them vary by type of problem and the details are worked out amongst the various people dealing with that issue.

Fourth Step: Trust Lists

As an example of how things don't work very well right now… Amethyst will block content if there are a certain number of reports against it. It doesn't take into consideration the validity of the reports, or the nature of the reports, or whether the user would agree or disagree with the reports. If someone reported their own content because they forgot to put a content warning on it they could wind up essentially shadow banning themselves. That's not a good situation.

Some people have suggested only using reports by people/organizations that are on our follow list, but you may follow someone who you don't agree with - you don't want them moderating your content!

I think the way forward is "Trust Lists" which work much the same way as Follow Lists and Block Lists. They're basically the list of people/organizations/bots/whatever that you want moderating your content. If you don't want any moderation you simply have no Trust List. And there could be "Negative Trust Lists" which say "do the opposite of whatever this person says to do".

Once the client has your Trust List they can pull NIP-56 reports and Moderator Reports that were submitted by people you trust and use only those reports to filter your feed. So basically you've chosen your own moderation team.

UPDATE: After I wrote that, Matthew from NOS Social pointed me to a paper on the concept of "TrustNet" which is very similar to what I'm proposing here, but more fully fleshed out (there's a whole 103 page Master's Thesis on the topic). It's good to see other people are thinking in the same direction!

Extending For Feed Algorithms ⬆︎⬇︎

What I describe above doesn't just need to be for moderating content - with a few tweaks it can be used to promote content as well. There's no requirement that the feedback given be negative. As we come up with the spec we just have to make sure positive feedback is fully supported.

There's been a lot of talk amongst Nostr app developers about the eventual need for algorithms to filter user feeds - both pushing some content up, and other content down. The idea is that there would be a variety of algorithms for the user to choose from and client apps would support more than one algorithm. Well, the system detailed above would provide those algorithms with most of the data they need to figure out what the person wants to see and doesn't want to see.

Wrapping Up

Could a government build an app and have it only use their approved moderators - skipping the user's Trust List? Sure. But there are going to be many other Nostr clients out there that don't do that. The user just has to get their hands on one of them. And could a government require relays in their contry to abide by the decisions of their approved moderators? Sure, but that is what it means to operate in a country - you have to abide by their laws. Your only alternative is to move your server out of the country. And of course users can see non-censored content if they use relays elsewhere. It will be an endless game of whack-a-mole for them to try to thwart that.

And that brings us back to what's stated on the homepage of the Nostr protocol… Nostr isn't censorship-proof, it's censorship-resistant.

Postscript

If you're wondering why we're bothering with this, it's because it's an issue that affects us deeply in very personal ways. For example, the same day this page was drafted we announced the launch of our Nostr relay. As far as we know it was the second porn-affiliated relay and the first LGBTQ-related relay. Sadly one of the responses to our announcement on Nostr was a death threat. You can see the dialog here: low-res, high-res. There's a reason why this effort is being led by two people from the LGBTQ community - the lack of content moderation affects us in ways it doesn't affect most other people.