Video transcription
Today’s question comes from Zurich.
Gary wants to know, “How does SafeSearch, both for text and images, work?”
Well, I worked on the initial version of SafeSearch for text.So let’s concentrate on that.
Don’t want to give away anything that spammers could use, but I can talk about way back in 2000 how SafeSearch worked, so you can kind of get an idea.
And the idea is roughly what you would expect, which is we look for certain words, and we give them certain weight.And if you have enough words with enough weight, then we sort of say, OK, this looks like it might be a sort of porn or porn-related document.
And you can have various thresholds, where you can say,OK, it might be safe at this level, but unsafe once to get too many.And you can do things like, well, if it’s a book, if it’s a really long thing and it’s got one word, that’s not quite as bad as if you have just like a very small document and you have that same word.
And you can very much imagine that some words are worse and more likely to be pornographic than other words.So certain slang terms, it turns out misspellings, right?
So like amateur misspelled A-M-A-T-U-R-E is much more likely to be amateur porn than amateur radio or something along those lines.
But you do have to be careful, because there’s words like breast, which can be breast cancer, or sex can be sex education.So you do want to try to do the learning to learn which words should carry which weights and which words should have more weight, and those sorts of things.
But it actually is relatively sophisticated in terms of trying to figure out–you can imagine doing a lot more than just pure content analysis or using just straight words.
But at least to a first approximation, that’s a pretty good way to sort of classify something as porn or not.One thing that I wanted to mention, which if you go down to the metadata for this video, we have a place where you can click.And if you think you have been detected as porn when you’re not pornographic, or you think you found a bug or an error with SafeSearch, you can report that and pass that information along.
And so people can adjust the algorithms or otherwise make improvements so that we don’t necessarily say that a site that is really, really good is pornographic if it’s not.
But you would be surprised at how well just doing some pretty simple scanning with some relatively simple weights can catch a large fraction of the porn on the web.
Previous search engines, just a little bit of historical digression here, at least I remember in the early days,AltaVista, you could search for sex and have their family mode on, and they would have only like results returned.
Because they had basically said, OK, we are only going to allow these results for this query, or we’re only going to say these results are safe.And the mental model that Google had was different.
We said, OK, if there’s a mother, she’s searching with her Cub Scout son, would she be surprised, would she be offended by the results?
But at the same time, you’d like to get the comprehensiveness of the web.
So you’d like to score the entire web and find the documents that are porn and exclude those.
But then if there’s something about sex education or things along those lines, you would like those to be returned.So it’s a pretty good approach.
It’s worked very well.And thankfully, there’s a much better team of engineers who are much more sophisticated in the ways that they analyze pages now, so all of that original stuff that I wrote back in I’m sure has been replaced by much better stuff at this point.
Quick Answer: Looks for certain words and gives them certain weight. If there are enough words with enough weight then it says