Matt Cutts: How can I make sure that Google knows my content is original?






Video transcription

Today’s question comes from Kunal Pradhan.
And I’m from eastern Kentucky, so I apologize that I’m horrible with names sometimes.
The question is, “Google crawls site A every hour and site B once in a day.Site B writes an article, site A copies it,changing the time stamp.Site A gets crawled first by Googlebot.Whose content is original in Google’s eyes and will rank highly? And if it’s A, then how does that do justice to site B?”
So I could get into a lot of really interesting stuff about how to crawl the web.
If you really want to know about a signal, the Nyquist rate says you want to sample at two times that frequency.But the fact is, you can always change a web page.

So the whole idea, the conception of being able to crawl the entire web and having a perfect copy at every instant, is a little bit flawed, because at any time we can only go and fetch a certain finite number of pages.
If we tried to fetch them all, and our architecture could almost support that, then the web might crash from all of those requests.And we try to crawl in a relatively polite way.
We also try to prioritize based on things like the page rank of a particular page, or maybe a site might have a lot of PageRank.
So the question is essentially, if A is getting crawled a lot but the original article starts on B, what if A rips off B? Well, there are ways that you can help to guard against that.

So, for example, if you do a Tweet, people will see it, people may link to it, and we may follow those links faster than we’ll discover it on the other site.
Another thing that you can do is you can hook up things like Pub SubHubbub, which will ping various places.There is a very limited amount in which we will use Pub SubHubbub to help improve our crawl, and that might change over time.
And that’s a great way to sort of asynchronously say hey, there’s a new article or there’s a new blog post.But let’s go ahead and play with this hypothetical scenario.
If A has copied your article and changed the time stamp,that’s a little bit deceptive, it’s as if they’re claiming that they have written it.

So you can do a couple things.Number one, if you are the author of that article, you can always do what’s known as a Digital Millennium Copyright Act sort of notice, where you send in this DMCA request; and you can find the information at google.com/DMCA.html.And basically what you’re saying is this site copied me, but I’m the original author.
So this site can either counter-notify, which means they dispute that.
They say I wrote this page, which has some penalties to it if they’re lying.
Or they can not dispute it and the stuff disappears off of the other site.

So if someone’s ripping you off, you can always do a DMCA notice.
You can also– for example, if it’s an auto-generated site and they’re ripping off or scraping a bunch of people–you can also do a spam report, because that’s not a high-quality site; that’s not the sort of thing that we want to have within our index.
But let’s just play it all the way out to the corner case.It is, in theory, possible that we will find an article on one site before we find it on the other site.And so it is definitely the case that we try hard to find out who is the original creator of a particular piece of content, but I wouldn’t claim that we’re perfect.
We do as much as I can think of to try to figure out what are the ways that people can indicate that they wrote the content.

And in fact, in Google News, we just introduced a couple new tags– almost as an experiment to see how well it works– to sort of say, here’s the original author of this content.
So there are approaches that we’re exploring to sort of figure out if there are other ways to do that.
But at least for the time being, in theory it is possible to have an article.
In practice, it tends to not happen that often, and you do have ways that you can get around that or ways that you can take action, from a DMCA request all the way up to a spam report.
Hope that helps.

Quick Answer: Tweet it, use pubsubhubbub to get it crawled. Do a DMCA request if you’re ripped off

Submit a Comment

Your email address will not be published. Required fields are marked *