Video transcription
We’ve got an interesting question from Danny and bucharest who wants to know:
How can Googlebot crawl and index pages that don’t have any links to them on my website? I find each day two or three pages in the index that don’t have any links to them on my site. The pages are generated by the search field of my website.
Ok, so you almost threw me for a loop and I read the entire sentence and I was ready to give one answer and then the last sentence change my answer completely.
So let me answer both ways starting with the beginning part of your question, how Google index stuff even when there are links pointing to my particular page.
Well, people can always submit URL or something like that but a lot of people don’t realize how many links there are just sort of floating around on the web.
So it could be that you don’t realize that someone is linking to a page on your site even though it is so we can follow a page from a very obscure esoteric specific page follow that link and find a deep page on your own site and just because we only return a subsample of all the links we know about when you do link colon on a particular URL we might know about a link but you might not know about a link.
So that’s how i started to answer your question and then you said the pages are generated by the search field of my website and that completely changes the nature of the question.
So in April of 2008, Giant Madhavan and Ilan Halevy did a blog post where they talked about crawling through HTML forms.
They later on got it published as a paper and so the basic idea is in some cases whenever we see a search form google can try to sort of fill out that form as long as the form is simple enough.
So suppose for example you have your website your main root page and you can get to any other part of your sight except for a drop-down page.
GoogleBot can enumerate the values in that drop down maybe it’s the 50 states in the United States and we truck can try to submit ok well what if we set the state to Kentucky or what if we set the state to California and then if that opens up new pages for us to discover and crawl that can let us crawl through a search form.
Now, In general we don’t crawl through a ton of search forms because they can be very complex you know sometimes they want credit card numbers and google bot is very broke.
It doesn’t have a credit card number but in some situations where there might be only one or two input elements we do have the ability to try to find out whether we can search through that form to find new content.
Now if that’s something that you’re not interested in maybe don’t want any those pages crawled you can always use robots.txt to do it disallow on the / surgery / search form or whatever the area that you’re going to go to whenever you submit the search form is.
So we try to be very polite you can read more about it if you search for you know Googlebot crawl through HTML forms or something like that you can read the sort of forms that we will and won’t crawl through.
But it’s all part of the process where we try to discover as much of the web as possible call it as comprehensively as we can so that we can return it to you in under half a second.
Quick Answer: Googlebot can interact with forms, or maybe there are just links you don’t know about