RSS Home Newsletter Advertising
Join the WebProWorld Forum!

Google Starts Controversial Form Crawling Program


Small number of sites in initial rollout

Googlebot received an update that permits it to complete certain forms, and learn more about the site hosting them.

Websites place content behind forms for the purpose of collecting information from a visitor requesting access to it. The site publisher might want those details for demographic details to improve marketing campaigns, for example.

Google thinks it can present better results to searchers by having access to the URLs behind forms, improving the site's exposure in the process. The Google Webmaster Central blog promised their crawls will be well-behaved:

Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won't crawl any of the URLs that a form would generate. Similarly, we only retrieve GET forms and avoid forms that require any kind of user information.

However, concerns have been raised about Google crawling forms not marked as forbidden. Kevin Heisler complained at Search Engine Watch the practice could violate the privacy of corporate data.

Though confident in Google's intentions, Heisler thinks potential backlash from corporate interests could be a problem. "The costs to CEOs, CIOs and CTOs at corporations far outweigh the benefits to consumers," he said.

Digg This! StumbleUpon This!
AddThis Social Bookmark Widget

News Tags: Google, Privacy, Crawler, Forms

Comments

Yahoo Doing this as well?

When doing a search in Yahoo for something, I received one listed that had a warning that this site sends unsolicited emails, possibly Yahoo found this out by spidering the form on their site?

Big Brother?

Big brother here we come!

Tanzanite
Tsavorite

When Will They Stop?

What is the point of this move on Google's part? I think eventually this type of technology will ultimately risk the very privacy we have.

A little much

I think that the move to try and understand every little aspect of website in the index is a little ridiculous for Google. What is the purpose of it? Do they not have enough data using the 1000 other data avenues to evalute websites?

Yet another chore for the poor developer

This kind of sucks as now we'd have to noindex, nofollow even those pages which are behind forms.

Better for 'unlocking content' & deep linking

Although shrewd SEOptimisers have moved dynamic content into spiderfriendly pages, some of the most detailed and rich content on the web is still 'hidden behind' forms.

I fail to see any value to google of 'hacking' areas of websites that the website owner wishes to be private. I suspect that they would only follow forms that have specific drop-down options (not free text fields).

Does ever friendly gbot always adhere...

to "robots.txt, nofollow, and noindex directives."?

If so, could somebody explain why I have many pages with internal links from every other page all but one as "nofollow" yet Google  lists the number of internal links as every page? 

Not particularly bothered, just curious.

Otherwise, this sounds like more "personalized search" nonsense. Personally, if I am searching the web I like to make my own mind up as to what I consider relevant not have search engines trying to guess. Just because I search one topic in one way does not mean I am searching the next topic in the same way.

Google

Although some may not like this, there could be alot of content google is missing out on. For instance, @ my company's site: www.healthplanone.com You need to enter your zip code and age to view instant health insurance quotes. Although we do promote the different carriers that we provide, google cannot see all of the content on our website. For other websites though, where you need to sign up, it may be pointless. Who knows though, we'll see how this turns out.

All in the name of?

Sounds like something Google would want to do. Why on Earth whould they need information about the website by running and utilizing a form?

Sounds pretty ridiculous.

google

hi my name is serhat.my internet site don't in google:(please help me

I really wonder where this

I really wonder where this will lead!?

 

I'm sure google has the best intention, and that is to get more coverage of the web, but I don't like it!!!

before too long gbot will

before too long gbot will break through captchas and start signing up as a user. All in the interest of helping the consumer ;)   

Those dang captchas.  I

Those dang captchas.  I could hardly read half of them 'specially  them warpy funky funhouse mirror ones.  Anyway Google's intentions are good. Google knows where to draw the line and is fair about it.   It's all for and about the benefit for users of the internet.

Give me a break!

"It's all for and about the benefit for users of the internet."

 

For Google it's really all about control of information and money.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
13 + 4 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.