July 01, 2004

New spider from Yahoo.com::[Search Engine]


Liang

For Chinese market, Yahoo launch www.yisou.com , which is mainly focus on China/Asia market.

Just after Yisou.com launched, it seems they also start using a new kind of spider, which they even didn't give it name yet.

web2.search.cnb.yahoo.com - - [22/Jun/2004:12:41:54 -0500] "HEAD /gmail HTTP/1.1" 200 - "-" "DeadLinkCheck/0.4.0 libwww-perl/5.69"

This is the first day yahoo spider crawl the website but only check the "DeadLink"
After that, it seems they spend 1 week to process the whole data and then remove the dead links and prepar for crawl the alive links.

web8.search.cnb.yahoo.com - - [01/Jul/2004:15:35:16 -0500] "GET /广州环境污染问题 HTTP/1.1" 200 2108 "-" "Mozilla/4.0"

This obviously a spider since it:
1] Only see these txt/html pages
2] No jpeg/flash and all other media loaded at the same time
3] Dig a website from link to link
4] the ip of this crawl locate: 北京市 Yahoo中国

The wired thing is that they didn't even give it a name, wish I can know the name soon.

Posted at July 1, 2004 04:00 PM by Liang at 04:00 PM | Comments (1) | TrackBack(0) | Booso!| Niu.la收藏!


Trackback

You can ping this entry by using http://www.wespoke.com/cgi-bin/mt/mt-tb.cgi/526

Comments

第 1 楼:

中国最大的中文搜索引擎平台、中文搜索引擎技术提供商

Posted by: 一达搜索 at January 17, 2005 10:53 PM from 220.201.114.15

Post a comment

请注意,为了防止spam,您的留言必需含有中文字符!









Remember personal info?




所有发表