February 19, 2004
Homemade web Crawl robot::[Search Engine]

If you found a refer from http://blog.wespoke.com, which not necessary mean I visited your wesite, it maybe visited by my crawl :)
I am tired of working today, so for relax, I code a small crawl, which attache my own HTTP_Agent and REFER to crawl the whole website.
Example, My crawl will show such log in your /var/log/httpd/access_log
"luliang.dhs.org - - [19/Feb/2004:16:29:54 -0600] "GET / HTTP/1.0" 200 20945 "http://blog.wespoke.com/" "Power by LiangLu at Differential Technology""
It means that refer from "http://blog.wespoke.com/", but it is not, Haha, a nice promotion method.
And I reset the browser type to "Power by LiangLu at Differential Technology" , en, my crawl works great now.
So, what is the next?
1] Write a API connect to Baidu.com , download the all top 1000 search result of XXX, and craw all of these website.
2] Write API connect to Google.com, get the same thing.
3] Just relex
:D
Trackback
You can ping this entry by using http://www.wespoke.com/cgi-bin/mt/mt-tb.cgi/335
