越来越多的人利用blog来进行搜索引擎的优化和利用此进行重定向. 这样的搜索引擎优化的SPAM严重影响了搜索引擎的正常排名也严重影响了BSP(博客服务托管)的良性运行. 两个月前曾经对博客动力和博客中国的spam进行了一次比较全面的整理, 但是2个月后的今天, 又有抬头趋势, 不得不再次动手清理, 而这次清理, 将公布这些垃圾源, 就是目标网站, 这些SEO的SPAM的目标网站我将定期公布, 以便各个搜索引擎定期清理. 下面具体介绍清理方法, 如果您对 trustlink比较熟悉, 就可以简单带过. 1. 由点到面. 先从系统内寻找任何一个垃圾词汇, 例如[免费激情小电影], 然后利用搜索引擎进行搜索: search $key site:blogchina.com 得到一系列的SEO网站 2. 从面上进行抓取, 分词: 将这一系列的SEO网站全部抓取后进行分词, 分句, 得到一个扩展的垃圾词汇列表 3. 再次重复 1. 的过程. 将全部的SEO网站的网页得到后进行 URL 的抽取, 得到一系列的垃圾网站的URL的源地址 4. 清理: 凡是有两个以上的垃圾词汇, 将被清理 凡是含有目标网站URL的, 清理 下面公布一些片段, 更多的垃圾网站的列表我已经整理好, 从这里下载. 全部的垃圾词汇, 从这里下载: 垃圾联接网站: 92 http://www.kan126.com/ 87 http://www.555b.com/143.htm 84 http://www.555b.com/666.htm 84 http://freemovie.2288.org/00001\index.htm 垃圾词汇: 48 成人小电影 39 pp成人小电影 28 免费成人电影 27 激情成人小电影 27 免费电影频道 27 免费激情小电影 26 在线小电影 26 免费性电影 另外, 请在本站恶意留言(SEO)也会被加入这个垃圾源列表....
图片博客测试,2G免费空间 这一板的调整会在近期完成,有建议砖头欢迎提出....
中文Blog Top 100走势分析 。这是一个很初期的版本,纪录blog/博客中哪些联接在过去24小时被点击的最多(排除了搜索引擎)。 当然,这个统计不够全面也不权威,只有加了Booso 反向联接代码的才能够追踪到。 这是一种发现流行趋势的简单做法,未来将公布类似 Baidu top50 类似的关键词等等一系列的东东。 一个令我自己意想不到的事情:查查全中国有多少人的名字和你一模一样这篇日志一共被转载了5000多次,在仅仅2/3周之内,流量也合计超过了100万。看来正经的文章大家不喜欢看,反而是副产品却能走红。...
估计大多数人都认为 Niu.la 书签系统 没有多少技术含量,其实我想大约是每个人设计产品的时候想到的做法不一样吧,你可以将一个产品设计的完美无缺,也可以将一个产品设计的充满灵气。而观众看到的只是最外面的一层东西,至于如何这样,没有人回去关心。 隐马尔可夫链(Hidden Markov) 模型是一种来尝试对随机现象进行描述的模型。 niu.la 的设计里需要体现最热门的书签,以往的几乎所有的门户网络系统,都是简单的根据点击的次数来进行判断哪些东西属于“热门”,哪些属于“冷门”。这种简单的运算可以说是非常没有道理也很笨的。因为热门的东西并非是发展的趋势,反而是相当多的人已经浏览过了的东西。 那么怎么办? 采用预测模型,预测哪些信息将是在未来的一段时间最流行,最热门,这样将这些预测公布出去,将收到无以伦比的效果。 架设点击和收藏是随机序列 C(i, t), S(i, t),其中i 是收藏的书签,t 是时间序列。进行合理的模型设计,可以将C(i, t+x) , S(i, t+x) 进行预测出来,其中x 是从现在往后的时间。 隐马尔可夫链(Hidden Markov)模型在牛啦的设计中起到了非常好的预测功能,基本上将将要流行都能准确的分析出来。...
MT系统的垃圾留言有一个特征,就是留言里面很多联接,非常烦人,在尝试了采用对评论提交后匹配有多少个联接出现在留言里,然后判断是否是垃圾留言是一个好办法来防止大多数的留言。我测试了大约一周,效果比较明显。 以下代码紧跟在mt-comments.cgi的use strict;下面。 use CGI qw(:standard); if ($ENV{'REQUEST_METHOD'} eq "POST") { my $tck = param('text'); die if($tck ne "" && $tck !~ /[\x80-\xff]/); # 以上判断是否是中文留言 my $ncom = 0; while ( $tck =~ /http/g ) { $ncom ++ ; } die if( $ncom > 1 ); # 以上判断有多少个URL出现 } 这里 $ncom 就是计算里面的联接个数,如果大于1,就当做垃圾留言了,直接中断。这个可以做适当的调整。 更多的内容请看以前写的Mt 系统如何防止垃圾留言以及如何通过 .htaccess 禁止垃圾 refer ,这里面提到了现有的常用的垃圾留言的处理方案。...
Blogdriver again :) New problems coming out one by one, yesterday the problem is the Deadlock of our Database. At the same time, there are around 100 users query our database, this number is not large at all. So, I thought it should be the configuration problem in the SQL server. I was proven to be right. Usually, when a database meet deadlock, we need look into detail: 1] Do you set lock in your program? 2] Check if you use too many affairs and open too many connections. 3] Please check the Lock Granularity, table, database? 4] Check the SQL server log, they will show you the exactly which SQL cause deadlock 5] What kind of these deadlock? S(share), Exclusive, IS.. ? and are they make sense? 6] What is the Deadlock Timeout setting too long? Default is -1. Here is a paper about the Deadlock, very useful and have lots detail also:...
There are so many fake referral from Spam website which is try to catch my eyeballs. There are two easy way to ban them: 1] In httpd.conf #################################### SetEnvIfNoCase Referer ".*(casino|gambling|poker|porn|sex|hqsearch|webcamss|rape).*" BadReferrer order allow,deny allow from all deny from env=BadReferrer #################################### 2] Using .htaccess to ban Refer and also Ban IP #################################### SetEnvIfNoCase Referer ".*(casino|gambling|poker|porn|sex|hqsearch|webcamss|rape).*" BadReferrer order deny,allow deny from env=BadReferrer deny from 63.81.44.2 deny from 69.50.191.130 ####################################...
For the Design of Booso.com, fix several problems whit the help of Nio and Chedong, Url rewrite: Design the link of this kind: http://booso.com/cgi-bin/booso.cgi?gmail to http://booso.com/gmail and keep all the others as usural: RewriteEngine On RewriteRule !^/$ - [C] RewriteRule !^/index.html$ - [C] RewriteRule !^/archives/howto.html$ - [C] RewriteRule !^/img/.* - [C] RewriteRule !^/cgi\-bin.* - [C] RewriteRule ^/?(.*) /usr/local/apache2/cgi-bin/booso.cgi?link=$1 RewriteLogLevel 9 Options ExecCGI AddHandler cgi-script .cgi And this will work. After that I found that the query name in Chinese will not work under Internet Explore but works find under mozilla and conqueror. Chedong suggest fix the problem of Url encoding. Url encoding: In Cgi: use CGI (); NewQuery=CGI::escape($URL); This will work. For other system, Php: PHP Example: <? $Text = "foo<b>bar"; $URL = "foo<b>bar.html"; echo HTMLSpecialChars($Text), "<BR>"; echo "<A HREF=\"", rawurlencode($URL), "\">link</A>"; ?> Note that PHP also has a strip_tags() function that will remove all HTML tags from a string. Using this function in a manner such as: echo strip_tags($Text); will strip all HTML from the input. However, if you use it in the form: echo strip_tags($Text, "<B>"); which only allows the "<B>" tag through, you are still often vulnerable to users inserting script code. By design, this function does not strip attributes from the tags. This means it is often possible to include things such as JavaScript event attributes. An example of a tag that would be allowed by the above strip_tags() call is: <B onmouseover="document.location='http://www.cert.org/'"> Some clients accept such attributes on tags that are otherwise benign. Apache Module Example: char *Text = "foo<b>bar"; char *URL = "foo<b>bar.html"; ap_rvputs(r, ap_escape_html(r->pool, Text), "<BR>", NULL); ap_rvputs(r, "<A HREF=\"", ap_escape_uri(r->pool, URL), "\">link</A>", NULL); mod_perl Example: $Text = "foo<b>bar"; $URL = "foo<b>bar.html"; $r->print(Apache::Util::escape_html($Text), "<BR>"); $r->print("<A HREF=\"", Apache::Util::escape_uri($URL), "\">link</A>"); This uses the same functions as in the Apache Module Example, called from Perl instead of directly from C. For a single code, encode url is here: perl -p -e 's/([^\w\-\.\@])/$1 eq "\n" ? "\n":sprintf("%%%2.2x",ord($1))/eg' decode : perl -p -e 's/%(..)/pack("c", hex($1))/eg' Other info: http://www.hk8.org/old_web/linux/cgi/ch02_01.htm...
刊登于上上期的《现代信息技术》杂志 卢亮 关键词: Blog 、blogger、User API、Mblog、Content Management 、e-learning Blog的概念在2002年进入中国以后,在短短的一年的时间里迅速增长。随着blog 托管网站的兴起,blog已经从单纯的抽象概念走向了一个被大众接受的一种媒体文化以及一种全新的网络文化及应用。现阶段中文blogger的数量保守估计应该在30万左右,而且以每天新增上千blogger的速度进入了高速增长期,可是数量不是质量,拥有30万中文blogger的中文blog市场,在商业方面仍然是非常初步的,如何在短期内从blog里面探索出一条商业道路来仍然是一个未知数。 现状扫描 中文blog从2003年末至现在,发展迅速,同时也随之而来出现了一些新情况,从现有的中文blog的现状来看,有如下的几个特点: • 托管服务风起云涌 • Blog服务的鱼目混杂 • 专业与无序并存 • 一些有益的尝试 托管服务风起云涌:在2003年年底,中文blog只有3家托管服务,这三家都有超过1年的历史,是中文blog兴起的先行者。虽然这三家blog托管继续领导中文blog市场,可是已经在短短的几个月内涌现了近10家的blog托管网站。这些网站规模有大有小,提供的服务有好有差,但是无论如何,可以看得出,已经有越来越多的人注意到中文blog的市场了,认为这个将是一个很好的尝试,尽管谈起商业模式,现阶段基本上尚没有一家blog托管网站实现了盈利和短期内可以看到有盈利的可能。 Blog服务的鱼目混杂:Blog热导致了众多的门户网站纷纷建立了自己的blog服务,而这些服务,通常不过是一个发布系统再加上一个日历和一个能够供用户进行评论的留言功能。如果说这样的系统就是blog,显然有些强差人意了。这些门户网站依靠自己的用户资源,可以很轻松的将其所谓的“blog”的概念普及给他们的用户,可是blog内在的许多相关的概念例如Traceback,Rss,Creative Commons却都没有被采纳。可以预计,这样的一些系统,会为其用户带来一些负面的导向,因为我们已经听到了“blog不就是网络日记本么”这样的感慨。 专业与无序并存:最初加入blog潮流的中文blog先行者,大都是一些媒体工作者,一些大学生以及一些IT从业人员。基本的blog的内容也集中在个人生活、学习的纪录,个人评论,以及追踪国外的IT新闻。这些都是很普遍的blog形式,然而这些blog却很难受到大众的关注,被媒体曝光的一定是这里面叛逆者,例如木子美以及竹影青瞳这样的以色情为基调的blogger 往往更能得到宣传媒体的厚爱,更能被发现“卖点”。在这样的气氛下,不免一些blogger会学习重复前面两位的路子,以期造成轰动效应,而不了解blog的人则对blog产生一个先入为主的有色印象,对中文blog的长期健康发展带来一些的隐患。 一些有益的尝试:我们还是看到了一些有益的尝试,在blog的应用方面,一些中文用户已经开始利用blog 这种方便的发布方式来推广自己,建立了个人简历,以及个人新闻,或者个人相册,家庭作业等等。这些有益的尝试与blog这种网络应用的前后台服务的脱离有相当大的关系,只有当用户能否很方便的利用blog系统提供的API 来完成自己需要的界面,内容的设置,才真正的能够将blog的应用推广开。 blog应用的商业化 除了blog加入商业元素外,blog本身的也因为其特点将会被更多的商业和非商业机构采纳,成为小型商业门户网络的解决方案。Blog对于非个人的应用也会随着中文blog的普及而产生而进入如下的一些领域: • 用于公司和团体的信息发布 这个已经逐渐成个小型工作室或者人工作室(soho)的信息发布的首选解决方案。对于soho用户来说,建立网站的方案选择一直是一个棘手的问题,因为并不需要像大型公司可以建立单独的部门或者专人负责甚至将这部分外包给其它的专业公司,但是也需要能够有比较专业的信息发布系统,而不能采用类似个人主页的方式。而blog的user API(用户接口)就刚好为这类用户提供了一个良好的用户接口,并通过这些用户接口建立设置自己的接近专业设计的网站。 • e-learning上的学习 E-learning已经不是一个新鲜的概念了,随着文档的电子化和高速网络的普及,e-learning的概念又一次被挖掘了出来,而这一次e-learning显然要比2000年网络泡沫破碎前有了全新的提高,不但是因为internet已经在这4年里经过了更大的发展,而且在硬件和软件方面有了更加充分的准备。在国内已经出现了几个高中利用blog的资源,进行辅助教学,为学生和教师分别建立了各自的blog ,然后教师将教学要点和家庭作业公布在blog上,而学生将其家庭作业和学习心得写在blog 上,从而实现了无纸教学和教学上的互动。 中文blog未来可以突破的几个方向 对于现有的中文blog托管网站来说,可以说2004是一个机会与挑战并存的一年。2003年中文blog的突飞猛进的增长和各个门户网站的介入,不可避免带进入一些必要的整合和商业上的运作。对照国外的blog商业化操作,中文blog托管服务和商业网站,应该从以下3各方面着手: • 提供更加丰富的功能 • 手机blog以及短信blog • 加强blog市场的规范化管理 提供更加丰富的功能:现有的中文blog服务商虽然都各有特色也拥有一些比较鲜明的特点,但是仍然缺少一些方便的User API。Blog的最流行的软件Moveable Type的缔造者Six Apart公司在最初为blog设计软件的时候就将用户接口作为一个最主要的特征来开发,从而在后面的几年内看到了明显的效应。现在的Moveable Type已经是最多商业blog用户采用的系统了。Six Apart公司的下一个侧重点是Content Management(内容管理),这也给国内的blog服务商一个提醒,也许,内容管理将成为未来商业blog能否成功的一个关键。 手机blog以及短信blog:随着能够拍照的手机的流行,手机blog(mblog,mobile blog)将无可非议的成为一种新的blog方式。手机blog就是采用手机进行拍照,然后将图片配以简单的说明直接发布在自己的blog上。与之相相仿,短信blog是通过手机将短信息直接发布在blog 上。当blog用户在旅游,遇到突发事件的时候,不能接触网络,或者是需要发布一些实时的消息时,这种依靠手机进行发布blog的形式无疑是一次革命,而且更加方便和适合新闻媒体工作者和旅游者。 加强blog市场的规范化管理:现在的中文blog仍然是一种用户自由发布的状态,因为从以往的经验看出,这些单个blogger 常常会影响到整个网站的稳定运行以及托管服务的抗击风险的能力。其中不乏有一些blogger的内容会给社会带来负面和消极的影响,甚至部分的blogger的内容会给国家和集体的利益带来损害,这是一个遵纪守法的公民所不愿意看到的。 当然,blog的应用和发展也会出现在其它的方面,因为一旦进入商业模式,一切的发展都是按照市场的需求作为导向的。在中文blog逐渐普及的今天,本文虽不能将整个中文blog面临的问题和机遇一一详述,但是希望能够做到窥一斑可见全豹,为中文blog的发展提个醒。...
Speaker: Liang Lu, Kevin Wen, Zhihong Mao Abstract The blogging concept was first introduced in China in 2002. It grew rapidly and became popular in merely one year. In 2003, Social Networking and related applications started to surface in China. There are more than 300,000 people who are actively involved in Blogging and Social Networking sites. These new methods of self-expression and communication have started a new Internet trend in China. People are publishing their thoughts and experiences on Blog sites. Social Networking tools help them to establish new connections and enhance existing friendships. Currently, blogging service providers are seeking new opportunities to commercialize their businesses. At the same time, they are adjusting their strategy to comply with evolving government regulations on Internet usage. Social Networking service providers are striving to find a profitable business model. What if we combine these two seemingly separate but yet fundamentally related concepts? Blogging and Social Networking complement each other. Their value can potentially be multiplied through integration. Blogging improves the interaction of Social Networking users. Blogger communities become tighter and stronger via Social Networking. Naturally, this new breed of collaboration and communication application evolves into a personal portal. Refer: http://www.blogdriver.com http://www.uufriends.com...
Before answer this question, we need ask a few questions: 1] Will government ban blog? No, as far as internet exist , China government will have to let BLOG exist. The main reason is that the risk of blog will not exceed the risk of BBS, another style of internet application for group users. 2] What is the bottomline? Sex, Pornography? No, as fas as I can see, there are many blogs who post pron in their blogs and still ok. Like the famous blogger Muzimei and ZhuMuqingtong. Politico-sensitive will be the bottom. And the other history of the CCP will also sensitive topic. These sensitive things will kill the blog and even the whole blog hosting service. So, where to go? Go somewhere that No-Politico-Related area. Too simple? Too Naive? Something simple? No, I mean it should go like this, I mean what I say, no more no less....
3-13-2004, the 3rd largest blogservice BLOGBUS was shutdown by China government. One day later, the largest blog service BLOGCN was also turn down by government. Then, another day later, the 2nd largest blogservice BLOGDRIVER post a notice on it's main page "Maintenance for unknow situition". Even one day later, BLOGDRIVER resume it's service, the other 2 blog service stop work by know, almost 180000 bloggers can access to their blog and don't know what happened. Last weekend, I have a phone meeting with the managers of these 3 blog services and other bloggers. Hengge, manager of Blogbus: We don't know how long it will take, but we have to clean up all these sensitive posts. Huzhiguang, manager of Blogcn: We got a fax from government which claim that we must shut down our servers. Rever, manager of Blogdriver: Still ok, by now. We don't have these sensitive posts and we have filtering program. Thanks to all friends who let us can be heard....
Just wrote a shell scripts which can auto find the top 600 music mp3 links and download them one by one. here it is: #!/bin/sh # By Liang Lu at 3-9-2004 rm mp3.list html.list wget http://list.mp3.baidu.com/topso/mp3topsong.html cat mp3topsong.html | tr \" \\n | grep htm$ >html.list CC=1 for VAL in `cat html.list` do wget http://list.mp3.baidu.com/topso/$VAL -O $CC.html cat $CC.html | tr \" \\n | grep mp3$ | grep http | head -1 >> mp3.list CC=`expr $CC + 1` done CC=1 for VAL in `cat mp3.list` do echo $CC wget $VAL -O $CC.mp3 CC=`expr $CC + 1` done...
Spend 1 minute find the 2nd way to hack Alexa toolbar. Kick away my old hack program, since my new method is soooooo coooool. Hahaaaaa. Why so stupid toolbar still can be proven as a net traffic meter? Do you remember, Hacker always walk fast. :-D...
Play with Alexa for a while, wrote a program which hack the Alexa. It is easy to do after you understand the Alexa woring flow: Browser (with Alexa Toolbar)-->Brows website (toolbar -->data.alexa.com) So, let me explain how to do it: 1] just sniffit your browser and you can get the Alexa API. 2] Wrote a program which simulate the Alexa toolbar and carry the website which you want be hack. 3] Running it thousands of times, ok, that website data in Alexa is soaring up. 4] anything else?...
Really funny post from "http://www.rightwingnews.com/archives/week_2004_02_01.PHP#001750" I have seen a lot of BS artists come and go, but I'm not sure I've ever seen one who compares to guy named Jim F. Kukral. Here's a guy with a weblog ranked 439,979th on Alexa, writing an e-book called "Blogs To Riches" and charging $47 a pop for it, when NOBODY is getting rich blogging with the possible exception -- if your definition of rich is loose enough -- of Andrew Sullivan. So let a blogger who averages more than 6000 daily uniques per weekday, has an Alexa rank of 35,487, and 20 advertisers currently, give you a little primer about making money via blogging. I may not have any fancy marketing degrees, but then again, I'm not going to charge you $47 to read this post either. First off, if your primary motivation is to make money, don't bother with blogging. That's not to say that you can't make money blogging, but most people don't and it usually takes a long, long, time to make any serious lucre even if you do. Just to give you an idea of what I'm talking about, among political bloggers, I'm going to **guess** that there are maybe 4 or 5 political bloggers right now who could scratch -- and I do mean scratch -- out a living based solely on advertising revenue (Glenn Reynolds, Andrew Sullivan, Josh Marshall, Daily Kos & Atrios). Out of that group, to the best of my knowledge, only Sullivan is really raking in what I think of as big time money with his huge fundraisers. Over time, the blogs getting that kind of traffic and therefore getting those kind of advertising opportunities are going to continue to grow, but for now, the numbers are small. But, let's cut to the chase and talk about ways to make money blogging. Amazon: I've tried selling Amazon products before, but really wasn't terribly impressed with them. For the amount of space you end up giving them, you just don't get enough of a return to make it worth it. In my opinion, you'd be better off putting ads in the same space. However, other people like Oliver Willis who really push Amazon in individual posts may get better results. If you're going to use Amazon, that's the way I'd recommend that you go. Banner Ads: I actually have a waiting list for my...
如果说的不客气一点, 虽然MT(moveable type)在没有Atom.xml之前唯一能吸引我的就算是XML-RPC 协议了. 利用XML-RPC可以方便进行发布, 管理. 以前写过一个程序, email2blog(Email-->Blog)就是利用 XML-RPC. 可是偏偏XML-RPC有漏洞, 密码传输也是明码, 包括mt-xmlrpc.cgi的严重bug,导致它不能很方便的使用. 这次不同了Atom.xml会弥补这些问题, 而且很方便的是可以采用 XML -->CSS -->网页, 这个简直是"XML + XSLT" 或者 "XML + CSS"的绝配了. 看看我的Atom.xml;不要奇怪为什么能直接浏览, 这就是CSS的功劳! 同时发现, 台湾的一些blogger也注意到了atom.xml的作用, 看来atom.xml的发展势头不可低估....
The latest version of MT is 2.65, which running behind this blog. 经过测试后, 发现2.65有几个进步. 1] 支持Atom.xml 而且设成了默认的模版, 可以建立个人完全的全新的界面, xlt + xml 一定会完美! 2] 弥补了xml-rpc 服务的几个安全上的漏洞 3] 更加强大的 traceback autodiscover,这个简直太强了, 测试的时候随便做了几个Link, 一会就告诉我XX被Traceback了. 老版本常常会有故障. 更多的新闻: http://danja.typepad.com/fecho/2003/12/sixapart_news.html http://www.neilturner.me.uk/2003/Dec/22/movable_type_265_and_30.html...