July 05, 2004

UTF-8 GB convert and coding problems::[Misc]


Liang

1] Change '&'#XXXXX;' to Utf-8:
perl -p -e 's/&#(.....);/pack("U", $1)/eg'

2] Change \xb?\xb? to GB2312
perl -p -e 's/\\x(..)/pack("c", hex($1))/eg'

3] Change %C2%D2%C2%D7 to gb2312
perl -p -e 's/%(..)/pack("c", hex($1))/eg'

basicly, jv-convert is the best tool to convert different formats in UNIX/Linux:

jv-convert --from UTF-8 --to gb18030

:
jv-convert --help
Usage: jv-convert [OPTIONS] [INPUTFILE [OUTPUTFILE]]

Convert from one encoding to another.

--encoding FROM
--from FROM use FROM as source encoding name
--to TO use TO as target encoding name
-i FILE read from FILE
-o FILE print output to FILE
--reverse swap FROM and TO encodings
--help print this help, then exit
--version print version number, then exit

Posted at July 5, 2004 05:19 PM by Liang at 05:19 PM | Comments (0) | TrackBack(0) | Booso!| Niu.la收藏!


Trackback

You can ping this entry by using http://www.wespoke.com/cgi-bin/mt/mt-tb.cgi/532

Comments

Post a comment

请注意,为了防止spam,您的留言必需含有中文字符!









Remember personal info?




所有发表