Monday, April 09, 2007

文字コード判定

>>> import chardet
>>> w = 'テキスト'
>>> char_dic = chardet.detect(w)
>>> {'encoding': 'utf-8', 'confidence': 0.99}


http://chardet.feedparser.org/
char_dic.get('confidence')
char_dic.get('encoding')

0 Comments:

Post a Comment

<< Home