StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POreading # char in python
text
Body
copied!<p>can someone help with me reading "#" char in python? i can't seem to get the file. because this is an output from the stanford postagger, is there any scripts available to convert the stanford postagger <a href="http://nlp.stanford.edu/software/tagger.shtml" rel="nofollow">http://nlp.stanford.edu/software/tagger.shtml</a> file to cwb. <a href="http://cogsci.uni-osnabrueck.de/~korpora/ws/CWBdoc/CWB_Encoding_Tutorial/node3.html" rel="nofollow">http://cogsci.uni-osnabrueck.de/~korpora/ws/CWBdoc/CWB_Encoding_Tutorial/node3.html</a></p> <p>so this is the utf-8 txt file that i'm trying to read:</p> <pre><code> 如果#CS 您#PN 在#P 新加坡#NR 只#AD 能#VV 前往#VV 一#CD 间#M 俱乐部#NN ，#PU 祖卡#NN 酒吧#NN 必然#AD 是#VC 您#PN 的#DEG 不二#JJ 选择#NN 。#PU 作为#P 或许#AD 是#VC 新加坡#NR 唯一#JJ 一#CD 家#M 国际#NN 知名#VA 的#DEC 夜店#NN ，#PU 祖卡#NN 既#CC 是#VC 一#CD 个#M 公共#JJ 机构#NN ，#PU </code></pre> <p>So with this code i'm not readin the # char in the utf-8 txt files:</p> <pre><code>#!/usr/bin/python # -*- coding: utf-8 -*- ''' stanford POS tagger to CWB format ''' import codecs import nltk import os, sys, re, glob reload(sys) sys.setdefaultencoding('utf-8') cwd = './path/to/file.txt' #os.getcwd() for infile in glob.glob(os.path.join(cwd, 'zouk.txt')): print infile (PATH, FILENAME) = os.path.split(infile) reader = codecs.open(infile, 'r', 'utf-8') for line in reader: for word in line: if word == '\#': print 'hex is here' </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload