Note that there are some explanatory texts on larger screens.

plurals
  1. POreading # char in python
    text
    copied!<p>can someone help with me reading "#" char in python? i can't seem to get the file. because this is an output from the stanford postagger, is there any scripts available to convert the stanford postagger <a href="http://nlp.stanford.edu/software/tagger.shtml" rel="nofollow">http://nlp.stanford.edu/software/tagger.shtml</a> file to cwb. <a href="http://cogsci.uni-osnabrueck.de/~korpora/ws/CWBdoc/CWB_Encoding_Tutorial/node3.html" rel="nofollow">http://cogsci.uni-osnabrueck.de/~korpora/ws/CWBdoc/CWB_Encoding_Tutorial/node3.html</a></p> <p>so this is the utf-8 txt file that i'm trying to read:</p> <pre><code> 如果#CS 您#PN 在#P 新加坡#NR 只#AD 能#VV 前往#VV 一#CD 间#M 俱乐部#NN ,#PU 祖卡#NN 酒吧#NN 必然#AD 是#VC 您#PN 的#DEG 不二#JJ 选择#NN 。#PU 作为#P 或许#AD 是#VC 新加坡#NR 唯一#JJ 一#CD 家#M 国际#NN 知名#VA 的#DEC 夜店#NN ,#PU 祖卡#NN 既#CC 是#VC 一#CD 个#M 公共#JJ 机构#NN ,#PU </code></pre> <p>So with this code i'm not readin the # char in the utf-8 txt files:</p> <pre><code>#!/usr/bin/python # -*- coding: utf-8 -*- ''' stanford POS tagger to CWB format ''' import codecs import nltk import os, sys, re, glob reload(sys) sys.setdefaultencoding('utf-8') cwd = './path/to/file.txt' #os.getcwd() for infile in glob.glob(os.path.join(cwd, 'zouk.txt')): print infile (PATH, FILENAME) = os.path.split(infile) reader = codecs.open(infile, 'r', 'utf-8') for line in reader: for word in line: if word == '\#': print 'hex is here' </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload