Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Let's get the easy part of your question out of the way first:</p> <p>When you print a list, the <code>repr</code> of the contents is used to represent the items in the list. So since </p> <pre><code>re.split(" ", c[0].get_text()) </code></pre> <p>is a list, the print statement prints the <a href="http://docs.python.org/2/library/repr.html#module-repr" rel="nofollow">repr</a> of the <code>unicode</code> element in the list.</p> <pre><code>In [63]: x = u'-1\xbe\xa0-101' In [64]: print(x) -1¾ -101 In [65]: repr(x) Out[65]: "u'-1\\xbe\\xa0-101'" </code></pre> <hr> <p>Now for the interesting part: Some unicode code points have names. For example,</p> <pre><code>In [60]: import unicodedata as ud In [61]: ud.name(u'\xbe') Out[61]: 'VULGAR FRACTION THREE QUARTERS' </code></pre> <p>In fact, we can search through all the unicode characters for those with names which match the pattern <code>'FRACTION (\w+) (\w+)'</code>:</p> <pre><code>import unicodedata as ud import re numerator = { 'ONE':1, 'TWO':2, 'THREE':3, 'FOUR':4, 'FIVE':5, 'SIX':6, 'SEVEN':7, 'EIGHT':8, 'NINE':9, 'ZERO':0, } denominator = { 'QUARTER':4, 'HALF':2, 'SEVENTH':7, 'NINTH':9, 'THIRD':3, 'FIFTH':5, 'SIXTH':6, 'EIGHTH':8, 'SIXTEENTH':16 } fraction = {} for num in range(0x110000): s = unichr(num) try: name = ud.name(s) except ValueError: continue match = re.search('FRACTION ({n}) ({d})'.format( n = '|'.join(numerator.keys()), d = '|'.join(denominator.keys()), ) , name) if match: fraction[num] = unicode( float(numerator[match.group(1)])/denominator[match.group(2)]).lstrip('0') print(fraction) </code></pre> <p>Thus we now have a <code>dict</code> named <code>fraction</code> which maps unicode code points to <code>unicode</code> decimal representations of the fractions.</p> <pre><code>{8585: u'.0', 43056: u'.25', 43057: u'.5', 43058: u'.75', 43059: u'.0625', 43060: u'.125', 43061: u'.1875', 188: u'.25', 189: u'.5', 190: u'.75', 8528: u'.142857142857', 8529: u'.111111111111', 8531: u'.333333333333', 8532: u'.666666666667', 8533: u'.2', 8534: u'.4', 8535: u'.6', 8536: u'.8', 8537: u'.166666666667', 8538: u'.833333333333', 8539: u'.125', 8540: u'.375', 8541: u'.625', 8542: u'.875', 69245: u'.333333333333', 3443: u'.25', 3444: u'.5', 3445: u'.75', 69243: u'.5', 69244: u'.25', 11517: u'.5', 69246: u'.666666666667'} </code></pre> <p>Now you can translate <code>u'-1\xbe\xa0-101'</code> like this:</p> <pre><code>text = u'-1\xbe\xa0-101' print(text.translate(fraction)) </code></pre> <p>yields</p> <pre><code>-1.75 -101 </code></pre> <hr> <p>So the short answer is:</p> <pre><code>fraction = {8585: u'.0', 43056: u'.25', 43057: u'.5', 43058: u'.75', 43059: u'.0625', 43060: u'.125', 43061: u'.1875', 188: u'.25', 189: u'.5', 190: u'.75', 8528: u'.142857142857', 8529: u'.111111111111', 8531: u'.333333333333', 8532: u'.666666666667', 8533: u'.2', 8534: u'.4', 8535: u'.6', 8536: u'.8', 8537: u'.166666666667', 8538: u'.833333333333', 8539: u'.125', 8540: u'.375', 8541: u'.625', 8542: u'.875', 69245: u'.333333333333', 3443: u'.25', 3444: u'.5', 3445: u'.75', 69243: u'.5', 69244: u'.25', 11517: u'.5', 69246: u'.666666666667'} text = c[0].get_text() text = text.translate(fraction) parts = map(float, text.split()) print(parts) </code></pre> <p>yields</p> <pre><code>[-1.75, -101.0] </code></pre> <p>Note that in the future it is possible that more fractions are assigned unicode code points. It is also possible that the name of the unicode code point does not match the pattern <code>'FRACTION ({n}) ({d})'</code> that I used to generate the <code>fraction</code> dict. So my solution is somewhat fragile and may need to be updated in the future.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload