StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POParsing HTML page into CSV - python
primarykey
Id
16171595
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2013-04-23T14:11:11.427
FavoriteCount
0
LastActivityDate
2013-04-23T15:57:00.900
LastEditDate
2013-04-23T15:57:00.900
LastEditorUserId
1274069
OwnerUserId
1274069
ParentId
0
PostTypeId
1
Score
-1
ViewCount
972
LastEditorDisplayName
text
Body
I am trying to transfer all the data i parsed from a website into a csv file but i have run into a couple of problems: 1.Even though i have added the character encoding, it still prints out as HTML in excel rather than plain text: e.g <pre><code><option redirectvalue="/partfinder/Asus/All In One/E Series/ET10B">ET10B</option> </code></pre> 2.It prints out in one column rather than rows for them all Here is my code so far: <pre><code>import string, urllib2, urlparse, csv, sys, codecs, cStringIO from urllib import quote from urlparse import urljoin from bs4 import BeautifulSoup from ast import literal_eval class UnicodeWriter: """ A CSV writer which will write rows to CSV file "f", which is encoded in the given encoding. """ def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): # Redirect output to a queue self.queue = cStringIO.StringIO() self.writer = csv.writer(self.queue, dialect=dialect, **kwds) self.stream = f self.encoder = codecs.getincrementalencoder(encoding)() def writerow(self, row): self.writer.writerow([s.encode("utf-8") for s in row]) # Fetch UTF-8 output from the queue ... data = self.queue.getvalue() data = data.decode("utf-8") # ... and reencode it into the target encoding data = self.encoder.encode(data) # write to the target stream self.stream.write(data) # empty queue self.queue.truncate(0) def writerows(self, rows): for row in rows: self.writerow(row) changable_url = 'http://www.asusparts.eu/partfinder/Asus/All%20In%20One/E%20Series' page = urllib2.urlopen(changable_url) base_url = 'http://www.asusparts.eu' soup = BeautifulSoup(page) selects = [] redirects = [] model_info = [] #Opening csv writer c = UnicodeWriter(open(r"asus_stock.csv", "wb")) #Object reader cr = UnicodeWriter(open(r"asus_stock.csv", "rb")) print "FETCHING OPTIONS" select = soup.find(id='myselectListModel') selects.append(select) for item in selects: print item.get_text() options = select.findAll('option') for option in options: if(option.has_attr('redirectvalue')): redirects.append(option['redirectvalue']) for r in redirects: rpage = urllib2.urlopen(urljoin(base_url, quote(r))) s = BeautifulSoup(rpage) #print s #Fetching the main title for each specific model and printing it out print "FETCHING MAIN TITLE" maintitle = s.find(id='puffBreadCrumbs') model_info.append(maintitle) print maintitle.get_text() datas = s.find(id='accordion') a = datas.findAll('a') content = datas.findAll('span') print "FETCHING CATEGORY" for data in a: if(data.has_attr('onclick')): arguments = literal_eval('(' + data['onclick'].replace(', this', '').split('(', 1)[1]) #model_info.append(arguments) print arguments #arguments[1] + " " + arguments[3] + " " + arguments[4] # Retrieves Part number and Price print "FETCHING DATA" for complete in content: if(complete.has_attr('class')): #model_info.append(complete['class']) print complete.get_text() print "FETCHING IMAGES" img = s.find('td') images = img.findAll('img') model_info.append(images) print images c.writerows(selects) </code></pre> How can i make it so it prints out as <blockquote> 1-Text rather than HTML 2-Rows rather than one column </blockquote> [EDIT] This is how i would like the CSV file to be displayed and example of values to be returned <pre><code>"Brand Name" "CategoryID" "ModelID" "Family" "Name" "Part Number" "Price" "Image src" Asus | AC Adapter | ET1602 | E Series | Power Cord 3P L:80CM,UK(B) | 14G110008350 |14.77 | image src </code></pre> [NEW EDIT] These are the outputs for the printed values: <pre><code>print "FETCHING OPTIONS" select = soup.find(id='myselectListModel') selects.append(select) for item in selects: print item.get_text() </code></pre> yields: <pre><code>ET10B ET1602 ET1602C etc.. </code></pre> Fetching Main Title: <pre><code>print "FETCHING MAIN TITLE" maintitle = s.find(id='puffBreadCrumbs') model_info.append(maintitle) print maintitle.get_text() </code></pre> yields: <blockquote> Asus - All In One - E Series - ET10B </blockquote> Fetching Category <pre><code>datas = s.find(id='accordion') a = datas.findAll('a') content = datas.findAll('span') print "FETCHING CATEGORY" for data in a: if(data.has_attr('onclick')): arguments = literal_eval('(' + data['onclick'].replace(', this', '').split('(', 1)[1]) #model_info.append(arguments) print arguments </code></pre> yields: <pre><code>FETCHING CATEGORY ('Asus', 'AC Adapter', 'ET10B', '6941', 'E Series') ('Asus', '04G265003580') ('Asus', '14G110008340') ('Asus', 'Bracket', 'ET10B', '7138', 'E Series') ('Asus', 'Cable', 'ET10B', '6983', 'E Series') ('Asus', 'Camera', 'ET10B', '6985', 'E Series') ('Asus', 'Cooling', 'ET10B', '6999', 'E Series') ('Asus', 'Cover', 'ET10B', '6984', 'E Series') etc.. </code></pre> Fetching the Name: <pre><code>print "FETCHING NAME" name = s.find('b').get_text() print name </code></pre> yields: <blockquote> POWER ADAPTER 65W19V 3PIN </blockquote> Fetching Part Number and Price <pre><code>print "FETCHING PART NUMBER AND PRICE (inc. VAT)" for complete in content: if(complete.has_attr('class')): #model_info.append(complete['class']) print complete.get_text() </code></pre> yields: <pre><code>FETCHING PART NUMBER AND PRICE (inc. VAT) Part number: 04G265003580 Remote stock 38.09:- EUR </code></pre> Fetching the images <pre><code>print "FETCHING IMAGES" img = s.find('td') images = img.findAll('img') model_info.append(images) print images </code></pre> yields: <pre><code>FETCHING IMAGES [<img alt="" src="/images/Articles/thumbs/04G265003580_thumb.jpg"/>] </code></pre>
Tags
<python><python-2.7><html-parsing>
Title
Parsing HTML page into CSV - python
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USash
UserOwnerUserId
1. USash
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POParsing HTML page into CSV - python
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTDownMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.