StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPostgreSQL Query Optimization and the Postmaster Process'
primarykey
Id
414307
data
AcceptedAnswerId
415069
AnswerCount
3
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2009-01-05T19:59:16.293
FavoriteCount
0
LastActivityDate
2018-04-24T01:35:36.393
LastEditDate
2018-04-24T01:35:36.393
LastEditorUserId
1033581
OwnerUserId
49985
ParentId
0
PostTypeId
1
Score
1
ViewCount
645
LastEditorDisplayName
Nicholas Leonard
text
Body
I currently working with a larger wikipedia-dump derived PostgreSQL database; it contains about 40 GB of data. The database is running on an HP Proliant ML370 G5 server with Suse Linux Enterprise Server 10; I am querying it from my laptop over a private network managed by a simple D-Link router. I assigned static DHCP (private) IPs to both laptop and server. Anyway, from my laptop, using pgAdmin III, I send off some SQL commands/queries; some of these are CREATE INDEX, DROP INDEX, DELETE, SELECT, etc. Sometimes I send a command (like CREATE INDEX), it returns, telling me that the query was executed perfectly, etc. However, the postmaster process assigned to such a command seems to remain sleeping on the server. Now, I do not really mind this, for I say to myself that PostgreSQL maintains a pool of postmasters ready to process queries. Yet, if this process eats up 6 GB of it 9.4 GB assigned RAM, I worry (and it does so for the moment). Now maybe this is a cache of data that is kept in [shared] memory in case another query happens to need to use that same data, but I do not know. Another thing is bothering me. I have 2 tables. One is the page table; I have an index on its page_id column. The other is the pagelinks tables which has the pl_from column that references either nothing or a variable in the page.page_id column; unlike the page_id column, the pl_from has no index (yet). To give you an idea of the scale of the tables and the necessity for me to find a viable solution, page table has 13.4 million rows (after I deleted those I do not need) while the pagelinks table has 293 million. I need to execute the following command to clean the pagelinks table of some of its useless rows: <pre><code>DELETE FROM pagelinks USING page WHERE pl_from NOT IN (page_id); </code></pre> So basically, I wish to rid the pagelinks table of all links coming from a page not in the page table. Even after disabling the nested loops and/or sequential scans, the query optimizer always gives me the following "solution": <pre><code>Nested Loop (cost=494640.60..112115531252189.59 rows=3953377028232000 width=6) Join Filter: ("outer".pl_from <> "inner".page_id)" -> Seq Scan on pagelinks (cost=0.00..5889791.00 rows=293392800 width=17) -> Materialize (cost=494640.60..708341.51 rows=13474691 width=11) -> Seq Scan on page (cost=0.00..402211.91 rows=13474691 width=11) </code></pre> It seems that such a task would take more than weeks to complete; obviously, this is unacceptable. It seems to me that I would much rather it use the page_id index to do its thing...but it is a stubborn optimizer and I might be wrong. 
Tags
<optimization><postgresql><indexing><rdbms>
Title
PostgreSQL Query Optimization and the Postmaster Process'
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USCœur
UserOwnerUserId
1. USNicholas Leonard
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POPostgreSQL Query Optimization and the Postmaster Process'
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.