StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPostgreSQL: How to optimize my database for storing and querying a huge graph
primarykey
Id
1822802
data
AcceptedAnswerId
1826151
AnswerCount
5
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2009-11-30T22:19:23.063
FavoriteCount
9
LastActivityDate
2009-12-15T18:50:30.837
LastEditDate
2009-12-01T19:17:24.807
LastEditorUserId
179014
OwnerUserId
179014
ParentId
0
PostTypeId
1
Score
7
ViewCount
2656
LastEditorDisplayName
text
Body
I'm running PostgreSQL 8.3 on a 1.83 GHz Intel Core Duo Mac Mini with 1GB of RAM and Mac OS X 10.5.8. I have a stored a huge graph in my PostgreSQL database. It consists of 1.6 million nodes and 30 million edges. My database schema is like: <pre><code>CREATE TABLE nodes (id INTEGER PRIMARY KEY,title VARCHAR(256)); CREATE TABLE edges (id INTEGER,link INTEGER,PRIMARY KEY (id,link)); CREATE INDEX id_idx ON edges (id); CREATE INDEX link_idx ON edges (link); </code></pre> The data in the table edges looks like <pre><code>id link 1 234 1 88865 1 6 2 365 2 12 ... </code></pre> So it stores for each node with id x the outgoing link to id y. The time for searching all the outgoing links is ok: <pre><code>=# explain analyze select link from edges where id=4620; QUERY PLAN --------------------------------------------------------------------------------- Index Scan using id_idx on edges (cost=0.00..101.61 rows=3067 width=4) (actual time=135.507..157.982 rows=1052 loops=1) Index Cond: (id = 4620) Total runtime: 158.348 ms (3 rows) </code></pre> However, if I search for the incoming links to a node, the database is more than 100 times slower (although the resulting number of incoming links is only 5-10 times higher than the number of outgoing links): <pre><code>=# explain analyze select id from edges where link=4620; QUERY PLAN ---------------------------------------------------------------------------------- Bitmap Heap Scan on edges (cost=846.31..100697.48 rows=51016 width=4) (actual time=322.584..48983.478 rows=26887 loops=1) Recheck Cond: (link = 4620) -> Bitmap Index Scan on link_idx (cost=0.00..833.56 rows=51016 width=0) (actual time=298.132..298.132 rows=26887 loops=1) Index Cond: (link = 4620) Total runtime: 49001.936 ms (5 rows) </code></pre> I tried to force Postgres not to use a Bitmap Scan via <pre><code>=# set enable_bitmapscan = false; </code></pre> but the speed of the query for incoming links didn't improve: <pre><code>=# explain analyze select id from edges where link=1588; QUERY PLAN ------------------------------------------------------------------------------------------- Index Scan using link_idx on edges (cost=0.00..4467.63 rows=1143 width=4) (actual time=110.302..51275.822 rows=43629 loops=1) Index Cond: (link = 1588) Total runtime: 51300.041 ms (3 rows) </code></pre> I also increased my shared buffers from 24MB to 512MB, but it didn't help. So I wonder why my queries for outgoing and incoming links show such an asymmetric behaviour? Is something wrong with my choice of indexes? Or should I better create a third table holding all the incoming links for a node with id x? But that would be quite a waste of disk space. But since I'm new into SQL databases maybe I'm missing something basic here? 
Tags
<optimization><postgresql><graph>
Title
PostgreSQL: How to optimize my database for storing and querying a huge graph
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USasmaier
UserOwnerUserId
1. USasmaier
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POPostgreSQL: How to optimize my database for storing and querying a huge graph
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POPostgreSQL: How to optimize my database for storing and querying a huge graph
 UserUserId
 USThomas
 VoteTypeVoteTypeId
 VTFavorite
3. VO
 singulars
 PostPostId
 POPostgreSQL: How to optimize my database for storing and querying a huge graph
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.