StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
2324145
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
24
CommunityOwnedDate
CreationDate
2010-02-24T06:38:01.953
FavoriteCount
0
LastActivityDate
2010-02-24T07:36:42.070
LastEditDate
2010-02-24T07:36:42.070
LastEditorUserId
171061
OwnerUserId
171061
ParentId
2324050
PostTypeId
2
Score
4
ViewCount
0
LastEditorDisplayName
text
Body
I'd argue that there's really no reason to make your DB do the extra work of evaluating the WHERE clause. Given that you actually want all the records, you will have to do the work of fetching them. If you do a single SELECT from the table, it will retrieve them all in table-order and you can partition them yourself. If you SELECT WHERE male and SELECT WHERE female, you'll have to hit an index for each operation, and you'll lose some data locality. For example, if your records on disk are alternating male-female and you have a dataset much larger than memory, you'll likely have to read the entire database twice if you do two separate queries, whereas a single SELECT for both will be a single table scan. EDIT: Since I'm getting downmodded into oblivion, I decided to actually run the test. I've generated a table <blockquote> CREATE TEMPORARY TABLE gender_test (some_data DOUBLE PRECISION, gender CHARACTER VARYING(20)); </blockquote> I generated some random data, <blockquote> select gender, count(*) from gender_test group by gender; gender | count --------+---------- female | 12603133 male | 10465539 (2 rows) </blockquote> First, let's run these tests without indices, in which case I'm quite sure I'm right... <blockquote> test=> EXPLAIN ANALYSE SELECT * FROM gender_test WHERE gender='male'; QUERY PLAN <hr> Seq Scan on gender_test (cost=0.00..468402.00 rows=96519 width=66) (actual time=0.030..4595.367 rows=10465539 loops=1) Filter: ((gender)::text = 'male'::text) Total runtime: 5150.263 ms test=> EXPLAIN ANALYSE SELECT * FROM gender_test WHERE gender='female'; QUERY PLAN <hr> Seq Scan on gender_test (cost=0.00..468402.00 rows=96519 width=66) (actual time=0.029..4751.219 rows=12603133 loops=1) Filter: ((gender)::text = 'female'::text) Total runtime: 5418.891 ms test=> EXPLAIN ANALYSE SELECT * FROM gender_test; QUERY PLAN <hr> Seq Scan on gender_test (cost=0.00..420142.40 rows=19303840 width=66) (actual time=0.021..3326.164 rows=23068672 loops=1) Total runtime: 4543.393 ms (2 rows) </blockquote> Funny, looks like fetching the data in a table scan without the filter is indeed faster! In fact, more than twice as fast! (5150 + 5418 > 4543) Much like I predicted! :-p Now, let's make an index and see if it changes the results... <blockquote> CREATE INDEX test_index ON gender_test(gender); </blockquote> Now to rerun the same queries... <blockquote> test=> EXPLAIN ANALYSE SELECT FROM gender_test WHERE gender='male'; QUERY PLAN <hr> Bitmap Heap Scan on gender_test (cost=2164.69..195922.27 rows=115343 width=66) (actual time=2008.877..4388.348 rows=10465539 loops=1) Recheck Cond: ((gender)::text = 'male'::text) -> Bitmap Index Scan on test_index (cost=0.00..2135.85 rows=115343 width=0) (actual time=2006.047..2006.047 rows=10465539 loops=1) Index Cond: ((gender)::text = 'male'::text) Total runtime: 4941.64 ms test=> EXPLAIN ANALYSE SELECT * FROM gender_test WHERE gender='female'; QUERY PLAN <hr> Bitmap Heap Scan on gender_test (cost=2164.69..195922.27 rows=115343 width=66) (actual time=1915.385..4269.933 rows=12603133 loops=1) Recheck Cond: ((gender)::text = 'female'::text) -> Bitmap Index Scan on test_index (cost=0.00..2135.85 rows=115343 width=0) (actual time=1912.587..1912.587 rows=12603133 loops=1) Index Cond: ((gender)::text = 'female'::text) Total runtime: 4931.555 ms (5 rows) test=> EXPLAIN ANALYSE SELECT * FROM gender_test; QUERY PLAN <hr> Seq Scan on gender_test (cost=0.00..457790.72 rows=23068672 width=66) (actual time=0.021..3304.836 rows=23068672 loops=1) Total runtime: 4523.754 ms </blockquote> Funny.... scanning the entire table in one go is still twice as fast! (4941 + 4931 vs 4523) NOTE There's all sorts of ways this is unscientific. I'm running with 16GB of RAM, so the entire dataset fits into memory. Postgres isn't configured to use nearly that much, but disk cache still helps... I'd hypothesize (but can't be assed to actually try) that the effects only get worse once you hit disk. I only tried the default btree Postgres indexing. I'm assuming the PHP partitioning takes no time - not true, but probably a pretty reasonable approximation. All tests run on a Mac Pro 8-way 2.66 Xeon 16GB RAID-0 7200rpm Also, this dataset is 26 million rows, which is probably a bit larger than most people care about... Obviously, raw speed isn't the only thing you care about. In many (most?) applications, you'd care more about the logical "correctness" of fetching them separately. But, when it comes down to your boss saying "we need this to go faster" this will (apparently) give you a 2x speedup. The OP explicitly asked about efficiency. Happy?
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POIs it better to filter a resultset using a WHERE clause or using application code?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USSteven Schlansker
UserOwnerUserId
1. USSteven Schlansker
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTDownMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.