StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>If performance is the primary concern, I would go with #6... a table per UDF (really, this is a variant of #2). This answer is specifically tailored to this situation and the description of the data distribution and access patterns described.</p> <h2>Pros:</h2> <ol> <li><p>Because you indicate that some UDFs have values for a small portion of the overall data set, a separate table would give you the best performance because that table will be only as large as it needs to be to support the UDF. The same holds true for the related indices.</p></li> <li><p>You also get a speed boost by limiting the amount of data that has to be processed for aggregations or other transformations. Splitting the data out into multiple tables lets you perform some of the aggregating and other statistical analysis on the UDF data, then join that result to the master table via foreign key to get the non-aggregated attributes.</p></li> <li><p>You can use table/column names that reflect what the data actually is.</p></li> <li><p>You have complete control to use data types, check constraints, default values, etc. to define the data domains. Don't underestimate the performance hit resulting from on-the-fly data type conversion. Such constraints also help RDBMS query optimizers develop more effective plans.</p></li> <li><p>Should you ever need to use foreign keys, built-in declarative referential integrity is rarely out-performed by trigger-based or application level constraint enforcement.</p></li> </ol> <h2>Cons:</h2> <ol> <li><p>This could create a lot of tables. Enforcing schema separation and/or a naming convention would alleviate this.</p></li> <li><p>There is more application code needed to operate the UDF definition and management. I expect this is still less code needed than for the original options 1, 3, & 4.</p></li> </ol> <h2>Other Considerations:</h2> <ol> <li><p>If there is anything about the nature of the data that would make sense for the UDFs to be grouped, that should be encouraged. That way, those data elements can be combined into a single table. For example, let's say you have UDFs for color, size, and cost. The tendency in the data is that most instances of this data looks like </p> <pre><code> 'red', 'large', 45.03 </code></pre> <p>rather than </p> <pre><code> NULL, 'medium', NULL </code></pre> <p>In such a case, you won't incur a noticeable speed penalty by combining the 3 columns in 1 table because few values would be NULL and you avoid making 2 more tables, which is 2 fewer joins needed when you need to access all 3 columns.</p></li> <li><p>If you hit a performance wall from a UDF that is heavily populated and frequently used, then that should be considered for inclusion in the master table.</p></li> <li><p>Logical table design can take you to a certain point, but when the record counts get truly massive, you also should start looking at what table partitioning options are provided by your RDBMS of choice.</p></li> </ol>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload