StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POExporting de-aggregated data
primarykey
Id
2710451
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2010-04-26T00:01:26.137
FavoriteCount
0
LastActivityDate
2010-04-26T15:54:47.320
LastEditDate
2010-04-26T15:54:47.320
LastEditorUserId
67137
OwnerUserId
67137
ParentId
0
PostTypeId
1
Score
0
ViewCount
200
LastEditorDisplayName
text
Body
I'm currently working on a data export feature for a survey application. We are using SQL2k8. We store data in a normalized format: QuestionId, RespondentId, Answer. We have a couple other tables that define what the question text is for the QuestionId and demographics for the RespondentId... Currently I'm using some dynamic SQL to generate a pivot that joins the question table to the answer table and creates an export, its working... The problem is that it seems slow and we don't have that much data (less than 50k respondents). Right now I'm thinking "why am I 'paying' to de-aggregate the data for each query? Why don't I cache that?" The data being exported is based on dynamic criteria. It could be "give me respondents that completed on x date (or range)" or "people that like blue", etc. Because of that, I think I have to cache at the respondent level, find out what respondents are being exported and then select their combined cached de-aggregated data. To me the quick and dirty fix is a totally flat table, RespondentId, Question1, Question2, etc. The problem is, we have multiple clients and that doesn't scale AND I don't want to have to maintain the flattened table as the survey changes. So I'm thinking about putting an XML column on the respondent table and caching the results of a SELECT * FROM Data FOR XML AUTO WHERE RespondentId = x. With that in place, I would then be able to get my export with filtering and XML calls into the XML column. What are you doing to export aggregated data in a flattened format (CSV, Excel, etc)? Does this approach seem ok? I worry about the cost of XML functions on larger result sets (think SELECT RespondentId, XmlCol.value('//data/question_1', 'nvarchar(50)') AS [Why is there air?], XmlCol.RinseAndRepeat)... Is there a better technology/approach for this? Thanks! EDIT: SQL Block for testing. Run steps 1 & 2 to prime the data, test with step 3, clean up with step 4... At a thousand respondents by one hundred questions, it already seems slower than I'd like. <pre><code>SET NOCOUNT ON; -- step 1 - create seed data CREATE TABLE #Questions (QuestionId INT PRIMARY KEY IDENTITY (1,1), QuestionText VARCHAR(50)); CREATE TABLE #Respondents (RespondentId INT PRIMARY KEY IDENTITY (1,1), Name VARCHAR(50)); CREATE TABLE #Data (QuestionId INT NOT NULL, RespondentId INT NOT NULL, Answer INT); DECLARE @QuestionTarget INT = 100 ,@QuestionCount INT = 0 ,@RespondentTarget INT = 1000 ,@RespondentCount INT = 0 ,@RespondentId INT; WHILE @QuestionCount < @QuestionTarget BEGIN INSERT INTO #Questions(QuestionText) VALUES(CAST(NEWID() AS CHAR(36))); SET @QuestionCount = @QuestionCount + 1; END; WHILE @RespondentCount < @RespondentTarget BEGIN INSERT INTO #Respondents(Name) VALUES(CAST(NEWID() AS CHAR(36))); SET @RespondentId = SCOPE_IDENTITY(); SET @QuestionCount = 1; WHILE @QuestionCount <= @QuestionTarget BEGIN INSERT INTO #Data(QuestionId, RespondentId, Answer) VALUES(@QuestionCount, @RespondentId, ROUND(((10 - 1 -1) * RAND() + 1), 0)); SET @QuestionCount = @QuestionCount + 1; END; SET @RespondentCount = @RespondentCount + 1; END; -- step 2 - index seed data ALTER TABLE #Data ADD CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED (QuestionId ASC, RespondentId ASC); CREATE INDEX DataRespondentQuestion ON #Data (RespondentId ASC, QuestionId ASC); -- step 3 - query data DECLARE @Columns NVARCHAR(MAX) ,@TemplateSQL NVARCHAR(MAX) ,@RunSQL NVARCHAR(MAX); SELECT @Columns = STUFF( ( SELECT DISTINCT '],[' + q.QuestionText FROM #Questions AS q ORDER BY '],[' + q.QuestionText FOR XML PATH('') ), 1, 2, '') + ']'; SET @TemplateSql = 'SELECT * FROM ( SELECT r.Name, q.QuestionText, d.Answer FROM #Respondents AS r INNER JOIN #Data AS d ON d.RespondentId = r.RespondentId INNER JOIN #Questions AS q ON q.QuestionId = d.QuestionId ) AS d PIVOT ( MAX(d.Answer) FOR d.QuestionText IN (xxCOLUMNSxx) ) AS p;'; SET @RunSql = REPLACE(@TemplateSql, 'xxCOLUMNSxx', @Columns) EXECUTE sys.sp_executesql @RunSql; -- step 4 - clean up DROP INDEX DataRespondentQuestion ON #Data; DROP TABLE #Data; DROP TABLE #Questions; DROP TABLE #Respondents; </code></pre>
Tags
<sql-server><xml><caching><export>
Title
Exporting de-aggregated data
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USBennett Dill
UserOwnerUserId
1. USBennett Dill
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.