Note that there are some explanatory texts on larger screens.

plurals
  1. POExporting de-aggregated data
    primarykey
    data
    text
    <p>I'm currently working on a data export feature for a survey application. We are using SQL2k8. We store data in a normalized format: QuestionId, RespondentId, Answer. We have a couple other tables that define what the question text is for the QuestionId and demographics for the RespondentId...</p> <p>Currently I'm using some dynamic SQL to generate a pivot that joins the question table to the answer table and creates an export, its working... The problem is that it seems slow and we don't have that much data (less than 50k respondents).</p> <p>Right now I'm thinking "why am I 'paying' to de-aggregate the data for each query? Why don't I cache that?" The data being exported is based on dynamic criteria. It could be "give me respondents that completed on x date (or range)" or "people that like blue", etc. Because of that, I think I have to cache at the respondent level, find out what respondents are being exported and then select their combined cached de-aggregated data.</p> <p>To me the quick and dirty fix is a totally flat table, RespondentId, Question1, Question2, etc. The problem is, we have multiple clients and that doesn't scale AND I don't want to have to maintain the flattened table as the survey changes.</p> <p>So I'm thinking about putting an XML column on the respondent table and caching the results of a SELECT * FROM Data FOR XML AUTO WHERE RespondentId = x. With that in place, I would then be able to get my export with filtering and XML calls into the XML column.</p> <p>What are you doing to export aggregated data in a flattened format (CSV, Excel, etc)? Does this approach seem ok? I worry about the cost of XML functions on larger result sets (think SELECT RespondentId, XmlCol.value('//data/question_1', 'nvarchar(50)') AS [Why is there air?], XmlCol.RinseAndRepeat)...</p> <p>Is there a better technology/approach for this?</p> <p>Thanks!</p> <p>EDIT: SQL Block for testing. Run steps 1 &amp; 2 to prime the data, test with step 3, clean up with step 4... At a thousand respondents by one hundred questions, it already seems slower than I'd like.</p> <pre><code>SET NOCOUNT ON; -- step 1 - create seed data CREATE TABLE #Questions (QuestionId INT PRIMARY KEY IDENTITY (1,1), QuestionText VARCHAR(50)); CREATE TABLE #Respondents (RespondentId INT PRIMARY KEY IDENTITY (1,1), Name VARCHAR(50)); CREATE TABLE #Data (QuestionId INT NOT NULL, RespondentId INT NOT NULL, Answer INT); DECLARE @QuestionTarget INT = 100 ,@QuestionCount INT = 0 ,@RespondentTarget INT = 1000 ,@RespondentCount INT = 0 ,@RespondentId INT; WHILE @QuestionCount &lt; @QuestionTarget BEGIN INSERT INTO #Questions(QuestionText) VALUES(CAST(NEWID() AS CHAR(36))); SET @QuestionCount = @QuestionCount + 1; END; WHILE @RespondentCount &lt; @RespondentTarget BEGIN INSERT INTO #Respondents(Name) VALUES(CAST(NEWID() AS CHAR(36))); SET @RespondentId = SCOPE_IDENTITY(); SET @QuestionCount = 1; WHILE @QuestionCount &lt;= @QuestionTarget BEGIN INSERT INTO #Data(QuestionId, RespondentId, Answer) VALUES(@QuestionCount, @RespondentId, ROUND(((10 - 1 -1) * RAND() + 1), 0)); SET @QuestionCount = @QuestionCount + 1; END; SET @RespondentCount = @RespondentCount + 1; END; -- step 2 - index seed data ALTER TABLE #Data ADD CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED (QuestionId ASC, RespondentId ASC); CREATE INDEX DataRespondentQuestion ON #Data (RespondentId ASC, QuestionId ASC); -- step 3 - query data DECLARE @Columns NVARCHAR(MAX) ,@TemplateSQL NVARCHAR(MAX) ,@RunSQL NVARCHAR(MAX); SELECT @Columns = STUFF( ( SELECT DISTINCT '],[' + q.QuestionText FROM #Questions AS q ORDER BY '],[' + q.QuestionText FOR XML PATH('') ), 1, 2, '') + ']'; SET @TemplateSql = 'SELECT * FROM ( SELECT r.Name, q.QuestionText, d.Answer FROM #Respondents AS r INNER JOIN #Data AS d ON d.RespondentId = r.RespondentId INNER JOIN #Questions AS q ON q.QuestionId = d.QuestionId ) AS d PIVOT ( MAX(d.Answer) FOR d.QuestionText IN (xxCOLUMNSxx) ) AS p;'; SET @RunSql = REPLACE(@TemplateSql, 'xxCOLUMNSxx', @Columns) EXECUTE sys.sp_executesql @RunSql; -- step 4 - clean up DROP INDEX DataRespondentQuestion ON #Data; DROP TABLE #Data; DROP TABLE #Questions; DROP TABLE #Respondents; </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload