Note that there are some explanatory texts on larger screens.

plurals
  1. POOptimizing SDF filesize
    primarykey
    data
    text
    <p>I recently started learning Linq and SQL. As a small project I'm writing a dictionary application for Windows Phone. The project is split into two Applications. One Application (that currently runs on my PC) generates a SDF file on my PC. The second App runs on my Windows Phone and searches the database. However I would like to optimize the data usage. The raw entries of the dictionary are written in a TXT file with a filesize of around 39MB. The file has the following layout</p> <pre><code>germanWord \tab englishWord \tab group germanWord \tab englishWord \tab group </code></pre> <p>The file is parsed into a SDF database with the following tables.</p> <p>Table <em>Word</em> with columns <em>_version (rowversion), Id (int IDENTITY), Word (nvarchar(250)), Language (int)</em><br> This table contains every single word in the file. The language is a flag from my code that I used in case I want to add more languages later. A word-language pair is unique.</p> <p>Table <em>Group</em> with columns <em>_version (rowversion), GroupId (int IDENTITY), Caption (nvarchar(250))</em><br> This table contains the different groups. Every group is present one time.</p> <p>Table <em>Entry</em> with columns <em>_version (rowversion), EntryId (int IDENTITY), WordOneId (int), WordTwoId(int), GroupId(int)</em><br> This table links translations together. <em>WordOneId</em> and <em>WordTwoId</em> are foreign keys to a row in the <em>Word</em> Table, they contain the id of a row. <em>GroupId</em> defines the group the words belong to.</p> <p>I chose this layout to reduce the data footprint. The raw textfile contains some german (or english) words multiple times. There are around 60 groups that repeat themselfes. Programatically I reduce the wordcount from around 1.800.000 to around 1.100.000. There are around 50 rows in the <em>Group</em> table. Despite the reduced number of words the SDF is around 80MB in filesize. That's more than twice the size of the the raw data. Another thing is that in order to speed up the searching of translation I plan to index the <em>Word</em> column of the <em>Word</em> table. By adding this index the file grows to over 130MB. </p> <p>How can it be that the SDF with ~60% of the original data is twice as large? </p> <p>Is there a way to optimize the filesize?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload