Note that there are some explanatory texts on larger screens.

plurals
  1. POopinions and advice on database structure
    primarykey
    data
    text
    <p>I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:</p> <pre><code>a:b:c:d:e a:b:c:d:e a:b:c:d:e a:b:c:d:e </code></pre> <p>And I have a list of categories to break these rows up into, for example:</p> <pre><code>Original Cat1 Cat2 Cat3 Cat4 Cat5 --------------------------------------- a:b:c:d:e a b c d e </code></pre> <p>As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed). </p> <p>Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories. </p> <p>Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:</p> <pre><code>Table: Generated generated_id int - unique id for each row generated generated_timestamp datetime - timestamp of when row was generated last_updated datetime - timestamp of when row last updated generated_method varchar(6) - method in which row was generated (manual or auto) original_string varchar (255) - the original string Table: Categories category_id int - unique id for category category_name varchar(20) - name of category Table: Category_Values category_map_id int - unique id for each value (not sure if I actually need this) category_id int - id value to link to table Categories generated_id int - id value to link to table Generated category_value varchar (255) - value for the category </code></pre> <p>Basically the idea is when I parse a row, I will insert a new entry into table <code>Generated</code>, as well as X entries in table <code>Category_Values</code>, where X is however many categories there currently are. And the category names are stored in another table <code>Categories</code>.</p> <p>What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.</p> <p>Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like </p> <pre><code>select * from Generated where original_string = '$string' // id is put into $id </code></pre> <p>and then </p> <pre><code>select * from Category_Values where generated_id = '$id' </code></pre> <p>...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I <em>can</em> do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload