Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I wrote an open source Python tool to simplify validation of such files available from <a href="http://pypi.python.org/pypi/cutplace/" rel="nofollow noreferrer">http://pypi.python.org/pypi/cutplace/</a>.</p> <p>The basic idea is that you describe the data format in a structured interface specification using OpenOffice.org, Excel or plain CSV. This is done in a few minutes and legible enough to serve as documentation too. We use it to validate files with about 200.000 rows on a daily base.</p> <p>You can validate a CSV file using the command line:</p> <pre><code>cutplace specification.csv data.csv </code></pre> <p>In case invalid data rows are found, the exit code is 1. If you need more control, you can write a little Python script that imports the cutplace module and adds a listener for validation events.</p> <p>As example, here's a specification that would validate the sample data you provided, filling the gaps of your short description by making a few assumptions. (I'm writing the specification in CSV to inline it in this post. In practice I prefer OpenOffice.org's Calc and ODS because I can use more formating and make it easier to read and maintain.)</p> <pre><code>,"Interface: Show statistics" , ,"Data format" "D","Format","CSV" "D","Item delimiter",";" "D","Header","1" "D","Encoding","ASCII" , ,"Fields" ,"Name","Example","Empty","Length","Type","Rule" "F","date","15-Mar-10",,,"RegEx","\d\d-[A-Z][a-z][a-z]-\d\d" "F","id","231",,,"Integer","0:" "F","shown","345",,,"Integer","0:" , ,"Checks" ,"Description","Type","Rule" "C","id per date must be unique","IsUnique","date, id" </code></pre> <p>Lines starting with "D" describe the basic data format. In this case it is a CSV file using ";" as delimiter with 1 header line in ASCII encoding.</p> <p>Lines starting with "F" describe the various fields. For example,</p> <pre><code>,"Name","Example","Empty","Length","Type","Rule" "F","id","231",,,"Integer","0:" </code></pre> <p>defines a mandatory field "id" of type Integer with a value of 0 or greater. To allow the field to be empty, specify an "X" in the "Empty" column:</p> <pre><code>,"Name","Example","Empty","Length","Type","Rule" "F","id","231","X",,"Integer","0:" </code></pre> <p>Finally there is an optional section to contain more advances checks spawning the whole file, not only single rows. For example, if each date in your file must provide date for an id only once, you can state this using:</p> <pre><code>,"Description","Type","Rule" "C","id per date must be unique","IsUnique","date, id" </code></pre> <p>Any row that starts with an empty column can contain any text you like and will not be processed during validation. This is useful for headings, comments and so on.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload