Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>In general one should sanitize first - "for your protection, and theirs." This includes stripping out any invalid characters (character coding sensitive, of course). If a field should only contain characters and spaces, then strip out anything that isn't that first.</p> <p>With that done, you then validate the results - is the name already used (for unique fields), is it the right size, is it not blank?</p> <p>The reason you give is precisely the right one - to maximize the user experience. Don't confuse the user, if you can avoid it. This helps protect from dumb copy &amp; paste behavior, but you have to be careful - if I want my name recorded as "Ke$h@", I may or may not be ok with changing it to "Keh".</p> <p>Secondly, it is also to prevent bugs. </p> <p>What happens when you want to create usernames that don't allow special characters? If I enter "Brian", and your system rejects it as the name us already in use, then I submit "Brian$"? First you validate it, and it is not in use, then you strip special characters and you are left with "Brian". Uh oh - now you either have to validate AGAIN, or you'll get a weird error that either account creation failed (if your database is set to require unique usernames, for instance), or worse it will succeed and over-writing/corruption occurs to user user accounts.</p> <p>Another example is minimum field lengths: if you require a name be at least 3 letters long and only accept letters, and I enter "no" you'd reject it; but if I enter "no@#$%" you would might say it was valid (long enough), sanitize it, and now it isn't valid anymore, etc.</p> <p>The easy way to avoid this is sanitize first, and then you don't have to double-think about validation.</p> <p>However, Niet was right about not encoding data before storage; it is generally much easier to setup output into HTML as being encoded when appropriate, then it is to remember to decode it when you just want the plain text (to entry into text boxes, JSON strings, etc). Most test cases you'll use won't include data with HTML entities, so its easy to introduce silly bugs that aren't easily caught.</p> <p>The big problem is that when such a bug is introduced, it can quickly lead to data corruption that is not easily solved. Example: you have plain text, output it to a text field incorrectly as html entities, the form gets submitted back and you re-encode it...every time it gets opened/resubmitted it gets re-encoded. With a busy site/form you could end up with thousands of differently encoded entries, with no clear way to determine what should and what was not intended to be HTML encoded.</p> <p>Protecting from injection is good, but HTML encoding isn't designed (and must not be relied upon) to do that.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload