Note that there are some explanatory texts on larger screens.

plurals
  1. POJMeter CSV Data Set is corrupting Japanese strings stored as proper UTF-8, I get Question Marks instead
    primarykey
    data
    text
    <p>I read in search terms from a simple text file to send to a search engine. It works fine in English, but gives me ???? for any Japanese text. Text with mixed English and Japanese does show the English text, so I know it's reading it.</p> <p>What I'm seeing:</p> <ul> <li>Input text: Snow Leopard をインストールする場合、新しい</li> <li>Turns into: Snow Leopard ???????????????</li> </ul> <p>This is in my POST field of an HTTP. If I set JMeter to encode the data, it just puts in the percent sequence for question marks.</p> <p>About the Data:</p> <ul> <li>The CSV file is very simple in structure.</li> <li>There's only one field / one column, which I name TERM, and later use as ${TERM}</li> <li>I don't really need full CSV because it's only one string per line.</li> <li>There's no commas or quotes.</li> <li>It's UTF-8 and when I run the Unix "file" command on the file, it says UTF-8 text.</li> <li>I've also verified UTF-8 in command line and graphical mode on two machines.</li> </ul> <p>Interesting note: An interesting coincidence that I noticed: if there are 15 Japanese characters then I get 15 question marks, so at some point it's being seen as full characters and not just bytes.</p> <p>JMeter CSV Dataset Config:</p> <ul> <li>Filename: japanese-searches.csv</li> <li>File encoding: UTF-8 (also tried without)</li> <li>Variable names: TERM</li> <li>Delimiter: ,</li> <li>Allow Quoted Data: False (I also tried True, different, but still wrong)</li> <li>Recycle at EOF: True</li> <li>Stop at EOF: False</li> <li>Staring mode: All threads</li> </ul> <p>A few things I've tried: - Tried Allow quoted Data. It changed to other strange characters. - Added -Dfile.encoding=UTF-8 - Tried encoding the POST stage, but it just turned into a bunch of %nn for question marks</p> <p>And I'm not sure how "debug" just after the each line of the CSV is read in. I <em>think</em> it's corrupted right away, but I'm not sure.</p> <p>If it's only mangled when I reference it, then instead of ${TERM} perhaps there's some other "to bytes" function call. I'll start checking into that. I haven't done anything with the JMeter functions yet.</p> <p>Edited Dec 24:</p> <p>Tweaks:</p> <ul> <li>Changed formatting and added bullet points for more clarity.</li> <li>Clarified that the file is UTF-8, and have verified that.</li> </ul> <p>A new theory:</p> <ul> <li>Is it possible that the Japanese characters are making it through, and the issue is that EVERY SINGLE place that shows them maps them to a "?" at DISPLAY TIME only. So even though I've checked in a bunch of places, they all have a display issue just in the UI?</li> <li>Is there a way in JMeter to see the numeric value of a character or string? Actually, to tell JMeter to display the list of Unicode code points?</li> <li>I'll look at my last log files... although I suppose even the server logs could mis-mapped the characters.</li> <li>Also, perhaps when doing variable expansion inside of the text field that I POST, where I reference the ${TERM}, maybe at <em>that</em> point it also maps to question marks, but that the corruption happens at that later point. If that happened, AND it was mis-displayed in the UI, then it might lead to a false conclusion.</li> <li>What I'd really like to do is pause JMeter after the first CSV record, just after that line is loaded, and look at it with a "data scope" or byte editor or something. Not sure if this is possible.</li> </ul>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload