Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to quickly find and replace many items on a list without replacing previously replaced items in BASH?
    primarykey
    data
    text
    <p>I want to perform about many find and replace operations on some text. I have a UTF-8 CSV file containing what to find (in the first column) and what to replace it with (in the second column), arranged from longest to shortest.</p> <p>E.g.:</p> <pre><code>orange,fruit2 carrot,vegetable1 apple,fruit3 pear,fruit4 ink,item1 table,item2 </code></pre> <p>Original file:</p> <pre><code>"I like to eat apples and carrots" </code></pre> <p>Resulting output file:</p> <pre><code>"I like to eat fruit3s and vegetable1s." </code></pre> <p>However, I want to ensure that if one part of text has already been replaced, that it doesn't mess with text that was already replaced. In other words, I don't want it to appear like this (it matched "table" from within vegetable1):</p> <pre><code>"I like to eat fruit3s and vegeitem21s." </code></pre> <p>Currently, I am using this method which is quite slow, because I have to do the whole find and replace twice:</p> <p>(1) Convert the CSV to three files, e.g.:</p> <pre><code>a.csv b.csv c.csv orange 0001 fruit2 carrot 0002 vegetable1 apple 0003 fruit3 pear 0004 fruit4 ink 0005 item1 table 0006 item 2 </code></pre> <p>(2) Then, replace all items from <code>a.csv</code> in <code>file.txt</code> with the matching column in <code>b.csv</code>, using <code>ZZZ</code> around the words to make sure there is no mistake later in matching the numbers:</p> <pre><code>a=1 b=`wc -l &lt; ./a.csv` while [ $a -le $b ] do for i in `sed -n "$a"p ./b.csv`; do for j in `sed -n "$a"p ./a.csv`; do sed -i "s/$i/ZZZ$j\ZZZ/g" ./file.txt echo "Instances of '"$i"' replaced with '"ZZZ$j\ZZZ"' ("$a"/"$b")." a=`expr $a + 1` done done done </code></pre> <p>(3) Then running this same script again, but to replace <code>ZZZ0001ZZZ</code> with <code>fruit2</code> from <code>c.csv</code>.</p> <p>Running the first replacement takes about 2 hours, but as I must run this code twice to avoid editing the already replaced items, it takes twice as long. Is there a more efficient way to run a find and replace that does not perform replacements on text already replaced?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload