Note that there are some explanatory texts on larger screens.

plurals
  1. PODelete characters at positions within a string in R?
    primarykey
    data
    text
    <p>I am looking for a way to delete the characters at certain positions within a string in R. For example, if we have a string <code>"1,2,1,1,2,1,1,1,1,2,1,1"</code>, I want to delete the third, fourth, 7th and 8th position. The operation would make the string: <code>"1,1,2,1,1,1,1,2,1,1"</code>.</p> <p>Unfortunately, breaking the string into a list using strsplit is not an option, because the strings I am working with are over 1 million characters long. Considering I have about 2,500 strings, it works out to be quite some time.</p> <p>Alternatively, finding a way to replace the characters with an empty string <code>""</code> would achieve the same purpose - I think. Looking into this line of thought, I came across this StackOverflow post:</p> <p><a href="https://stackoverflow.com/questions/6819573/r-how-can-i-replace-lets-say-the-5th-element-within-a-string">R: How can I replace let&#39;s say the 5th element within a string?</a></p> <p>Unfortunately, the solution suggested is hard to efficiently generalize and the following takes about 60 seconds per input string for a list of 2000 positions to remove:</p> <pre><code>subchar2 = function(inputstring, pos){ string = "" memory = 0 for(num in pos){ string = paste(string, substr(inputstring, (memory+1), (num-1)), sep = "") memory = num } string = paste(string, substr(inputstring,(memory+1), nchar(inputstring)),sep = "") return(string) } </code></pre> <p>Looking into the problem, I found a snippet of code, that seems to replace the characters at certain positions with <code>"-"</code>:</p> <pre><code>subchar &lt;- function(string, pos) { for(i in pos) { string &lt;- gsub(paste("^(.{", i-1, "}).", sep=""), "\\1-", string) } return(string) } </code></pre> <p>I don't quite understand regular expression (yet), but I have a strong suspicion something along these lines will be much better time-wise than the first code solution. Unfortunately, this subchar function seems to break when the values in pos gets high:</p> <pre><code>&gt; test = subchar(data[1], 257) Error in gsub(paste("^(.{", i - 1, "}).", sep = ""), "\\1-", string) : invalid regular expression '^(.{256}).', reason 'Invalid contents of {}' </code></pre> <p>I was also considering trying to read in the string data into a table using SQL, but I was hoping that there would be a elegant string solution. The SQL implementation in R to do this seems rather complicated.</p> <p>Any ideas? Thanks!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload