Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Something like this might work for your first part. I'm unable to download the file right now but when I can, I will try and respond to the second part as well.</p> <pre><code>library(data.table) library(stringr) # Slightly modified dataset dataset &lt;- data.table( Sequence = c( 'AAAAGAAAVANQGKK' ,'AAAAGAAAVANQGKK' ,'AAIKFIKFINPKINDGE' ,'AAIKFIKFINPKINDGE' ,'AAIKFIKFINPKINDGE' ,'AAIKFIKFINPKINDGE' ,'AAIYKLLKSHFRNE' ,'AAKKFEE' ), modifications = c( '[14] Acetyl (K)|[15] Acetyl (K)' ,'[14] Acetyl (K)|[15] Acetyl (K)' ,'[4] Acetyl (K)|[7] Something (K)|[12] Acetyl (K)' ,'[4] Acetyl (K)|[7] Acetyl (K)|[12] Acetyl (K)' ,'[7] Acetyl (K)|[12] Acetyl (K)' ,'[4] Acetyl (K)|[7] Acetyl (K)' ,'[5] Biotin (K)|[8] Acetyl (K)' ,'[3] Acetyl (K)' ) ) # get the 1st, 2nd, 3rd modifications in separate columns dataset &lt;- data.table(cbind( dataset, str_split_fixed(dataset[,modifications], pattern = "\\(K\\)",3) )) dataset[,':='( V1 = as.character(V1), V2 = as.character(V2), V3 = as.character(V3) )] # Count of modifications dataset[, NoOfKs := 3] dataset[V3 == "", NoOfKs := 2] dataset[V2 == "", NoOfKs := 1] dataset[V1 == "", NoOfKs := 0] # Retaining Acetyl/Biotin or no modification only dataset[, AB01 := TRUE] dataset[, AB02 := TRUE] dataset[, AB03 := TRUE] dataset[V1 != "", AB01 := grepl(V1, pattern = "Acetyl|Biotin")] dataset[V2 != "", AB02 := grepl(V2, pattern = "Acetyl|Biotin")] dataset[V3 != "", AB03 := grepl(V3, pattern = "Acetyl|Biotin")] dataset &lt;- dataset[AB01 &amp; AB02 &amp; AB03] # Marking each modification as acetyl/biotin/none dataset[V1 != " " &amp; grepl(V1, pattern = "Acetyl"), AB1 := "Acetyl"] dataset[V1 != " " &amp; grepl(V1, pattern = "Biotin"), AB1 := "Biotin"] dataset[V2 != " " &amp; grepl(V2, pattern = "Acetyl"), AB2 := "Acetyl"] dataset[V2 != " " &amp; grepl(V2, pattern = "Biotin"), AB2 := "Biotin"] dataset[V3 != " " &amp; grepl(V3, pattern = "Acetyl"), AB3 := "Acetyl"] dataset[V3 != " " &amp; grepl(V3, pattern = "Biotin"), AB3 := "Biotin"] dataset[ , list( Sequence = Sequence, modifications = modifications, GroupID = .GRP ), by = c('NoOfKs','AB1','AB2','AB3') ] </code></pre> <p>Output</p> <pre><code> NoOfKs AB1 AB2 AB3 Sequence modifications GroupID 1: 2 Acetyl Acetyl NA AAAAGAAAVANQGKK [14] Acetyl (K)|[15] Acetyl (K) 1 2: 2 Acetyl Acetyl NA AAAAGAAAVANQGKK [14] Acetyl (K)|[15] Acetyl (K) 1 3: 2 Acetyl Acetyl NA AAIKFIKFINPKINDGE [7] Acetyl (K)|[12] Acetyl (K) 1 4: 2 Acetyl Acetyl NA AAIKFIKFINPKINDGE [4] Acetyl (K)|[7] Acetyl (K) 1 5: 3 Acetyl Acetyl Acetyl AAIKFIKFINPKINDGE [4] Acetyl (K)|[7] Acetyl (K)|[12] Acetyl (K) 2 6: 2 Biotin Acetyl NA AAIYKLLKSHFRNE [5] Biotin (K)|[8] Acetyl (K) 3 7: 1 Acetyl NA NA AAKKFEE [3] Acetyl (K) 4 </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload