StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPerl Protein Seq. ID and SQ and "//"
text
Body
copied!<p>So the new Mission is to Download the File from the Website (<a href="http://ceres.primus-fatum.de/~fate/scriptsprachen/uniprotDB_part.txt" rel="nofollow">http://ceres.primus-fatum.de/~fate/scriptsprachen/uniprotDB_part.txt</a>) and then i must to du an subroutine to save line by line and then search for ID and Sq .. and All of that should saved in new Txt file : 1. Id Line should be at first , 2. SQ at last 3. Everything else should come between ID and SQ and at the End should come Salsh .... here is an Example.. but the File have 1000 Example</p> <p>Example of the output expected:</p> <pre><code>ID 001R_FRG3G Reviewed; 256 AA. -> ID First place ***** AC Q6GZX4; DT 28-JUN-2011, integrated into UniProtKB/Swiss-Prot. DT 19-JUL-2004, sequence version 1. DT 18-APR-2012, entry version 24. DE RecName: Full=Putative transcription factor 001R; GN ORFNames=FV3-001R; OS Frog virus 3 (isolate Goorha) (FV-3). OC Viruses; dsDNA viruses, no RNA stage; Iridoviridae; Ranavirus. OX NCBI_TaxID=654924; OH NCBI_TaxID=8295; Ambystoma (mole salamanders). OH NCBI_TaxID=30343; Hyla versicolor (chameleon treefrog). OH NCBI_TaxID=8316; Notophthalmus viridescens (Eastern newt) (Triturus viridescens). OH NCBI_TaxID=8404; Rana pipiens (Northern leopard frog). OH NCBI_TaxID=45438; Rana sylvatica (Wood frog). RN [1] RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. RX PubMed=15165820; DOI=10.1016/j.virol.2004.02.019; RA Tan W.G., Barkman T.J., Gregory Chinchar V., Essani K.; RT "Comparative genomic analyses of frog virus 3, type species of the RT genus Ranavirus (family Iridoviridae)."; RL Virology 323:70-84(2004). CC -!- FUNCTION: Transcription activation (Potential). CC ----------------------------------------------------------------------- CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms CC Distributed under the Creative Commons Attribution-NoDerivs License CC ----------------------------------------------------------------------- DR EMBL; AY548484; AAT09660.1; -; Genomic_DNA. DR RefSeq; YP_031579.1; NC_005946.1. DR ProteinModelPortal; Q6GZX4; -. DR GeneID; 2947773; -. DR ProtClustDB; CLSP2511514; -. DR GO; GO:0006355; P:regulation of transcription, DNA-dependent; IEA:UniProtKB-KW. DR GO; GO:0046782; P:regulation of viral transcription; IEA:InterPro. DR GO; GO:0006351; P:transcription, DNA-dependent; IEA:UniProtKB-KW. DR InterPro; IPR007031; Poxvirus_VLTF3. DR Pfam; PF04947; Pox_VLTF3; 1. PE 4: Predicted; KW Activator; Complete proteome; Reference proteome; Transcription; KW Transcription regulation. FT CHAIN 1 256 Putative transcription factor 001R. FT /FTId=PRO_0000410512. FT COMPBIAS 14 17 Poly-Arg. SQ SEQUENCE 256 AA; 29735 MW; B4840739BF7D4121 CRC64; -> SQ at LAST and then "//" MAFSAEDVLK EYDRRRRMEA LLLSLYYPND RKLLDYKEWS PPRVQVECPK APVEWNNPPS EKGLIVGHFS GIKYKGEKAQ ASEVDVNKMC CWVSKFKDAM RRYQGIQTCK IPGKVLSDLD AKIKAYNLTV EGVEGFVRYS RVTKQHVAAF LKELRHSKQY ENVNLIHYIL TDKRVDIQHL EKDLVKDFKA LVESAHRMRQ GHMINVKYIL YQLLKKHGHG PDGPDILTVK TGSKGVLYDD SFRKIYTDLG WKFTPL // </code></pre> <p>I have tried this:</p> <pre><code>use strict; use warnings; sub main { my @file_data=(); my $motif =''; my $protein_seq=''; my $h= '[VLIM]'; my $s= '[AG]'; my $x= '[ARNDCEQGHILKMFPSTWYV]'; my $regexp = "($I){1}D"; ->motif to be searched is ID my $regexp = "($S){1}Q"; ->motif to be searched is SQ my @locations=(); @file_data= get_file_data("seq.txt"); $protein_seq= extract_sequence(@file_data); foreach my $line(@file_data){ if ($motif=~ /$regexp/){ print "found motif \n\n"; } else { print "not found \n\n"; } } </code></pre> <p>Recording the location/position of motif to be outputed..</p> <pre><code> @locations= match_position($regexp,$seq); if (@locations){ print "Searching for motifs $regexp \n"; print "Catalytic site is at location:\n"; } else{ print "motif not found \n\n"; } exit; sub get_file_data{ #body... my ($filename)=@_; my $sequence=''; foreach my $line(@file_data){ if ($line=~ /^\s*$/){ next; } elsif ($line=~ /^\s*#/){ next; } elsif ($line=~ /^>/){ next; } else { $sequence.=$line; } } $sequence=~ s/\s//g; return $sequence; } sub(match_positions) { my ($regexp, $sequence)=@_; use strict; my @position=(); while ($sequence=~ /$regexp/ig){ push (@position, $-[0]); } return @position; } } main(); </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload