Note that there are some explanatory texts on larger screens.

plurals
  1. POBiopython (or just Python in general): Most Efficient Way to Parse Species Name From A large .fasta file using gi identifier
    primarykey
    data
    text
    <p>I have a .fasta file (.txt essentiallly) of about 145000 entries that are formatted as below</p> <pre><code>&gt;gi|393182|gb|AAA40101.1| cytokine [Mus musculus] MDAKVVAVLALVLAALCISDGKPVSLSYRCPCRFFESHIARANVKHLKILNTPNCALQIVARLKNNNRQV CIDPKLKWIQEYLEKALNKRLKM &gt;gi|378792467|pdb|3UNH|Y Chain Y, Mouse 20s Immunoproteasome TTTLAFKFQHGVIVAVDSRATAGSYISSLRMNKVIEINPYLLGTMSGCAADCQYWERLLAKECRLYYLRN GERISVSAASKLLSNMMLQYRGMGLSMGSMICGWDKKGPGLYYVDDNGTRLSGQMFSTGSGNTYAYGVMD SGYRQDLSPEEAYDLGRRAIAYATHRDNYSGGVVNMYHMKEDGWVKVESSDVSDLLYKYGEAAL &gt;gi|378792462|pdb|3UNH|T Chain T, Mouse 20s Immunoproteasome MSSIGTGYDLSASTFSPDGRVFQVEYAMKAVENSSTAIGIRCKDGVVFGVEKLVLSKLYEEGSNKRLFNV DRHVGMAVAGLLADARSLADIAREEASNFRSNFGYNIPLKHLADRVAMYVHAYTLYSAVRPFGCSFMLGS YSANDGAQLYMIDPSGVSYGYWGCAIGKARQAAKTEIEKLQMKEMTCRDVVKEVAKIIYIVHDEVKDKAF ELELSWVGELTKGRHEIVPKDIREEAEKYAKESLKEEDESDDDNM </code></pre> <ol> <li>I have a list of gi's (the first number listed after the |).</li> <li>The size of this list varies between 60 - 600 gi's for a given test</li> <li>I want to return a list with respective species of those gi's</li> <li>The species name is usually seen as in the first example (surrounded by square brackets [Mus musculus]) it is not always present.</li> <li>Order is not particularly important.</li> </ol> <p>I have been using various BioPython parsing bits and pieces but I think because of the size of the search it fails. I was hoping someone on here would know of a more efficient way? </p> <p>Thanks in advance!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload