StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPerl: Matching four different files and obtaining particular Information in output file
primarykey
Id
10906508
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2012-06-05T23:45:49.213
FavoriteCount
1
LastActivityDate
2012-06-06T22:07:59.717
LastEditDate
2012-06-06T16:26:07.303
LastEditorUserId
1337
OwnerUserId
1392636
ParentId
0
PostTypeId
1
Score
4
ViewCount
325
LastEditorDisplayName
text
Body
I have four files. File 1 (named as inupt_22.txt) is an input file containing two columns (space delimited). First column is the alphabetically sorted list of ligandcode (three letter/number code for a particular ligand). Second column is a list of PDBcodes (Protein Data Bank code) respective of each ligandcode (unsorted list though).<img src="https://i.stack.imgur.com/9yrUA.jpg" alt="enter image description here"> File 1 (input_22.txt): <pre class="lang-none prettyprint-override"><code> 803 1cqp AMH 1b2i ASC 1f9g ETS 1cil MIT 1dwc TFP 1ctr VDX 1db1 ZMR 1a4g </code></pre> File 2(named as SD_2.txt) is a SDF (Structure Data file) for fragments of each ligand. A ligand can contain one or more than one fragments. For instance, here 803 is the ligandcode and it has two fragments. So the file will look like: four dollar sign (<code>$$$$</code>) followed by ligandcode (i.e 803 in this example) in next line. every fragment follows the same thing. Next, in the 5th line of each fragment (third line from <code>$$$$.\n803</code>), there is a number that represents number of rows in next block of rows, like 7 in first fragment and 10 in next fragment of 803 ligand. Now, next block of rows contains a column (61-62) which contains specific number that refers to atoms in fragments. For example in first fragment of 803, these numbers are 15,16,17,19,20,21,22. These numbers need to be matched in file 3.<img src="https://i.stack.imgur.com/icHSA.jpg" alt="enter image description here"> File 2 (SD_2.txt) looks like: <pre class="lang-none prettyprint-override"><code>$$$$ 803 SciTegic05101215222D 7 7 0 0 0 0 999 V2000 3.0215 -0.5775 0.0000 C 0 0 0 0 0 0 0 0 0 15 0 0 2.3070 -0.9900 0.0000 C 0 0 0 0 0 0 0 0 0 16 0 0 1.5926 -0.5775 0.0000 C 0 0 0 0 0 0 0 0 0 17 0 0 1.5926 0.2475 0.0000 C 0 0 0 0 0 0 0 0 0 19 0 0 2.3070 0.6600 0.0000 C 0 0 0 0 0 0 0 0 0 20 0 0 2.3070 1.4850 0.0000 O 0 0 0 0 0 0 0 0 0 21 0 0 3.0215 0.2475 0.0000 O 0 0 0 0 0 0 0 0 0 22 0 0 1 2 1 0 1 7 1 0 2 3 1 0 3 4 1 0 4 5 1 0 5 6 2 0 5 7 1 0 M END > <Name> 803 > <Num_Rings> 1 > <Num_CSP3> 4 > <Fsp3> 0.8 > <Fstereo> 0 $$$$ 803 SciTegic05101215222D 10 11 0 0 0 0 999 V2000 -1.7992 -1.7457 0.0000 C 0 0 0 0 0 0 0 0 0 1 0 0 -2.5137 -1.3332 0.0000 C 0 0 0 0 0 0 0 0 0 2 0 0 -2.5137 -0.5082 0.0000 C 0 0 0 0 0 0 0 0 0 3 0 0 -1.7992 -0.0957 0.0000 C 0 0 0 0 0 0 0 0 0 5 0 0 -1.0847 -0.5082 0.0000 C 0 0 0 0 0 0 0 0 0 6 0 0 -0.3702 -0.0957 0.0000 C 0 0 0 0 0 0 0 0 0 7 0 0 0.3442 -0.5082 0.0000 C 0 0 0 0 0 0 0 0 0 8 0 0 0.3442 -1.3332 0.0000 C 0 0 0 0 0 0 0 0 0 9 0 0 -0.3702 -1.7457 0.0000 C 0 0 0 0 0 0 0 0 0 11 0 0 -1.0847 -1.3332 0.0000 C 0 0 0 0 0 0 0 0 0 12 0 0 1 2 1 0 1 10 1 0 2 3 1 0 3 4 1 0 4 5 2 0 5 6 1 0 5 10 1 0 6 7 2 0 7 8 1 0 8 9 1 0 10 9 1 0 M END > <Name> 803 > <Num_Rings> 2 > <Num_CSP3> 6 > <Fsp3> 0.6 > <Fstereo> 0.1 </code></pre> File 3 is CIF (Crystallographic Information file). This file can be obtained from following link: <a href="ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif" rel="nofollow noreferrer">File_3</a> This file is a collection of individual cif files for several ligand molecules. Each part in file starts with <code>data_ligandcode</code>. For our example it will be <code>data_803</code>. After 46 lines from the start of each small file in collection, there is a block that gives structural information about the molecule. The number of rows in this block is not fixed. However, this block ends with an Hash sign (<code>#</code>). In this block two columns are important which are 53-56 and 62-63. 62-63 column contains numbers that can be matched from numbers obtained from file 2. And, 53-56 contains atom names like <code>C1 (Carbon 1)</code> etc. This column can be used to match with file 4. File 4 is a Grow.out file that contains information about interaction of each ligand with their target protein. The file name is the PDBcode given in file 1 against each ligand. For example for ligand 803 the PDBcode is 1cqp. So, the grow.out file will be having name of 1cqp. <a href="http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/1cqp/grow.out" rel="nofollow noreferrer">1cqp</a> In this file those rows are important those contain ligandcode (for example 803) and and the atom name obtained from 53-56 column of file three. Task: I need a script that reads ligandcode from File 1, goes to file 2 search for <code>$$$$ . \nLigandcode</code> and then obtain numbers from column 61-62 for each fragment. Then in next step my script should pass these number to file 3 and match the rows containing these number in column 62-63 of file 3 and then pull out the information in column 53-56 (atom names). And last step will be opening of file 4 with the name of PDBcode and then printing the rows containing ligandcode and the atom names obtained from file 3. The printing should be done in an output file. I am a Biomedical Research student. I don't have computer science background. However, I have to use Perl programming for some task. For the above mentioned task I wrote a script, but it is not working properly and I can not find the reason behind it. The script I wrote is : <pre><code>#!/usr/bin/perl use strict; use warnings; use Text::Table; use Carp qw(croak); { my $a; my $b; my $input_file = "input_22.txt"; my @lines = slurp($input_file); for my $line (@lines){ my ($ligandcode, $pdbcode) = split(/\t/, $line); my $i=0; my $k=0; my @array; my @array1; open (FILE, '<', "SD_2.txt"); while (<FILE>) { my $i=0; my $k=0; my @array; my @array1; if ( $_=~/\x24\x24\x24\x24/ . /\n$ligandcode/) { my $nextline1 = <FILE>; my $nextline2 = <FILE>; my $nextline3 = <FILE>; my $nextline4= <FILE>; my $totalatoms= substr( $nextline4, 1,2); print $totalatoms,"\n"; while ($i<$totalatoms) { my $nextlines= <FILE>; my $sub= substr($nextlines, 61, 2); print $sub; $array[$i] = $sub; open (FH, '<', "components.txt"); while (my $ship=<FH>) { my $var="data_$ligandcode"; if ($ship=~/$var/) { while ($k<=44) { $k++; my $nextline = <FH>; } my $j=0; my $nextline3; do { $nextline3=<FH>; print $nextline3; my $part= substr($nextline3, 62, 2); my $part2= substr($nextline3, 53, 4); $array1[$j] = $part; if ($array1[$j] eq $array[$i]) { print $part2, "\n"; open (GH, '<', "$pdbcode"); open (OH, ">>out_grow.txt"); while (my $grow = <GH>) { if ( $grow=~/$ligandcode/){ print OH $grow if $grow=~/$part2/; }} close (GH); close (OH); } $j++; } while $nextline3 !~/\x23/; } } $i++; close (FH); } }} close (FILE); } } ##Slurps a file into a list sub slurp { my ($file) = @_; my (@data, @data_chomped); open IN, "<", $file or croak "can't open $file\n"; @data = <IN>; for my $line (@data){ chomp($line); push (@data_chomped, $line); } close IN; return (@data_chomped); } </code></pre> I want to make it a script that works fast and works for 1000 fragments altogether, if I make a list of 400 molecules in file 1. Kindly help me to make this script working. I ll be grateful.
Tags
<perl>
Title
Perl: Matching four different files and obtaining particular Information in output file
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USBrad Gilbert
UserOwnerUserId
1. USShipra
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POPerl: Matching four different files and obtaining particular Information in output file
 UserUserId
 USsimbabque
 VoteTypeVoteTypeId
 VTFavorite
2. VO
 singulars
 PostPostId
 POPerl: Matching four different files and obtaining particular Information in output file
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POPerl: Matching four different files and obtaining particular Information in output file
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.