Note that there are some explanatory texts on larger screens.

plurals
  1. POPerl program to mimic restriction enzymes using references, hash tables and subs
    primarykey
    data
    text
    <p>I'm a student in an intro Perl class. I'm looking for suggestions on how to approach an assignment. My professor encourages forums. The assignment is:</p> <blockquote> <p>Write a Perl program that will take two files from the command line, an enzyme file and a DNA file. Read the file with restriction enzymes and apply the restriction enzymes to the DNA file.</p> <p>The output will be fragments of DNA arranged in the order they occur in the dna file. The name of the output files should be constructed by appending the name of the restriction enzyme to the name of the DNA file, with an underscore between them. </p> <p>For example, if the enzyme is EcoRI and the DNA file is named BC161026, the output file should be named BC161026_EcoRI.</p> </blockquote> <p>My approach is to create a main program and two subs as follows:</p> <p>Main: Not sure how to tie my subs together?</p> <p>Sub program $DNA: Take a DNA file and remove any new lines to make a single string</p> <p>Sub program Enzymes: Read and store the lines from the enzyme file which is from the command line Parse the file in a way that it separates the enzyme acronym from the position of the cut. Store the position of the cut as a regular expression in a hash table Store the name of the acronym in a hash table</p> <blockquote> <p>Note on enzyme file format: The enzyme file follows a format known as Staden. Examples: </p> <p><code>AatI/AGG'CCT//</code><br> <code>AatII/GACGT'C//</code><br> <code>AbsI/CC'TCGAGG//</code></p> <p>The enzyme acronym consists of the characters before the first slash (AatI, in the first example. The recognition sequence is everything between the first and second slash (AGG'CCT, again, in the first example). The cut point is denoted by an apostrophe in the recognition sequence There are standard abbreviations for dna within enzymes as follows:</p> <p>R = G or A B = not A (C or G or T) etc...</p> </blockquote> <p>Along with a recommendation for a main chunk, do you see any missing pieces that I've omitted? Can you recommend specific tools that you think would be useful in patching this program together?</p> <p>Example input enzyme: <code>TryII/RRR'TTT//</code></p> <p>Example string to read: <code>CCCCCCGGGTTTCCCCCCCCCCCCAAATTTCCCCCCCCCCCCAGATTTCCCCCCCCCCGAGTTTCCCCC</code></p> <p>The output should be:</p> <blockquote> <p>CCCCCCGGG</p> <p>TTTCCCCCCCCCCCCAAA</p> <p>TTTCCCCCCCCCCCCAGA</p> <p>TTTCCCCCCCCCCGAG</p> <p>TTTCCCCC</p> </blockquote>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload