StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPython compiler for simple language to java vm code algorithm
primarykey
Id
16565621
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
8
CommunityOwnedDate
CreationDate
2013-05-15T12:57:17.803
FavoriteCount
4
LastActivityDate
2015-06-11T09:42:42.743
LastEditDate
2015-06-11T09:42:42.743
LastEditorUserId
317266
OwnerUserId
1750007
ParentId
0
PostTypeId
1
Score
19
ViewCount
1152
LastEditorDisplayName
text
Body
I have a simple language that I am trying to write a compiler for (yes it is homework) to compile a simple language I shall describe if necessary to java vm code. It currently works pretty well I've just hit a bump with logical AND's and OR's. Each work fine in a single if/while condition, but if I try and chain them things go wrong, correct me if I am wrong but I believe that AND has precedence, but I was wondering if there are logical ways of arranging them? I think is what I'm trying to ask, the java vm code output just has the compare and jump statements one after the other (which seems wrong). I realise it's quite abstract so maybe what I'm after is a pseudo code/algorithm for how to structure chained AND's and OR's. EDIT: Currently just treats any combination of AND and OR as AND's. Comparing the factor/term/expression connection (compared to booleanfactor etc) I believe that AND has precedence? Just a thought. Apologies if this is poorly understood :/ So i figure ill include relevant info just incase. compiler <pre><code>import re import sys # Restrictions: # Integer constants must be short. # Stack size must not exceed 1024. # Integer is the only type. # Logical operators cannot be nested. class Scanner: '''The interface comprises the methods lookahead and consume. Other methods should not be called from outside of this class.''' def __init__(self, input_file): '''Reads the whole input_file to input_string.''' # source code of the program to be compiled self.input_string = input_file.read() # index where the unprocessed part of input_string starts self.current_char_index = 0 # a pair (most recently read token, matched substring of input_string) self.current_token = self.get_token() def skip_white_space(self): '''Consumes all characters in input_string up to the next non-white-space character.''' if (self.current_char_index >= len(self.input_string) - 1): # bad fix for it over-running the end of the file return while self.input_string[self.current_char_index].isspace(): self.current_char_index += 1 return def get_token(self): '''Returns the next token and the part of input_string it matched. Returns None if there is no next token. The characters up to the end of the token are consumed.''' self.skip_white_space() # find the longest prefix of input_string that matches a token token, longest = None, '' for (t, r) in Token.token_regexp: match = re.match(r, self.input_string[self.current_char_index:]) if match and match.end() > len(longest): token, longest = t, match.group() # consume the token by moving the index to the end of the matched part self.current_char_index += len(longest) return (token, longest) def lookahead(self): '''Returns the next token without consuming it. Returns None if there is no next token.''' return self.current_token[0] def consume(self, *tokens): '''Returns the next token and consumes it, if it is in tokens. Raises an exception otherwise. If the token is a number or an identifier, its value is returned.''' if self.current_token[0] not in tokens: print('Token ' + self.current_token[0] + ' isn\'t in the tokens: ') for token in tokens: print(token) raise Exception('Token is not in tokens this shouldn\'t happen much') if self.current_token[0] == 'ID': symbol_table.location(self.current_token[1]) value = self.current_token[1] elif (self.current_token[0] == 'NUM'): value = self.current_token[1] else: value = self.current_token[0] self.current_token = self.get_token() return value class Token: DO = 'DO'; ELSE = 'ELSE'; END = 'END'; IF = 'IF'; THEN = 'THEN'; WHILE = 'WHILE'; SEM = 'SEM'; BEC = 'BEC'; LESS = 'LESS'; EQ = 'EQ'; GRTR = 'GRTR'; LEQ = 'LEQ'; NEQ = 'NEQ'; GEQ = 'GEQ'; ADD = 'ADD'; SUB = 'SUB'; MUL = 'MUL'; DIV = 'DIV'; LPAR = 'LPAR'; RPAR = 'RPAR'; NUM = 'NUM'; ID = 'ID'; READ = 'READ'; WRITE = 'WRITE'; OR = 'OR'; AND = 'AND'; NOT = 'NOT'; # The following list gives the regular expression to match a token. # The order in the list matters for mimicking Flex behaviour. # Longer matches are preferred over shorter ones. # For same-length matches, the first in the list is preferred. token_regexp = [ (DO, 'do'), (ELSE, 'else'), (END, 'end'), (IF, 'if'), (THEN, 'then'), (WHILE, 'while'), (READ, 'read'), (WRITE, 'write'), (OR, 'or'), (AND, 'and'), (NOT, 'not'), (SEM, ';'), (BEC, ':='), (LESS, '<'), (EQ, '='), (NEQ, '!='), (GRTR, '>'), (LEQ, '<='), (GEQ, '>='), (ADD, '[+]'), # + is special in regular expressions (SUB, '-'), (MUL, '[*]'), (DIV, '/'), (LPAR, '[(]'), # ( is special in regular expressions (RPAR, '[)]'), # ) is special in regular expressions (ID, '[a-z]+'), (NUM, '[0-9]+'), ] class Symbol_Table: '''A symbol table maps identifiers to locations.''' def __init__(self): self.symbol_table = {} def size(self): '''Returns the number of entries in the symbol table.''' return len(self.symbol_table) def location(self, identifier): '''Returns the location of an identifier. If the identifier is not in the symbol table, it is entered with a new location. Locations are numbered sequentially starting with 0.''' if identifier in self.symbol_table: return self.symbol_table[identifier] index = len(self.symbol_table) self.symbol_table[identifier] = index return index class Label: def __init__(self): self.current_label = 0 def next(self): '''Returns a new, unique label.''' self.current_label += 1 return 'l' + str(self.current_label) def indent(s, level): return ' '*level + s + '\n' # Each of the following classes is a kind of node in the abstract syntax tree. # indented(level) returns a string that shows the tree levels by indentation. # code() returns a string with JVM bytecode implementing the tree fragment. # true_code/false_code(label) jumps to label if the condition is/is not true. # Execution of the generated code leaves the value of expressions on the stack. class Program_AST: def __init__(self, program): self.program = program def __repr__(self): return repr(self.program) def indented(self, level): return self.program.indented(level) def code(self): program = self.program.code() local = symbol_table.size() java_scanner = symbol_table.location('Java Scanner') return '.class public Program\n' + \ '.super java/lang/Object\n' + \ '.method public <init>()V\n' + \ 'aload_0\n' + \ 'invokenonvirtual java/lang/Object/<init>()V\n' + \ 'return\n' + \ '.end method\n' + \ '.method public static main([Ljava/lang/String;)V\n' + \ '.limit locals ' + str(local) + '\n' + \ '.limit stack 1024\n' + \ 'new java/util/Scanner\n' + \ 'dup\n' + \ 'getstatic java/lang/System.in Ljava/io/InputStream;\n' + \ 'invokespecial java/util/Scanner.<init>(Ljava/io/InputStream;)V\n' + \ 'astore ' + str(java_scanner) + '\n' + \ program + \ 'return\n' + \ '.end method\n' class Statements_AST: def __init__(self, statements): self.statements = statements def __repr__(self): result = repr(self.statements[0]) for st in self.statements[1:]: result += '; ' + repr(st) return result def indented(self, level): result = indent('Statement(s)', level) for st in self.statements: result += st.indented(level+1) return result def code(self): result = '' for st in self.statements: result += st.code() return result class If_AST: def __init__(self, boolean_expression, then): self.boolean_expression = boolean_expression self.then = then def __repr__(self): return 'if ' + repr(self.boolean_expression) + ' then ' + \ repr(self.then) + ' end' def indented(self, level): return indent('If-Then', level) + \ self.boolean_expression.indented(level+1) + \ self.then.indented(level+1) def code(self): l1 = label_generator.next() return self.boolean_expression.code(l1) + \ self.then.code() + \ l1 + ':\n' class If_Else_AST: def __init__(self, boolean_expression, then, _else): self.boolean_expression = boolean_expression; self.then = then; self._else = _else; def __repr__(self): return 'if ' + repr(self.boolean_expression) + ' then ' + \ repr(self.then) + ' else ' + \ repr(self._else) + ' end' def indented(self, level): return indent('If-Then-Else', level) + \ self.boolean_expression.indented(level+1) + \ self.then.indented(level+1) + \ indent('Else', level+1) + \ self._else.indented(level+1) def code(self): l1 = label_generator.next() l2 = label_generator.next() return self.boolean_expression.code(l1) + \ self.then.code() + \ 'goto ' + l2 + '\n' + \ l1 + ':\n' + \ self._else.code() + \ l2 + ':\n' class While_AST: def __init__(self, boolean_term, body): self.boolean_term = boolean_term self.body = body def __repr__(self): return 'while ' + repr(self.boolean_term) + ' do ' + \ repr(self.body) + ' end' def indented(self, level): return indent('While-Do', level) + \ self.boolean_term.indented(level+1) + \ self.body.indented(level+2) def code(self): l1 = label_generator.next() l2 = label_generator.next() return l1 + ':\n' + \ self.boolean_term.code(l2) + \ self.body.code() + \ 'goto ' + l1 + '\n' + \ l2 + ':\n' class Assign_AST: def __init__(self, identifier, expression): self.identifier = identifier self.expression = expression def __repr__(self): return repr(self.identifier) + ':=' + repr(self.expression) def indented(self, level): return indent('Assign', level) + \ self.identifier.indented(level+1) + \ self.expression.indented(level+1) def code(self): loc = symbol_table.location(self.identifier.identifier) return self.expression.code() + \ 'istore ' + str(loc) + '\n' class Write_AST: def __init__(self, expression): self.expression = expression def __repr__(self): return 'write ' + repr(self.expression) def indented(self, level): return indent('Write', level) + self.expression.indented(level+1) def code(self): return 'getstatic java/lang/System/out Ljava/io/PrintStream;\n' + \ self.expression.code() + \ 'invokestatic java/lang/String/valueOf(I)Ljava/lang/String;\n' + \ 'invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V\n' class Read_AST: def __init__(self, identifier): self.identifier = identifier def __repr__(self): return 'read ' + repr(self.identifier) def indented(self, level): return indent('Read', level) + self.identifier.indented(level+1) def code(self): java_scanner = symbol_table.location('Java Scanner') loc = symbol_table.location(self.identifier.identifier) return 'aload ' + str(java_scanner) + '\n' + \ 'invokevirtual java/util/Scanner.nextInt()I\n' + \ 'istore ' + str(loc) + '\n' class Comparison_AST: def __init__(self, left, op, right): self.left = left self.op = op self.right = right def __repr__(self): op = { Token.LESS:'<', Token.EQ:'=', Token.GRTR:'>', Token.LEQ:'<=', Token.NEQ:'!=', Token.GEQ:'>=' } return repr(self.left) + op[self.op] + repr(self.right) def indented(self, level): return indent(self.op, level) + \ self.left.indented(level+1) + \ self.right.indented(level+1) def true_code(self, label): op = { Token.LESS:'if_icmplt', Token.EQ:'if_icmpeq', Token.GRTR:'if_icmpgt', Token.LEQ:'if_icmple', Token.NEQ:'if_icmpne', Token.GEQ:'if_icmpge' } return self.left.code() + \ self.right.code() + \ op[self.op] + ' ' + label + '\n' def false_code(self, label): # Negate each comparison because of jump to "false" label. op = { Token.LESS:'if_icmpge', Token.EQ:'if_icmpne', Token.GRTR:'if_icmple', Token.LEQ:'if_icmpgt', Token.NEQ:'if_icmpeq', Token.GEQ:'if_icmplt' } return self.left.code() + \ self.right.code() + \ op[self.op] + ' ' + label + '\n' class Expression_AST: def __init__(self, left, op, right): self.left = left self.op = op self.right = right def __repr__(self): op = { Token.ADD:'+', Token.SUB:'-', Token.MUL:'*', Token.DIV:'/' } return '(' + repr(self.left) + op[self.op] + repr(self.right) + ')' def indented(self, level): return indent(self.op, level) + \ self.left.indented(level+1) + \ self.right.indented(level+1) def code(self): op = { Token.ADD:'iadd', Token.SUB:'isub', Token.MUL:'imul', Token.DIV:'idiv' } return self.left.code() + \ self.right.code() + \ op[self.op] + '\n' class Number_AST: def __init__(self, number): self.number = number def __repr__(self): return self.number def indented(self, level): return indent(self.number, level) def code(self): # works only for short numbers return 'sipush ' + self.number + '\n' class Identifier_AST: def __init__(self, identifier): self.identifier = identifier def __repr__(self): return self.identifier def indented(self, level): return indent(self.identifier, level) def code(self): loc = symbol_table.location(self.identifier) return 'iload ' + str(loc) + '\n' class BooleanFactor_AST: def __init__(self, condition, logic): self.condition = condition self.logic = logic def __repr__(self): if self.logic == False: return 'NOT ' + repr(self.condition) else: return repr(self.condition) def indented(self, level): if self.logic == False: return indent('NOT ', level) + self.condition.indented(level + 1) else: return self.condition.indented(level) def false_code(self, label): if self.logic == True: return self.condition.false_code(label) else: return self.condition.true_code(label) return def true_code(self, label): if self.logic == True: return self.condition.true_code(label) else: return self.condition.false_code(label) class BooleanTerm_AST: def __init__(self, terms): self.terms = terms def __repr__(self): result = repr(self.terms[0]) for term in self.terms[1:]: result = result + ' AND ' + repr(term) return result def indented(self, level): result = self.terms[0].indented(level) for term in self.terms[1:]: result = result + indent('AND', level) result = result + term.indented(level) return result def code(self, label): result = '' for term in self.terms: result = result + term.false_code(label) return result class BooleanExpression_AST: def __init__(self, expressions): self.expressions = expressions def __repr__(self): result = repr(self.expressions[0]) for expression in self.expressions[1:]: result = result + ' OR ' + repr(expression) return result def indented(self, level): result = self.expressions[0].indented(level) indentation = 0 for expression in self.expressions[1:]: indentation += 1 result = result + indent('OR', level + indentation) result = result + expression.indented(level + indentation) return result def code(self, label): result = '' for expression in self.expressions: result = result + expression.code(label) return result # The following methods comprise the recursive-descent parser. def program(): sts = statements() return Program_AST(sts) def statements(): result = [statement()] while scanner.lookahead() == Token.SEM: scanner.consume(Token.SEM) st = statement() result.append(st) return Statements_AST(result) def statement(): if scanner.lookahead() == Token.IF: return if_statement() elif scanner.lookahead() == Token.WHILE: return while_statement() elif scanner.lookahead() == Token.ID: return assignment() elif scanner.lookahead() == Token.READ: return read(); elif scanner.lookahead() == Token.WRITE: return write(); else: # error return scanner.consume(Token.IF, Token.WHILE, Token.ID) def if_statement(): scanner.consume(Token.IF) condition = boolean_expression() scanner.consume(Token.THEN) then = statements() if scanner.lookahead() == Token.END: scanner.consume(Token.END) return If_AST(condition, then) else: scanner.consume(Token.ELSE) _else = statements() scanner.consume(Token.END) return If_Else_AST(condition, then, _else) def while_statement(): scanner.consume(Token.WHILE) condition = boolean_expression() scanner.consume(Token.DO) body = statements() scanner.consume(Token.END) return While_AST(condition, body) def assignment(): ident = identifier() scanner.consume(Token.BEC) expr = expression() return Assign_AST(ident, expr) def read(): scanner.consume(Token.READ) variable = identifier() return Read_AST(variable) def write(): scanner.consume(Token.WRITE) expr = expression() return Write_AST(expr) def comparison(): left = expression() op = scanner.consume(Token.LESS, Token.EQ, Token.GRTR, Token.LEQ, Token.NEQ, Token.GEQ) right = expression() return Comparison_AST(left, op, right) def expression(): result = term() while scanner.lookahead() in [Token.ADD, Token.SUB]: op = scanner.consume(Token.ADD, Token.SUB) tree = term() result = Expression_AST(result, op, tree) return result def term(): result = factor() while scanner.lookahead() in [Token.MUL, Token.DIV]: op = scanner.consume(Token.MUL, Token.DIV) tree = factor() result = Expression_AST(result, op, tree) return result def factor(): if scanner.lookahead() == Token.LPAR: scanner.consume(Token.LPAR) result = expression() scanner.consume(Token.RPAR) return result elif scanner.lookahead() == Token.NUM: value = scanner.consume(Token.NUM) return Number_AST(value) elif scanner.lookahead() == Token.ID: return identifier() else: # error return scanner.consume(Token.LPAR, Token.NUM, Token.ID) def identifier(): value = scanner.consume(Token.ID) return Identifier_AST(value) def boolean_factor(): if scanner.lookahead() == Token.NOT: scanner.consume(Token.NOT) logic = False else: logic = True result = comparison() return BooleanFactor_AST(result, logic) def boolean_term(): result = [boolean_factor()] while scanner.lookahead() in [Token.AND]: scanner.consume(scanner.lookahead()) temp = boolean_factor() result.append(temp) return BooleanTerm_AST(result) def boolean_expression(): result = [boolean_term()] while scanner.lookahead() in [Token.OR]: scanner.consume(scanner.lookahead()) temp = boolean_term() result.append(temp) return BooleanExpression_AST(result) # Initialise scanner, symbol table and label generator. #scanner = Scanner(open('test.txt')) scanner = Scanner(sys.stdin) symbol_table = Symbol_Table() symbol_table.location('Java Scanner') # fix a location for the Java Scanner label_generator = Label() # Uncomment the following to test the scanner without the parser. # This shows a list of all tokens in the input. # #token = scanner.lookahead() #while token != None: # print(token) # scanner.consume(token) # token = scanner.lookahead() #exit() # Call the parser. ast = program() assert scanner.lookahead() == None # Uncomment the following to test the parser without the code generator. # The first line gives back the program by calling __repr__ of the AST classes. # The second line shows the syntax tree with levels indicated by indentation. # #print(ast) #print(ast.indented(0)) #exit() # Call the code generator. # This translates the abstract syntax tree to JVM bytecode. # It can be assembled to a class file by Jasmin: http://jasmin.sourceforge.net/ print(ast.code()) </code></pre> testing bat file <pre><code>python compiler.py <test.txt> Program.j java -Xmx100m -jar jasmin.jar Program.j java -Xmx100m Program < testInput.txt > test_output.txt </code></pre> and language (BNF) <pre><code>Program = Statements Statements = Statement (; Statement) Statement = If | While | Assignment If = if Comparison then Statements end While = while Comparison do Statements end Assignment = identifier := Expression Comparison = Expression Relation Expression Relation = = | != | < | <= | > | >= Expression = Term ((+ | -) Term) Term = Factor ((* | /) Factor) Factor = (Expression) | number | identifier BooleanExpression = BooleanTerm (or BooleanTerm)* BooleanTerm = BooleanFactor (and BooleanFactor)* BooleanFactor = not BooleanFactor | Comparison </code></pre> I think thats all that is relevant, cheers if you take a go at helping me on this
Tags
<python><compiler-construction><jvm><jvm-languages>
Title
Python compiler for simple language to java vm code algorithm
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USSeki
UserOwnerUserId
1. USZeno15
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POPython compiler for simple language to java vm code algorithm
 UserUserId
 USNicola Musatti
 VoteTypeVoteTypeId
 VTFavorite
2. VO
 singulars
 PostPostId
 POPython compiler for simple language to java vm code algorithm
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POPython compiler for simple language to java vm code algorithm
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.