The Book of Ashes
Legend in his own mind, creator of all you see here, he walks this Earth on the path of the becoming.
On Saturday, 12, July 2003 Ashes wrote...
Writing a lanuage parser 1:00AM
Ahhh the web server. It is nothing by itself. Serving up straight HTML pages. To live it requires its own lanuage. To understand its own lanuage I must write a translator (parser). Look at the Perihelion as it is now. It is dynamic and fluid. Virtually every page is dynamically generated. Take the disturbances page. When you request this page from my server, it goes off, determines the latest disturbance stored in the table, retreives all the data about those images and their titles and displays them on the page. To do this I had to write code that looked up values from the database. To do this Oracle provides a lanuage called PL/SQL which I used to dynamically create the page. Lanuages are entirely logical. They are also complicated.
Step one in writing a parser. First you must write a tokeniser. This beasty takes a stream of characters (your code) and breaks this up into tokens or words (logical keywords in the code) removing excess whitespace and new line characters. Once you have a tokeniser you need to define your lanuage as a regular expression. For me this might look something like...
|
{program}-> |
({for_loop} | {while_loop} | {repeat_until_loop} | {if_statement} | {assignment_statement} | {procedure_call} | null) + ({program} | null) |
|
{for_loop}-> |
""for"" + ""("" + {variable_name} + "","" + ({number} | {variable_name} | {function_name}) + "","" + ({number} | {variable_name} | {function_name}) + "")"" + ""{"" + {program} + ""}"" + "";"" |
|
{while_loop}-> |
""while"" + ""("" + {condition} + "")"" + ""{"" + {program} + ""}"" + "";"" |
|
{repeat_until_loop}-> |
""repeat"" + ""{"" + {program} + ""}"" + ""until"" + ""("" + {condition} + "")"" + "";"" |
|
{if_statement}-> |
""if"" + ""("" + {condition} + "")"" + ""{"" + {program} + ""}"" + ({elsif} | {else} | null) + "";"" |
|
{elsif}-> |
""elsif"" + ""("" + {condition} + "")"" + ""{"" + {program} + ""}"" + ({elsif} | null) |
|
{else}-> |
""else"" + ""{"" + {program} + ""}"" |
|
etc |
|
|
The following code is accepted by the expression above...
|
Mathematically this is very beautiful. It defines exactly how the lanuage must behave, how its structured and what can appear after what. It may appear very technical and can be quite hard to get your head around if you don't understand whats going on. But briefly the | symbol is an OR sign eg {for_loop} | {while_loop} means we are expecting either a for loop OR a while loop, both of which are then defined below. Very cool.
Once we have defined this down to the lowest of levels then we need to implement the parser in code. This is the difficult bit...