Arnoldc interpreter using Clojure instaparse

Wow, I didn’t realize I had taken such a long sabbatical since my last post. I haven’t let off the coding gas while I was away 🙂 One of the many things I have been working on is writing a clojure Arnoldc interpreter. If you are unfamiliar with Arnoldc, check out this link lhartik ArnoldC. I had come across clojure instaparse during the advent of code 2015 and used it to solve day 7 and thought it was the perfect tool to accomplish my goal. Per the README on project page, "Instaparse aims to be the simplest way to build parsers in Clojure" and I couldn't agree more.

The project is made up of the following clojure files:

The lexr.clj (Parser)

The lexr file contains the code utilizing the instaparse library to transform Arnoldc strings into hiccup structures. Below is a brief description of the clojure vars and functions within the lexr and interpreter.

NOTE: I left out code that supported my need to capture the actual thrown parse exception instead of the default implemented within the library. Refer to stack overflow how-to-test-for-texts-not-fitting-an-instaparse-grammar-clojure for further details.

ADDITIONAL NOTE: I use ,,, in place of ellipsis as commas are whitespace with in clojure 🙂 I also bold parenthesis when used as part of clojure code in the hopes that it adds clarity.

(def tokens ,,,)

This contains a map of all the tokens of the Arnoldc language. I created this with the idea that I could show how easy it would be to transform the language and continue to have the same functionality. At the end of the post I describe an example of this with arnoldc pig latin lexr and the test

(defn arnold-grammar ,,,)

This function will create the string representation of the Arnoldc grammar. As clojure core doesn’t contain string interpolation, this is not the greatest looking… When you combine the tokens and the string you would receive the below string link to gist

    Program = method-declaration* begin-main wspace method-declaration*;
        begin-main = <'IT\'S SHOWTIME'> (statement wspace)_ end-main ;<wspace> = <#'\s_'>;statement = wspace (assignment | printing | decvar | assign-var-from-method-call |
        call-method | if | while | call-method | return);method-statement = (assignment | printing | decvar | assign-var-from-method-call |
        call-method | if | while | call-method);call-method = <'DO IT NOW'> wspace method-name numvar* wspace;method-name = (number| #'[a-zA-Z][a-za-z0-9]*' |zero |false | true) assign-var-from-method-call = <'GET YOUR ASS TO MARS'> wspace variable wspace call-method;assignment = <'GET TO THE CHOPPER'> numvar wspace set-val wspace end-assignment wspace;
        while= <'STICK AROUND'> numvar wspace statement* end-while ;
        <end-while>= <'CHILL'>;
        set-val= <'HERE IS MY INVITATION'> numvar (arithmetic-op|logical-op)*;
        <end-assignment>= <'ENOUGH TALK'>;
        printing = <'TALK TO THE HAND'> wspace (variable | quotedstring| number) wspace;
        decvar = <'HEY CHRISTMAS TREE'> numvar wspace init-val;
        if = <'BECAUSE I\'M GOING TO SAY PLEASE'> numvar statement* else-if* end-if;
        else-if = <'BULLSHIT'>wspace statement*;
        <end-if> = <'YOU HAVE NO RESPECT FOR LOGIC'> wspace;
        <numvar> = wspace (number | variable | zero | false | true);
        zero = <'@I LIED'>;
        init-val = <'YOU SET US UP'> (numvar|bool) wspace;
        bool = wspace (true|false);
        true = '@NO PROBLEMO';
        false ='@I LIED';
        number = #'-?[0-9]+';
        variable = #'[a-zA-Z][a-za-z0-9]*';
        quotedstring = #'"._"';
        logical-op = wspace (gt | or | and | equalto );
        arithmetic-op = wspace ( plus | minus | mult | div | modulo) ;
        modulo = <'I LET HIM GO'> numvar;
        plus = <'GET UP'> numvar;
        minus = <'GET DOWN'> numvar;
        mult = <'YOU\'RE FIRED'> numvar;
        div = <'HE HAD TO SPLIT'> numvar;
        gt = <'LET OFF SOME STEAM BENNET'> numvar;
        or = <'CONSIDER THAT A DIVORCE'> numvar;
        and = <'KNOCK KNOCK'> numvar;
        equalto = <'YOU ARE NOT YOU YOU ARE ME'> numvar;
        <end-main> = <'YOU HAVE BEEN TERMINATED'> ;
        method-declaration = <'LISTEN TO ME VERY CAREFULLY'> numvar wspace
        method-arg_
        ( non-void-method* | statement* wspace end-method) wspace;
        non-void-method = <'GIVE THESE PEOPLE AIR'> wspace (method-statement|method-return)\* end-method wspace;
        method-return = <'I\'LL BE BACK'> (numvar | quotedstring)+ wspace ;
        return = <'I\'LL BE BACK'> (numvar | quotedstring)? wspace ;
        method-arg = <'I NEED YOUR CLOTHES YOUR BOOTS AND YOUR MOTORCYCLE'> wspace variable wspace;
        <end-method> = <'HASTA LA VISTA, BABY'>;

(def arnoldc (insta/parser arnold-grammar))

This is where the parsing magic happens. Passing in a string will parse it into hiccup structures according to the Arnoldc grammar that I defined.

example usage is here

(def transform-ops ,,,)

This map was created to transform the parsed elements that match clojure keyswords. Early versions of the transform simply returned values 0, 1 for @I LIED and @NO PROBLEMO that were not in the hiccup style [:key value] . I needed these specific elements to conform to a hiccup shape so that I could evaluate it in a consistent way when I wrote my interpreter. For clarity:

I found that the original version(returning just the value 0 or 1) made the interpreter much more complicated as I was having to check which kind of structure I was receiving after recursing down to the returned value. By adjusting the transform to return a hiccup structure I was able to remove the complex validation checks I was building due to the keyword that told me what I was dealing with. This allowed me to continue to rely on multimethods that were guaranteed to get the correct structure and use the same recursive function for all the tokens.

(defn lights-camera-action ,,,)

This wraps the Arnoldc instaparse parser and is what should be used to parse Arnoldc strings.

The interpreter.clj

The interpreter is mostly a multimethod that is dispatched on the first element within a hiccup structure(a clojure keyword). A global symbol-table is used to hold variables and state. I will describe a some of the more interesting items below.

(def symbol-table (atom {}))

This atom is a map that holds the state for an Arnoldc program.

DISCLAIMER: I made a trade-off in my design with how I implemented garbage collection…. I haven’t done it yet. Completing a program will clear the symbol-table, however running a long enough loop that calls a function with parameters will eventually cause problems. Details on how variables are created can be found below in the (defn transform-method-variables ,,,) section.

(defmulti run (fn [s] (nth s 0)))

This multimethod is the engine that powers the interpreter. It is dispatching on the keywords within each hiccup structure(clojure vector) which is always in the 0 position of each vector.

(defmethod run :Program ,,,)

The Arnoldc language allows methods to be declared before, after, or both before and after the main program. The let form(see below) separates out all the method declarations via the clojure group-by function so that I can define them(by run-statements which will dispatch to multimethods) before they are called within the main Arnoldc program(via (run bmain) ). bmain gets all statements that evaluate to true via group-by, while method-des gets all the method declarations.

I reset the symbol-table when the program completes so that I don’t pollute subsequent runs. To see what can happen, comment out the (reset! symbol-table {}) line and run the tests :). You will find that the symbol table keeps the state around and causes problems between tests that use the same method declarations and variables.

(defmethod run :Program [[e & rest :as statements]]
    (let [ {[bmain] true method-des false}
                    (group-by #(= :begin-main (first %)) rest)]
    (try
        (run-statements method-des)
        (run bmain)
    (catch Exception e (throw e))
    (finally
        (reset! symbol-table {})))))

(defn arithmetic-helper ,,,)

This function handles the Arnoldc logic and arithmetic operators. The thing to note is the following lines of code. case is wrapped in a second set of parenthesis in order to immediately call the returned function from choose-logic-op or choose-op. The function will be invoked with operand and the result of (run varnum-node) as parameters.

A function case, will return a function and be passed and operand the the result of a multimethod dispatch against varnum-node. Higher order functions FTW!

    (recur ((case arith-key
            :logical-op (choose-logic-op operator)
            :arithmetic-op (choose-op operator))
            operand
            (run varnum-node))
        rest)

(defn transform-method-variables ,,,)

This function is called by the :call-method multimethod. I used gensym and the method name to create a prefix to be passed to the transform-method-variables function. I did this as I am storing the variables all in the "global" symbol-table atom. An issue this function resolved was one that I encountered in recursive test cases where I would stomp on variables declared in the methods as they were called multiple times. I created an issue to clean this up.

(defmethod run :call-method ,,,)

Most of the complexity of this code is in the handling of arguments passed to the method(if/when they are passed). new-meth-args gets the same treatment as the variables mentioned in the transform-method-variables function mentioned above and gets a prefix.

(defn roll-credits ,,,)

This function is the preferred way to interpret Arnoldc code. See the tests for an example.

    (roll-credits "IT'S SHOWTIME
        HEY CHRISTMAS TREE var
        YOU SET US UP 123
        YOU HAVE BEEN TERMINATED" )

Pig Latin Arnoldc lexr

As mentioned in the (def tokens ,,,) section above, we have finally reached the section where I describe why I defined the tokens outside of the arnoldc-grammar. The key to the transformation is the update-map function which will transform all the values within a map.

(def pig-latin-arnoldc ,,,)

The following function code will

  1. use the update-map function and take the arnoldc tokens map and return a map with the same keys, but with new pig latin arnoldc values(the qoutes that make up the language) The translate-to-pig-latin function will split a string on whitespace and map the pig-latin function to all the strings from the split and then join them together to form a pig latin string.
  2. pass the "new" map of the pig latin arnoldc language(described in #1) to the arnold-grammar function. Since the keywords stay the same, the function will simply pull out the new values mapped to the original arnoldc keys and insert them into the grammar.
  3. Finally insta/parser will return an executable parser based on the pig-latin grammar.
   (def pig-latin-arnoldc
       (-> (update-map arnie/tokens translate-to-pig-latin);#1
            (arnie/arnold-grammar);#2
            (insta/parser)));#3

(defn ights-camera-actionlay ,,,)

This function is the pig latinified lights-camera-action function from the arnoldc lexr.

  1. pass the arnoldc lexr’s parser function the pig latin transformed string(s) represented by expr.
  2. transform the parsed hiccup datastructures that match the keys within the transform-ops from the arnoldc lexr. This continues to work without modification because the map keys remain the same as the orignal arnoldc definitions(we only updated the values :D).
  3. catch any thrown parse errors
   (defn ights-camera-actionlay [& expr]
   "interpret pig-latin text"
   (try (->> (arnie/parser pig-latin-arnoldc (clojure.string/join expr));#1
             (insta/transform arnie/transform-ops));#2
   (catch Exception e
        (throw (Exception. (str "EREWHAY ETHAY UCKFAY IDDAY IWAY OGAY ONGWRAY?" (.getMessage e)))))))

Conclusion

I was really satisfied with how quickly instaparse allowed me to create an interpreter. The hiccup structures were easy to work with and clojure is a joy to code in. I will certainly be keeping it within my toolbox and I encourage you to try it out.