Full JavaScript parser for PHP

[ Update 18 Nov 2009 ]

This article is rather old now – the jParser code has been released

Despite the glorious sunshine this week, my week off, I managed to put some time into my pet project of developing a full JavaScript parser written in 100% native PHP. Actually, I’ve been developing a generic parser suite for some time, and using it to build a full JavaScript parser was my ultimate goal to be satisfied that it all works and is powerful enough to be useful. I’ve written a bunch of blogs about developing a parser generator in PHP, (click “parsing” to do a tag search).

Before I start wittering on,
Click here to play with the online example of JParser

Here are the main difficulties I encountered while building the JavaScript parser:

1. Performance
Generating the parse table was taking about 30 minutes and using several hundred megabytes of memory. Going back to the drawing board with certain parts of the parse table generator, I’ve managed to get this down to about 7 minutes on my humble Mac Mini.

2. Special rules
The ECMAScript standard states certain special cases in the grammar rules. One of particular note (clause 12.4) says that an ExpressionStatement may not begin with a "{" or a "function". This special rule avoids ambiguity and therefore avoids parse table conflicts, but the rule is effectively outside of the grammar. I’ve finally found the right part of the parser architecture to implement such rules

3. Automatic semicolon insertion
As you probably know just from writing JavaScript, the ECMAScript standard permits the lazy omission of semicolons at the end of some statements, as long as you terminate with a line break instead. This is actually more complex than it sounds, but more to the point, it is another special rule that is not directly a part of the grammar and is handled at parse time.
[UPDATE: Automatic semicolon insertion now implemented, See! ]

20 thoughts on “Full JavaScript parser for PHP”

  1. I have a problem were I’m trying to get all cookies from a webpage, of course I get all cookies that are sent to the browser, the problem is that I can’t get cookies generated from javascript within the page. Could I use your jParser, jTokenizer to achive this?

  2. Would this be able to integrate with CURL and domdocuement to parse javascript from websites using curl?

  3. anyway … i implemented this gettext thing using a regexp like tim suggested. still hope, that this parser library will be available somewhen in future … thanks!

  4. hello,

    i found your website searching for a solution for the exact same problem like udo. is it possible to get in contact with him? i would like to know, if he implemented such thing using a regexp parser or if he found an other solution.

    hope you’ll find some time for releasing / open-sourcing your parser, tim :)

  5. folks – I don’t have time to open source this right now. really sorry, but the code needs some attention before it’s released.

  6. This seems to be s powerful javscript parser. But where can I DOWNLOAD it? I didn’t find any download information.

  7. Hi!
    I can not find the subj on your site :( All links are dead (“Can`t find the server”)
    Where I can find this great parser?
    10x!

  8. Hi Udo.
    As this project is not my full time job, I’m not able to act very quickly on this kind of request. I really do appreciate your interest though and I’d love to get some code posted soon. I will email you personally when I get something up, but can’t promise when that will be.

    Looking at what you want to do with it, I’m not sure you need a full parser. I think your requirement could be achieved with a simpler method. Perhaps just running a single RegExp over the whole thing. The following pattern should match the gettext call assuming a double-quoted string literal is the first argument. have a go!

    <?php
    $input = '// example js source
    somediv.innerHTML = gettext("Please login first");
    anotherdiv.innerHTML = gettext ( "And another", junk );';
    
    preg_match_all('/gettexts*(s*"((?:\\.|[^rn"\\])*)"/', $input, $matches );
    var_dump( $matches[1] );
  9. Thanks for your reply.

    I would probably need only a subset of the functions of your compiler. In fact I only need to know which functions in a .js file are being called and which (constant) parameters are passed to them.

    Something like:

    somediv.innerHTML = gettext(“Please login first”);

    The text “Please login first” is what I need.

    Any chance to get your source? I can’t seem to find any other JS parser in PHP…

  10. @Udo It’s not commercial, but there isn’t a public distribution available at the moment.

    I would love to release a public library when I get the time. Perhaps with some more feedback from people over what features you would like to see I could get something going.

  11. Hello,

    I’m very interested in your JS compiler as I’d like to use it to scan JavaScript files for “gettext” tokens and generate .po files out of them.

    I can’t find any download link. Is the compiler commercial, closed-source?

  12. Never mind the sneak preview –
    Check out JASPA [http://jaspa.org.uk/] It uses an extended form of this parser to convert ActionScript into regular JavaScript.

  13. Actually the output wasn’t supposed to be XML, it’s just convenient to format it that way. Perhaps I should modify the dump routine to print valid XML.

    A visual representation of the parse tree would be a fun Flash project, but not on my list. If you’re interested in this kind of thing outside the realm of PHP, Google “ANTLR”

    Incidentally, my real goal for this parser framework is a bigger deal than this on it’s own – check it out: http://web.2point1.com/2008/09/11/jaspa-sneak-preview/

Comments are closed.