jParser and jTokenizer released

After nearly two years I’ve finally gotten around to releasing my PHP JavaScript parser, although documentation is still thin on the ground.

Download jParser 1.0.0 (recommended)
Download jParser devel package (Full source and build scripts)
See the library examples running at apps.timwhitlock.info/jparser

The library has been split in two:

jTokenizer – A JavaScript tokenizer designed to mimic the PHP tokenizer.
jParser – The fully blown JavaScript syntactical parser which generates a parse tree.

The reason for the split is that for most purposes where you think you need a parser, you in fact just need a tokenizer. The tokenizer library is about 15KB, whereas the parser is over 700KB (minified), so you can see why you might not want to include it unnecessarily.

The library files jparser.php and jtokenizer.php are self-contained, minified files for production use. If you wish to inspect or modify the code you will need to download the devel package. This package provides a build script which collates the libraries into their distributable files.

jTokenizer

Possible uses for the tokenizer include code highlighting and simple manipulation of JavaScript source code.

The main function you will want to use is j_token_get_all which behaves the same as the PHP token_get_all function with the addition of a column number as well as a line number. Additionally there is the j_token_name as per the PHP token_name function.

jParser

This is a full, syntactical parser. On its own it simply generates a parse tree which can be traversed and manipulated. There is no proper documentation on this yet, but take a look at the node classes in the devel package if you are serious about doing something useful with this parser.

Some other notes in no particular order

The full parser uses a lot of juice. I recommend giving PHP loads of memory, and be careful what you throw at it if you’re going to run it on a production server.

A parser is not an interpreter or a JavaScript engine. If you want to develop such a thing in PHP you might be insane, but it could be done with this parser as a base.

The JParser parse tree is purposefully not a full tree, it collapses redundant nodes to save memory. If you want to see a full tree then take a look at the JParserRaw class. (devel package required)

Splitting the parsing process into two parts (tokenize/parse) is probably not the most efficient and probably uses more memory than it would another way. However, I figured it would be neat to mimic the PHP tokenizer functionality so that parsers could be built that take a stream of PHP tokens.

17 thoughts on “jParser and jTokenizer released”

Your program will be very useful. Thank you very much!

@Peter Could you possibly send me an example of a file that hangs? [ tim at this domain ]

Although I don’t actively maintain this project I use the tokenizer in another project https://github.com/timwhitlock/php-commonjs and would like to check for the bug.

Hi
Nice utility. I have used the tokeniser on some large files and it may be overloading the heap because it seems to hang. The workaround I used was to feed it one line at a time and instead of throwing errors (due to syntax like incomplete comments) add an extra line until it becomes valid syntax. In this way the string to tokenise is kept short. I’m not sure if I have explainied it well. There is probably a better way.

Still it is a very nice and useful tool. Thanks a bunch.

I used the Tokenizer in a recent project, it works great. Thanks a bundle for sharing!

hello, how i can replace one fuction with another in js code?

@pear
I fixed it. Chaging 74th line on JTokenizerBase.php to $this->regNumber = ‘/^(?:0x[A-F0-9]+|d*.d+(?:E(?:+|-)?d+)?|d+(?:E(?:+|-)?d+)?)/i’; can solve the bug.
And there is a bug what not support Regular expression ‘m’ modifier.
This can be solved by changing 54th and 64th lines on JTokenizerBase.php from [gi] to [gim].

Great code!
but It can’t parse like the code below.

var o = 8e3;

Could you fix and upgrade it?

Yep, I use Closure myself nowadays. It’s great

@will Farrell
Never mind, friend just told me about Google Closure Compiler

I was looking around for some code to substitute variables for shorter name to decrease the files size of my js files. Your obfuscate feature works awesome! I was wondering is there is a toggle to prevent it from changing the function names, as well as not have a leading $? Thanks.

I’d have to refactor it to return rather than echo. I’m not actively working this at the moment.
I suggest you use PHP output buffering to achieve this for now.