The reason for the split is that for most purposes where you think you need a parser, you in fact just need a tokenizer. The tokenizer library is about 15KB, whereas the parser is over 700KB (minified), so you can see why you might not want to include it unnecessarily.
The library files
jtokenizer.php are self-contained, minified files for production use. If you wish to inspect or modify the code you will need to download the devel package. This package provides a build script which collates the libraries into their distributable files.
The main function you will want to use is
j_token_get_all which behaves the same as the PHP token_get_all function with the addition of a column number as well as a line number. Additionally there is the
j_token_name as per the PHP token_name function.
This is a full, syntactical parser. On its own it simply generates a parse tree which can be traversed and manipulated. There is no proper documentation on this yet, but take a look at the node classes in the devel package if you are serious about doing something useful with this parser.
Some other notes in no particular order
The full parser uses a lot of juice. I recommend giving PHP loads of memory, and be careful what you throw at it if you’re going to run it on a production server.
The JParser parse tree is purposefully not a full tree, it collapses redundant nodes to save memory. If you want to see a full tree then take a look at the
JParserRaw class. (devel package required)
Splitting the parsing process into two parts (tokenize/parse) is probably not the most efficient and probably uses more memory than it would another way. However, I figured it would be neat to mimic the PHP tokenizer functionality so that parsers could be built that take a stream of PHP tokens.