Parsing for PHP developers – Part III

JSON Parser

If you haven’t read Part 1, or Part 2 they are there for the reading.

I’m going to demo a JSON parser in this post. It’s 100% native PHP code, and is based on the work I’ve done toward my ultimate goal of a full JavaScript parser.

Click here to play with the interactive JSONParser demo

I thought I’d get this example online now as my ultimate goal is taking longer than I had hoped. I shan’t go into the details, suffice to say that the JSON grammar below is a very tiny subset of the full JavaScript grammar and doesn’t really have any complex rules.


Here’s the JSON grammar I put together.

<JSON_OBJECT_LITERAL>
	: "{" "}"
	| "{" <JSON_PROP_LIST> "}"
	;

<JSON_PROP_LIST>
	: JSON_STRING_LITERAL ":" <JSON_LITERAL>
	| <JSON_PROP_LIST> "," JSON_STRING_LITERAL ":" <JSON_LITERAL>
	;

<JSON_LITERAL>
	: JSON_STRING_LITERAL
	| JSON_NUMERIC_LITERAL
	| <JSON_ARRAY_LITERAL>
	| <JSON_OBJECT_LITERAL>
	| "true"
	| "false"
	| "null"
	;

<JSON_ARRAY_LITERAL>
	: "[" "]"
	| "[" <JSON_ELEMENT_LIST> "]"
	;

<JSON_ELEMENT_LIST>
	: <JSON_LITERAL>
	| <JSON_ELEMENT_LIST> "," <JSON_LITERAL>
	;

The grammar notation of the full JavaScript language may only be about 12 times the size of the JSON grammar above, but the parse table it generates is hundreds of times bigger. The JSON parse table was generated in just a few milliseconds and the PHP source code for the table alone is about 3K. In comparison; my current JavaScript parse table generator takes about 7 minutes and the table source code is about 800k.

Anyhow, I digress. The purpose of showing off the JSON parser is to underline the usefulness of the parse tree, as I touched on in part 2, and of course to make it relevant to PHP :)

Parse node classes

Each node in the parse tree is assigned a different PHP class which extends a vanilla flavoured node class. You can manipulate these nodes much as you would with an XML or DOM tree. Most importantly you can write custom routines to evaluate them. When you evaluate the root node you begin a recursive procedure which ultimately gives you a value, or object that represents the whole structure. In this case an associative array which is the deserialized JSON object.

These nodes don’t need much code either. For example, the terminal symbol JSON_NUMERIC_LITERAL has a class assigned to it who’s evaluate method simply returns its string value as a native PHP number. The nodes for JSON_OBJECT_LITERAL and JSON_ARRAY_LITERAL are obviously a bit more complex, but I’m sure you get the idea. It doesn’t take much imagination to see that once the parser has given you a tree it’s very easy to hook in whatever logic you want.

This was only an academic exercise, particularly as PHP5 has a JSON extension enabled by default. The PHP json_decode function is much faster than my parser, and of course my parser is only one-directional, but it shows that if an extension doesn’t exist for what you want to parse, it is possible to write one in native PHP.

As is usual for me, this blog has very little direction and I am not sure what the topic of the next post will be, or if there was even a topic for this one. However, my goals for this body of work are clear and I look forward to demonstrating a fully working JavaScript parser some day soon. I will also release some code eventually – honest.