If you haven’t read Part 1, or Part 2 they are there for the reading.
I’m going to demo a JSON parser in this post. It’s 100% native PHP code, and is based on the work I’ve done toward my ultimate goal of a full JavaScript parser.
I thought I’d get this example online now as my ultimate goal is taking longer than I had hoped. I shan’t go into the details, suffice to say that the JSON grammar below is a very tiny subset of the full JavaScript grammar and doesn’t really have any complex rules.
In part 1 I introduced and demonstrated the parsing concept using a very simple date parser. In this part I am going to talk about the important role of tokenizing. If you haven’t read part 1 this may not make much sense, so read it now if you haven’t already.
Looking again at the simple grammar of part 1. You may notice that the rule: <D_DIGIT> ::= "0" | "1" ... "9" is a bit different to all the others. It does not really contribute to the syntax of our language, it merely describes the legal characters that make up a single digit. It is convenient to view this aspect of the language as a subset of the grammar; one that is concerned only with what input ‘looks like’ rather than where it appears. This can be called the lexicalgrammar. The rest of the language which is concerned with syntax can be called the syntactical grammar. Continue reading…
Parsing is a fairly common word in the web developer’s vocabulary. We do it all the time. One immediately thinks of XML as something we parse regularly without batting an eyelid. As a PHP developer you might also parse an ini file with parse_ini_file, or parse a date string with strtotime. Whatever language you write, these tasks are easily achieved using either built-in functions or by installing other code libraries or extensions. Sometimes you may find yourself needing to parse something more bespoke, like say a postcode – you’ll either write a routine yourself, or do some googling for a neat algorithm someone out there has decided to share. – no problem.
A rod for my back
But what if you want to parse something really complex, like say – an entire JavaScript program. What if you can’t find a third party library that works for you? Well I tried to find one. I found some very promising projects. But they ranged from abandoned projects, to dodgy alpha releases, to ones that just plain didn’t work and with no documentation to help. The most serious looking projects were so sophisticated that I didn’t even have the knowledge to start using them. I decided, as I often do, that I need empowering with the knowledge to write my own parser should I need one for – well, whatever. Continue reading…