If you haven’t read Part 1, or Part 2 they are there for the reading.
Click here to play with the interactive JSONParser demo
In part 1 I introduced and demonstrated the parsing concept using a very simple date parser. In this part I am going to talk about the important role of tokenizing. If you haven’t read part 1 this may not make much sense, so read it now if you haven’t already.
Looking again at the simple grammar of part 1. You may notice that the rule:
<D_DIGIT> ::= "0" | "1" ... "9" is a bit different to all the others. It does not really contribute to the syntax of our language, it merely describes the legal characters that make up a single digit. It is convenient to view this aspect of the language as a subset of the grammar; one that is concerned only with what input ‘looks like’ rather than where it appears. This can be called the lexical grammar. The rest of the language which is concerned with syntax can be called the syntactical grammar. Continue reading…
Parsing is a fairly common word in the web developer’s vocabulary. We do it all the time. One immediately thinks of XML as something we parse regularly without batting an eyelid. As a PHP developer you might also parse an ini file with parse_ini_file, or parse a date string with strtotime. Whatever language you write, these tasks are easily achieved using either built-in functions or by installing other code libraries or extensions. Sometimes you may find yourself needing to parse something more bespoke, like say a postcode – you’ll either write a routine yourself, or do some googling for a neat algorithm someone out there has decided to share. – no problem.
A rod for my back