Archive

Archive for March 30th, 2008

Parsing for PHP developers – Part II

March 30th, 2008 No comments

In part 1 I introduced and demonstrated the parsing concept using a very simple date parser. In this part I am going to talk about the important role of tokenizing. If you haven’t read part 1 this may not make much sense, so read it now if you haven’t already.

Syntactical vs Lexical

Looking again at the simple grammar of part 1. You may notice that the rule: <D_DIGIT> ::= "0" | "1" ... "9" is a bit different to all the others. It does not really contribute to the syntax of our language, it merely describes the legal characters that make up a single digit. It is convenient to view this aspect of the language as a subset of the grammar; one that is concerned only with what input ‘looks like’ rather than where it appears. This can be called the lexical grammar. The rest of the language which is concerned with syntax can be called the syntactical grammar. Read more…

Tags: ,