Tag Archives: parsing

After nearly two years I’ve finally gotten around to releasing my PHP JavaScript parser, although documentation is still thin on the ground.

The library has been split in two:

  1. jTokenizer – A JavaScript tokenizer designed to mimic the PHP tokenizer.
  2. jParser – The fully blown JavaScript syntactical parser which generates a parse tree.

Continue reading…

If you learn a programming language it is unlikely that you will read the formal language specification that defines all the laws of the syntax. You may never read it at all. It is more useful to learn by example, or at least topic-by-topic. However, a mere ten years after writing my first few lines of JavaScript, I read the ECMAScript standard and it threw up some things I did not know.

There are many things that you can write in JavaScript that are perfectly valid syntax, but that you probably never will write. Here are a few that raised an eyebrow or two.

Continue reading…

[ Update 18 Nov 2009 ]

This article is rather old now – the jParser code has been released

Despite the glorious sunshine this week, my week off, I managed to put some time into my pet project of developing a full JavaScript parser written in 100% native PHP. Actually, I’ve been developing a generic parser suite for some time, and using it to build a full JavaScript parser was my ultimate goal to be satisfied that it all works and is powerful enough to be useful. I’ve written a bunch of blogs about developing a parser generator in PHP, (click “parsing” to do a tag search).

Before I start wittering on,
Click here to play with the online example of JParser

Continue reading…

JSON Parser

If you haven’t read Part 1, or Part 2 they are there for the reading.

I’m going to demo a JSON parser in this post. It’s 100% native PHP code, and is based on the work I’ve done toward my ultimate goal of a full JavaScript parser.

Click here to play with the interactive JSONParser demo

I thought I’d get this example online now as my ultimate goal is taking longer than I had hoped. I shan’t go into the details, suffice to say that the JSON grammar below is a very tiny subset of the full JavaScript grammar and doesn’t really have any complex rules.

Continue reading…

In part 1 I introduced and demonstrated the parsing concept using a very simple date parser. In this part I am going to talk about the important role of tokenizing. If you haven’t read part 1 this may not make much sense, so read it now if you haven’t already.

Syntactical vs Lexical

Looking again at the simple grammar of part 1. You may notice that the rule: <D_DIGIT> ::= "0" | "1" ... "9" is a bit different to all the others. It does not really contribute to the syntax of our language, it merely describes the legal characters that make up a single digit. It is convenient to view this aspect of the language as a subset of the grammar; one that is concerned only with what input ‘looks like’ rather than where it appears. This can be called the lexical grammar. The rest of the language which is concerned with syntax can be called the syntactical grammar. Continue reading…

Parsing is a fairly common word in the web developer’s vocabulary. We do it all the time. One immediately thinks of XML as something we parse regularly without batting an eyelid. As a PHP developer you might also parse an ini file with parse_ini_file, or parse a date string with strtotime. Whatever language you write, these tasks are easily achieved using either built-in functions or by installing other code libraries or extensions. Sometimes you may find yourself needing to parse something more bespoke, like say a postcode – you’ll either write a routine yourself, or do some googling for a neat algorithm someone out there has decided to share. – no problem.

A rod for my back

But what if you want to parse something really complex, like say – an entire JavaScript program. What if you can’t find a third party library that works for you? Well I tried to find one. I found some very promising projects. But they ranged from abandoned projects, to dodgy alpha releases, to ones that just plain didn’t work and with no documentation to help. The most serious looking projects were so sophisticated that I didn’t even have the knowledge to start using them. I decided, as I often do, that I need empowering with the knowledge to write my own parser should I need one for – well, whatever. Continue reading…