I’ve got all this PHP. Now what? — Parsing PHP in Go
Sun, Jul 27, 2014There’s a lot of PHP out there. The problems plaguing that language are well documented, but many companies and projects are frightfully entrenched. There have recently been projects aimed at improving performance and reliability: things like Hack, php-ng, and the recent effort toward a language specification. Unfortunately, none of these projects will directly result in improving large, existing bodies of PHP. So when I learned Go, and I watched Rob Pike’s talk on writing the text/template
lexer, I thought it would be fun to try out that language and idea on reading PHP. Over the past several months, that whim has evolved into a nearly feature-complete PHP parser.
Until now, the goals and milestones related to this project have been nebulous at best. It began as a crazy experiment, after all. It currently parses most of the code I throw at it, with support for most PHP 5.4 features, but not all. Thus, the most important goal at this point is to move the parser further to solidly support the full set of PHP 5.6 features. As a stretch, it would be great if it was possible to set the parser to check against specific PHP versions. The project currently has 85% test coverage and 67% coverage with full unit tests, but I’d like to increase those numbers. I’d also like to improve stability when parsing incorrect code.
At this point, I think there is an opportune moment to consider the next direction to take the parser beyond just parsing. Here is a list of ideas I have:
- A phpfmt tool (a la gofmt)
- Static analysis tools (e.g. type inference, dead code detection)
- Transpiler
The transpiler is perhaps my favorite idea, particularly in Go. With the go/ast
package, transpiling into Go is perhaps the closest in reach of all these goals, despite sounding so lofty.
All that said, this project began as an experiment, and for the time being, it continues as one. I’m happy to open up the project to a wider audience. Please feel free to comment or contribute. If you would like to just play with the parser, I have this little tool for testing. The code is available [on Github][4].