Tapestry Training -- From The Source

Let me help you get your team up to speed in Tapestry ... fast. Visit howardlewisship.com for details on training, mentoring and support!

Sunday, December 07, 2008

ANTLR and code generation

In between packing (I'm moving across town) I'm doing a bit of work for Tapestry 5.1, TAP5-79: Improve Tapestry's property expression language to include OGNL-like features. People really miss being able to do a few cool things in OGNL, such as create lists and maps on the fly ... this is not uncommon when creating a page activation context.

The 5.0 code was based on regular expressions and hand parsing because it only supported a very limited number of options. For 5.1, the grammar will grow considerably, adding options for list and map creation, method invocation (with parameters), and perhaps property projection and list filtering. Hand-tooled parsers aren't going to keep up, so it was time to switch to a more complete solution.

I ended up choosing ANTLR because it seems well supported, has a book and good online documentation, and a set of supporting tools. ANTLR is used elsewhere as well, for example by Hibernate to parse HQL.

There is even decent support for ANTLR with Maven (while Tapestry still builds with Maven, something I hope to address soon). Because of this, I only check my grammar files into SVN, not the generated files; on the continuous integration server, the ANTLR plugin generates the lexer and parser code fresh for each build.

The only real down-side is the runtime dependency ... about 113K and problematic if Tapestry is ever combined with some other tool that has a dependency on a different and incompatible version. Hibernate (for better or worse) uses the ANTLR2 runtime library, which uses different package names.

My first step was to re-create Tapestry 5.0's behavior on top of ANTLR. Because of some complexity in the lexical part of the grammar (that darn ".." operator!) it took quite a bit of head bashing. I did eventually figure it out, and did what any self-respecting coder should do ... leave a simple, useful, documented example for the next poor slob.

Now I'm back into the side of code generation; Tapestry's property expression grammar is converted directly into bytecode; the intermediate language is Javassist, which is a significant subset of Java. So I parse the property expressions into a AST (abstract syntax tree), then generate what looks like Java code from that, which gets compiled in-process and turned directly into instantiable classes.

How would you test something like that? At one time, I would try to unit test that the generated code was correct. Eventually I hit some bugs where my tests passed, but the generated code was incorrect.

With code generation, there is no such thing as a unit test, it's always an integration test. You can try and limit the scope, but there's too many moving parts for a unit test to useful or credible.

Instead, I test my parsing and code generation logic by testing the generated objects' behavior. So I feed in a large number of expressions and objects to have expressions evaluated upon, and check that the results I get by reading and setting property expressions is correct. If I get the right results, I know the generated code is good.

7 comments:

Anonymous said...

take a look at javacc, https://javacc.dev.java.net/, if you want a runtime-dep free option (there's also a book and maven plugin :)

Yves Zoundi said...

JavaCC is nice but it lacks books and documentation compared to ANTLR.
ANTLR can target many programming languages while JavaCC is Java centric. However JavaCC doesn't need any runtime dependencies. I also find JavaCC easier to work with the Lexer state, for something similar to incremental lexing.

Testing can be problematic but you might be able to add basic unit tests to integration tests. You could test the contents of simple expected generated code strings. You could load the generated code using the Java Compiler API or maybe some Beanshell.

Maven has its annoyances and problems, but when I look at what it can do compared to many other tools, I am truly convinced that Maven doesn't suck too much :-). Startup time, runtime, IDE support and documentation improved a lot since Maven 1.x. Ant, Sant, Ivy, Forrest, etc. also have their issues and/or limitations.

Yves Zoundi
VFSJFileChooser : http://vfsjfilechooser.sourceforge.net
XPontus XML Editor : http://xpontus.sf.net
Blog : http://yveszoundi.blogspot.com

Tiggr said...

Howard,

Is it because OGNL is

broken + unmaintained == dead

that you are re-implementing OGNL in Tapestry-5?

Regards,
Pieter Schoenmakers

Unknown said...

OGNL isn't broken and it isn't non-maintained (Jesse Kuhnert has taken over development of it). However, OGNL has performance issues: it was built for JDK 1.2, so it can't take advantage of JDK 1.5 concurrency features and it has a few choke points. And it is (despite Jesse's changes) still quite reflection based. Finally, Tapestry needs access to annotations of the bound property (this drives default validation and many other things).

Tiggr said...

Howard,

Thank you for replying.

I understand how you need certain features that OGNL currently does not provide. Of course, I am surprised that you rewrite something similar to OGNL from the ground up instead of building on OGNL.

May I use this opportunity to explain why OGNL is dead?

- The compiler is broken. It evaluates expressions during compilation, except if the expression refers to Iterator.next(). As if that is the only kind of method with a side effect.

- The compiler compiles an OGNL expression into a Java expression. This approach is not correct. It translates "a || b" to "booleanValue (a) ? a : b", which is incorrect if a has side effects.

- The web site ognl.org is dead. The latest news is from 2006 and the latest OGNL version is not announced there.

- The current site seems to be http://www.opensymphony.com/ognl/. The download page is empty. The source repository does not contain the sources to the 2.7.3 version that is available in maven.

- There is a user forum. There are 63 messages in 29 threads. The best read ones have >800 reads. That sounds like a lot of people looking for information. Yet they do not participate and presumably turn away (disappointedly). The developer forum has similar numbers.

- There are 151 issues in the issue tracker. 16 are open. Only 4 have been touched the last 3 months.

Despite that it is being maintained, as you say, it looks pretty dead.

I switched a big application from T3 to T4 this summer and I am utterly disappointed. The reason to switch was the portlet support in T4. Yet, out of the box, Tapestry 4 can not run portlets, as I reported in TAPESTRY-2548 on August 1 this year. That issue is still unresolved.

One of the best things of open source is that one can fork. I have forked Tapestry-4 and OGNL for my own use since I can no longer be bothered to report issues to a project that is not maintained. Yes, my T4 runs portlets smoothly and, yes, my T4 does not compile any OGNL expression and, yes, my T4 runs without swamping exceptions on the console output.

Drop me a note if you're interested in the patches.

Regards,
Pieter Schoenmakers

PS Howard, please adjust http://tapestry.apache.org. It says that T4.1 is active. Which it isn't. T4 is broken + unmaintained == dead too.

Renat Zubairov said...

It would be even more interesting to see how good coverage of _generated_ code does your unit tests have :)
I mean if AST to Text transformation generates some if statements then it should be multiple time tested for all conditions inside. What do you think about running a cobertura on the code you submit to Javassist to see how good coverage of it do you have in your unit tests?

Unknown said...

Renat,

Code coverage is actually pretty good for the generated code; this is the current nightly coverage report.

I've been focusing on ensuring that all the code in PropertyConduitSourceImpl is tested.

A lot of the additional code concerns error recovery, something I haven't started in on yet. I'm actually pretty good with making the parser very rigid, as long as it can report the error properly.