Package weka.core.expressionlanguage Description

Package for a framework for simple, flexible and performant expression languages

Introduction & Overview

The weka.core.expressionlanguage package provides functionality to easily create simple languages.

It does so through creating an AST (abstract syntax tree) that can then be evaluated.

At the heart of the AST is the Node interface. It's an empty interface to mark types to be an AST node.
Thus there are no real constraints on AST nodes so that they have as much freedom as possible to reflect abstractions of programs.

To give a common base to build upon the Primitives class provides the subinterfaces for the primitive boolean (Primitives.BooleanExpression), double (Primitives.DoubleExpression) and String (Primitives.StringExpression) types.
It furthermore provides implementations of constants and variables of those types.

Most extensibility is achieved through adding macros to a language. Macros allow for powerful meta-programming since they directly work with AST nodes.
The Macro interface defines what a macro looks like.

Variable and macro lookup is done through VariableDeclarations and MacroDeclarations resp. Furthermore, both can be combined through VariableDeclarationsCompositor and MacroDeclarationsCompositor resp.
This really allows to add built-in variables and powerful built-in functions to a language.

Useful implementations are:

The described framework doesn't touch the syntax of a language so far. The syntax is seen as a separate element of a language.
If a program is given in a textual representation (e.g. "A + sqrt(2.0)" is a program in a textual representation), this textual representation declares how the AST looks like. That's why the parser's job is to build the AST.
There is a parser in the weka.core.expressionlanguage.parser package.
However the framework allows for other means to construct an AST if needed.

Built-in operators like (+, -, *, / etc) are a special case, since they can be seen as macros, however they are strongly connected to the parser too.
To separate the parser and these special macros there is the Operators class which can be used by the parser to delegate operator semantics elsewhere.

A word on parsers

Currently the parser is generated through the CUP parser generator and jflex lexer generator. While parser generators are powerful tools they suffer from some unfortunate drawbacks:

The parsers are generated. So there is an additional indirection between the grammar file (used for parser generation) and the generated code.
The grammar files usually have their own syntax which may be quite different from the programming language otherwise used in a project.
In more complex grammars it's easy to introduce ambiguities and unwanted valid syntax.

It's for these reasons why the parser is kept as simple as possible and with as much functionality delegated elsewhere as possible.

Summary

A flexible AST structure is given by the Node interface. The Macro interface allows for powerful meta-programming which is an important part of the extensibility features. The Primitives class gives a good basis for the primitive boolean, double & String types.
The parser is responsible for building up the AST structure. It delegates operator semantics to Operators. Symbol lookup is done through the VariableDeclarations and MacroDeclarations interfaces which can be combined with the VariableDeclarationsCompositor and MacroDeclarationsCompositor classes resp.

Usage

With the described framework it's possible to create languages in a declarative way. Examples can be found in MathExpression, AddExpression and SubsetByExpression.

A commonly used language is:

 // exposes instance values and 'ismissing' macro
 InstancesHelper instancesHelper = new InstancesHelper(dataset);
 
 // creates the AST
 Node node = Parser.parse(
   // expression
   expression, // textual representation of the program
   // variables
   instancesHelper,
   // macros
   new MacroDeclarationsCompositor(
     instancesHelper,
     new MathFunctions(),
     new IfElseMacro(),
     new JavaMacro()
   )
 );

 // type checking is neccessary, but allows for greater flexibility
 if (!(node instanceof DoubleExpression))
   throw new Exception("Expression must be of boolean type!");
    
 DoubleExpression program = (DoubleExpression) node;

History

Previously there were three very similar languages in the weka.core.mathematicalexpression package, weka.core.AttributeExpression class and the weka.filters.unsupervised.instance.subsetbyexpression package.
Due to their similarities it was decided to unify them into one expressionlanguage. However backwards compatibility was an important goal, that's why there are some quite redundant parts in the language (e.g. both 'and' and '&' are operators for logical and).

Version:: $Revision: 1000 $
Author:: Benjamin Weber ( benweber at student dot ethz dot ch )