perl's intermediate
representation (IR) and the bare JVM, the more generalized GNU Kawa IR is
used as a companion to perl's IR. The perl IR is massaged into Kawa's
``middle-layer'' IR. In turn, Kawa's IR compiles directly to JVM bytecode.
While Kawa's IR does not hand us a ready-made port of Perl to the JVM, it
does provide more infrastructure than is available with any other system.
Using Kawa's IR, together with the Bmodule, brings us much closer to a
full Perl port to the JVM than has hitherto been possible.
B
module to facilitate direct compilation from the perl intermediate
representation (IR) to JVM assembler. Both these approaches have
unfortunate drawbacks that render them cumbersome for all but trivial
subsets of Perl. In particular, these two approaches share their most fatal
drawback. They both must effectively emulate the entire perl back-end
(the Perl Virtual Machine, or PVM) in the Java environment.
The PVM is discussed in [1, 2, 6, 7] and in perlguts. The PVM can take
the IR generated by perl's front-end, along with a set of Perl data type
implementations, and evaluate the IR (and in doing so, execute the code
given by the Perl programmer).
When porting to the JVM, previous approaches needed to effectively emulate
parts of the PVM using hand-coded Java. Specifically, both approaches
presented last year do so by using hand-coded Java classes to emulate Perl's
native data structures on the JVM. Such work is easy for small examples,
but as a full port of Perl is pursued, such work becomes unwieldy. It
amounts to reimplementation of the entire PVM in Java.
Parallel to previous work on porting Perl to the JVM, Per Bothner had
continued to develop the GNU Kawa system for Scheme [3, 4]. As he worked,
he began to abstract some of the components of Kawa that were not
Scheme-specific, and he developed them into an interesting layer of
abstraction. The end result was a high level and generalized intermediate
representation (IR) that compiled to the JVM.
This new Kawa IR filled a large gap in the JVM porting community. Namely,
it provided an infrastructure to port many languages to the JVM. This paper
discusses how that IR can be used to facilitate a better, easier and more
robust port of Perl to the JVM than those methods previously attempted.
perl does have its own IR. [6] discussed how this IR can
be used to generate JVM assembler directly. It was discovered, however,
that perl's IR simply did not map well onto the JVM.
In hindsight, this is not surprising. The perl IR was not designed, as
GCC's was, to ease the burden of creating new front-ends and back-ends. In
fact, perl's IR was actually designed specifically to work with and
depend on the PVM. Thus, it makes sense that using perl's IR to port to
new architectures is difficult.
Given this reality, the next step is to find a way to still leverage the
useful perl front-end IR in a way that will better facilitate a port to
the JVM. The solution proposed in this paper considers using a second IR
that is specifically designed to function with the JVM. To finish the job,
a translator is being written that massages perl's IR into the other IR.
This approach is easier, because an IR designed to be general will have
better facilities to implement various language features. For example,
features like lexically-scoped variables and anonymous subroutines are
common in many languages. If the IR supports these features, translating
from perl's IR to the new IR will be easier. And, even for those
features that are unique to Perl, a good IR provides facilities (better than
those available on the bare JVM) to implement those additional features.
The GNU Kawa IR serves as this new IR. Originally designed for Scheme,
Kawa's IR has been generalized to support basic generic features that are
common in many very high level languages. In addition, Kawa has extensible
parts, too. For example, the user of the IR can implement a class that
controls variable binding lookup. Yet, the IR's object oriented interface
hides the details of how that variable binding lookup operates internally.
This feature alone can help simplify one of Perl's most complex
features---tied variables.
gnu.expr.*. This package,
described in [3, 4], provides classes that represent nodes in a parse tree.
Each node in that tree is a subclass of the abstract base class,
Expression.
Each Expression has two key methods: eval and compile. The former
is used when the source program is run interactively. When invoked, eval
evaluates the current expression (and often subexpressions) in the context
of the current run-time environment. The compile method is used to
compile the expression (and often subexpressions) to a JVM class file for
later use.
One of the most important subclasses of Expression is the LambdaExp,
which provides the basic semantics of a lambda expression in Scheme.
However, the LambdaExp class is an abstraction of lambda, and thus
does not contain anything specific to Scheme. The LambdaExp can be used
to enclose functions as well as objects. LambdaExp has straightforward
methods for handling function parameters, localized and lexically scoped
variables and variables internally captured by a closure.
The ModuleExp is a subclass of LambdaExp. A ModuleExp maps onto a
JVM class, and can include declarations of both static and instance
variables, as well as static and instance methods. When translating
perl's IR to Kawa's IR, a ModuleExp is used to represent each Perl
package.
$x = 5;
$y = $x + 7;
print "RESULT: $y", "\n";
Figure 1 shows a diagram representing how this ``add and
print'' example is compiled to Kawa's IR.
The top-level Perl package is compiled to a ModuleExp. Contained by
that ModuleExp are two Declaration objects for the two scalar
variables used in the program.
The ModuleExp contains one subexpression, a BeginExp. A BeginExp
is used to organize a set of subexpressions, and evaluate them for their
side effects.
One such subexpression seen in this example is the ApplyExp. This
expression is a generalized way to evaluate a function. The functions can
be existing LambdaExps, or, as in this case, a reference to some known
procedure. In this example, we refer to a PrimProcedure object, which
simply refers to a known function that is written in Java. However, an
ApplyExp can just as easily refer to unnamed functions compiled by Kawa,
or named functions written in Perl or Scheme. This flexibility is a great
advantage over other methods for compiling non-Java languages to the JVM.
Another particularly flexible mechanism is variable binding. The SetExp
is used to set a variable to a particular value. Each SetExp refers to
some variable. However, the binding of that variable need not be specified
directly. The variable is looked up by Kawa in the current context. Thus,
worrying about how the variable is bound is left up to the IR. The compiler
from perl's IR to Kawa's IR need not specifically worry about that issue.
The complement to the SetExp is the RefExp. A RefExp refers back
to a variable so its value can be used as an r-value. Variable binding and
lookup are handled completely by Kawa. The programmer who generates the IR
need not be concerned with those details; the programmer need only tell Kawa
the scoping style of variable (lexical or dynamic) and attach the
declaration to the right Kawa expression. Kawa does the rest.
SetExps and RefExps can also be used to solve the
issues with Perl's tie. Kawa has a binding mechanism, whereby variable
declarations can have particular constraints associated with their use.
SetExps and RefExps are evaluated or compiled with references to these
constraint objects. At runtime, the constraint objects are resolved. In
this manner, JVM code can automatically be generated to handle complex
variable access mechanisms, such as exist with Perl's tie.
This flexibility is typical of Kawa's facilities. Kawa is specifically
designed as an IR for very high level languages. Thus, those features one
expects in such a language are completely native and natural to Kawa. Kawa
provides abstractions for features that required careful and copious
hand-coding in previously attempted ports.
As another example, consider Kawa's core expression type, the LambdaExp.
This expression natively provides full support for what Perl calls anonymous
subroutines, and even provides support for closures. These are complex
language features that require careful consideration to implement. Kawa's
IR provides an abstraction of these features to the compiler writer, so that
she can focus only on the specific details of how the given source language
handles and uses those features.
perl
front-end, as it is written in C.
In addition, as is noted in [2, 5, 6, 7], the easiest way to leverage
perl's front end is to use the Band Omodules. Of course, these
modules are written for access via Perl itself. Thus, we are presented with
a bootstrapping problem: linking C and Java together is not yet particularly
easy (GCC 3.0 will likely make this easier, but not trivial), and Perl
modules cannot yet interface with Java easily until we are done a full JVM
port.
To solve this problem, the Java-Perl Lingo (JPL) is used. The JPL is a part
of the core perl distribution that eases the integration of Java and
Perl. The JPL allows Java and Perl code to integrate somewhat seamlessly
via the Java Native Interface (JNI). (More details on the JPL are available
in [6, 7].)
Thus, the core of the new perljvm implementation is a Bmodule that
uses the JPL to make calls to Kawa's API. This Bmodule instantiates
various Kawa expression objects, builds a Kawa IR tree from the internal
perl IR tree, and finally tells Kawa to write the compiled JVM bytecode.
Massaging the perl IR into the Kawa IR is relatively straightforward (at
least compared to previous methods). Of course, perl's IR is on a very
high level, but so is Kawa's IR. Thus, Kawa mitigates the ``impedance
mismatch'' between the perl IR and the lower level JVM code.
Bmodule, the
programmer must still understand the intimate details of the canonical
perl implementation. For a programmer to effectively work on the
project, she must have a depth of knowledge of both Kawa and perl.
Currently, there a few programmers who have both the time and inclination to
acquire such knowledge, and thus progress on the project is slow.
Not only is Kawa complex, but also its API has not yet solidified. As
active porting efforts of non-Scheme languages to Kawa continue, Kawa's
features are being generalized more and more. This is advantageous in the
long run, as it will make Kawa more generally useful. However, in the short
term, those who use Kawa as a porting infrastructure must keep up closely
with Kawa development. This factor further impedes the ability for new
programmers to join the project.
Finally, some have voiced a concern that given the slow speed of current
Java environments, and the high level of abstraction used by Kawa, there
will be serious efficiency problems with this approach. This particular
drawback can be answered in two ways. First, it is clearly better to have
an inefficient port rather than no port at all. This work is the first of
the many attempts of porting Perl to the JVM that has shown real promise to
work for non-trivial Perl programs. We must continue forward, and then
focus on optimizing for speed once the job is done.
Second, native compilation of both Java source and JVM bytecode is an active
area of development. Such native compilation could speed Kawa-generated
bytecodes up considerably. In addition, since this port centers around
Kawa, the port will benefit from optimizations added directly to Kawa. Such
optimizations are one focus of current Kawa development.
perl's
existing front-end. Use of that front-end leverages the many person-years
that went into that canonical implementation of Perl, while limiting the
problem scope to a much more manageable task.
However, using perl's front-end was not without its own challenges.
Earlier approaches implicitly assumed that the PVM could be easily mapped
onto the JVM (or Java) directly. The perl IR proved too inflexible for
this method, as it was not designed to be a generalized IR. Attempting to
use it as such quickly took perl's IR to its limits. In the case of a
JVM port, this meant that far too much new Java code was required to support
even the simplest of features.
In addition, reliance on the JVM to act in the same manner as the PVM proved
to be another problematic assumption. The JVM is not designed to be a
perfectly general virtual architecture. This is not necessarily a flaw in
the JVM, but it does indicate that using the JVM in such a general way is
not the best approach.
Fortunately, the Kawa system provides a more generalized method for
compiling non-Java languages to the JVM. Kawa introduces a layer of
abstraction that is absolutely necessary if the JVM is to be used as a
general architecture for non-Java languages. Other projects that port
non-Java languages to the JVM would do well to revisit Kawa in its current
state, and perhaps migrate to it. Such migration would not only alleviate
problems faced in those projects, but standardizing on Kawa would also ease
the task of integrating the object models of the various JVM ports.
In the specific case of our Perl port, Kawa solved some even more difficult
problems. Using Kawa overcomes the deficiencies inherent in perl's IR
and its tight coupling with the PVM. By providing a higher-level IR, Kawa
eases the reuse of perl's IR. The minutiae of book-keeping required when
trying to compile perl's IR directly to JVM bytecode disappears when Kawa
is used. Semantic mapping is the sole focus, and the common details of
compilation are handled internally by Kawa's compilation process.
The most open area for future work is to continue porting more of Perl to
the JVM via Kawa. Currently, only a subset of Perl is supported, but the
path is clear. Kawa's infrastructure makes the task of porting Perl to the
JVM much more feasible. It is hoped that more developers will become
interested in the project now that this work has laid out a clear path to
the goal.
Also, this work benefits more than just the Perl community. Already, the
work of porting Perl to the JVM via Kawa has inspired enhancements to Kawa
itself. It is hoped that continued efforts to port a unique language like
Perl via Kawa will help Kawa to become even more generalized and robust.
As Microsoft's .NET system looms on the horizon, The Kawa/JVM environment
can be a real competitor to it. Of course, a Kawa/JVM system has the added
advantage that it is completely open and free software, while Microsoft's
.NET will no doubt remain proprietary. It is hoped that this advantage can
carry a Kawa/JVM-based language system, along with a Perl port to Kawa/JVM,
to success for users and programmers alike.
perljvm software is licensed under the same license as perl: a
disjunction of the Artistic License and the GNU General Public License.
The perljvm software is available via savannah, the GNU project
development server. See the URL,
http://savannah.gnu.org/projects/perljvm/, for details.
More information about Kawa is available at
http://www.gnu.org/software/kawa/.
perljvm: Using B to Facilitate a Perl Port To the Java
Virtual Machine''. Proceedings of the Perl Conference 4.0, pages
17-23. Monterey, CA, USA. July 2000.
[7]
Kuhn, Bradley M. ``Considerations on Porting Perl to the Java Virtual
Machine''. Master's Thesis. Department of Electrical and Computer
Engineering and Computer Science. University of Cincinnati, Cincinnati, OH,
USA. January 2001.
[8]
Mccrae, Raymond, et al. ``PerlCaffeine: Compiling Perl to Java''.
Proceedings of the Perl Conference 4.0, pages 127-135. Monterey, CA,
USA. July 2000.
[9]
Stallman, Richard M. Using and Porting the GNU Compiler Collection
(GCC). Free Software Foundation, Boston, MA, USA, Edition 2.95. August
2000.