perljvm: Using B to Facilitate a Perl Port To the Java Virtual
Machine
B(back-end)
compiler modules provided the best path to accomplish a JVM port of Perl.
Since the Bmodules provide access to to the internal representation (IR)
used by the perl back-end, perljvm concerns itself only with translating
that IR into JVM assembler, using the Jasmin assembler syntax. A JVM
assembler code emitter module was developed to minimize perljvm's
dependence on Jasmin. To support Perl's native data types on the JVM,
equivalents for native Perl data structures must be designed and built in
Java. Once perljvm is stable and most common Perl features are
supported, work can begin on using the JVM as an object model to integrate
Perl and Java objects seamlessly. At that time, integration with the JPL
will be feasible.
eval construct using these two input
sets is then invoked, and in that manner, the source program is run.
This approach has a number of advantages. First, if the source language has
a well-written specification, or is a language with few constructs, based
around a single paradigm (e.g., an object-oriented paradigm), then
implementing an interpreter for the language is often a simple matter of
implementing the specification. Design issues are often already decided by
the specification or by the paradigm, greatly easing the burden on the
implementor.
A second advantage is that real-time, on-the-fly code evaluation (i.e.,
eval($string)) is always available. The Java program that implements the
interpreter simply needs to instantiate a new instance of the interpreter,
and feed it $string as input.
However, this approach has two disadvantages, one of which is particularly
problematic for a Perl port. The first disadvantage is speed. Since
hardware devices that have JVMs on a chip are still in the realm of research
labs, not the mass-market, most JVM implementations are done in software.
The JVM bytecodes are interpreted by this software. Thus as Per Bothner
notes, ``if your interpreter for language X is written in Java, which is in
turn interpreted by a Java VM, then you get double interpretation overhead''.
Such a situation is unacceptable for Perl, which has always prided itself on
speed.
Another disadvantage that might be acceptable for some languages, but is
completely unacceptable for Perl is code divergence. If a language has a
well-defined specification that describes precisely the syntax and semantics
of the language, code divergence is not an issue. An implementation must
adhere to the specification. However, it has often been noted in the Perl
community that ``the specification is the implementation''. The community
cannot tolerate divergent implementations. Indeed, much work in the
mid-1990s was done to stop the divergence of the Microsoft and Unix-like
Perl implementations. It would surely be a tragedy if a Perl port to the
JVM went down the road of code divergence.
Thus, the only way to safely achieve an acceptable JVM port of Perl using
this approach is to compile perl, the existing C implementation of Perl,
with a C compiler targeted to JVM. Experimental compilers of this nature do
exist [6], but they are far from ready for production. In addition, even
such a port of Perl would undoubtedly be slower than any of the other
approaches. Indeed, given the relatively large size of perl, such a port
would most likely be completely inappropriate for JVM implementations
embedded in small hardware devices or other software programs.
Therefore, simply waiting for a C compiler to be targeted to the JVM is not
the best approach for porting Perl to the JVM. Other methods must be
investigated and attempted.
goto) that exist on the JVM but do not exist in Java [3].
With these disadvantages and only one minor advantage, it is not surprising
that there has yet to be any language that has been successfully ported to
the JVM using this method.
Bmodules, perljvm can utilize perl's
existing front-end and much of the back-end to do the work of the port.
Perl's Bmodules allow the programmer implement their own back-ends
separate from perl's back-end. A module that is uses Bhas the
opportunity to examine and manipulate the IR that was generated by perl's
front-end. In addition, Bcan be used to examine the internal data
structures used by perl's back-end.
On the command line, the user interfaces to these back-ends via the O
module. The Omodule acts primarily as a wrapper, allowing the
corresponding Bmodule to be invoked. Thus, instead of running the
``default'' perl back-end, a completely different back-end, written in
Perl, can be chosen. (Please see [5] and [10] for a more complete
discussion of Band O.)
Taking advantage of this feature, the core of perljvm is implemented
using B. Since all the facilities of perl's front-end are provided,
there is no need for perljvm to have its own lexer or parser for Perl.
In addition, perljvm can use Bto examine the IR, which is roughly
``PVM code'', plus references to Perl's native data structures.
Bmodule, perljvm does not need to parse Perl, find
syntax errors, generate an IR, nor do any front-end compiler work. That is
already done by perl, and is completely accessible via the Bmodule.
With all front-end issues already solved, the next challenge is the creation
of valid JVM class files.
The JVM file format is quite complex. Directly generating such a file from
a Bmodule would be tricky. There is no standard for assembler syntax
for the JVM, so there are no tools in the standard Java environment to
easily generate JVM class files. Early in the project, this was a focus of
much attention.
However, Brian Jepson proposed that instead of generating the JVM class file
directly, perljvm should instead generate output using the Jasmin
assembler. Jasmin assembler is a syntax for writing JVM class files that is
similar to assembler formats used for non-virtual architectures [15]. This
solution greatly reduced the problem scope of perljvm. Mr. Jepson
discusses this idea extensively in [10].
However, there was the concern that the Jasmin assembler format is not
standardized; it is simply one possible format for JVM bytecode assembler.
Indeed, other formats do exist and are in use. Therefore, it was imperative
that perljvm rely on one particular assembler syntax as little as
possible.
To alleviate this problem, the concept of JVM ``bytecode emitters'' was
introduced. First, a virtual base class called B::JVM::Emit was created.
All code that must emit Java bytecode uses the interface provided by
B::JVM::Emit, and all subclasses of B::JVM::Emit must provide
implementations of B::JVM::Emit's interface specific to a given assembler
syntax.
As an example, consider the following code. It creates a JVM class called
Foo, with one static public method, main, whose body has a single JVM
dup instruction.
my $emit = new B::JVM::Jasmin::Emit("Foo");
$emit->methodStart("main([Ljava/lang/String;)V", "static public");
$emit->dup("main([Ljava/lang/String;)V");
$emit->methodEnd("main([Ljava/lang/String;)V");
If a standard assembler format for the JVM is ever created, one needs only
implement B::JVM::StandardAssembler::Emit as a subclass of
B::JVM::Emit, and change the first line in the example above to:
my $emit = new B::JVM::StandardAssembler::Emit("Foo");
Assuming that B::JVM::StandardAssembler::Emit is implemented properly,
the rest of the code will function properly, generating the Foo class.
Bmodules to manipulate
the IR, and a code emitter object for JVM bytecodes, most of the components
for a Perl to JVM compiler are in place. However, recall that the IR
generated by perl assumes that a PVM and implementations of Perl's native
data types are available. To successfully port Perl to the JVM, the data
types that Perl considers native must be available on the JVM.
One approach would be to ``map'' all of Perl's data types into equivalent data
types already available for the JVM. Unfortunately, in most cases, this
approach is not possible, since Perl's native data types are so unique. For
example, at first glance, it might seem feasible to map Perl's hash into an
object of type java.util.Hashtable. However, Java's hash tables do not
understand the concept of tie. Similarly, scalars cannot be mapped onto
java.lang.String, since scalars act like numbers when they are supposed
to, and Java strings do not. The uniqueness and flexibility of Perl's data
types, loved by Perl programmers everywhere, become the headache of the
programmer who wants to port Perl to an architecture whose data types are
not so unique and flexible.
Thus, for each data type that perl's back-end considers ``native'', an
equivalent class for it must exist on the JVM. Since the Java language
easily compiles to the JVM in an idiomatic way, these classes are
implemented in Java. Each class provides an interface that Perl expects for
the data type, and since the implementation is in Java, the data type can
run on the JVM.
As an example, consider the following portion of the class SvBase, which
is an implementation of perl's SvNULL [1]:
class SvBase implements Cloneable {
boolean defined;
SvBase() {
defined = false;
}
boolean isDefined() {
return defined;
}
void undef() {
defined = false;
}
// [...]
}
Thus, using this class, the Perl program:
defined $bar;could be compiled to the Jasmin equivalent:
.class public main .super java/lang/Object .method static public main([Ljava/lang/String;)V .var 0 is foo LSvBase new SvBase dup astore_0 dup invokespecial SvBase/<init>()V invokevirtual SvBase/defined()ZThe Java equivalent of that is as follows (perljvm does not actually translate to Java; the following code is provided for didactic purposes only):
class main {
static public void main(String argv[]) {
SvBase bar = new SvBase();
bar.defined();
}
}
This example gives the flavor of how the native Perl data types are
implemented in Java to provide access to Perl data types on the JVM. An
in-depth discussion of these Java classes will be available in [12].
Bmodule is most useful. Perljvm descends the
syntax tree provided by the IR, in a depth first fashion. At each opcode,
perljvm processes it using the emitter to generating Jasmin code. The
emitted Jasmin code utilizes the various data type classes to perform the
task the opcode would have performed had it been run on the PVM.
As an example, consider the follow code:
sub B::SVOP::JVMJasmin {
my $op = shift;
my $name = $op->name();
# ...
my $curMethod = # ....
# ...
if ($name eq "gvsv") {
my $stashName = $op->gv->STASH->NAME();
my $gvName = $op->gv->NAME();
$emit->getstatic($curMethod, "Stash/DEF_STASH", "LStash;");
$emit->ldc($curMethod, cstring $stashName);
$emit->invokevirtual($curMethod,
"findNamespace(Ljava/lang/String;)LStash;");
$emit->ldc($curMethod, cstring $gvName);
$emit->invokevirtual($curMethod,
"Stash/findGV(Ljava/lang/String;)Linternals/GV;");
$emit->invokevirtual($curMethod, "GV/getScalar()LScalar;");
}
# ...
}
In this code segment, we see part of the subroutine, B::SVOP::JVMJasmin.
The name indicates that it is the code for handling opcodes of type SVOP
for the JVM port using Jasmin. SVOP is a name provided and required by
the Bmodule. The JVMJasmin portion of the name is provided by the
user of B, and is usually given on the command line when using the O
module [5, 10].
The first argument when opcode subroutines are invoked is the opcode object
itself. Usually, as in this case, the name() method is called to decide
how to handle the particular opcode.
In this example, the code for handling the SVOP named gvsv is shown.
The gvsv opcode is used when a dynamically scoped variable is mentioned.
This opcode must find the actual data of the variable by searching for it in
the name space. To generate the equivalent Jasmin code for this opcode, the
three Java classes Stash, GV, and Scalar must be used. These are
equivalents to stashes, GVs and SVs in the perl core [1].
If the variable being looked for happens to be in the $main::foo, then
the code above generates Jasmin assembler that looks something like this:
getstatic Stash/DEF_STASH LStash; ldc "main" invokevirtual findNamespace(Ljava/lang/String;)LStash; ldc "foo" invokevirtual Stash/findGV(Ljava/lang/String;)Linternals/GV; invokevirtual GV/getScalar()LScalar;The Java equivalent of that is as follows (perljvm does not actually translate to Java; the following code is provided for didactic purposes only):
Stash.DEF_STASH.findNamespace("main").findGV("foo").getScalar();
If you compare this to the process described in [1] of how a stashes work
inside perl, it is easy to see that this is equivalent code for a gvsv
opcode (given that the Stash and GVJava classes do their jobs
correctly!).
Thus, a programmer wishing to add support for new opcodes in perljvm goes
through the following procedure:
(一)
Analyze the opcode in the perl back-end, and see if it uses any native
data types that do not have any equivalent Java classes yet.
(二)
If such Java classes are needed, write and test them.
(三)
Add a method B::OP_TYPE::JVMJasmin, and write code to emit equivalent JVM
assembler for the new opcode, utilizing the Java classes as necessary.
It should be noted that the approach taken for perljvm is not without
trade-offs. Since perljvm provides no actual interpreter nor compiler
for Perl on the JVM itself, constructs such as eval($string) will not be
supported. For these cases, it is most reasonable to wait until C itself is
retargeted completely to the JVM [6]. Once perl itself is available on
the JVM, perljvm can invoke it only as a last resort, but continue to
support the rest of Perl natively on the JVM for better performance.
Bthat is not heavily
tied to the perl core itself (as the B::CCis). Care has been taken
to carefully document what each opcode does, and what it needs to know about
Perl's native data types. As the project progresses, it may turn out that
perljvm is useful as an independent documentation and ``reverse
engineering'' of the PVM and Perl's native data types. This information
might be useful in future Perl efforts, such as Topaz.
Perljvm is currently available as part of B::JVM::Jasmin on CPAN, and
at http://www.ebb.org/perljvm. It is copyrighted by Bradley M. Kuhn, and
is licensed under the same license as perl itself.
perljvm.
Brian Jepson released an early prototype of B::JVM::Jasmin, the core
module used by perljvm. Although the current B::JVM::Jasmin shares no
code with Mr. Jepson's prototype, Mr. Jepson was the first to introduce the
idea of using Jasmin to facilitate a port of Perl to the JVM. His help is
greatly appreciated.
Finally, Mr. Kuhn thanks Matthew T. O'Connor, who has assisted in the
implementation of perljvm since the earliest versions.