new gtk2-perl

As I mentioned last week or so, the non-perlish handling of GObject versus GtkObject references and GBoxed lifetimes and all that stuff really grated on my nerves, and I wanted to take a week or so to play around with ideas for handling it better.

Well, I have something to show you now. It's essentially a complete rewrite. First I'll explain the philosophy behind the new stuff, and then get into why I decided to go this way and why i think we should move the gtk2-perl project to this new architecture and code base.

Please understand i do not mean this to be an inflammatory proposal, and i certainly don't want to anger or hurt anyone. I see this as a proposal for taking gtk2-perl to the next plateau in ways that are just not practical in the current architecture.

code for your perusal

I did most of the infrastructure and layout work (the really hard stuff) last week. On thursday, Ross McFarland (who had posted to the list not long ago) offered to do some of the grunt-work of implementing lots of Gtk classes while i spent the weekend hammering out wrinkles and trying to get the scribble sample to run. I have a tarball snapshot of our work up to tonight available on my home page. Please have a look through it and discuss. We are building against gtk+-2.0.6 on stock RedHat 8.0 systems.

Basic Philosophy
The G Module

Wrappers
GType to Package Mappings
GEnums and GFlags
GBoxed
GObject
GSignal

The Gtk2 Module

GtkObject
Typemap scheme
Autogeneration

maps
gtk2perl-autogen.h
gtk2perl.typemap
register.xsh
boot.xsh

Why Inline Is Not the Way To Go for Gtk2-Perl
Where to Go From Here

In the following i refer to the original Gtk-Perl-0.7008 as gtk-perl, the existing unfinished Inline-based perl bindings for gtk2 as gtk2-perl, and my new unfinished XS-based gtk2 bindings as new-gtk2-perl.

Basic Philosophy

I designed my package with a few basic tenets in mind.

Stick close to the C API so that you can use knowledge from the C API and API reference docs with the perl bindings; this is overruled in some places by the remaining tenets.
Be perlish. This is the most important. The user of the perl bindings should not have to worry about memory management, reference counting, freeing objects, and all that stuff, else he might as well go write in C instead (or worse yet, may defect to python!).
Leave out deprecated functionality.
Don't add new functionality. The exceptions to this rule are consolidation of methods where default parameters may be used, or where the direct analog from C is not practical. For example, Gtk2::Button->new and Gtk2::Button->new("label") instead of a separate Gtk2::Button->new_with_label. Also, $red = Gtk2::Gdk::Color->new (65535, 0, 0);
Be lightweight. As little indirection and bloat as possible. If possible, implement each toplevel module (e.g., G, Gtk2, Gnome2, GtkHTML, etc) as one .pm and one .so.
Be extensible. Export header files and typemaps so that other modules can easily chain off of our base. Do not require the entirely of Gtk2 for someone who needs only to build atop GObject.
C objects must always outlive their Perl wrappers; from the other side, a perl wrapper should never point to an invalid C object.

The G Module

(This may not be the best choice for namespaces, but I thought G::Object made more sense than GObject::Object.)

In keeping with the tenet of not requiring the entire car for someone who only needs a single wheel, I broke the glib/gobject library family into its own module and namespace. This has proved to be a godsend, as it has made things very easy to debug; there's a clean separation between the base of the type system and the stuff on top of it.

Actually, the G module was originally going to be the only thing I implemented, but it turned out that I couldn't test the code because the types it defined were all abstract. So, I started implementing Gtk2 as well, and that's that.

The G module takes care of all the basic types handled by the GObject library --- GEnum, GFlags, GBoxed, GObject, GValue, GClosure --- as well has signal marshalling and such in GSignal. I'll discuss each of these separately.

Wrappers

In order to use the GObject types from perl, we need to wrap those objects up in perl scalars. The basic type of perl wrapper is a blessed scalar which is a reference to a scalar which holds the actual underlying object's pointer value. Perl scalars are reference counted, and thus we can rely on perl to keep track of when the wrapper is no longer needed.

The fundamental point about the lifetime of an object and its wrapper is two-fold: the C object must always outlive the perl wrapper, and the perl wrapper must never point to something invalid.

If an object is created by a function that returns directly to perl, then the wrapper returned by that function should "own" the object. If no other code assumes ownership of that object (by ref'ing a GObject or copying a GBoxed), then the object should be destroyed when the perl scalar is destroyed (actually, as part of its destruction).

If a function returns a preexisting object owned by someone else, then the bindings should NOT destroy the object with the perl wrapper. How we handle this for the various types is described below.

GType to Package Mappings

Both gtk-perl and gtk2-perl use substitution rules to map GType classes to corresponding perl packages. The fundamental flaw here is that the substitution rules are not easily extendable and are easily broken by extension packages which don't follow the naming conventions.

To circumvent this shortcoming, I built in the idea of explicit mappings. There is no chance of tricking a substitution mechanism, and at runtime you do a hash table lookup instead of a complete regex match and sub. This also allows us to track which classes are and are not registered and catch other fun things.

In addition, the type system tries as hard as it can to recover when things don't go well, using the GType system to its advantage. If you return a C object of a type that is not registered with Gperl, such as MyCustomTypeFoo, gperl_new_object (see below) will warn you that it has blessed the unknown MyCustomTypeFoo into the first known package in its ancestry, Gtk2::VBox.

GBoxed and GObject have distinct mapping registries to avoid cross-pollination and mistakes in the type system. See below.

To assist in handling inheritance that isn't specified directly by the GType system, the function gperl_set_isa allows you to add elements to the @ISA for a package. gperl_register_class does this for you, but you may need to add additional parents, e.g., for implementing GInterfaces. (see Gtk2/xs/GtkEntry.xs for an example)

GEnums and GFlags

These are largely unchanged from gtk2-perl. My only thought is towards changing the function names from gperl_convert_enum, gperl_convert_back_enum, etc to gperl_enum_from_sv, gperl_sv_from_enum, etc.

GBoxed

GBoxed provides a way to register functions that create, copy, and destroy opaque structures. For our purposes, we'll allow any perl package to inherit from G::Boxed and implement accessors for the struct members, but G::Boxed will handle the object and wrapper lifetime issues.

There are two functions for creating boxed wrappers:

SV * gperl_new_boxed (gpointer boxed, GType gtype, gboolean own);
SV * gperl_new_boxed_copy (gpointer boxed, GType gtype);

If own is TRUE, the wrapper returned by gperl_new_boxed will take boxed with it when it dies. In the case of a copy, own is implied, so there's a separate function which doesn't need the own option.

When you register a boxed type you get the option of supplying the name of the package to which wrappers for this type should be blessed, or a function which will return the name of the package. The reason for this is that boxed type pointer doesn't know it's own gtype as a GObject does, and for some polymorphic structures you may need to do extra magic.

This hook comes in handy for GdkEvent, for example, which registers a get_package function which returns, e.g., "Gtk2::Gdk::Event::Button" when event->type == GDK_BUTTON_PRESS; farther on down the road, "Gtk2::Gdk::Event" is added to @Gtk2::Gdk::Event::Button::ISA. This allows sophisticated things to happen without hard-coding downstream dependencies into the base library!

To get a boxed pointer out of a scalar wrapper, you just call gperl_get_boxed_check --- this will croak if the sv is undef or not blessed into the specified package. In general, you'll want to call this function from C preprocessor macros used in a typemap; see the Gtk2 autogeneration description below.

GObject

The GObject knows its own type. Thus, we need only one parameter to create a GObject wrapper:

SV * gperl_new_object (GObject * object);

The wrapper SV will be blessed into the package corresponding to the gtype returned by G_OBJECT_TYPE (object), that is, the bottommost type in the inheritance chain. If that bottommost type is not known, the function walks back up the tree until it finds one that's known, blesses the reference into that package, and spits out a warning on stderr. To hush the warning, you need merely call

void gperl_register_class (GType gtype, const char * package);

This magical function also sets up the @ISA for the package to point to the package corresponding to g_type_parent (gtype). [Since this requires the parent package to be registered, there is a simple deferral mechanism, which means your @ISA might not be set until the next call to gperl_register_class.]

There are two ways to get an object out of an SV (though I think only one is really needed):

GObject * gperl_get_object (SV * sv);
GObject * gperl_get_object_check (SV * sv, GType gtype);

The second one is like the first, but croaks if the object is not derived from gtype.

You can get and set object data and object parameters just like you'd expect.

GSignal

All of this GObject stuff wouldn't be very useful if you couldn't connect signals and closures. I got most of my handling code from gtk2-perl and pygtk, and it's pretty straightforward. The data member is optional, and must be a scalar. Callbacks are not eval'd (I kept ignoring missing functions because they didn't kill my app).

To connect perl subroutines to GSignals I use GClosures, which require the handling of GValues. Again, largely borrowed from working code.

The Gtk2 Module

After I got the G module partially implemented I wanted to test it, but realized that its types were all abstract and largely not instantiable. So, I had to implement some client code. Since there were questions about how to handle GtkObjects with their floating references and GdkEvents and all this stuff, I decided to get enough going to run the scribble example from the gtk source. Since Gtk is a lot larger than GObject, I also chose to use autogeneration to idiot-proof a lot of the more tedious stuff. Also, I intended to include Gdk, Atk, and Pango under the Gtk2 namespace, in the style of the existing gtk2-perl project.

GtkObject

GtkObject adds the idea of a floating reference to GObject. A GObject is created with one reference which must be explicitly removed by its owner. GtkObject has a floating reference which is sunk by the code which wants to own it. This makes it less painful to create lots of objects in a row (you don't have to unref them).

It also takes away a bit of the pain in creating perl wrappers. The wrapper for a GtkObject must always be created by the function

SV * gtk2perl_new_gtkobject (GtkObject * o);

This simplistic function basically looks like this:

{
SV * wrapper = gperl_new_object (G_OBJECT (o), FALSE);	/* will ref the object */
gtk_object_sink (o);
return wrapper;
}

The combination of the ref, the sunken floating ref, and G::Object::DESTROY will always do the right thing.

It's also useful to note that creation is the only difference between GtkObject wrappers and GObject wrappers. You still use gperl_get_object to get the object pointer out of the wrapper, and you still use all the other class registration and isa manipulation utilities.

It's also important to know that this is largely done for you by the typemap.

Typemap scheme

In the same way that the G module uses explicit one-to-one GType to package registrations, I decided it was most foolproof to use an explicit, even exhaustive XS typemap. In this way I could avoid problems such as finding the proper set of regexes to map $var to the type macro and all sort of other problems of extensibility. This of course means it must be autogenerated, but that's easy.

The other main feature of the typemap is that it masks in a very sensible way the differences between GObject and GtkObject, and makes it very easy to specify whether a wrapper owns the object it wraps. This is handled through the idea of a "variant", which is a term I made up just now because it sounds about right.

Basically, a variant is the name of the class with some suffix. For example, for the a GBoxed subclass such as GdkEvent, a header would do this:

typedef GdkEvent GdkEvent_ornull;
typedef GdkEvent GdkEvent_own;

#define SvGdkEvent(s)           (gperl_get_boxed_check ((s), GDK_TYPE_EVENT))
#define SvGdkEvent_ornull(s)    ((s)==&PL_sv_undef ? NULL : SvGdkEvent(s))

#define newSVGdkEvent(e)        (gperl_new_boxed ((e), GDK_TYPE_EVENT, FALSE))
#define newSVGdkEvent_own(e)    (gperl_new_boxed ((e), GDK_TYPE_EVENT, TRUE))
#define newSVGdkEvent_ornull(e) (e == NULL ? &PL_sv_undef ? newSVGdkEvent (e))

Then the typemap entries for its various variants would look like this:

TYPEMAP
GdkEvent *	T_GDK_TYPE_EVENT 
GdkEvent_ornull *	T_GDK_TYPE_EVENT_ORNULL
GdkEvent_own *	T_GDK_TYPE_EVENT_OWN

INPUT 
T_GDK_TYPE_EVENT
	$var = SvGdkEvent ($arg);
T_GDK_TYPE_EVENT_ORNULL
	$var = SvGdkEvent_ornull ($arg); 

OUTPUT
T_GDK_TYPE_EVENT
	$arg = newSVGdkEvent ($var); 
T_GDK_TYPE_EVENT_ORNULL
	$arg = newSVGdkEvent_ornull ($var); 
T_GDK_TYPE_EVENT_OWN
	$arg = newSVGdkEvent_own ($var);

And with that, your XS wrapper code can look as simple as this:

GdkEvent_own *
gdk_get_event (class)
        SV * class
    C_ARGS:
        /*void*/

guint
gdk_event_get_time (event)
        GdkEvent * event

Isn't that nice and simple?

The variants for the various types go like this:

GBoxed
/* no ext */	object will not be destroyed with wrapper
_own	object will be destroyed with wrapper
_copy	object will be copied (and copy will be owned)
_ornull	undef/NULL is legal
GObject
/* no ext */	object's refcount will be increased (=>not owned)
_noinc	object's refcount will not be increased (=>owned)
_ornull	undef/NULL is legal
GtkObject
/* no ext */	everything is peachy
_ornull	undef/NULL is legal

Obviously, this scheme calls for autogeneration for any number of classes larger than just four or five.

Autogeneration

Auto-generated code is known for having problems because it doesn't pay enough attention to special cases, but it's also great for situations in which a human writing or maintaining it would simply go insane with the tedium. Handling the magnitude of classes in Gtk is something like that' Here's a description of what gets autogenerated in Gtk2.

Oh yeah, the generation takes place in two places: the boot.xsh is created by code in Gtk2/Makefile.PL, and Gtk2/genstuff.pl, which is called by Gtk2/Makefile.PL, generates the rest.

maps

This is the starting point for autogeneration. This map file serves the purpose of a defs file in other binding packages; it is the input to the code generator. This map lists the TYPE macro for each of the GObject types in all of the gtk headers (including gdk, gdk-pixbuf, atk, and pango), along with the actual name of the class, name of the package into which it is to be blessed, and the base type (not exactly the fundamental type). Most of those should be obvious except for the base type. The base type is one of GEnum, GFlags, GBoxed, GObject, GInterface, or GtkObject. This is the important flag which determines what kind of code gets created for each record; the GtkObject wrapper must be created differently from the GObject wrapper, for instance.

In this file, you can change the explicit name of an object. If you don't like PangoFontDescription being Gtk2::Pango::FontDescription, you can change it to Gtk2::Pango::Font::Desc::ription if you were so inclined (but please don't).

I have a script called genmaps.pl that actually scans the gtk header files and creates and runs a small program to generate the maps file. The advantage here is that the type information comes directly from the code and I don't have to worry about clerical errors making the software incorrect. In practice, this should need to be run only when new classes are added to the base libraries.

gtk2perl-autogen.h

This file contains the typedefs and cast macros. This includes all the variant stuff described above.

gtk2perl.typemap

The exhaustive typemap uses the macros defined in gtk2perl-autogen.h so that you are assured to get the same results from typemap generated code as from hand-written perl stack manipulation.

register.xsh

Included from the boot code of the toplevel Gtk2 module, this file lists all of the types in the maps file as a series of calls to the appropriate package registration functions (gperl_register_boxed or gperl_register_class). This is done before the boot code below so that hand-written code may override it. This code gets called when your program does a "use Gtk2".

boot.xsh

The Gtk2 module is made up of dozens of XS files but only one PM file. Gtk2.pm calls bootstrap on Gtk2, but not on any of the others (because it doesn't know about them). It is a module's boot code which binds the xsubs into perl, so it's imperative that the modules get booted!

So, Makefile.PL scans the xs/ subdirectory for all the MODULE = ... lines in the XS files. It maps these to boot code symbols, and generates code to call these symbols in boot.xsh, which is then included by the boot code for the toplevel module, right after register.xsh. (The generation code takes steps to avoid spitting out the same symbol more than once, and will not emit code to boot the toplevel module (or else you get an infinite loop).

Just a point of style; you can change packages in an XS file by repeating the MODULE = ... line with a different PACKAGE (and possibly PREFIX) value. It's a good idea, however, to keep the MODULE the same, so that only one boot symbol gets generated per file.

Why Inline Is Not the Way To Go for Gtk2-Perl

I couldn't have done this in the Inline framework. Trust me, I tried. I looked at this stuff for weeks, trying to figure out how to change the underlying code without breaking the whole shebang, and there just wasn't a feasible way. On the other hand, I'm writing this monster document only 7 days after starting the pure-XS bindings project.

I originally just wanted to play with reference counts, but the entire Inline build system got very much in my way. There was just too much infrastructure there; this file loads that file, which decides what the symbol prefix should be based on some hard-coded list of substitutions, and then it bootstraps this module, which has perl code that calls an XS wrapper to a C wrapper function that calls all sorts of helper functions before finally calling the actual Gtk function. I couldn't see the forest for the trees, and I found myself having to dig into the undocumented internals of things far too often.

So, I switched to pure XS for my sandbox. Using a combination of typemaps and castmacros (borrowed from gtk2-perl) serves to make the XS implementation very clean.

Basically, this code from the Inline version of gtk2-perl

SV* gtkperl_message_dialog_new (char* class, SV* parent, SV* flags, SV* type, SV* buttons, char* message)
{
    return gtk2_perl_new_object(gtk_message_dialog_new(SvGtkWindow_nullok(parent),
                                SvGtkDialogFlags(flags), SvGtkMessageType(type),
                                SvGtkButtonsType(buttons), message));
}

Becomes this in XS:

GtkWidget *
gtk_message_dialog_new (class, parent, flags, type, buttons, message)
        SV * class
        GtkWindow_ornull * parent
        GtkDialogFlags flags
        GtkMessageType type
        GtkButtonsType buttons
        gchar * message
    C_ARGS:
        parent, flags, type, buttons, message

Personally, I think the XS version is easier to figure out, quicker to use, and more maintainable in the long run. This works for the vast majority of functions, most of which don't even need the C_ARGS line. For others you have the complete XS back of tricks at your disposal: reducing code size by using the ALIAS keyword; using default parameters and vararg functions to make life easier on the perl side; and a wealth of documentation on the XS API both in pod and published paperback form.

I know there has been some reluctance on this list to look at XS because of its learning curve, but let me give you a laundry list of reasons that XS is the better choice for this project:

There are five reasons for using Inline, and this project ignores all of them.

To avoid the need to digging in perl's guts for simple things, or needing to learn the XS API. Every single Inline wrapper in the gtk2-perl tree takes a list of SV*s as args and converts them by hand. In very many cases, we need to manipulating array and reference values, resulting in direct use of the XS API. We're not hiding from it very well.
Storing your code Inline, that is, in the same source files, to avoid the need to write a complete, hard-to-install module. However, gtk2-perl splits the code out into separate source files (isn't that the reason why 0.44 is required and not 0.43?), completely sidestepping the original purpose and other major benefit of Inline.
Automate and hide the details of the XS build process. After building all the Inline modules in Makefile.PL, gtk2-perl copies all the generated xs files to another directory and builds them again. The link is done by a hand-written gcc command. Might as well just write XS code to begin with and compile it once.
Bind languages other than C to perl. We're writing C code.
Inline supports tweak, run, tweak, run development. Well, sort of, if you don't mind waiting a few minutes for it to recompile in the background.

Inline does indeed impose some runtime penalties, even in the precompiled and installed version; especially with the highly-indirected way in which the gtk2-perl module has been written. There are over 100 tiny *.pms, resulting in hundreds of disk hits just to load a small gtk2-perl script. The new-gtk2-perl stuff follows the ideology of gtk-perl and puts everything in one .so loaded from one .pm. (G.pm and G.so for the G module --- basically one pair per main package.)
It's really nasty hard to figure out the Inline build system. It took me a week of constant digging to understand the convoluted details of Inline's module loading, how to chain an extension from it, and why I couldn't get it to work; it took an hour and a half of reading gtk-perl to figure out the same thing. Inline simply gets in the way of development of project of this scale.
I started my XS-based project a week ago and already have over half of Gtk2 implemented (using code borrowed from gtk2-perl and pygtk).

Where to Go From Here

As stated above, I think the inline architecture stinks and we would greatly benefit from moving to XS. I have a mostly functional XS code tree sitting here, in my hands, which I intend to use from here on out because it fits my needs to extensibility. I want to continue contributing to the public gtk2-perl project, and I hereby give you all the code.

While it may seem rash to throw away the Inline tree in favor of another, I don't see it that way:

We still have the inline tree from which to pull a large amount of working code (if anything, it will be simplified in the transition)
It's still very early in the project (0.12 is the most recent stable release) and development has slowed even though the API is only about 60% covered. If there is any time to change the call signatures and break code, it is now.
It's very straightforward and easy to add new bindings. Ross McFarland got a hold of me late last week, before any document like this one existed, and spent about eight hours watching TV and implementing bindings for about 60 classes while I worked out the kinks in the underlying stuff. That's real proof that with this new XS architecture we can have the whole library up to 100% implemented in very little time.
The bindings are designed to be modular and extensible, so right now someone could start creating bindings for GConf, Bonobo, GtkHTML, etc, etc, with no changes to the Gtk2 module. The Gnome2 bindings could be ported in a matter of hours.
In life, you always throw away the first one. The Inline code was the first one. We learned a lot from it. I couldn't have done the XS version without having the Inline stuff as a teacher of good ideas and pitfalls.