<?xml version="1.0" encoding="US-ASCII"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
    <title>Whiteknight's Blog</title>
    
    <link href="http://whiteknight.github.com/" />
    <updated>2012-05-29T16:06:53-07:00</updated>
    <id>http://whiteknight.github.com/</id>
    <author>
        <name>Andrew Whitworth (Whiteknight)</name>
        <email>wknight8111@gmail.com</email>
    </author>

    
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/afwknight" /><feedburner:info uri="afwknight" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
        <title>IO Refactors</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/V4D3FNyxOy0/io_cleanup_first_round.html" />
        <updated>2012-05-27T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/05/27/io_cleanup_first_round</id>
        <content type="html">&lt;p&gt;The IO subsystem is a lot like the garbage collector: So long as it &lt;em&gt;just works&lt;/em&gt; we can ignore its faults for quite a long time. The garbage collector had performance and other issues for years before everybody&amp;#8217;s favorite bacek went through and finally rewrote it. His effort there saves the rest of us meer mortals from having to touch the GC again for another couple years.&lt;/p&gt;

&lt;p&gt;The IO system works reasonably well. It&amp;#8217;s got a decent set of features more or less, it implements most of the important operations that our users have needed in the past, it&amp;#8217;s not spectacularly slow (and disk or network operation performance almost always outweighs any issues in the code that leads to those things), and we haven&amp;#8217;t been getting a lot of error reports or feature requests for it. In short, if it ain&amp;#8217;t broke, don&amp;#8217;t fix it.&lt;/p&gt;

&lt;p&gt;A few days ago I was working on a ticket for moritz to add better integration between our various IO vector PMCs (FileHandle, Socket, etc) and the ByteBuffer PMC. ByteBuffer is what it&amp;#8217;s name implies: It&amp;#8217;s an array-like type for working with individual bytes in a chunk of memory. It&amp;#8217;s like a binary encoded STRING, but it&amp;#8217;s not immutable and has a handful of additional features that a raw STRING (or the String PMC) doesn&amp;#8217;t. ByteBuffer can be populated from and exported to a STRING, and it is useful for certain types of operations that need to operate on a sequence of bytes without having to worry about strings and encodings and all that other nonsense. Mortiz&amp;#8217;s request was a reasonable one so I sat down and made it happen. A few nights ago I merged that work in to master with an &amp;#8220;experimental&amp;#8221; tag on it.&lt;/p&gt;

&lt;p&gt;However while I was in the IO subsystem code making this happen something did break. Not in the code, instead something broke inside my poor little head. The snapping sound you hear is the poor camel&amp;#8217;s back under the load of that last piece of straw. I&amp;#8217;ve had enough of that system and its inside-out organization and collection of half-ideas and botched refactors. I&amp;#8217;ve had my fill of the nonsense and finally decided it was time to make things right.&lt;/p&gt;

&lt;p&gt;And before anybody says to me, &amp;#8220;hey Mr Whiteknight, you shouldn&amp;#8217;t be so mean, somebody probably worked really hard to make this code do what it does&amp;#8221;, let me just say two things: First, &amp;#8220;Mr Whiteknight&amp;#8221; is my father&amp;#8217;s handle and Second, &lt;em&gt;I was one of the people who helped put IO where it is today&lt;/em&gt;. I don&amp;#8217;t feel particularly bad insulting myself or my own work, and my contributions, though well-intentioned at the time, are a big part of why the system is in the condition its in now. First, a brief history lesson.&lt;/p&gt;

&lt;p&gt;When I joined Parrot, it sported an IO system based on layers. Layers were arranged in a structure something like a vtable, and IO requests would be fed through the layers. Each layer getting the output of the one before it until the bottom layer actually spat the data out (or, read it in depending on which way you were moving). This worked pretty well when you were trying to do File IO on a file with a particular encoding, with buffering, through an asychrony mechanism, etc. Actually I say it worked well but it was sort of overkill: It was just too much infrastructure for the possible benefits and despite the theory of allowing better code reuse there really weren&amp;#8217;t too many different layering combinations that could be set up. Plus, layers start to interdepend and violate encapsulation, then optimization starts prompting a few &amp;#8220;short cuts&amp;#8221; where layers were flattened together. One of the earlier things I did on Parrot, post-GSOC, was to remove some of the last vestiges of the then-unused layering system from Parrot&amp;#8217;s IO.&lt;/p&gt;

&lt;p&gt;The IO subsystem has something of a problem where it has a few masters and has to be performance conscious. Many of our programs are still the kind that shuffle data about (very much in the influence of Perl) and IO operation performance mattered when your compiler is reading in HLL code and outputting PIR code, then you&amp;#8217;re reading PIR code in and trying to compile it again. Too much nonsense and everybody feels it.&lt;/p&gt;

&lt;p&gt;In Parrot at the user level you can do IO in two ways: Through the IO PMCs (FileHandle, mostly) and through opcodes (&lt;code&gt;say&lt;/code&gt;, &lt;code&gt;print&lt;/code&gt;, etc). The problem, put succinctly, is this: We want to encapsulate logic for writing to files inside the FileHandle PMC, but we don&amp;#8217;t want to add new IO-specific VTABLES and we don&amp;#8217;t want to incur the costs of method calls on every single IO request. In other words, we didn&amp;#8217;t want the &lt;code&gt;print&lt;/code&gt; opcode to just be a thin wrapper around the &lt;code&gt;print&lt;/code&gt; method on FileHandle. Such a thing, especially if implemented naively, would have killed performance by creating nested runloops and a whole host of other problems.&lt;/p&gt;

&lt;p&gt;The way the system is set up is that both &lt;code&gt;FileHandle.print()&lt;/code&gt; and the &lt;code&gt;print&lt;/code&gt; opcode are both thin wrappers around the real routine &lt;code&gt;Parrot_io_putps&lt;/code&gt;, which does all the hard work. And, more importantly, that routine is expected to act transparently (like the &lt;code&gt;print&lt;/code&gt; opcode does) on any IO PMC type like Socket or StringHandle. The only real way to do this, if you can&amp;#8217;t call a method on the FileHandle and Socket PMC is to use a large switch-statement:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;switch (handle-&amp;gt;vtable-&amp;gt;base_type) {
    case enum_class_FileHandle:
        ...
    case enum_class_Socket:
        ...
    case enum_class_StringHandle:
        ...
    default:
        Parrot_pcc_invoke_method_from_c_args(..., handle, &amp;quot;print&amp;quot;, ...);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I&amp;#8217;ve obviously glossed over all the details, but this is the general form of that routine and several other similar routines in the IO API. You&amp;#8217;ll notice several things from even a quick glance at this example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If we want to add a new IO type to Parrot core we need to add a new entry to the switch statement in &lt;em&gt;every IO API routine that needs to care about PMC type&lt;/em&gt; (this is a major part of the reason we don&amp;#8217;t yet have a sane, separate Pipe type).&lt;/li&gt;

&lt;li&gt;If the user passes in an Object, something defined at the PIR level, we do fall back to calling the method, because we can&amp;#8217;t do anything else intelligently.&lt;/li&gt;

&lt;li&gt;We can&amp;#8217;t really subclass FileHandle or Socket from the user level, because it would fail the &lt;code&gt;base_type&lt;/code&gt; test, and wouldn&amp;#8217;t be able to handle the low-level structure accesses from that point forward anyway.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Point number 2 is particularly interesting because the &lt;code&gt;FileHandle.print()&lt;/code&gt; method calls &lt;code&gt;Parrot_io_putps&lt;/code&gt;, which may turn around and call the &lt;code&gt;.print()&lt;/code&gt; method. This is a big part of the reason why FileHandle cannot be subclassed in user code. It&amp;#8217;s clearly an example of poorly separated concerns and poor encapsulation. Either the method should call the IO API or the IO API should call the method but we can&amp;#8217;t be doing both. Actually, I&amp;#8217;d far prefer the former, if we can do it in a good, general way.&lt;/p&gt;

&lt;p&gt;There are a few other issues worth mentioning, which I&amp;#8217;ll just dump rapid-fire without much explanation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We don&amp;#8217;t have a separate Pipe type. Instead, FileHandle can be opened in &amp;#8220;pipe mode&amp;#8221; to write to a separate process or read output from a separate process.&lt;/li&gt;

&lt;li&gt;We have limited buffering, but only on FileHandle and we cannot configure buffers for input and output separately, or use separate buffers.&lt;/li&gt;

&lt;li&gt;We don&amp;#8217;t really have encodings set up in any consistent way, so it&amp;#8217;s very possible, though I haven&amp;#8217;t worked out all the details, to write strings with different encodings to a file. This is especially true if we&amp;#8217;re using buffers and performing writes through different API routines.&lt;/li&gt;

&lt;li&gt;FileHandle logic is considered to be the default and is given deference in the code. Pipe logic is unified with file logic at a very low level. Socket and StringHandle are treated as bolted-on spare parts and don&amp;#8217;t benefit from hardly any code sharing or unified architecture. They also don&amp;#8217;t have all the same useful features as FileHandle has.&lt;/li&gt;

&lt;li&gt;Several functions in the IO subsystem are poorly or inconsistently named and implemented, not to mention the often-times confusing documentation and absurd architectural arrangements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So that&amp;#8217;s the system we&amp;#8217;ve got. What do I want to do to fix these issues?&lt;/p&gt;

&lt;p&gt;The first thing I&amp;#8217;ve suggested is to break up IO functionality into an &lt;code&gt;IO_VTABLE&lt;/code&gt; of function pointers, similar to how the &lt;code&gt;STR_VTABLE&lt;/code&gt;, the sprintf dispatch mechanism, the packfile segment dispatch table and other similar mechanisms in Parrot work. Each IO request would go through the API routines, which dispatch to a vtable routine (possibly with some intermediate buffering logic). Here&amp;#8217;s what it looks like in the branch to do a basic write:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;IO_VTABLE * const vtable = IO_GET_VTABLE(interp, handle);
vtable-&amp;gt;write_s(interp, handle, str-&amp;gt;strstart, str-&amp;gt;bufused);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And here&amp;#8217;s how to do it with write buffering:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;IO_VTABLE * const vtable = IO_GET_VTABLE(interp, handle);
IO_BUFFER * const read_buffer = IO_GET_READ_BUFFER(interp, handle);
Parrot_io_buffer_write_s(interp, handle, vtable, buffer, str);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Internall, the buffer does it&amp;#8217;s magic and flushes data out to the vtable if necessary.&lt;/p&gt;

&lt;p&gt;The second thing I want to do is break out buffering so that instead of being a detail of the FileHandle PMC a buffer is a separate struct which can be attached to any IO type as desired. And, even better, we can attach multiple buffers to an IO stream, at least one each for input and output, configured separately. The buffering API, which will be cleaned up and properly encapsulated, will take a pointer to the &lt;code&gt;IO_VTABLE&lt;/code&gt; for the handle and will pass data through transparently as required. A thin wrapper PMC type, &lt;code&gt;IO_BUFFER&lt;/code&gt;, would allow references to buffers to be accessed and configured directly, which would be very useful in some cases.&lt;/p&gt;

&lt;p&gt;Imagine, if I may go off on a short tangent, a threaded system where one worker task had a reference to a buffer and continuously made sure it was filled in the background while another worker task read bits and pieces from the buffer very quickly. It would be possible, through careful choice of algorithm, to do such a thing lock-free. Feel free to replace &amp;#8220;file&amp;#8221; with &amp;#8220;socket&amp;#8221; or &amp;#8220;pipe&amp;#8221; in the example above too. Imagine also a system where we can transparently use &lt;code&gt;mmap&lt;/code&gt; (or it&amp;#8217;s windows equivalent) to map a file to memory as part of the buffer, and keep working with it that way.&lt;/p&gt;

&lt;p&gt;The third thing I want to do is start teasing apart the logic for Pipes from the file logic. I&amp;#8217;ll create a separate &lt;code&gt;io_vtable&lt;/code&gt; for pipe operations, and use that inside FileHandle when we&amp;#8217;re in pipe mode. Eventually we&amp;#8217;ll be able to create a separate type, divide out all the logic completely, and get to work on really interesting stuff like feasible 2-way and 3-way pipes.&lt;/p&gt;

&lt;p&gt;The fourth thing I want to do is start setting up interfaces so that IO operations including buffering, low-level IO, file descriptor manipulation and other things become more accessible at the PIR level so users can make better use of these tools, both in subclasses of the in-built handle PMCs and in custom types which neither derive from nor hold instances of those types.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve started sketching out many of these ideas in the &lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt; branch. cotto seems to agree with the general direction and I haven&amp;#8217;t heard any complaints so far, so I&amp;#8217;ve had my head down and been working hard on making these ideas reality. As of this writing, I&amp;#8217;ve modified just about every single line of code in the subsystem, gotten most of the new architecture and logic into place and set up the vtables for the most important built-in types. I have a few details to finish up before I try to build (and inevitably debug) this new beasts. Ultimately I would like this first round of cleanups to produce no user-visible changes, so the old PMC methods and exported API functions are going to continue doing what they&amp;#8217;ve always done. Later rounds of cleanups will add new interfaces and eventually deprecate and remove some of the crufty older ones. I&amp;#8217;ll post more updates as this work progresses.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/V4D3FNyxOy0" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/05/27/io_cleanup_first_round.html</feedburner:origLink></entry>
    
    <entry>
        <title>Destructors are Hard</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/3rcWz4_4zGw/destructors_are_hard.html" />
        <updated>2012-05-23T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/05/23/destructors_are_hard</id>
        <content type="html">&lt;p&gt;&amp;#171;&amp;#171;&amp;lt;&amp;#171;&amp;#160;HEAD:drafts/gc_destructors.md In my last post I mentioned some of the work I was trying to do with GC finalization and destructors. I promised I would publish a longer and more in-depth post about destructors, what the current state is, what I am doing,&lt;/p&gt;

&lt;h1 id='and_what_still_needs_to_be_done'&gt;and what still needs to be done.&lt;/h1&gt;

&lt;p&gt;In &lt;a href='/2012/05/20/pending_branchwork.html'&gt;my last post&lt;/a&gt; I mentioned some work involving the GC, finalization and destructors. Today I&amp;#8217;m going to expand on some of those ideas, talk about what the current state of destruction and finalization are in Parrot, some of the problems we have with coming up with better solutions, and some of the things I and others are working on to get this all working as our users expect us to. I apologize in advance for such a long post, there&amp;#8217;s a lot of information to share, and hopefully a much larger architectural discussion to be started.&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;3d0a502723cb7124eb717d7e82bac5ecc567ac31:_posts/2012-05-23-destructors_are_hard.md&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;Destructors are hard. The idea behind a destructor is a simple one: We want to have a piece of code that is guaranteed to execute when the associated object is freed. Memory allocated on the heap is going to get reclaimed en masse by the operating system when the process exits. However, things such as handles, connections, tokens, mutexes, and other remote resources might not necessarily get freed or handled correctly if the process just exits, or if the object is destroyed without some sort of finalization logic performed on it. Here&amp;#8217;s a sort of example that&amp;#8217;s been bandied about a lot recently:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function main () {
    var f = new &amp;#39;FileHandle&amp;#39;;
    f.open(&amp;quot;foo.txt&amp;quot;, &amp;quot;w&amp;quot;);
    f.print(&amp;quot;hello world&amp;quot;);
    exit(1);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this example we would expect that the text &lt;code&gt;&amp;quot;hello world&amp;quot;&lt;/code&gt; would be written to the &lt;code&gt;foo.txt&lt;/code&gt; file. However, because the text to be written may be buffered (both in Parrot and by the OS), there&amp;#8217;s a very real chance that the data won&amp;#8217;t get written if we do not call the finalizer for the &lt;code&gt;FileHandle&lt;/code&gt; PMC.&lt;/p&gt;

&lt;p&gt;Obviously, the brain-dead solution to this particular problem is to manually close or flush the file handle:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function main () {
    var f = new &amp;#39;FileHandle&amp;#39;;
    f.open(&amp;quot;foo.txt&amp;quot;, &amp;quot;w&amp;quot;);
    f.print(&amp;quot;hello world&amp;quot;);
    f.close();
    exit(1);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;However, the whole point of having things like finalizers (&amp;#8220;destructors&amp;#8221;) and GC is to make it so that the programmer does not need to worry about little details like these. The program should be smart enough to find dead objects in a timely manner and free their resources. Beyond that, many programming languages (with special emphasis on Perl6) require the availability of reliable and sane destructors.&lt;/p&gt;

&lt;p&gt;In the remainder of this post I would like to talk about why destructors are hard to implement correctly, why Parrot does not currently (really) have them, and some of the ideas we&amp;#8217;ve been kicking around about how to add them.&lt;/p&gt;

&lt;p&gt;First, let&amp;#8217;s cover where we currently stand. Parrot does have destructors, of a sort, in the form of the &lt;code&gt;destroy&lt;/code&gt; vtable. That routine is called by the GC when the object is being reclaimed, during the sweep pass. A side-effect of this implementation is that if PMC &lt;code&gt;A&lt;/code&gt; refers to PMC &lt;code&gt;B&lt;/code&gt; and both are being collected, it&amp;#8217;s very possible that &lt;code&gt;A&lt;/code&gt;&amp;#8217;s destructor tries to access some information in &lt;code&gt;B&lt;/code&gt; &lt;em&gt;after &lt;code&gt;B&lt;/code&gt; has already been reclaimed&lt;/em&gt;. Think about a database connection object that maintains a socket on one side, and a hash of connection information on the other. The socket probably cannot perform a simple disconnect, but instead should send some sort of sign-off message first to alert the server that it can proceed with its own cleanup. The socket PMC would need information from the connection information hash to send this final message, but if the hash had already been reclaimed the access would fail with undefined results.&lt;/p&gt;

&lt;p&gt;This situation has lead to more than a few calls for ordered destruction. In one of the most common and severe cases, Parrot&amp;#8217;s Scheduler PMC was being relied upon by various managed PMCs. When a Task PMC was destroyed, at least in earlier iterations of the system, it would attempt to send a message to the Scheduler that it was no longer available to be scheduled. Ignore for a moment the fact that the Task could not possibly have been reclaimed in the first place if the Scheduler had a live reference to it, and if the Scheduler was still alive itself.&lt;/p&gt;

&lt;p&gt;Because of some of these order-of-destruction bugs, GC finalization (a final, all-encompassing GC sweep path guaranteed to execute all remaining destructors prior to program exit) had been turned off. That and performance reasons. Turning off GC finalization leads to the problem above where data written to the FileHandle is not not flushed before program exit. You are probably now starting to understand the bigger picture here.&lt;/p&gt;

&lt;p&gt;Having ordered destruction means essentially that we should be able to have an acyclic dependency graph of all objects in the system with destructors. However, maintaining this in the general case is impossible and attempting to approximate it would be very expensive in terms of performance. In any case, this is just a way to work around the problem of our naive sweep algorithm, which destroys and frees dead objects in a single pass, and not a real solution to the larger problems. A far better idea, recently suggested by hacker Moritz, is a 2-pass GC sweep.&lt;/p&gt;

&lt;p&gt;In the 2-pass case the GC sweep phase would have two loops: the first to identify all PMCs to be swept (from a linear scan of the entire memory pool), execute destructors on them and add them all to a list, and the second to iterate over that list (after all destructors had been called) and reclaim the memory. Because of the linked-list setup of the GC, this second pass could, conveivably, be almost free because we could simply append this list of swept items to the end of the free list for an &lt;code&gt;O(1)&lt;/code&gt; operation , and the first pass would be no less friendly on the processor data cache than our current sweep would be. This, in theory, solves our problem with ordered destruction, and should allow us to re-enable GC finalization globally without having to worry about these kinds of bugs causing segfaults in the final milliseconds of a program.&lt;/p&gt;

&lt;p&gt;So that&amp;#8217;s the basics of our current system and our problem with GC finalization, and shows us how we would proceed to make sure destructors were always called as a guarantee of the VM. However, this doesn&amp;#8217;t begin to address any of the problems with destructors that will plague their implementation and improvement. I&amp;#8217;ll talk about that second subject now.&lt;/p&gt;

&lt;p&gt;Destructors, as I said earlier, are hard. In the case of GC finalization, after the user program has executed and exited, it&amp;#8217;s relatively easy to loop over all objects and call destructors. It is those destructors which happen during normal program execution that cause problems.&lt;/p&gt;

&lt;p&gt;In the C++ language, destructors have certain caveats and limitations. For instance, we can&amp;#8217;t really throw exceptions from destructors, because that may crash the program. Not just an &amp;#8220;oops, here&amp;#8217;s an exception for you to handle&amp;#8221;, but instead a full-on crash. In Parrot we can probably be smarter about avoiding a crash but not by much. It&amp;#8217;s a limitation of the entire destructors paradigm. Let me demonstrate what I&amp;#8217;m talking about.&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s say I have this program, which opens up a filehandle to write a message and then starts doing something unrelated to the filehandle but expensive with GC:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function foo() {
    var f = new &amp;#39;FileHandle&amp;#39;;
    f.open(&amp;quot;foo.txt&amp;quot;, &amp;quot;w&amp;quot;);
    f.write(&amp;quot;hello world!&amp;quot;);
    f = null;       // No more references to f!

    for (int j = 0; j &amp;lt; 1000000; j++) {
        var x = new MyObject(j);
        x.DoSomething();
        x.DoSomethingElse();
        x.DoOneLastThing();
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Somewhere along the line, when the GC runs out of space, it&amp;#8217;s going to attempt a sweep and that means that &lt;code&gt;f&lt;/code&gt; is going to be identified as unreferenced, finalized and reclaimed. The question is, where? The thing is that we don&amp;#8217;t know where GC is going to run for a few reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We don&amp;#8217;t know how many free headers GC has left in the free list before it has to sweep to find more.&lt;/li&gt;

&lt;li&gt;We don&amp;#8217;t know how many PMCs are being allocated per loop iteration, because the various methods on &lt;code&gt;x&lt;/code&gt; could be allocating them internally, and all PCC calls currently generate at least one PMC, and this is a lot of pressure.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So at any point in that loop, any point at all, GC could execute and reclaim the FileHandle &lt;code&gt;f&lt;/code&gt;. That calls the destructor, flushes the output, and frees the handle back to the operating system. Good, right? What if there is a problem closing that handle, and the destructor for FileHandle tries to throw an exception (or, if this example isn&amp;#8217;t stoking your imagination, imagine that &lt;code&gt;f&lt;/code&gt; is an object of type &lt;code&gt;MyFileHandleSubclass&lt;/code&gt; with an error-prone finalizer).&lt;/p&gt;

&lt;p&gt;There are a few options for what to do here. The first option is that we throw the exception like normal. This means that the loop code with the &lt;code&gt;MyObject&lt;/code&gt; variables, which is running perfectly fine and has no reason to throw an exception by itself, is interrupted in mid loop. The backtrace, if we provide one at all, probably points to &lt;code&gt;MyObject&lt;/code&gt; but with an exception type and exception message indicative of a failed FileHandle closing. Initial review by the poor developer doing the debugging will show that there are no filehandles trying to close inside this loop and then we get a bug report because a snippet of code which is running just fine exits abruptly with an error condition which it did not cause. The solution for this, wrapping every single line of code you ever write in exception handlers to catch the various possible exceptions thrown from GC finalizers, is untenable from a developer perspective.&lt;/p&gt;

&lt;p&gt;A second option is that we somehow disallow things like exceptions from being thrown from destructors, because there&amp;#8217;s no real way to catch them rationally. This seems reasonable, until we start digging into details. How do we disallow these, by technical or cultural means? And if we&amp;#8217;re relying on cultural means (a line in a .html document somewhere that says &amp;#8220;don&amp;#8217;t do that, and we won&amp;#8217;t be responsible if you do!&amp;#8221;), what happens if a hapless young programmer does it anyway without having first read all million pages of our hypothetical documentation? Does Parrot just crash? Does it enter into some kind of crazy undefined state? Obviously we would need some kind of technical mechanism to prevent bad things from happening in a destructor, though the list of potentially bad things is quite large indeed (throwing exceptions, allocating new PMCs, installing references to dead about-to-be-swept objects into living global PMCs, etc) and filtering these out by technical means would be both difficult and taxing on performance. When you consider that even basic error reporting operations at an HLL level, depending on syntax and object model used, may cause a string to be boxed into a PMC, or a method to be called requiring allocation of a PMC for the PCC logic, or whatever, we end up with finalizers which are effectively useless.&lt;/p&gt;

&lt;p&gt;A third option is that we could just ignore certain requests in finalizers, such as throwing exceptions. If an exception is thrown at any point we just pack up shop, exit the finalizer and pretend it never happened. This works fine for exceptions, but does nothing for the problem of a finalizer attempting to store a reference to the dieing object into a living object. I don&amp;#8217;t know why a programmer would ever want to do that, but if it&amp;#8217;s possible you can be damned sure it will happen eventually. Also, when I say &amp;#8220;pack up shop&amp;#8221;, we&amp;#8217;re probably talking about a &lt;code&gt;setjmp&lt;/code&gt;/&lt;code&gt;longjump&lt;/code&gt; call sequence, which isn&amp;#8217;t free to do.&lt;/p&gt;

&lt;p&gt;The general consensus among developers is that errors caused by programs running on top of Parrot should never segfault. If you&amp;#8217;re running bytecode in a managed environment, the worst that you should ever be able to get is an exception. Segmentation faults should be impossible to get from a pure-pbc example program.&lt;/p&gt;

&lt;p&gt;However, as soon as you introduce destructors, suddenly these things become possible. And not just from specifically malicious code, even moderately naive code will be able to segfault by storing a reference to a dieing PMC in a place accessible from live PMCs. Unless, that is, we try to do something expensive like filtering or sandboxing, which would absolutely kill performance.&lt;/p&gt;

&lt;p&gt;And this point I keep bringing up about dead objects installing references to themselves in living objects is not trivial. Our whole system is built around the premise that objects which are referenced are alive and objects which are no longer referenced can be reclaimed by GC. Throughout most of the system we dereference pointers as if they point to valid memory locations or to live PMCs. If we turn that assumption around and say that dead objects may still be referenced by the system, then we lose almost all of the benefits that our mark and sweep GC has to offer. Specifically we would either have to install tests for &amp;#8220;liveness&amp;#8221; around &lt;em&gt;every single PMC pointer access&lt;/em&gt;, which would bring performance to a standstill. Otherwise, we need to have a policy that says the user at the PIR level is able to create segfaults without restriction, though officially we declare it to be a bad idea. It&amp;#8217;s not just a matter of having to test PMCs to make sure they are alive, the memory could be reclaimed and used for some other purpose entirely! Meerly accessing a reclaimed PMC could cause problems (segfaults, etc) or, if the PMC has already been recycled into something like a transparent proxy for a network resource, send network requests to do things that you don&amp;#8217;t want to have happen! The security implications are troubling &lt;em&gt;at best&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The only real solution I can come up with to this problem, and it&amp;#8217;s not a very good one, is to add a &amp;#8220;purgatory&amp;#8221; section to the GC, where we put PMCs during GC sweep but we do not actually free them. The next time GC runs, anything which is still in purgatory is clearly not referenced and can be freed immediately. Anything that is no longer in purgatory has been &amp;#8220;resurrected&amp;#8221; by some shenanigans and has to be treated as still being alive &lt;em&gt;even though its destructor has already been called&lt;/em&gt;. In other words, we take a performance hit and enable zombification in order to prevent segfaults. I don&amp;#8217;t know what we want to do here, this is probably the kind of decision best left to the architect (or tech-savvy clergy) but I just want to point out that none of our options are great.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve also brought up the problem with allocating new objects during a finalizer. Why is this a problem? Keep in mind that GC tends to execute when we&amp;#8217;ve attempted to allocate an object and have none in the free list. If we have no available headers on the free list, are already in the middle of a GC sweep and ask to allocate a new header, what do we do? Maybe we say that we invoke GC when we have only 10 items left (instead of 0) on the free list, guaranteeing that we always have a small number of headers available for finalization, though no matter what we set this limit at it&amp;#8217;s possible we could exhaust the supply if we have many objects to finalize with complex finalizers. Every time a finalizer calls a method or boxes a string, or does any of a million other benign-sounding things PMCs get allocated. If we try to allocate a PMC when there are no PMCs on the free list and we&amp;#8217;re already in the middle of GC sweep, the system may trigger another recursive GC run.&lt;/p&gt;

&lt;p&gt;Another option is that we could maintain multiple pools and only sweep one at a time. If one pool is being swept we could allocate PMCs from the next pool (possibly triggering a GC sweep in that second pool and needing to recurse into a second pool, etc). Maybe we allocate headers directly from malloc while we&amp;#8217;re in a finalizer, keep them in a list and free them immediately after the finalizer exits. We have some options here, but this is still a very &amp;#171;&amp;#171;&amp;lt;&amp;#171;&amp;#160;HEAD:drafts/gc_destructors.md real problem that requires very careful consideration. Something like a semi-space GC algorithm might help here, because we could allocate from the &amp;#8220;dead space&amp;#8221; before that space was freed.&lt;/p&gt;

&lt;p&gt;Or we could try to immediately free some PMCs during the first sweep pass, and use those headers as the free list from which to allocate during destructors. This raises some problems because it would be very difficult to identify PMCs which could be freed during the first pass without negating any references which are going to be accessed during the destructors. Also, we run into the (rare) occurance where all the PMCs swept during a particular GC run have destructors, and there are no &amp;#8220;unused&amp;#8221; headers to immediately free and&lt;/p&gt;

&lt;h1 id='recycle_for_destructors'&gt;recycle for destructors.&lt;/h1&gt;

&lt;p&gt;real problem that requires very careful consideration. Again, I don&amp;#8217;t have an answer here, just a long list of terrible options that need to be sorted according to the &amp;#8220;lesser of all evils&amp;#8221; algorithm.&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;3d0a502723cb7124eb717d7e82bac5ecc567ac31:_posts/2012-05-23-destructors_are_hard.md&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let&amp;#8217;s look at destructors from another angle. Obviously a garbage-collected system is supposed to free the programmer up from having to manually manage memory (at least) and possibly other resources as well. You make a mess and don&amp;#8217;t want to clean it yourself, the GC comes along after you and takes care of the things you don&amp;#8217;t wnat to do yourself. On one hand the argument can be made that if you really care about a resource being cleaned in a responsible, timely manner, that you call an explicit finalizer yourself and leaving those kinds of tasks to the finalizer is akin to saying &amp;#8220;I don&amp;#8217;t care about that object and whatever happens, happens.&amp;#8221; After all, if you can&amp;#8217;t throw an exception from a destructor and if the destructor is called outside normal program flow with no opportunity to report back even the simplest of success/failure conditions, it really doesn&amp;#8217;t matter from the standpoint of the programmer whether it succeeded or silently failed. Further, if the resource is sensitive, you don&amp;#8217;t clean it explicitly and Parrot later crashes and segfaults because some uninformed user created a zombie PMC reference, your destructor cannot and will not get called no matter what. If all sorts of things at multiple levels can go wrong and prevent your destructor from running, does it &lt;em&gt;really&lt;/em&gt; matter if the destructor gets called at all?&lt;/p&gt;

&lt;p&gt;Another viewpoint is that destructors don&amp;#8217;t need to be black-boxes, and we don&amp;#8217;t care if they have problems so long as they&amp;#8217;ve given a best effort to &amp;#171;&amp;#171;&amp;lt;&amp;#171;&amp;#160;HEAD:drafts/gc_destructors.md free the resources, those efforts have a decent expected chance of success, and they have an opportunity to log problems in case somebody has a few moments to spare reading through log files. After all, if a FileHandle fails to close in an automatically-invoked destructor, it also would have failed to close in a manually-invoked one and what are you going to do about it? If the thing won&amp;#8217;t close, it won&amp;#8217;t close. You can either log the failure and keep going with your program (like our destructor would have done automatically) or you can raise hell and possibly terminate the program (like what &lt;em&gt;could&lt;/em&gt; happen if an exception is thrown from a destructor). In other words, when you&amp;#8217;re talking about failures related to basic resources at the OS level, there aren&amp;#8217;t many good options when you&amp;#8217;re writing programs at the Parrot level.&lt;/p&gt;

&lt;p&gt;I suspect that what we are going to end up with is a system where we allocate a temporary managed pool of PMCs to be available, and allocate all PMCs during a destructor from that pool. After GC, we clear the emergency pool at once. This solution adds a certain amount of complexity to the GC and also does nothing to deal with the zombie references problem I&amp;#8217;ve mentioned several times. We&amp;#8217;d have to make a stipulation that PMCs allocated during a destructor &lt;em&gt;may not&lt;/em&gt; themselves have automatic destructors.&lt;/p&gt;

&lt;p&gt;Things start to get a little bit complicated no matter what path we choose. This is the kind of issue where we&amp;#8217;re going to need lots more input,&lt;/p&gt;

&lt;h1 id='especially_from_our_users'&gt;especially from our users.&lt;/h1&gt;

&lt;p&gt;free the resources and they have an opportunity to log problems in case somebody has a few moments to spare reading through log files. After all, if a FileHandle fails to close in an automatically-invoked destructor, it also would have failed to close in a manually-invoked one and what are you going to do about it? If the thing won&amp;#8217;t close, it won&amp;#8217;t close. You can either log the failure and keep going with your program (like our destructor would have done automatically) or you can raise hell and possibly terminate the program (like what &lt;em&gt;could&lt;/em&gt; happen if an exception is thrown from a destructor). In other words, when you&amp;#8217;re talking about failures related to basic resources at the OS level, there aren&amp;#8217;t many good options when you&amp;#8217;re writing programs at the Parrot level. If you&amp;#8217;re not so hot at OS administration, there might not be anything you can do no matter what.&lt;/p&gt;

&lt;p&gt;In Parrot we really want to enable PMC destruction and GC finalization. As things stand now you can run &lt;code&gt;destroy&lt;/code&gt; vtables written in C, usually without issue. However when we expose this functionality to the user we are talking about executing PBC, in a nested runloop (at least one!), with fresh allocations and all the capabilities of PBC at your disposal. As soon as you open that can of worms, the many problems and problematic possibilities become manifest. The security concerns become real. The performance implications become real. I&amp;#8217;m not saying that these are problems we can&amp;#8217;t solve, I&amp;#8217;m only pointing out that they haven&amp;#8217;t been solved already because they are hard problems with real trade-offs and some tough (and unpopular) decisions to be made.&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;3d0a502723cb7124eb717d7e82bac5ecc567ac31:_posts/2012-05-23-destructors_are_hard.md&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/3rcWz4_4zGw" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/05/23/destructors_are_hard.html</feedburner:origLink></entry>
    
    <entry>
        <title>Pending Branchwork</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/F2eanBpmZIM/pending_branchwork.html" />
        <updated>2012-05-20T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/05/20/pending_branchwork</id>
        <content type="html">&lt;p&gt;As I promised in my last post, I have several branches up in the air that need to be worked on. Some branches merged last week after the release. Others are pending to merge soon and some are still in development. In this post I&amp;#8217;m going to give a short summary of these things, since I haven&amp;#8217;t been posting regular updates like normal.&lt;/p&gt;

&lt;h3 id='already_merged'&gt;Already Merged&lt;/h3&gt;

&lt;p&gt;After the release last week I merged three small branches that brought small changes and appeared to test cleanly with NQP and Rakudo. In short, these were uncontroversial.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;whiteknight/gh_675&lt;/code&gt; named after the &lt;a href='https://github.com/parrot/parrot/issues/675'&gt;Github Issue of the same name&lt;/a&gt;, this branch removed the &lt;code&gt;can&lt;/code&gt; vtable. In all cases in core and in external projects where I looked, the &lt;code&gt;can&lt;/code&gt; vtable was simply a redirect to the &lt;code&gt;find_method&lt;/code&gt; vtable and a check for null. There&amp;#8217;s no need for this added indirection, we can call the &lt;code&gt;find_method&lt;/code&gt; VTABLE directly from &lt;code&gt;can&lt;/code&gt; opcode.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;whiteknight/imcc_file_line&lt;/code&gt; This branch removed some very old, long-deprecated IMCC directives. The &lt;code&gt;.line&lt;/code&gt; and &lt;code&gt;.file&lt;/code&gt; directives were not poorly implemented (as far as IMCC goes) but they weren&amp;#8217;t used and weren&amp;#8217;t introspectable. The &lt;code&gt;setline&lt;/code&gt; and &lt;code&gt;setfile&lt;/code&gt; directives (yes, they are directives even though they looked like opcodes!) weren&amp;#8217;t used anywhere and weren&amp;#8217;t implemented well. I&amp;#8217;ve removed all four. Now, we can use the &lt;code&gt;.annotate&lt;/code&gt; directive to replace all of these and add other metadata besides in a way that is easy to introspect from within running bytecode.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;whiteknight/remove_cmd_ops&lt;/code&gt; removed a few command-line arguments from the parrot executable which were non-functional. These arguments have been disconnected since the time of the IMCC API cleanups months ago, and nobody had even noticed. Now they&amp;#8217;re gone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those things out of the way, here&amp;#8217;s a list of some of the branches that are currently unmerged but may be merging soon.&lt;/p&gt;

&lt;h3 id='id1'&gt;&lt;code&gt;eval_pmc&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;This is one of the most disruptive branches I&amp;#8217;ve got going, which is why I&amp;#8217;m in no hurry to merge it. Before I can merge it I need to patch both NQP and Rakudo. I submitted patches for these but they weren&amp;#8217;t ready to apply and I have to go back and re-do them.&lt;/p&gt;

&lt;p&gt;This branch removes the deprecated &lt;code&gt;Eval&lt;/code&gt; PMC. The &lt;code&gt;IMCCompiler&lt;/code&gt; PMC has already been updated to use a PDD31-compliant interface, which returns a &lt;code&gt;PackfileView&lt;/code&gt; PMC instead of an &lt;code&gt;Eval&lt;/code&gt;. NQP and Rakudo need to be updated to use this new interface instead of the older &lt;code&gt;VTABLE_invoke&lt;/code&gt; one. This update will work in the Parrot master branch just fine, so we can make those updates to NQP and Rakudo and test them thoroughly before we merge the &lt;code&gt;eval_pmc&lt;/code&gt; branch in.&lt;/p&gt;

&lt;h3 id='id2'&gt;&lt;code&gt;remove_sub_flags&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;This is a much bigger and much more disruptive branch. However, because of the fact that NQP and Rakudo don&amp;#8217;t really use subroutine flags for their control flow, those two projects won&amp;#8217;t really be affected as much as everybody else will be.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;remove_sub_flags&lt;/code&gt; branch removes the &lt;code&gt;:load&lt;/code&gt; and &lt;code&gt;:init&lt;/code&gt; flags from the PIR syntax and replaces them with &lt;code&gt;:tag&lt;/code&gt;. The only real way to work with &lt;code&gt;:tag&lt;/code&gt; is through the &lt;code&gt;PackfileView&lt;/code&gt; PMC, so we need to merge the &lt;code&gt;eval_pmc&lt;/code&gt; branch into Parrot first before we can make any further progress on this one. This is a back-burner task and will probably not be touched before the end of the summer.&lt;/p&gt;

&lt;h3 id='id3'&gt;&lt;code&gt;whiteknight/gc_finalize&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;We&amp;#8217;ve received some requests from Rakudo folks that we need to start getting serious about GC finalization. This involves two changes: First is setting the GC to perform a finalization sweep at interp exit, which it currently is not doing. The second is to fix some sweep-related behaviors so the &lt;code&gt;destroy&lt;/code&gt; VTABLE can be much more sane and useful.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;whiteknight/gc_finalize&lt;/code&gt; branch does both of these things. First, it re-enables GC finalization which had been turned off for so long that the code for it no longer works in master. Second, it moves to a two-stage sweep algorithm, so that we execute all &lt;code&gt;destroy&lt;/code&gt; vtables first before we start freeing any resources.&lt;/p&gt;

&lt;p&gt;There are still going to be problems with &lt;code&gt;destroy&lt;/code&gt; vtables however, and I&amp;#8217;m searching for solutions to these. Let me illustrate with a short example. We call GC to sweep typically in response to a request for a new PMC when we have none on the free list. If we have an item on the freelist, we return that immediately and very quickly. If not, we invoke GC to try and free up some headers (or allocate new ones from the OS).&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s say we&amp;#8217;re programming in Rakudo Perl6 and we have an object with a destructor. For the purposes of our example, it&amp;#8217;s a DB connection object. That destructor needs to call a method on a Socket object connecting the client program to the server. As everybody should be aware of now, calling a method in Parrot itself is going to allocate a CallContext PMC.&lt;/p&gt;

&lt;p&gt;However, we run into a small problem because we&amp;#8217;re in GC &lt;em&gt;because&lt;/em&gt; we&amp;#8217;re out of PMCs to allocate. So if we try to allocate a new PMC at this point I don&amp;#8217;t know exactly what will happen but I can only imagine that the results would not be good. At the worst case, we recursively call into GC which goes back to sweeping, which re-executes finalizers, and we get into an infinite loop.&lt;/p&gt;

&lt;p&gt;I won&amp;#8217;t go into all the details here, I&amp;#8217;ve got another (long) post drafted that discusses these and some other issues related to finalization. This &lt;code&gt;whiteknight/gc_finalize&lt;/code&gt; branch solves some of the first few problems but there will be more to come after that.&lt;/p&gt;

&lt;h3 id='id4'&gt;&lt;code&gt;whiteknight/gh_663&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;singleton&lt;/code&gt; designator for C-level PMCs has been deprecated for some time now, and the &lt;code&gt;whiteknight/gh_663&lt;/code&gt; branch intends to remove them.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s how singletons work in Parrot: The &lt;code&gt;get_pointer&lt;/code&gt; and &lt;code&gt;set_pointer&lt;/code&gt; vtables are used to manage a single reference to an existing singleton PMC if any. To get the PMC, we invoke the &lt;code&gt;get_pointer&lt;/code&gt; vtable &lt;em&gt;without an invocant PMC&lt;/em&gt; (the only such occurance of a vtable invoked without an existing PMC reference in the whole codebase that I am aware of). If it returns NULL, a new header is created. If the new header is created, the &lt;code&gt;set_pointer&lt;/code&gt; vtable is called on the new object with itself as an argument.&lt;/p&gt;

&lt;p&gt;This all happens inside &lt;code&gt;Parrot_pmc_new&lt;/code&gt; and is mostly transparent, except for the few bits of code throughout the system which violate this (rather flimsy) encapsulation boundary.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;get_pointer&lt;/code&gt; and &lt;code&gt;set_pointer&lt;/code&gt; vtables operate on &lt;code&gt;void*&lt;/code&gt; pointers, so we even lose typesafety. Plus, we don&amp;#8217;t expose &lt;code&gt;get_pointer&lt;/code&gt; or &lt;code&gt;set_pointer&lt;/code&gt; vtables to PIR code, so there&amp;#8217;s absolutely no way to create a singleton class at the user-level using this mechanism. You can do what users of all other languages do and create an accessor and restricted constructor and implement singletons that way. In fact, I think that&amp;#8217;s better.&lt;/p&gt;

&lt;p&gt;The majority of offending code has been ripped out of this branch, though I&amp;#8217;m still seeing some segfaults during the build as a result of bad, unchecked pointer accesses in places where encapsulation has been violoated. I&amp;#8217;ve got to spend a little bit more time tracking down some of these failures. Then, assuming NQP and Rakudo aren&amp;#8217;t relying on this mechanism, the merge should be relatively painless.&lt;/p&gt;

&lt;h3 id='id5'&gt;&lt;code&gt;whiteknight/gh_610&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;A while ago, moritz suggested that we improve integration of our ByteBuffer PMC type, especially with our FileHandle and Socket types. We should be able to read a sequence of raw bytes from either of those PMCs into a ByteBuffer and we should be able to write raw bytes from a ByteBuffer into either of those destinations too.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;whiteknight/gh_610&lt;/code&gt; aims to make this a reality. Already I&amp;#8217;ve done most of the code work to get this in place, though I haven&amp;#8217;t added all the necessary tests and documentation. Plus, a few coding standards tests are failing too.&lt;/p&gt;

&lt;p&gt;While looking at this code, I am reminded that the IO subsystem is kind of messy. I&amp;#8217;ve tried to clean it up in the past, and made a few small improvements over time. However, without a larger guiding vision to follow, I never really had a great idea of what kind of larger architectural changes to make to really bring this subsystem up out of the mud. After working on this branch, I finally had something like a flash of insight, and think I have a good idea about how to clean things up. This leads me to&amp;#8230;&lt;/p&gt;

&lt;h3 id='id6'&gt;&lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;My idea is a relatively simple one: All our IO operations are controlled by the various PMC types (FileHandle, Socket, StringHandle, etc), but all our IO API functions are currently implemented as ugly (and brittle) switch statements to pick between execution pathways for these different types. A far better idea would be to separate out the different logic behind a virtual function dispatch table (vtable).&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve written up some proposed changes in the &lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt; branch, and will start work if other people think it&amp;#8217;s a decent idea.&lt;/p&gt;

&lt;p&gt;The key points are as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Move all FileHandle-specific logic into src/io/filehandle.c. Do the same for Pipe, Socket and StringHandle types.&lt;/li&gt;

&lt;li&gt;Implement a new &lt;code&gt;io_vtable&lt;/code&gt; type, which will contain a dispatch table for common operations. Each one of the files created in #1 above will implement the routines for one &lt;code&gt;io_vtable&lt;/code&gt; and supporting logic.&lt;/li&gt;

&lt;li&gt;Buffering will be refactored. Instead of the FileHandle PMC containing several attributes for buffering, we&amp;#8217;ll instead use an &lt;code&gt;io_buffer&lt;/code&gt; object to hold buffering details. An encapsulated buffering API will take this buffer structure and the relevant vtable and automatically perform buffering if necessary.&lt;/li&gt;

&lt;li&gt;I am going to start separating out Pipe logic from FileHandle, though I&amp;#8217;m not planning to create a separate type for it quite yet.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once these things are done, I think the IO system will be much cleaner and much more hackable. This is lower priority right now until some of my ideas are vetted, but I&amp;#8217;m glad I finally have a plan in mind after so many years of staring helplessly at this code.&lt;/p&gt;

&lt;h3 id='id7'&gt;&lt;code&gt;whiteknight/sprintf_cleanup&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;The engine for our &lt;code&gt;sprintf&lt;/code&gt; implementation is sort of old and messy. It&amp;#8217;s some very functional and very stable code, but it needs to be brought up to date with our modern coding and organizational standards.&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;whiteknight/sprintf_cleanup&lt;/code&gt; branch I make several changes, most of which are entirely internal and should not affect users at all:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I move the files from &amp;#8216;src/misc.c&lt;code&gt; and &lt;/code&gt;src/spf_&lt;em&gt;.c&lt;code&gt; to
&lt;/code&gt;src/string/sprintf.c&lt;code&gt; and &lt;/code&gt;src/string/spf_&lt;/em&gt;.c&lt;code&gt; respectively.&lt;/code&gt;&lt;/li&gt;

&lt;li&gt;I&amp;#8217;ve cleaned up some header-file nonsense and created a new &lt;code&gt;src/string/spf_private.h&lt;/code&gt; header file to hold private data.&lt;/li&gt;

&lt;li&gt;I&amp;#8217;ve changed the code to use a StringBuilder instead of the older (and now-incorrect) repeated string concatenations. With immutable strings, each concat operation creates a new STRING instead of appending to the pre-allocated buffer, which is extremely wasteful. I haven&amp;#8217;t benchmarked this change, but I suspect it has higher performance on longer, more complicated formats.&lt;/li&gt;

&lt;li&gt;I&amp;#8217;ve fixed a sub-optimal error message at request of benabik in &lt;a href='https://github.com/parrot/parrot/issues/759'&gt;ticket #759&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This branch is almost complete and I&amp;#8217;ll probably merge it this weekend. Besides the text of the exception message, there are no visible user changes so it shouldn&amp;#8217;t be controversial at all.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/F2eanBpmZIM" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/05/20/pending_branchwork.html</feedburner:origLink></entry>
    
    <entry>
        <title>Parrot 4.4.0 Banana Fanna Fo Ferret</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/QKBRFa-Ke4Q/parrot_4_4_0.html" />
        <updated>2012-05-17T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/05/17/parrot_4_4_0</id>
        <content type="html">&lt;blockquote&gt;
&lt;p&gt;Its existence guarantees nothing in itself, and the catalytic or Promethean moment only occurs when one individual is prepared to cease being the passive listener to such a voice and to become instead is spokesman, or representative.&lt;/p&gt;

&lt;p&gt;But it&amp;#8217;s important to remember the many dreary years when the prospect of victory appeared quite unattainable. On every day of those years, the &amp;#8220;as if&amp;#8221; pose had to be kept up, until its cumulative effect could be felt.&lt;/p&gt;

&lt;p&gt;&amp;#8211; Christopher Hitchens, &lt;i&gt;Letters to a Young Contrarian&lt;/i&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On behalf of the Parrot team, I&amp;#8217;m proud to announce the 4.4.0 release of Parrot &amp;#8220;Banana Fanna Fo Ferret&amp;#8221;. &lt;a href='http://parrot.org/'&gt;Parrot&lt;/a&gt; is a virtual machine aimed at running all dynamic languages.&lt;/p&gt;

&lt;p&gt;Parrot 4.4.0 is available on &lt;a href='ftp://ftp.parrot.org/pub/parrot/releases/stable/4.4.0/'&gt;Parrot&amp;#8217;s FTP site&lt;/a&gt;, or by &lt;a href='http://parrot.org/download'&gt;following the download instructions&lt;/a&gt;. For those who want to hack on Parrot or languages that run on top of Parrot, we recommend &lt;a href='https://github.com/parrot'&gt;our organization page&lt;/a&gt; on GitHub, or you can go directly to the official Parrot Git repo on &lt;a href='https://github.com/parrot/parrot'&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Parrot 4.4.0 News:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;- Core
    + Most internal calls to libc exit(x) have been replaced with
      Parrot_x_* API calls or PARROT_FORCE_EXIT
- Documentation
    + &amp;#39;pdd31_hll.pod&amp;#39; made stable in &amp;#39;docs/pdds/&amp;#39;.
    + Updated main &amp;#39;README&amp;#39; to &amp;#39;README.pod&amp;#39;
    + Updated various dependencies, e.g., &amp;#39;lib/Parrot/Distribution.pm&amp;#39;.
    + Updated all &amp;#39;README&amp;#39; files to &amp;#39;README.pod&amp;#39; files.
    + Added &amp;#39;README.pod&amp;#39; files to top-level directories.
- Tests
    + Update various tests to pull from new &amp;#39;README.pod&amp;#39;
    + Updated &amp;#39;t/tools/install/02-install_files.t&amp;#39; to pull from new
      &amp;#39;README.pod&amp;#39;
- Community
- Platforms
- Tools
    + pbc_merge has been fixed to deduplicate constant strings and
      merge annotations segments&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Alvis Yardley (or a delegate) will release Parrot 4.5.0, the next scheduled monthly release, on June 16th 2012. Subsequent release managers are to be announced. A special thanks to our donors, contributors and volunteers for making this release possible.&lt;/p&gt;

&lt;p&gt;Enjoy!&lt;/p&gt;

&lt;p&gt;I haven&amp;#8217;t been doing enough blogging lately! On Tuesday I put out the 4.4.0 release of Parrot, &amp;#8220;Banana Fanna Fo Ferret&amp;#8221;. I figured it was a fun play on words. I added a little quote from a favorite writer of mine, Christopher Hitchens. Much of his writings can be pretty inflamatory, but I picked two quotes that related to historical struggles for social progress, and which when read in a certain light (and dramatically out of context) make sense for Parrot too.&lt;/p&gt;

&lt;p&gt;The release went off without a problem, and I&amp;#8217;ve got a few branches waiting in the environs to be merged. I&amp;#8217;m sure I&amp;#8217;ll talk about some of those projects if I can get back into a normal blogging rhythm again.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/QKBRFa-Ke4Q" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/05/17/parrot_4_4_0.html</feedburner:origLink></entry>
    
    <entry>
        <title>XML Is Hard</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/WKE7ujNe74c/xml_is_hard.html" />
        <updated>2012-04-28T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/04/28/xml_is_hard</id>
        <content type="html">&lt;p&gt;Last week I promoted the Parse and Json libraries in Rosella to stable status. For both those libraries I wrapped up a few outstanding TODO issues, wrote up some &lt;a href='/Rosella/libraries/json.html'&gt;website&lt;/a&gt; &lt;a href='/Rosella/libraries/parse.html'&gt;documentation&lt;/a&gt; and added a bunch of unit tests. I figured I would do the same thing for the XML library too. After all I had done the hard part: the first 90% of the library was the recursive descent parser which I had most of.&lt;/p&gt;

&lt;p&gt;So today I got to work on that library, trying to put together the last few bits so I could make the library stable. Like I said, I had about 90% of it done already. I spent the time today doing another 90%. I figure I only have about 90% left to go before I have a &amp;#8220;real&amp;#8221;, usable XML library. Somewhere a mathematician is reading this post and inventing new curse words, but nobody can hear him, because he has no friends.&lt;/p&gt;

&lt;p&gt;It turns out that XML is hard.&lt;/p&gt;

&lt;p&gt;Anybody can put together a little parser for XML-like tag syntax with attributes, text, and nested tags. That part is dirt simple, and I had that done in an hour or two. It&amp;#8217;s once you start getting into DTD declarations and schema validation that things get messy. Honestly, I don&amp;#8217;t think I can seriously call Rosella&amp;#8217;s XML library &amp;#8220;complete&amp;#8221; without those things. Or, not without most of them. I can probably get away with only the first 90% or so.&lt;/p&gt;

&lt;p&gt;So, what can Rosella&amp;#8217;s Xml library do today? Here is a sample of XML text that I can parse into a document object tree without problems:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;
&amp;lt;!DOCTYPE foo [
    &amp;lt;!ELEMENT foo (bar, baz)&amp;gt;
    &amp;lt;!ELEMENT bar ANY&amp;gt;
    &amp;lt;!ELEMENT baz (fie)&amp;gt;
    &amp;lt;!ELEMENT fie EMPTY&amp;gt;
    &amp;lt;!ATTLIST fie
                lol CDATA #REQUIRED
                wat CDATA #IMPLIED
                sux CDATA #FIXED &amp;quot;hello!&amp;quot;&amp;gt;
]&amp;gt;
&amp;lt;foo&amp;gt;
    &amp;lt;bar/&amp;gt;
    &amp;lt;baz&amp;gt;
        &amp;lt;fie lol=&amp;quot;laughing out loud&amp;quot; wat=&amp;quot;you talkin bout?&amp;quot; sux=&amp;quot;hello!&amp;quot;/&amp;gt;
    &amp;lt;/baz&amp;gt;
&amp;lt;/foo&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, if I want, I can jam all that schema nonsense into a separate file, and load it separately:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;!DOCTYPE foo SYSTEM &amp;quot;foo.dtd&amp;quot;&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Although I haven&amp;#8217;t integrated Rosella Net yet, to allow loading schemas from a URL. In code, I can do a few things:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var dx = new Rosella.Xml.Document();
dx.read_from_file(&amp;quot;foo.xml&amp;quot;);
dx.validate();
if (!dx.is_valid()) {
    for (string err in dx.errors)
        say(err);
}
dx.write_to_file(&amp;quot;newfoo.xml&amp;quot;);

var dtd = new Rosella.Xml.DtdDocument();
dtd.read_from_file(&amp;quot;foo.dtd&amp;quot;);
var errors = dtd.validate_xml(dx);
if (elements(errors) &amp;gt; 0) {
    for (string err in errors)
        say(err);
}
dtd.write_to_file(&amp;quot;newfoo.dtd&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That example shows us loading an XML document from a file and validating it with it&amp;#8217;s built-in rules from the &lt;code&gt;!DOCTYPE&lt;/code&gt; header. The second part shows us loading a separate DTD definition from a standalone file, and using that to validate the XML document too. In both cases, the validator runs through the document object and returns a whole list of error messages, not just a simple yes/no flag. In both cases, we can also re-serialize the XML and DTD documents back to string and then to file.&lt;/p&gt;

&lt;p&gt;So what is left to do? Well, for starters there&amp;#8217;s a bunch of syntax in the &lt;code&gt;!ELEMENT&lt;/code&gt; tag that I don&amp;#8217;t quite handle yet, such as quantifiers and alternations:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;!ELEMENT foo (bar*, (baz|bar), fie?)&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Parsing all that in a way that doesn&amp;#8217;t suck is not something I&amp;#8217;m looking forward to doing.&lt;/p&gt;

&lt;p&gt;Then in attribute lists, there&amp;#8217;s some syntax I don&amp;#8217;t deal with, such as enumerated values again:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;!ATTLIST foo bar (yes|no)&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The validator I&amp;#8217;ve implemented is pretty naive so far, and isn&amp;#8217;t set up to do quantifiers anyway. That&amp;#8217;s all going to take a while to do. We&amp;#8217;re doing some basic validation now, but nowhere near as much as we would expect from a full implementation.&lt;/p&gt;

&lt;p&gt;And keep in mind, even when I&amp;#8217;m done implementing (mostly) proper XML and DTD parsing, I could still go on to parse other schema languages like XSD which some applications might expect and even prefer. Maybe I could do something like XPath too, which would be very nice. I probably won&amp;#8217;t try to do XSLT though: I&amp;#8217;m still young and I would like to keep some of my sanity in reserve for my twilight years.&lt;/p&gt;

&lt;p&gt;My Json library is about 1300 lines of winxed code long, including whitespace. My Xml library is about 2400 lines of code long and still growing. Json is pretty easy (by design!), but XML is very hard. I&amp;#8217;m not going to push the Xml library to become stable any time soon, there&amp;#8217;s a hell of a lot of work left on it and I&amp;#8217;m not going to rush anything.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/WKE7ujNe74c" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/04/28/xml_is_hard.html</feedburner:origLink></entry>
    
    <entry>
        <title>Various Updates</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/UvfO829A1Lc/various_updates.html" />
        <updated>2012-04-25T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/04/25/various_updates</id>
        <content type="html">&lt;p&gt;Here are some updates on various projects I&amp;#8217;ve been working on or been planning to work on:&lt;/p&gt;

&lt;h2 id='parrotstore'&gt;ParrotStore&lt;/h2&gt;

&lt;p&gt;In my post introducing ParrotStore, I mentioned that I only had support for MySQL, Memcached, and a little bit of stuff working for MongoDB. In the past few days I&amp;#8217;ve also added SQLite3 support. Now you can do this, after installing the prequisites, building and installing:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var sqlite3_lib = loadlib(&amp;#39;sqlite3_group&amp;#39;);
var sqlite3 = new &amp;#39;SQLite3DbContext&amp;#39;;
sqlite3.open(&amp;quot;test.sqlite3&amp;quot;);
sqlite3.query(&amp;quot;INSERT INTO tbl1 (name, number) VALUES (&amp;#39;Andrew&amp;#39;, 100)&amp;quot;);
var result = sqlite3.query(&amp;quot;SELECT * FROM tbl 1&amp;quot;);
for (var row in result) {
    for (string colname in row)
        print(colname + &amp;quot;=&amp;quot; + string(row[colname]) + &amp;quot; &amp;quot;);
    say(&amp;quot;&amp;quot;);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;SQLite3 offers a bunch of features that I don&amp;#8217;t tap into yet, but we have a good start and can do some basic work with it already.&lt;/p&gt;

&lt;p&gt;Also, I mentioned that we didn&amp;#8217;t support queries with multiple result sets in the MySQL bindings. Well, now we do (and we do in SQLite3 too):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var result1 = mysql.query(&amp;quot;CALL my_stored_proc&amp;quot;);
var result2 = sqlite3.query(&amp;quot;SELECT * FROM tbl1 ; SELECT * from tbl2&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If the query returns one result set, a DataTable object is returned. If it has multple result sets, an array of DataTables is returned instead.&lt;/p&gt;

&lt;h2 id='eval_pmc'&gt;Eval PMC&lt;/h2&gt;

&lt;p&gt;I went digging through my backlog of old branches last night and found my incomplete branch for removing the deprecated Eval PMC. After updating to current master I gave it a spin and most things looked good. I fixed all the core parrot tests and then moved on to the rest of the ecosystem.&lt;/p&gt;

&lt;p&gt;Winxed works fine with the PackfileView PMC instead of the Eval PMC. I made a few of those updates in the past, so it mostly worked out of the gate. Rosella compiled and ran like a charm too.&lt;/p&gt;

&lt;p&gt;NQP-rx works fine because it mostly relies on the PCT libraries that ship with Parrot, and which I had already fixed.&lt;/p&gt;

&lt;p&gt;The new NQP is a little bit more of a hassle. It took me a little bit of effort to figure out the bootstrapping mechanism, but after a few hours of hacking I had NQP building on the new Parrot using PackfileView instead of Eval. However, one of the regex tests hangs indefinitely now and I&amp;#8217;m having trouble tracking that down. this project may get bumped down to a lower priority level until I can either figure out what the problem with NQP is, or until I can enlist some help to fix it.&lt;/p&gt;

&lt;p&gt;I would like to merge this branch as soon as NQP is fixed and I can prove that I can build it and Rakudo on the branch.&lt;/p&gt;

&lt;h2 id='sub_flags_cleanup'&gt;Sub Flags Cleanup&lt;/h2&gt;

&lt;p&gt;My &lt;code&gt;remove_sub_flags&lt;/code&gt; branch, tasked with removing the old &lt;code&gt;:load&lt;/code&gt; and &lt;code&gt;:init&lt;/code&gt; flags from Parrot and replacing them with the new &lt;code&gt;:tag()&lt;/code&gt; syntax is right where I left it a few weeks ago. I&amp;#8217;m down to a relatively small list of test failures, the solution to most of which is to update the syntax in the tests themselves. A handful of tests such as those using the &lt;code&gt;parrot-nqp&lt;/code&gt; and &lt;code&gt;winxed&lt;/code&gt; compilers are failing because I need to update those compilers first to generate the correct code so the tests can run correctly.&lt;/p&gt;

&lt;p&gt;After fixing NQP-rx and Winxed, I need to get started testing out the new NQP and Rakudo. I suspect both of those two things will be made to work without too much effort.&lt;/p&gt;

&lt;p&gt;It turns out that the Eval PMC deprecation work overlaps with this slightly, so the things I change for that branch should help reduce failures in this branch too. After I get Eval deprecated and removed, I&amp;#8217;ll come back to this branch and see where things stand.&lt;/p&gt;

&lt;p&gt;This is such a large and disruptive change that I can&amp;#8217;t imagine we would want a merge before the 4.4 release, even if I got all the bugs ironed out. We could be a month or more away from a merge, so I&amp;#8217;m not listing this work as high priority.&lt;/p&gt;

&lt;h2 id='pcc'&gt;PCC&lt;/h2&gt;

&lt;p&gt;Bacek has been doing a lot of refactoring in PCC land, trying to fix some slow and infelicitious aspects of it. I&amp;#8217;ve gotten a set of new PCC-related opcodes added to core and have a few more that I want to add, including new variants of &lt;code&gt;set_args&lt;/code&gt;, &lt;code&gt;get_params&lt;/code&gt; and friends to take explicit context arguments instead of using magical behavior to try and find them automatically. A few patches to IMCC and the new behavior might go in without anybody noticing. I&amp;#8217;ve talked more about this in past posts, and I&amp;#8217;m sure I&amp;#8217;ll have more to say when I start making changes.&lt;/p&gt;

&lt;h2 id='rosella'&gt;Rosella&lt;/h2&gt;

&lt;p&gt;Rosella is mostly where I want it to be right now. I&amp;#8217;m planning to change around the development cycle to stick to supported releases of Parrot and Winxed instead of tracking HEAD for both of them. I&amp;#8217;m going to promote one or two more libraries to &amp;#8220;stable&amp;#8221; status and then put out a release sometime after Parrot 4.4 hits the news stands next month. I&amp;#8217;ve already promoted the Parse and Json libraries to stable status. I will probably promote Xml and Net too, since I am pretty happy with both of those two libraries and feel that they are almost ready for general use.&lt;/p&gt;

&lt;p&gt;After that, I suspect Rosella is going to take a back seat for a while, so I can focus on some other projects.&lt;/p&gt;

&lt;h2 id='google_summer_of_code'&gt;Google Summer of Code&lt;/h2&gt;

&lt;p&gt;GSOC is keeping me pretty busy so far. We accepted 4 projects this summer. The fifth project, which was to do some work on the Jaesop Stage 1 compiler, was lost because the student was accepted to a different organization instead. The four remaining projects are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Security Sandbox&lt;/strong&gt; by Justin&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Mod_Parrot 2.0&lt;/strong&gt; by brrt&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;LAPACK Bindings&lt;/strong&gt; by jashwanth&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;PACT Assembly&lt;/strong&gt; by benabik&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I think these projects will be very cool, and I am looking forward to see what kinds of great code they can produce this summer.&lt;/p&gt;

&lt;h2 id='green_threads'&gt;Green Threads&lt;/h2&gt;

&lt;p&gt;nine has been doing some amazing work on his threading branch. Yesterday he informed me that he had a solution to make green threads work on Windows, and had already implemented part of it. That&amp;#8217;s awesome, because I was planning to work on porting the green threads to windows next, but if he&amp;#8217;s doing it then I don&amp;#8217;t have to.&lt;/p&gt;

&lt;p&gt;Some of the performance numbers he&amp;#8217;s been getting are pretty impressive for certain tasks. Some benchmarks he has are even showing a significant threading performance improvement over a similar benchmark written in perl5.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve been doing some testing on his branch and things are looking mostly good except for one or two remaining GC-related bugs that need to be ironed out. After that, if we can get some concensus, I would love to start talking a merger shortly after 4.4.&lt;/p&gt;

&lt;h2 id='6model'&gt;6Model&lt;/h2&gt;

&lt;p&gt;With Green Threads possibly off my TO-DO list, Eval PMC Deprecation mostly wrapped up and remove_sub_flags on the back burner, I can start moving towards my next project: 6model. And I can do it much earlier than I was expecting. I&amp;#8217;m going to mine benabik&amp;#8217;s rejected 6model project proposal for some ideas, then I&amp;#8217;m going to jump in and try to get things working. I suspect things could get moving pretty quickly, if I can keep my level of free time relatiely high.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/UvfO829A1Lc" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/04/25/various_updates.html</feedburner:origLink></entry>
    
    <entry>
        <title>ParrotStore</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/Qbqn0uvAcYc/parrotstore.html" />
        <updated>2012-04-15T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/04/15/parrotstore</id>
        <content type="html">&lt;p&gt;I created a new repo for a new project: &lt;a href='http://github.com/Whiteknight/ParrotStore'&gt;ParrotStore&lt;/a&gt;. ParrotStore intends to provide some storage and persistance (and caching and database) solutions for Parrot. At the time of writing this post we have three in development: Memcached, MySQL and MongoDB.&lt;/p&gt;

&lt;h2 id='memcached'&gt;Memcached&lt;/h2&gt;

&lt;p&gt;The first thing I wrote is a rudimentary pure-parrot interface to Memcached for high speed caching. The interface looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var memcached = new ParrotStore.Memcached([&amp;quot;192.168.1.1&amp;quot;, &amp;quot;192.169.1.2&amp;quot;]);
memcached.set(&amp;quot;foo&amp;quot;, &amp;quot;hello world!&amp;quot;);
:(int have, string content) = memcached.get(&amp;quot;foo&amp;quot;);
say(content);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, if you want a simpler interface, you can do something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;string content = memcached.autoget(&amp;quot;foo&amp;quot;,
    function() { return &amp;quot;hello world!&amp;quot;; }
);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;autoget&lt;/code&gt; method will try to read from Memcached if the item exists, and will invoke the callback to get the value otherwise (and save it to Memcached for later use). Of course, for this to be practical the callback to generate the content should be more expensive than a return of a constant string.&lt;/p&gt;

&lt;p&gt;I havent&amp;#8217;t tested with multiple memcached servers yet, and I haven&amp;#8217;t implemented several of the methods memcached supports. It&amp;#8217;s a start, however, and I can already think of several potential uses for it.&lt;/p&gt;

&lt;h2 id='mysql'&gt;MySQL&lt;/h2&gt;

&lt;p&gt;MySQL is popular and extremely common, so I figured I should work on that next. Plus, if we ever want to have a snowball&amp;#8217;s chance in hell of hosting a decent PHP compiler, we&amp;#8217;re going to want easy and available bindings for MySQL. Now, after a little bit of hacking today, we have it.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s what we can do in Parrot today:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var lib = loadlib(&amp;quot;mysql_group&amp;quot;);
var mysql = new &amp;#39;MySQLDbContext&amp;#39;;
mysql.connect(&amp;quot;localhost&amp;quot;, &amp;quot;username&amp;quot;, &amp;quot;password&amp;quot;, &amp;quot;database&amp;quot;, 0, 0);
var result = mysql.query(&amp;quot;DROP DATABASE foo;&amp;quot;);
say(result, &amp;quot; rows effected&amp;quot;);      // &amp;quot;1 rows affected&amp;quot;, if you had one

result = mysql.query(&amp;quot;SELECT * FROM bar&amp;quot;);
say(typeof(result));                // &amp;quot;MySqlDataTable&amp;quot;
for (var row in result) {           // Iterate over all rows
    int idx = int(row);
    say(&amp;quot;row &amp;quot; + string(idx));
    for (string column in row) {    // Iterate over all columns
        say(column + &amp;quot;: &amp;quot; + string(row[column]));
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;One thing I don&amp;#8217;t handle quite yet is handling multiple result sets. So if you have a stored proc which returns multiple sets of data, you won&amp;#8217;t get any but the first back into your program. I&amp;#8217;ll try to get that implemented as quickly as I can.&lt;/p&gt;

&lt;h2 id='mongodb'&gt;MongoDB&lt;/h2&gt;

&lt;p&gt;We&amp;#8217;re starting to use MongoDB at work, and I figured a great way to become more familiar with this piece of software was to write bindings for it for Parrot. Despite several unnecessary problems with linking to the Mongo C Driver libraries, I&amp;#8217;ve managed to produce a few results.&lt;/p&gt;

&lt;p&gt;Mongo uses a storage format called BSON (similar to JSON), and stores BSON documents as atomic units. ParrotStore implements a BsonDocument and a MongoDbContext PMC type. As of this morning, you can create a BSON document and insert it into the DB:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var lib = loadlib(&amp;quot;mongodb_group&amp;quot;);
var bsondoc = new &amp;#39;BsonDocument&amp;#39;;
bsondoc.append_start_object(&amp;quot;name&amp;quot;);
bsondoc.append_string(&amp;quot;first&amp;quot;, &amp;quot;Andrew&amp;quot;);
bsondoc.append_string(&amp;quot;nick&amp;quot;, &amp;quot;Whiteknight&amp;quot;);
bsondoc.append_end_obect();
bsondoc.finish();

var mongo = new &amp;#39;MongoDbContext&amp;#39;;
mongo.connect(&amp;quot;127.0.0.1&amp;quot;, 27017);
mongo.insert(&amp;quot;local.foo&amp;quot;, bsondoc);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The document is indeed written to the database, although I don&amp;#8217;t have any methods yet to read it back out. The documentation for the C Driver for MongoDB is lacking, but I have the source code handy and it is pretty readable. I hope to have basic querying implemented by the end of the day.&lt;/p&gt;

&lt;p&gt;Here are a few things I plan to add, either today or in the next few days:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Support simple querys and commands&lt;/li&gt;

&lt;li&gt;Support introspecting and iterating over BSON documents&lt;/li&gt;

&lt;li&gt;Implement a JSON-&amp;gt;BSON translator (I have most of this written already).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are several other features that I need to implement, although many of them aren&amp;#8217;t necessary to say I have a minimally functional set: support for replicated sets, support for atomic find/replace updates, support for cursors and bson iterators, etc. There&amp;#8217;s a lot of work here, but I&amp;#8217;m off to a pretty good start already.&lt;/p&gt;

&lt;h2 id='build_system_and_project_setup'&gt;Build System and Project Setup&lt;/h2&gt;

&lt;p&gt;ParrotStore contains a bunch of sub-projects which are really only related by theme. They&amp;#8217;re all solutions for storing stuff, but they don&amp;#8217;t really relate to each other besides that. So, the build system is set up to easily build these projects individually. At the terminal, if you have &lt;code&gt;make&lt;/code&gt;, you can build them like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;make memcached
make install_memcached
make mysql
make install_mysql
make mongodb
make install_mongodb
make            # attempts to build them all
make install    # attempts to build and install them all&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is great for if you don&amp;#8217;t have the mysql or mongodb development packages installed but you want to get the memcached library (or any other combination).&lt;/p&gt;

&lt;p&gt;Internally, the makefile calls a distutils-based &lt;code&gt;setup.winxed&lt;/code&gt; program for building the various components, but you shouldn&amp;#8217;t use &lt;code&gt;setup.winxed&lt;/code&gt; directly.&lt;/p&gt;

&lt;p&gt;Like Rosella, which is a prerequisite for this project, ParrotStore will be a collection of things not one big monolithic system. It will provide a Memcached interface in one standalone library, a MySQL interface in one, a MongoDB interface in one, and other interfaces separately too. Some of them (like Memcached) will be pure parrot. Other things like MongoDB will have C-level components too. Where Rosella has always promised to be pure Parrot, ParrotStore cannot and should not follow such a rule. Some things may turn out to be implementable with NCI, but that&amp;#8217;s an experiment for later. Maybe, much later.&lt;/p&gt;

&lt;p&gt;Also, expect a lot of synergy between Rosella and ParrotStore. ParrotStore will both use Rosella internally, provide many of the interfaces that other Rosella-based projects expect, and add several extensions to make Rosella features even more cool and powerful.&lt;/p&gt;

&lt;h2 id='future_projects'&gt;Future Projects&lt;/h2&gt;

&lt;p&gt;The goal of ParrotStore is simple persistance. In a sense it might become something like an ORM, or contain an ORM, mapping Parrot data to and from various persistance mechanisms. This project does not intend to do any embedding, whether Parrot embedded in a database or a database embedded in Parrot, or whatever else. The Database (or cache or whatever) is separate, and ParrotStore just provides a client interface to it. For instance, the PL/Parrot project embeds Parrot into the Postgres DB. ParrotStore would provide an external interface for querying it instead.&lt;/p&gt;

&lt;p&gt;I do not yet have a runnable test suite. I&amp;#8217;ve been doing ad hoc tests because this is all so new and experimental. I need to add a test suite.&lt;/p&gt;

&lt;p&gt;I also want to add a custom caching mechanism for storing frozen PMCs to file and fetching them again. Multiple backends to a PMC mechanism would allow us to store PMCs to various persistance systems for later use. This is another thing that I&amp;#8217;ve wanted for a while, but I haven&amp;#8217;t quite nailed down a design yet.&lt;/p&gt;

&lt;p&gt;I would like to add a client interface for Postgres. I suspect there are some people floating around who could help make that a reality.&lt;/p&gt;

&lt;p&gt;I think this project will probably grow organically, adding new storage backends and cool interfaces for various purposes, and then adding some tools and utilities that use these things. As with all my projects, feedback, requests, suggestions, and questions about my basic compentency are always welcome.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/Qbqn0uvAcYc" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/04/15/parrotstore.html</feedburner:origLink></entry>
    
    <entry>
        <title>GSOC Proposals Received</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/w93u4nvZGQQ/received_gsoc_proposals.html" />
        <updated>2012-04-09T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/04/09/received_gsoc_proposals</id>
        <content type="html">&lt;p&gt;The GSOC 2012 proposal deadline has come and gone. We&amp;#8217;ve received several project proposals, although only half a dozen are serious, honest, plausible proposals. We have a few days now to rank them, comment on them, and assign potential mentors to them. When we find out how many slots Google is able to assign to us, we&amp;#8217;ll be able to pick out which ones will be worked on this summer.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s a list of the decent-looking proposals we&amp;#8217;ve received:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Jaesop Stage 1 Compiler and Runtime&lt;/strong&gt; by &lt;strong&gt;mayank&lt;/strong&gt;. mayank intends to fix up the last remaining bits of stage 0, then get started on a Javascript-compiler-in-Javascript stage 1. By the end of the summer I think it&amp;#8217;s very plausible that he could have a self-hosting compiler for most of the JavaScript language and at least a start on the basic runtime. If he keeps the abstraction boundaries nice and clean, after the summer is over we should be well primed to start upgrading bits of the internals to use 6model and PACT, when those two projects are ready to be used there.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;LAPACK Bindings for Parrot-Linear-Algebra&lt;/strong&gt; by &lt;strong&gt;jashwanth&lt;/strong&gt;. This is something I&amp;#8217;ve wanted to add to PLA since the beginning of that project, and an absolute necessity if I ever want to get back to my dream of writing an M language compiler for Parrot. jashwanth has proposed assing LAPACK bindings to PLA (via NCI) and implementing a nice interface for some of the most important transformations, decompositions and operations provided by that library. He also intends to provide a few pure-parrot backup implementations for cases when LAPACK isn&amp;#8217;t available but we still need to get work done. It&amp;#8217;s an open-ended project that can be done in small, discrete chunks.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;6model Integration&lt;/strong&gt; by &lt;strong&gt;benabik&lt;/strong&gt;. We know we want 6model, and we know benabik has the chops to pull it off. He is still working on his thesis AND is expecting a baby this summer, but somehow I still don&amp;#8217;t feel like it&amp;#8217;s an undoable project. His proposal is to integrate 6model into Parrot&amp;#8217;s core and start transitioning our existing PMCs to use 6model instead (and abandon most of our current object model).&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;PACT Assembly&lt;/strong&gt; by &lt;strong&gt;benabik&lt;/strong&gt;. Yes, benabik has submitted two proposals. This one is the start of PACT; something that, like 6model, we know we want. benabik, considering PACT was his brain child and he&amp;#8217;s getting his PhD in compiler-related topics, is uniquely qualified to pull this one off and make it shine. The real question is which one of his two proposals we as a community want him to work on more (I&amp;#8217;m already personally signed up to do whichever one he doesn&amp;#8217;t pick, so it shouldn&amp;#8217;t be a loss either way). PACT, as I&amp;#8217;ve talked about before, is intended to be a large modular library of compiler tools and building blocks, so there is ample room to expand the project if things are going unusually well.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Security Sandbox&lt;/strong&gt; by &lt;strong&gt;Justin&lt;/strong&gt;. Security sandboxing is something that we&amp;#8217;ve wanted, to varying degrees, for years. Justin has proposed to at least get us started with proper security and implement as many permissions and restrictions as he has time for. It&amp;#8217;s a project that we can consider to be a &amp;#8220;success&amp;#8221; if even half of what gets proposed actually gets completed, and there is plenty of room to build on if his momentum gets up.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Mod_Parrot 2.0&lt;/strong&gt; by &lt;strong&gt;brrt&lt;/strong&gt;. ModParrot, the Parrot module for the Apache webserver hasn&amp;#8217;t been actively maintained in some time, and has fallen into disrepair following many of the internals changes to Parrot in the past few years. brrt has proposed an update to ModParrot to use the new and more stable embedding API. This is another modular project that can grow if his development speed stays high to include all sorts of helper libraries, driver programs, plugins for HLLs (Rakudo in particular) and other things. Most valuable at all may be his plans for implementing an automated test suite, which will help ensure ModParrot never falls by the wayside again.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we&amp;#8217;ve gotten 6 decent proposals from 5 students, and if even half of these go on to succeed in reaching their goals Parrot will be much better off by the end of the summer. And this list doesn&amp;#8217;t even include the calling-conventions work that bacek is working on, or the threading work that nine is working on, The M0 work that several other developers are doing, or the packfile and IMCC and whatever else work that I&amp;#8217;m planning for myself. This could be a very eventful summer indeed.&lt;/p&gt;

&lt;p&gt;If you&amp;#8217;re signed up to be a mentor this summer, or if you would like to be, please head over to the &lt;a href='http://www.google-melange.com'&gt;GSOC website&lt;/a&gt;, sign up, and take a look at the proposals.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/w93u4nvZGQQ" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/04/09/received_gsoc_proposals.html</feedburner:origLink></entry>
    
    <entry>
        <title>Jaesop Modules</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/dm2v07M2FjM/jaesop_improvements.html" />
        <updated>2012-04-01T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/04/01/jaesop_improvements</id>
        <content type="html">&lt;p&gt;I&amp;#8217;ve received a few emails from prospective GSOC students interested in doing Jaesop-related work this summer for GSOC. So, to make sure it was a platform that&amp;#8217;s fit to be worked on, I fired it up and ran through the tests. Everything passed, which is pretty awesome, except there weren&amp;#8217;t a whole heck of a lot of tests to begin with. Saying that 100% of tests covering about 10% of the code passed isn&amp;#8217;t saying much with any kind of certainty.&lt;/p&gt;

&lt;p&gt;Having a student work on Jaesop, if such a proposal is submitted and accepted, would be quite a boon for that project. In summers past we&amp;#8217;ve had students working on compiler-related projects, usually starting with nothing or almost nothing. For instance last year Rohit was working on a JavaScript compiler starting from the ground up and didn&amp;#8217;t have a huge amount of luck. Lucian was working on a Python compiler project, disregarding much of the older Pynie project work that had been done. He did better, but would have been able to achieve a lot more if he were starting from a stronger foundation (especially, an improved Python-ready object model). Asking another student to work on a new compiler project this year would not be a great thing for us to do, especially knowing that some of the fundamental issues (i.e. object model) are still not resolved at the lowest level.&lt;/p&gt;

&lt;p&gt;Jaesop is slightly different because a student would be starting with a working foundation. It&amp;#8217;s not perfect by any stretch, but it is something. It is a working piece of code with some of the complexities of the JavaScript object model and library logic already sorted out. To make it an even better platform for launching a summer project, I decided that a few more pieces needed to be added. I feel like it&amp;#8217;s the difference between a student who can hit the ground running and one who has to crawl around a few big rocks first.&lt;/p&gt;

&lt;p&gt;First thing, I added &lt;code&gt;require()&lt;/code&gt; and &lt;code&gt;exports&lt;/code&gt;. This way we can do the Common.js analog of loading bytecode files. Here&amp;#8217;s a small example that I wrote out, called &lt;code&gt;sys.js&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;exports.puts = function(s) {
    WX-&amp;gt;say(s);
};&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And here is how we would call that from our program:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var sys = require(&amp;quot;sys&amp;quot;);
sys.puts(&amp;quot;Hello world!&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Doing this kind of stuff is the kind of unnecessary complexity that a student shouldn&amp;#8217;t have to worry about. It&amp;#8217;s tangential to any problems a student would be solving and so having the student waste time on infrastructure like this would just be taking time away from the actual project.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve also improved logic relating to protypes. It&amp;#8217;s not perfect or compliant by any standards, but it&amp;#8217;s much much closer to the norm and likely gets us close enough to parsing and running the kinds of non-trivial programs that are going to form the basis of a compiler.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var sys = require(&amp;quot;sys&amp;quot;);
function Foo() {
    this.a = &amp;quot;foo a&amp;quot;;
}
Foo.prototype.b = &amp;quot;foo_b&amp;quot;;

function Bar() {
    this.a = &amp;quot;bar a&amp;quot;;
}
Bar.prototype.b = &amp;quot;bar b&amp;quot;;

var f = new Foo();
sys.puts(f.a);
sys.puts(f.b);

var b = new Bar();
sys.puts(b.a);
sys.puts(b.b);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If I didn&amp;#8217;t tell you otherwise, you might think this was normal JavaScript code running on Node.js or some other real JavaScript compiler! This exact code example is running on my machine right now. Prior to my hacking this morning, the code above would either have thrown an error or have printed out the same thing twice because both &lt;code&gt;Foo&lt;/code&gt; and &lt;code&gt;Bar&lt;/code&gt; function objects would have shared a prototype.&lt;/p&gt;

&lt;p&gt;Now that I&amp;#8217;ve done this work I feel much better about a Jaesop-based GSOC project happening this summer. Now the responsibility lies with the students to submit acceptable proposals and get to work!&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/dm2v07M2FjM" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/04/01/jaesop_improvements.html</feedburner:origLink></entry>
    
    <entry>
        <title>Rosella Json and Xml</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/DIu5deUIPBY/rosella_xml_json.html" />
        <updated>2012-03-31T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/03/31/rosella_xml_json</id>
        <content type="html">&lt;p&gt;In a &lt;a href='/2012/03/25/rosella_net.html'&gt;previous post&lt;/a&gt; I wrote about how I didn&amp;#8217;t have XML or JSON parsing libraries in Rosella and didn&amp;#8217;t have any real plans to build them either. Well, I kind of lied. As of this morning, Rosella has an experimental prototype of an XML parsing library and a JSON library.&lt;/p&gt;

&lt;p&gt;On the XML side, you can do something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var d = new Rosella.Xml.Document();
d.read_from_string(&amp;quot;&amp;lt;foo&amp;gt;&amp;lt;bar value=&amp;#39;hello!&amp;#39;/&amp;gt;&amp;lt;/foo&amp;gt;&amp;quot;);
say(d.get_root_element().children[0].attributes[&amp;quot;value&amp;quot;]);
d.write_to_file(&amp;quot;test.xml&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For JSON you can do similar things:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var obj = Rosella.Json.parse(&amp;quot;{ &amp;#39;foo&amp;#39;:3.14, &amp;#39;bar&amp;#39;: [ null, []] }&amp;quot;);
string json = Rosella.Json.to_json(obj);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I was looking at the ByteBuffer and StringIterator types, which both allow relatively quick access to individual characters as integer values. Originally I didn&amp;#8217;t want to make an XML or JSON parsing library because I figured that string/substring operations to break the string up would be far too slow. However, if you iterate the strings as integers and make ample use of Winxed&amp;#8217;s compile-time constants and code inlining abilities, the performance can suddenly become comparable to any other data parser in other similar languages.&lt;/p&gt;

&lt;p&gt;Things could actually be faster still if I could assume all strings are ASCII. However real-world data doesn&amp;#8217;t play by those rules. So, we need to go through the added layer of abstraction in StringIterator to get characters, which might not be fixed-width depending on encoding. Things could also be faster if StringIterator had a fast &amp;#8220;unget&amp;#8221; or &amp;#8220;roll back&amp;#8221; operation. Instead I need to maintain a relatively costly character stack for parsing, which is not quite ideal.&lt;/p&gt;

&lt;p&gt;Are these libraries standards compliant? No. Does the XML library know anything about XSLT, XPath, SAX, or schemas? No. Do these libraries handle all error conditions gracefully? No. Are they well tested right now? Emphatic no.&lt;/p&gt;

&lt;p&gt;Parrot&amp;#8217;s standard library comes with a &lt;code&gt;data_json&lt;/code&gt; compiler object which can compile JSON to an object, or serialize an object back to JSON. I haven&amp;#8217;t benchmarked my implementation against that one, but I have many reasons to believe that mine is significantly faster. It&amp;#8217;s still immature and probably very error prone, but it&amp;#8217;s only existed for a day and there are many improvements left to make.&lt;/p&gt;

&lt;p&gt;These are basic and immature implementations of data format parsing libraries, and excellent proofs of concept. I need to add plenty of tests for each and then start benchmarking to make sure the bit-twiddly effort I&amp;#8217;ve put in to creating quick algorithms has paid off. Rosella has two new toys to play with, and I hope they can do some good in the coming months.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/DIu5deUIPBY" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/03/31/rosella_xml_json.html</feedburner:origLink></entry>
    
    <entry>
        <title>Rosella Net</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/LKmvLosASUc/rosella_net.html" />
        <updated>2012-03-25T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/03/25/rosella_net</id>
        <content type="html">&lt;p&gt;I&amp;#8217;ve started work on a new library for Rosella: Net. Net will be a networking library for doing things like HTTP requests and related operations. For a first draft I&amp;#8217;m modelling it closely on the LWP::, Http::, and related modules in Parrot&amp;#8217;s standard library. Many of these have been lovingly crafted and maintained by Francois Perrad and others over the years. Basically I&amp;#8217;m borrowing the essential algorithms from the existing libraries, translating to Winxed en passant, and using them as a basis for building a bigger library.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m not looking to replace the LWP or Http modules from Parrot. Those things are required dependencies of &lt;strong&gt;Distutils&lt;/strong&gt;, which is used by many projects for good reason. Distutils is an amazing tool for quickly and easily setting up build and test frameworks for Parrot-based projects. I wouldn&amp;#8217;t want to disrupt that. In fact, I want to encourage it. Rosella uses Distutils for its build. PLA uses it as well. It&amp;#8217;s a great tool and most projects on Parrot should consider using it. What I do want to do is create an internet-access framework where basic functionality is easy to get to and where new features can be easily added. I also want to fill in what I see as being a pretty important functionality gap in Rosella&amp;#8217;s collection of libraries.&lt;/p&gt;

&lt;p&gt;The LWP and Http libraries in the Parrot standard runtime are loosly-based on the Perl 5 modules of the same names. They started as something of a direct translation of the necessary parts of the Perl5 libraries directly to PIR. What I want to do is start with some of the ideas and algorithms from that port, update the interfaces to make them more Rosella-esque, and then start adding some of the features that the LWP and Http ports didn&amp;#8217;t include (and maybe more beyond that).&lt;/p&gt;

&lt;p&gt;Eventually I would like to add a nice wrapper interface around Socket and Select PMCs the same way the FileSystem library adds nicer wrappers around FileHandle and OS PMCs. I also want to add support for other protocols (FTP and IRC both come to mind). SOAP, REST, and RPC calls could be interesting to add in the future, though most of those would require a library for building and parsing XML, which I don&amp;#8217;t have now and don&amp;#8217;t have any plans to add in the near future. If I could add upload support for Smolder or other continuous integration applications to my Harness library, that would be fun too.&lt;/p&gt;

&lt;p&gt;As of today I can upload report archives to smolder which have been previously assembled by distutils:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var request = Rosella.Net.Http.create_request(&amp;quot;http://smolder.parrot.org/app/projects/process_add_report/2&amp;quot;);
request.set_method(&amp;quot;POST&amp;quot;);
request.add_header(&amp;quot;Connection&amp;quot;, &amp;quot;close&amp;quot;);
request.add_form_field(&amp;quot;architecture&amp;quot;, &amp;quot;x86-64&amp;quot;);
request.add_form_field(&amp;quot;platform&amp;quot;, &amp;quot;linux&amp;quot;);
request.add_form_field(&amp;quot;revision&amp;quot;, &amp;quot;0e75eb61d5b713b57b05e91a2d45251bb18d3e2e&amp;quot;);
request.add_form_field(&amp;quot;tags&amp;quot;, &amp;quot;test&amp;quot;);
request.add_form_field_filename(&amp;quot;report_file&amp;quot;, &amp;quot;/pla/report.tar.gz&amp;quot;, &amp;quot;application/octet-stream&amp;quot;);
var response = Rosella.Net.Http.request(request);
say(response.status_code());
say(response.header().get_header_text());
say(response.get_content());&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is a lower-level of interface than I expect most people to use when the library is more mature. Eventually you&amp;#8217;ll create a UserAgent object which provides nice pretty methods for making GET and POST requests. Like I said, this is roughly based on Parrot&amp;#8217;s LWP library, which itself is based on Perl 5&amp;#8217;s much-loved LWP module.&lt;/p&gt;

&lt;p&gt;The library also supports local resource access with &lt;code&gt;file://&lt;/code&gt; URIs:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// Get a file:
var request = Rosella.Net.Http.create_request(&amp;quot;file:///home/andrew/data/foo.txt&amp;quot;);
var response = Rosella.Net.Http.request(request);
say(response.get_content());

// Create a new file:
var request = Rosella.Net.Http.create_request(&amp;quot;file:///home/andrew/data/bar.txt&amp;quot;);
request.set_method(&amp;quot;PUT&amp;quot;);
request.set_content(&amp;quot;Hello World!&amp;quot;);
var response = Rosella.Net.Http.request(request);
say(response.status_code());&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There isn&amp;#8217;t a lot to look at with this new library quite yet, but things are moving along pretty quick. Now that I have some of the basic algorithms working, I&amp;#8217;ll start rearranging the internals to more closely align with my long-term goals. This has basically been a weekend project so far, and I need to get back to my &lt;code&gt;remove_sub_flags&lt;/code&gt; branch and a few other projects in Parrot core before I take this too much further.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/LKmvLosASUc" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/03/25/rosella_net.html</feedburner:origLink></entry>
    
    <entry>
        <title>Parrot In GSOC 2012</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/qVFRVl-d8LM/parrot_in_gsoc_2012.html" />
        <updated>2012-03-16T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/03/16/parrot_in_gsoc_2012</id>
        <content type="html">&lt;p&gt;I just got the email a few minutes ago: The Parrot Foundation has been accepted into the Google Summer of Code program for the summer of 2012!&lt;/p&gt;

&lt;p&gt;I haven&amp;#8217;t wanted to write too much about GSOC until I knew we were accepted. I thought (hoped) that our application and track record were strong, but I don&amp;#8217;t like to take anything for granted. But, now that we are accepted I can (and will) write as much about it as I want. Unfortuanately I don&amp;#8217;t have time to write much today, so here are some links that prospective students should read over until I have the time to spend in front of my keyboard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://github.com/parrot/parrot/wiki/Summer-of-Code-Task-Ideas'&gt;List of possible project ideas&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://whiteknight.github.com/2011/04/11/gsoc_proposals.html'&gt;My guidelines for proposals&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://whiteknight.github.com/2011/03/27/gsoc_students_next_steps.html'&gt;Next Steps for prospective students&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='https://github.com/parrot/parrot/wiki/Summer-of-Code-Proposal-Template'&gt;GSOC Proposal Template&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the coming days I&amp;#8217;ll be writing about more project ideas.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/qVFRVl-d8LM" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/03/16/parrot_in_gsoc_2012.html</feedburner:origLink></entry>
    
    <entry>
        <title>Packfile Write API</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/ZvmF847UHgg/pbc_c_cleanup.html" />
        <updated>2012-03-15T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/03/15/pbc_c_cleanup</id>
        <content type="html">&lt;p&gt;Last night benabik told me about a problem that, while serious, hardly caused me to raise an eyebrow. Some innocuous-looking code, written while trying to follow along with some ill-written documentation, lead IMCC to enter into an infinite loop. I wasn&amp;#8217;t surprised that IMCC contained such a bug, though I was surprised that it was so easy to reproduce and the functionality in question wasn&amp;#8217;t really tested at all. The code that caused this bug is this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;.const &amp;#39;FixedIntegerArray&amp;#39; foo = &amp;#39;test&amp;#39;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The Sub &amp;#8216;test&amp;#8217; was an &lt;code&gt;:immediate&lt;/code&gt; subroutine which generated a FixedIntegerArray. For people not familiar with IMCC&amp;#8217;s workings an &lt;code&gt;:immediate&lt;/code&gt; Sub is executed as soon as the packfile is compiled and its entry in the packfile is replaced with the return value of the Sub. So, the Sub doesn&amp;#8217;t exist in the packfile, only the thing that the Sub generated. This is the mechanism by which arbitrary PMCs can (in theory) be serialized into the packfile at runtime.&lt;/p&gt;

&lt;p&gt;Compare with this syntax:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;.const &amp;#39;Sub&amp;#39; foo = &amp;#39;test&amp;#39;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this example, the local variable &amp;#8216;foo&amp;#8217; will refer to the item in the constants table where the Sub &amp;#8216;test&amp;#8217; is (or would be, if it weren&amp;#8217;t an &lt;code&gt;:immediate&lt;/code&gt;). The tag &amp;#8216;Sub&amp;#8217; there is a bit of a misnomer, since the PMC in that slot might not be a Sub. This syntax is basically a really lousy way of saying &amp;#8220;give me the PMC in the constant table in the slot where the given named &amp;#8216;Sub&amp;#8217; is or would be.&amp;#8221; Also, that type information isn&amp;#8217;t really used for anything, since PIR is a dynamic language. At first glance you might suspect that the &lt;code&gt;.const&lt;/code&gt; directive can take any PMC name as its type, but that is wrong. It only accepts four types: &lt;code&gt;&amp;#39;Sub&amp;#39;&lt;/code&gt;, &lt;code&gt;&amp;#39;Integer&amp;#39;&lt;/code&gt;, &lt;code&gt;&amp;#39;String&amp;#39;&lt;/code&gt; and &lt;code&gt;&amp;#39;FixedIntegerArray&amp;#39;&lt;/code&gt;. Don&amp;#8217;t ask me why these are the only four supported from among all our built-in types, or why we support more than just &amp;#8216;Sub&amp;#8217; (the other three options seem superfluous to me).&lt;/p&gt;

&lt;p&gt;The poorness of this syntax is not really something that I want to write about in this post. We can do it better and in the next incarnation of PIR (if we ever get to building such a beast) &lt;em&gt;will&lt;/em&gt; do better. This is just one more &amp;#8220;of course it does it that way&amp;#8221; kind of moment that you learn to ignore when you&amp;#8217;re dealing with the code that is the current incarnation of IMCC.&lt;/p&gt;

&lt;p&gt;What I do want to write about instead is the file where the parsing code for this exists: &lt;code&gt;compilers/imcc/pbc.c&lt;/code&gt;. That file contains much of the logic for taking IMCC&amp;#8217;s internal &lt;code&gt;SymReg&lt;/code&gt; representation and turning it into a packfile. It&amp;#8217;s a huge mess, and encapsulation is broken here in ways that are as bad or worse than any other single example in the repository. That&amp;#8217;s why I&amp;#8217;m planning a shotgun cleanup of it soon.&lt;/p&gt;

&lt;p&gt;The functions and logical blocks in this file fall into two broad categories: First, there are functions for iterating the AST (&lt;code&gt;struct SymReg&lt;/code&gt; and friends) and pulling out relevant values. Second, there are functions for inserting new data into the budding packfile. The first category of functions is generally fine and a necessary part of any assembler, even if some of the code could be cleaned and modernized. It&amp;#8217;s that second category that&amp;#8217;s of much broader interest. My plan is to take functions and sequences from &lt;code&gt;pbc.c&lt;/code&gt; out of IMCC, wrap them up all pretty, and add them to the proper packfile subsystem API.&lt;/p&gt;

&lt;p&gt;Of course, I do start to think about how exactly to do that. At runtime the &lt;code&gt;PackFile*&lt;/code&gt; structure is basically read-only. Bytecode is read-only and contains fixed integer indices into the constants table which is also not expected to change. If bytecode isn&amp;#8217;t changing then annotations and debugging info for that bytecode is probably not changing either. Once a packfile is loaded into the interp, given a PackfileView PMC wrapper, and made executable it really shouldn&amp;#8217;t be modified any more.&lt;/p&gt;

&lt;p&gt;However when we&amp;#8217;re talking about a compiler or other code generating system, we want the ability to write and modify packfiles. When we&amp;#8217;re done modifying we might want to stamp them with a flag to say that they are read-only and suitable for executation.&lt;/p&gt;

&lt;p&gt;So I&amp;#8217;m thinking we want two APIs. The first uses a bit flag on the packfile to determine if it&amp;#8217;s editable, and can edit it if that flag is set. The second is the normal read-only accessor API which can generally ignore that flag except for the routines that load a packfile in to be executed by the interpreter. For those handful of routines that do actual loading and verification we can throw an exception or something.&lt;/p&gt;

&lt;p&gt;My general plan, and I want to get a lot of feedback on this before I touch anything, is to make a second API for packfile editing routines. I&amp;#8217;ll prefix those functions with something like &lt;code&gt;Parrot_pfw_&lt;/code&gt; (instead of &lt;code&gt;Parrot_pf_&lt;/code&gt;) to set them aside. I&amp;#8217;ll then start moving packfile building logic from IMCC into this new API. Instantly we get much improved encapsulation, clear separation of concerns between packfile writing and executing, and a more robust interface for compiler writers to use in the future. It will also be a nice development in terms of security, where we can limit certain packfiles to be executable. I think it&amp;#8217;s a pretty good idea &lt;em&gt;and&lt;/em&gt; I don&amp;#8217;t think it should take too much effort to accomplish. At least, not in a first draft.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m not starting any new projects until I get my &lt;code&gt;remove_sub_flags&lt;/code&gt; branch much further along. I think it&amp;#8217;s a good idea to follow up a tough project with one which is more straight-forward and offers clear rewards. Speaking of that branch, parrot is building fine and I&amp;#8217;m slowly working my way through the list of test failures. When I get the majority of those sorted out I&amp;#8217;m going to start working on patches for Winxed, NQP-rx, NQP and Rakudo. We&amp;#8217;re a long way off but the progress is extremely rewarding.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/ZvmF847UHgg" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/03/15/pbc_c_cleanup.html</feedburner:origLink></entry>
    
    <entry>
        <title>Packfile Tags Cleanup</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/9zKUTweXq24/packfile_tags_cleanup.html" />
        <updated>2012-03-04T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2012/03/04/packfile_tags_cleanup</id>
        <content type="html">&lt;p&gt;I&amp;#8217;ve complained a lot about Parrot&amp;#8217;s packfiles and packfile-related subsystems. A few months ago I started &lt;a href='/2011/07/07/packfiles_work_continues.html'&gt;writing about a plan to fix some of these issues&lt;/a&gt;. Now, after something of an extended hiatus from hardcore parrot hacking, I&amp;#8217;ve decided to get started on doing all this work.&lt;/p&gt;

&lt;p&gt;The biggest part of my plan, from a user-interface perspective was changing this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;.sub &amp;#39;foo&amp;#39; :load
    ...
.end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;to this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;.sub &amp;#39;foo&amp;#39; :tag(&amp;#39;load&amp;#39;)
    ...
.end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Also, as part of this change, we want to change this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;load_bytecode &amp;quot;bar.pbc&amp;quot;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;to this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$P0 = load_bytecode &amp;quot;bar.pbc&amp;quot;
$P1 = $P0.&amp;#39;subs_by_tag&amp;#39;(&amp;quot;load&amp;quot;)
$I0 = $P1.&amp;#39;is_initialized&amp;#39;(&amp;#39;load&amp;#39;)
if $I0 goto done_initialization
$P2 = iter $P1
loop_top:
unless $P2 goto loop_bottom
$P3 = shift $P2
$P3()
goto loop_top
loop_bottom:
$P1.mark_initialized(&amp;quot;load&amp;quot;)
done_initialization:&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I&amp;#8217;ve discussed these changes before but I will repeat one important detail that is well worth repeating: Despite the fact that the total amount of PIR code increases, the overall performance profile &lt;em&gt;improves&lt;/em&gt;. The second set of code snippets are faster to execute and are much more flexible. The performance improves as the number of &lt;code&gt;:load&lt;/code&gt; or &lt;code&gt;:init&lt;/code&gt; subs in your packfile increases. You gain the ability to tag subroutines with any string name you want, use as many different string names for subroutine groups as you want, get a list of subs to execute in any order at any time, be able to initialize and reinitialize libraries at any time as many times as you want, and do a few other cool and empowering things. Actually, the system is set up to allow any arbitrary PMC in the constants table to be tagged with a string name and accessed using that name, but the only thing we have syntax for right now are the Subs.&lt;/p&gt;

&lt;p&gt;Another point that I also feel like repeating is this: People shouldn&amp;#8217;t write PIR code by hand. Don&amp;#8217;t do it. You will write something friendly like Rakudo Perl 6, or NQP or Winxed, and those compilers will automatically generate the necessary PIR sequences for you. It will take some effort on my part to get the various compilers updated, but the end user shouldn&amp;#8217;t see any differences. So long as you aren&amp;#8217;t writing your code in PIR directly, you&amp;#8217;ll be fine.&lt;/p&gt;

&lt;p&gt;So what&amp;#8217;s involved in making this change? The packfile system, especially the packfile loader and the bits that are intertwined with IMCC are some of the messier or at least most obscure parts of Parrot. This is one of the subsystems of Parrot that is the least abstracted and requires the most low-level bit twidling. The GC is a comparable beast in this regard. Things like memory alignment, offsets, complicated nested &lt;code&gt;struct&lt;/code&gt; definitions and accesses, byte ordering, and a variety of other similar concepts are very common in this subsystem. Any changes must be made with extreme care.&lt;/p&gt;

&lt;p&gt;At the time of this writing I&amp;#8217;ve already made several major changes in my branch and have several more to go. I&amp;#8217;ve removed the &lt;code&gt;:load&lt;/code&gt; and &lt;code&gt;:init&lt;/code&gt; flag syntax from the IMCC lexer and parser (they&amp;#8217;ve been deprecated for a long time now and a replacement syntax has been available for some months). I&amp;#8217;ve removed the awful &lt;code&gt;do_sub_pragmas&lt;/code&gt; and &lt;code&gt;PackFile_fixup_subs&lt;/code&gt; functions and rearranged significant amounts of logic. I&amp;#8217;ve also taken a shotgun approach to converting &lt;code&gt;:load&lt;/code&gt; and &lt;code&gt;:init&lt;/code&gt; flags in the various PIR code libraries to &lt;code&gt;:tag(&amp;quot;load&amp;quot;)&lt;/code&gt; and &lt;code&gt;:tag(&amp;quot;init&amp;quot;)&lt;/code&gt; respectively. Some of these conversions were automated and a little bit over zealous, but that is to be expected at this stage of the game. Code for handling &lt;code&gt;:immediate&lt;/code&gt; and &lt;code&gt;:postcomp&lt;/code&gt; flags have been moved into IMCC where they belong. libparrot now has no direct knowledge of either of those two things.&lt;/p&gt;

&lt;p&gt;Coming up are several more big changes before this branch is even remotely close to being mergable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I need to clean up interp initialization logic that had been spread around in various places. The embedding API defines a central execution path for running bytecode programs, and the initialization logic needs to be moved higher up along that path.&lt;/li&gt;

&lt;li&gt;I need to set a reference to the owning PackfileView PMC in each loaded Sub PMC. Currently we anchor every single PackfileView PMC to prevent them from ever being collected by GC. By storing a reference to the Packfile in the Sub PMCs, we can mark reachable Subs, mark reachable Packfiles, and therefore allow old, unreferenced packfiles to be GC collected. This fixes an old and sizable memory leak (especially in programs that do a lot of dynamic compilation. If we do this in the right way in the right places, we can close the memory leak and reduce iteration over packfile constants looking for Subs and PMCs to mark and keep track of.&lt;/li&gt;

&lt;li&gt;I need to rewrite &lt;code&gt;Parrot_load_language&lt;/code&gt; and friends. This is the function that implements the &lt;code&gt;load_language&lt;/code&gt; opcode behavior. A &amp;#8220;language&amp;#8221; in this context is a library package which usually contains a compiler object, a runtime, and any dynpmc and dynop libraries required by those. This codepath shared much logic with &lt;code&gt;Parrot_load_bytecode&lt;/code&gt; (the guts behind the old &lt;code&gt;load_bytecode_s&lt;/code&gt; opcode variant), and therefore shares many problems and misbehaviors.&lt;/li&gt;

&lt;li&gt;I want to refactor the relationship between the &lt;code&gt;PackFile*&lt;/code&gt; structure and the &lt;code&gt;PackfileView&lt;/code&gt; PMC. If the PMC is the GCable wrapper around the structure, the whole thing is more easily managed by GC if we can guarantee that the PMC always exists if the struct does. Refactors here can further clean up code in the packfile loader and elsewhere.&lt;/li&gt;

&lt;li&gt;We need to rip out a lot of old, dead, crufty code that is no longer needed. This includes things like &lt;code&gt;Parrot_compile_file&lt;/code&gt; and &lt;code&gt;Parrot_compile_string&lt;/code&gt;, which are unnecessary internally because we have easily accessible and easily usable compiler objects at the PIR level. This also includes various helper routines for &lt;code&gt;Parrot_load_bytecode&lt;/code&gt;, &lt;code&gt;Parrot_load_language&lt;/code&gt; and &lt;code&gt;do_sub_pragmas&lt;/code&gt;, helper routines for working with &lt;code&gt;:load&lt;/code&gt; and &lt;code&gt;:init&lt;/code&gt; flags&lt;/li&gt;

&lt;li&gt;I need to update Winxed, NQP-rx, NQP, Rakudo and any other HLLs and libraries that still rely on the old behaviors.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So there are plenty of steps left before we can talk merger, and I don&amp;#8217;t expect to get this work anywhere near master before 4.2. It&amp;#8217;s a lot of work but it&amp;#8217;s fun and will be rewarding when it&amp;#8217;s all done. I&amp;#8217;m looking forward to getting this all wrapped up in the coming weeks and months.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/9zKUTweXq24" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/03/04/packfile_tags_cleanup.html</feedburner:origLink></entry>
    
    <entry>
        <title>Introspection, Disasembly and PACT (GSOC)</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/qQYjnw9B6vM/gsoc_idea_pact.html" />
        <updated>2012-02-21T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2012/02/21/gsoc_idea_pact</id>
        <content type="html">&lt;p&gt;In &lt;a href='/2012/02/15/gsoc_season_starting.html'&gt;my post a few days ago I mentioned Google Summer of Code 2012&lt;/a&gt; and gave a lightning list of simple project ideas that might be worth pursuing. Today I&amp;#8217;m going to expand on one of these ideas because it&amp;#8217;s fertile ground for many possible GSOC projects, including the possibility of several projects concurrently if we have multiple students interested in it.&lt;/p&gt;

&lt;p&gt;Parrot has a lot of introspection ability, but we don&amp;#8217;t really have the tools necessary to introspect bytecode. We need some kind of tool that, given a Sub PMC or a PackfileView PMC or similar will be able to provide a disassembled representation of the actual opcodes. Here&amp;#8217;s a basic code example of what I am talking about:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function foo() { var x = 2; ... }

function bar() {
    var disassem = new Parrot.PACT.Disassembler();
    using foo;
    var raw = disassem(foo);
    var reg = raw.registers();          // Get register counts
    var lex = raw.lexicals();           // Get info about lexicals
    var constants = raw.constants()     // Referenced constants
    var ops = raw.opcodes();            // Symbolic Opcodes
    say(ops[0]);                        // &amp;quot;set_p_i, $P0, 2&amp;quot;, etc
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;These are just some random ideas and not all of them are necessary to implement. The most important part, in my mind, is getting a list of symbolic Opcode PMCs. Each Opcode PMC would have this general form:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;class Parrot.PACT.Opcode {
    var opname;         // The name or short name of the op
    var opnumber;       // The number of the opcode
    var oplib;          // The oplib which owns it
    var args;           // Array of Arguments
                        // Either Register or Constant
    ...
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I prefix the disassembler classname with the namespace &amp;#8220;Parrot.PACT&amp;#8221; because eventually this should be an integral component of the PACT library. When we use PACT to assemble packfiles (and, ultimately, bytecode files) we&amp;#8217;ll be constructing a list of these Opcode PMCs and then using a serializer to write them down to raw bytecode.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;                  Serializer
Array of Opcodes ------------&amp;gt; Packfile Bytecode Segment

          Deserializer
Bytecode --------------&amp;gt; Array of Opcodes&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;An excellent proof of concept system would combine these two mechanisms together into a faithful round-trim assembly/disassembly mechanism. In fact, there are multiple little potential projects here that can be arranged and ordered/prioritized to create a summer-long project or many:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a tool to disassemble raw bytecode into Opcode PMCs, and create a disassembler program to interact with the user and print the disassembly listings to file/console.&lt;/li&gt;

&lt;li&gt;Create a tool for round-trip disassembly and assembly. Write the disassembly type, then write a tool that does the reverse operation (take a list of Opcodes and write a valid Packfile or bytecode segment).&lt;/li&gt;

&lt;li&gt;Create the tool to disassemble raw bytecode, then write a utility layer to construct a control flow graph from those Opcodes. This layer could be used in turn to create things like code complexity analyzers, or even simple decompilers (for the very ambitious student).&lt;/li&gt;

&lt;li&gt;Write a tool to take a stream of Opcode PMCs and other related data (tables of constant values, annotations, debugging symbols, etc) and write them into a valid and executable packfile. This would be the base layer of the PACT assembly engine, and would be used to help build compilers and other tools.&lt;/li&gt;

&lt;li&gt;Construct anything else PACT-related (AST and manipulators, CFG/DAG and friends, PIR-&amp;gt;Opcode assembler, etc). There is lots of fertile ground here for projects (and we have a lot of ideas and designs already put together for these things).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are lots of ideas here and I&amp;#8217;ve still only scratched the surface. My goal with this post is to show how fertile this ground is, how much available work there is to be had and how many new features we desperately need.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s a basic flow graph of things I&amp;#8217;m envisioning as eventual parts of PACT or its close cousins. This will show the kinds of components that PACT may either eventually contain or serve as the common substrate for:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;                                      PIR and PASM Code
                                              |
            Optimizers         Analyzers &amp;lt;-+  |  +-&amp;gt; Debugger/Live
            ^ |    ^ |           ^         |  |  |            Interpreter
            | V    | V           |         |  V  |
HLL code -&amp;gt; AST -&amp;gt; Control Flow Graph -&amp;gt; Opcode stream -&amp;gt; Packfile
             ^       |           ^         |  ^  ^           |
             |       |           |         |  |  |           |
            HLL &amp;lt;----+           +---------+  |  +-----------+
       (Decompiled)                           |              |
                                              |              |
                                             PIR Code &amp;lt;------+
                                          (Disassembly)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;One day when I have more time I may try to put this together into a real image of some variety. ASCII graphics were good enough for our digital ancestors and they will suffice here for a first draft. As you can see this graph contains several components, any one of which or any small subsection might make for an interesting and extremely rewarding project over the summer. This also ignores the inherent complexity and layered architecture possible in things like the AST transformations and optimizations, register allocation, etc. My point is that even the blocks on the graph above can be further decomposed into a variety of smaller but still interesting projects. If any of this stuff looks interesting to you, please get in touch ASAP so we can start talking and planning. Obviously this is more work than one person will do in one summer, so we want to make sure we are coordinating between all interested parties.&lt;/p&gt;

&lt;p&gt;I think that if we start on the left side of this chart and implement the routines for reading from and writing to packfiles first, we can start building layers of additional functionality on top of them. This gives us an ability to break such a big system up into managable parts, to complete some of those parts in small summer-sized chunks, and to be able to use intermediate implementations to solve real problems while we wait for the rest of the system to grow and mature.&lt;/p&gt;

&lt;p&gt;If we had multiple students interested in working on PACT in one capacity or another it would be an awesome way to maximize developer resources and help push forward the idea of code reusability. I&amp;#8217;m really excited about this whole area and would love it if some students were interested in it too.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/qQYjnw9B6vM" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/02/21/gsoc_idea_pact.html</feedburner:origLink></entry>
    
    <entry>
        <title>Compile Winxed With Winxed</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/kzPPaZ2PpCc/pure_winxed_compiler.html" />
        <updated>2012-02-18T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2012/02/18/pure_winxed_compiler</id>
        <content type="html">&lt;p&gt;You can compile Winxed code with Winxed itself! What&amp;#8217;s that you say? The Winxed compiler is bootstrapped and self-hosted, and is written in Winxed and already compiles winxed? Well, that&amp;#8217;s all true. Sort of. However there is one small caveat: The Winxed driver program historically has not been able to perform the last step of compilation. The driver compiles winxed code down to PIR, but then uses the &lt;code&gt;spawnw&lt;/code&gt; opcode to invoke an instance of Parrot to compile the PIR down to PBC.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m pleased to say that this last step is no longer necessary. At least, not in Winxed master (which has not yet been snapshotted into Parrot core).&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s a small toy compiler driver that uses Parrot&amp;#8217;s PackfileView PMC to compile a &lt;code&gt;.winxed&lt;/code&gt; file down into &lt;code&gt;.pbc&lt;/code&gt; without spawning any child processes:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function get_winxed_compiler(string pbc_name = &amp;quot;winxedst3.pbc&amp;quot;)
{
    var wx_pbc = load_packfile(pbc_name);
    for (var load_sub in wx_pbc.subs_by_tag(&amp;quot;load&amp;quot;))
        load_sub();
    return compreg(&amp;quot;winxed&amp;quot;);
}

function main[main](var args)
{
    var wx_compreg = get_winxed_compiler();
    string winxedcc_name = args.shift();
    string infile_name = args.shift();
    string outfile_name = args.shift();

    string code = (new &amp;#39;FileHandle&amp;#39;).readall(infile_name);
    var pf = wx_compreg.compile(code);
    pf.write_to_file(outfile_name);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That&amp;#8217;s less than 20 lines of Winxed code to get the Winxed compiler object loaded, to compile the code and to output the PBC to file. We can make this better, of course, by being more flexible in the handling of arguments and printing out basic help and error messages and all that stuff. Eventually we are going to update the winxed executable itself to use this trick instead of spawning the child process. This should, I hope, have a noticably beneficial effect on compiling with Winxed from the commandline. For large, long builds like Rosella has, any speed improvements are appreciated.&lt;/p&gt;

&lt;p&gt;One particularly interesting tidbit to notice is the very first line: A new syntax for handling optional parameters. I put a patch for that feature together last week and NotFound decided he could do the same thing but better than I did. So, the latest Winxed compiler (again, not yet snapshotted into Parrot Core) supports this syntax for providing default values to optional arguments. I hope that this feature is included with the 4.1 release next week. Here are some examples of the new feature in action:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// This...
function foo(var bar [optional], int has_bar [opt_flag])
{
    if (!has_bar)
        bar = default_bar_value();
    ...
}

// ...is the same as this
function foo(var bar = default_bar_value())
{
    ...
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The initializer can be any expression value, including expressions involving previous arguments:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function foo (var bar, var baz = bar.some_method(bar))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This new syntax probably has a few kinks to work out still, but it&amp;#8217;s a very cool and very appreciated addition. I&amp;#8217;m hoping to use this new syntax to clean up a lot of code in Rosella and hopefully in Jaesop too.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/kzPPaZ2PpCc" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/02/18/pure_winxed_compiler.html</feedburner:origLink></entry>
    
    <entry>
        <title>Compiling Templates</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/cvI9r-8Stds/rosella_template_compilation.html" />
        <updated>2012-02-16T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2012/02/16/rosella_template_compilation</id>
        <content type="html">&lt;p&gt;I had a precious few hours to myself yesterday and was able to do some updating work on Rosella&amp;#8217;s Template library. I was able to use the time to implement a feature I had wanted for a while: template compilation. You can now compile a template into winxed code or even compile it all the way down to a Packfile. Actually, that&amp;#8217;s sort of a lie. The Winxed compiler &lt;code&gt;.compile()&lt;/code&gt; method returns an Eval PMC not a PackfileView PMC. I&amp;#8217;m going to submit a patch for that soon, and when I do you&amp;#8217;ll be able to save the template to a .pbc file.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s how to use the Template library in the basic, interpreted way:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var engine = new Rosella.Template.Engine();
string output = engine.generate(template, context);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The template is a string with a format that I&amp;#8217;ve demonstrated before, and the &lt;code&gt;context&lt;/code&gt; is any user-defined data structure that you want to use to populate the variables in the template. I won&amp;#8217;t go into detail about those things in this post. Now, after my recent changes and additions, you can compile your template to an executable Sub:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var engine = new Rosella.Template.Engine();
var sub = engine.compile(template);
string output = sub(context);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, if you really want to see the generated winxed code, you can get that:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;string wx_code = engine.compile_to_winxed(template);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The compilation process does take some time, there&amp;#8217;s no way to deny that. There are ways to mitigate that expense, of course. You can compile ahead of time and save the code to a file or even a .pbc and execute that later. There are several strategies if you&amp;#8217;re really interested, I won&amp;#8217;t go into too much detail here. Once the code is compiled, which can and should be done ahead of time, the time savings during execution are significant. Here&amp;#8217;s some benchmarking I&amp;#8217;ve done to time a relatively simple template with a ten thousand iteration &lt;code&gt;&amp;lt;$ repeat $&amp;gt;&lt;/code&gt; loop:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Interpreted:
0.969569s - %100.000000
0.796563s - %82.156419 (-%17.843581 compared to base)
0.900937s - %92.921402 (-%7.078598 compared to base)

Compiled:
0.365500s - %37.697161 (-%62.302839 compared to base)
0.498571s - %51.421914 (-%48.578086 compared to base)
0.367616s - %37.915399 (-%62.084601 compared to base)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In some cases, pre-compiling the template can take a third as much time as interpreting the template directly. Almost every timing I&amp;#8217;ve seen is around 50% or better of the interpreted time.&lt;/p&gt;

&lt;p&gt;This is a very new feature and I haven&amp;#8217;t added it to the test suite yet. Expect rough edges as I play with it and optimize it. If you&amp;#8217;re doing a lot of templating with Rosella, this feature could help save some time for you and plus it was a really fun thing to work on.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/cvI9r-8Stds" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/02/16/rosella_template_compilation.html</feedburner:origLink></entry>
    
    <entry>
        <title>GSOC 2012 Is Starting</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/oAO5JU3LCdk/gsoc_season_starting.html" />
        <updated>2012-02-15T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2012/02/15/gsoc_season_starting</id>
        <content type="html">&lt;p&gt;You might not see it, but I&amp;#8217;m starting to get very excited. Discussions about the Google Summer of Code program is starting up for the 2012 summer. Projects in years past have lead to some awesome developments in Parrot, either directly or indirectly, and 2012 could easily deliver more.&lt;/p&gt;

&lt;p&gt;For prospective students, here are some blog posts I&amp;#8217;ve written in years past about the process, and what you need to do to get started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='http://whiteknight.github.com/2011/04/11/gsoc_proposals.html'&gt;GSOC Proposals&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://whiteknight.github.com/2011/03/27/gsoc_students_next_steps.html'&gt;Next Steps&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you&amp;#8217;re interested in participating in GSoC this year, with Parrot or any other organization, I suggest you read those two posts. They&amp;#8217;re important. Many students in school or freshly graduated may know how to code but might not know much about some of the tangential topics: source control (git, in the case of Parrot), documentation, unit testing, refactoring, etc. Now would be a great time to start brushing up on all those topics, so you don&amp;#8217;t waste time during the summer getting your tools in place.&lt;/p&gt;

&lt;p&gt;As usual I will try to post some ideas for projects here on this blog. If you are an eligible student and you are interested in one or more of these ideas please get in touch with me or other interested Parrot developers. If you have other ideas that I don&amp;#8217;t mention, that&amp;#8217;s cool too. Get in touch anyway and we can start talking about those ideas. &lt;strong&gt;The important thing is to talk to us&lt;/strong&gt;. Seriously, it&amp;#8217;s important.&lt;/p&gt;

&lt;p&gt;Here are a few ideas off the top of my head that might be worth some more investigation. I might write additional posts about these ideas if people want more information about them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Opcode disassembler&lt;/strong&gt;. We have a growing set of tools for working with bytecode from running Parrot code, but we don&amp;#8217;t have all the pieces that we would need to make a fully self-hosted disassembler. What we still need are tools to read out the raw bytecode and convert into a sequence of Opcode PMC representations, then a driver program to turn that output into readable (and hopefully round-trim compilable) disassembly output.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;PACT&lt;/strong&gt; An alternate to Parrot&amp;#8217;s venerable PCT library, called PACT, has been in the planning stages for many months. What we need is somebody with the time and motivation to start putting those ideas into practice. A toolkit library that works with syntax trees and outputs working bytecode libraries would be quite an awesome thing to have, and would help us push the state-of-the-art for Parrot compilers up a few notches. Intended to be a layered system, the successful student could implement only a few of the necessary layers.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Jaesop Stage 1&lt;/strong&gt; Jaesop is a bootstrapped JavaScript compiler. Right now there is a stage 0 compiler which uses node.js to compile JS into Winxed. We need a stage 1 compiler, written in JavaScript, that can run on stage 0 and compile itself. It&amp;#8217;s an interesting, mind-bendy kind of project. If you know compilers and you like JavaScript, this might be the project for you.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Anything Python-Related&lt;/strong&gt; Parrot wants and needs more Python love. We&amp;#8217;ve had a few attempts at making a working Python compiler before on Parrot. Working on any of those, starting a new attempt, or working on other ways to integrate Python and Parrot would all be greeted with some eagerness.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;New Object Model&lt;/strong&gt; Parrot needs a new object model. Right now we have 6model waiting in the environs, but we haven&amp;#8217;t integrated it yet. There is some question about whether we want to copy+paste merge 6model as it is, or if we want to try and make a more custom adaptation of it from the ground up. Proposals to do either, or anything closely related, would be very interesting.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;LAPACK Bindings for PLA&lt;/strong&gt; I&amp;#8217;ve wanted PLA to get LAPACK bindings for a long time now, and I&amp;#8217;ve never had the time to do it myself. I&amp;#8217;ve never really even had the time to design what such a thing would look like. I suspect we can do the lion&amp;#8217;s share through raw NCI. Tools to solve matrices for eigenvalues and eigenvectors and common transformations and decompositions would be very interesting indeed.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;New PackFile Loader&lt;/strong&gt; I&amp;#8217;ve complained about the unnecessarily magic behavior of our packfile loader before. A new, re-written packfile loader would not automatically assemble Namespace or MultiSub PMCs at load time. Down this path lay the possibility for significant performance improvements and complexity reductions in some of our oldest and least-friendly code. It would require serious changes to IMCC, PIR, and higher-level compilers like Winxed. The new NQP already does the right kind of thing and would serve as a great example to follow.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Anything Thread or Asynchrony Related&lt;/strong&gt; Parrot has Green Threads on Unixy platforms. Extending proper support to Windows and elsewhere would be awesome (again, something I want to do but have not had time or a Windows machine for testing). Asynchronous IO using new threading primitives and anything else that you could think to build with the new threading system would be awesome. Adding in types that would help the effort such as lock-free arrays and hashes would be nice assets. Adding in support for concurrency primitives like locks, mutexes and critical sections would also be cool (and implementing those in terms of green thread primitives would be even cooler).&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Divided VTABLE&lt;/strong&gt; Parrot&amp;#8217;s VTABLE is a huge monolithic structure, and there have been many suggestions recently that we break it down smaller chunks based on roles. The &amp;#8220;Number&amp;#8221; role would contain arithmetic vtables, while the &amp;#8220;invokable&amp;#8221; role would have VTABLE_invoke and friends. GCable role, Array role, Hash role, Metaobjecet role, etc. These are all things that we could use and would decrease overall memory footprint and increase flexibility in the system. Bonus points if this work was integrated closely with 6model and other proposed changes in our PMC subsystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are just a few of the ideas I have on the top of my head this morning. Some, I&amp;#8217;m sure, are too big. Others are too small. But in each is the kernel of a good idea and if anybody reading this is interested we should start the conversation now to get these vague ideas focused into compelling proposals.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/oAO5JU3LCdk" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/02/15/gsoc_season_starting.html</feedburner:origLink></entry>
    
    <entry>
        <title>Rosella Reflect</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/H0rkW-i5TQc/rosella_reflect.html" />
        <updated>2012-02-01T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2012/02/01/rosella_reflect</id>
        <content type="html">&lt;p&gt;Earlier this month I released the new Reflect library in Rosella. I hadn&amp;#8217;t mentioned it before, but the library is sufficiently interesting that I want to talk about it at least a little bit. The Reflect library adds in tools for reflection. Somewhere, an etymologist weeps a tear of joy for the creative naming, I&amp;#8217;m sure.&lt;/p&gt;

&lt;p&gt;The Reflect library adds in wrappers for classes and packfiles that makes them easier to work with for many operations. First, I&amp;#8217;d like to use a couple code examples to show the most basic API:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// Get the Sub PMC that we&amp;#39;re currently executing
var s = Rosella.Reflect.get_current_sub();

// Get the current context
var c = Rosella.Reflect.get_current_context();

// Get the current object, if the current Sub is a method call
var obj = Rosella.Reflect.get_current_object();

// Get the class of the current object, if the current Sub is a method call
var c = Rosella.Reflect.get_current_class();

// Get a Module object for the packfile where the current Sub is defined
var m = Rosella.Reflect.get_current_module();

// Get a reflection wrapper object for the given Parrot Class PMC
var r = Rosella.Reflect.get_class_reflector(myClass);

// Get a Module object for the packfile in &amp;quot;foo/bar.pbc&amp;quot;, loading it as
// necessary
var m = Rosella.Reflect.Module.load(&amp;quot;foo/bar.pbc&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That&amp;#8217;s the basic API that the library provides to get basic information about where execution is happening at the moment when the call is made. Once you have a Module object or a Class reflector object, you can do all sorts of cool things that used to be a pain in the butt to do manually:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var m = Rosella.Reflect.get_current_module();
say(m);          // Stringified, produces the name and version of the packfile
m.load();        // Execute all :tag(&amp;quot;load&amp;quot;) and :load functions
n.init();        // Execute all :tag(&amp;quot;init&amp;quot;) and :init functions
say(m.version(); // Get the version string of the packfile &amp;quot;X.Y.Z&amp;quot;
say(m.path());   // The on-disk path to the current packfile

// Get a hash of all Class PMCs defined at compile-time (using the :method
// flag on Subs) defined in the packfile, keyed by name
var c = m.classes();

// Get a list of all non-:anon functions defined in the packfile
var f = m.functions();

// Get a hash of all non-:anon functions in the packfile, organized into
// a hash keyed by namespace
var f = m.functions_by_ns();

// Get a hash of all NameSpace PMCs defined at compile-time
var ns = m.namespaces();&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Once you have Class and NameSpace PMCs from the packfile, you can start to do all sorts of cool operations and analyses on them. Once you have a Class reflector object, you can do even more stuff with that:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var c = Rosella.Reflect.get_current_class();

// Create a new object of the current type
var o = c.new();

// Say the name of the class
say(c.name());

// Attributes are encapsulated as objects. You can get an Attribute
// reflector and use it later to get and set values on objects of this
// type or subclasses
var attr = c.get_attr(&amp;quot;foo&amp;quot;);
var value = attr.get_value(o);
attr.set_value(o, &amp;quot;whatever&amp;quot;);

// Methods are also encapsulated. You can get a method reflector now and
// invoke it on objects later (including objects of different types)
var method = c.get_method(&amp;quot;bar&amp;quot;);
var result = method.invoke(o);
var meths = c.get_all_methods();

// Basic capability detection. Determine if objects are members of the
// class or their subsets, and determine if the class can perform certain
// methods
if (c.isa(o)) { ... }
if (c.can(&amp;quot;bar&amp;quot;)) { ... }&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I hope the code examples make up for the terse explanations.&lt;/p&gt;

&lt;p&gt;The Reflect library is currently focused on reading data from things like Classes and Packfiles, not on creating these things like the new PACT project is supposed to do. I want to extend this library even further with abilities to further introspect functions down to the opcode level and then&amp;#8230;Well, when we have a stream of opcodes to analyze the possibilities are endless. I&amp;#8217;d also like the ability to get better introspection of the interpreter and global state, though a cleaner interface than the hodge-podge of &lt;code&gt;interpinfo&lt;/code&gt; opcodes and ParrotInterpreter PMC methods and whatever else we currently use.&lt;/p&gt;

&lt;p&gt;As always, using the interface Rosella provides will help to insulate you from changes to the various underlying mechanisms when we finally get around to cleaning them up and making them sane. There isn&amp;#8217;t a huge push to make such cleanups on a large scale yet, but I wouldn&amp;#8217;t be surprised if a few things started getting prettified in the coming months at a slow pace.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve already started using the new library in several of the Rosella utility programs such as those that create a winxed header file or a test suite from an existing packfile. In all cases the updated programs are both cleaner and have more functionality than the previous incarnations. Expect to see this library improve and grow in 2012 and beyond, and expect to see it work closely with PACT, once that project gets moving forward.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/H0rkW-i5TQc" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/02/01/rosella_reflect.html</feedburner:origLink></entry>
    
    <entry>
        <title>Rosella Test Updates and Upgrades</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/zuAtihOgeGE/rosella_test_cleanups.html" />
        <updated>2012-01-30T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2012/01/30/rosella_test_cleanups</id>
        <content type="html">&lt;p&gt;The first half of this month was dominated with some epic illnesses between my family members and myself, family functions and home maintenance. What little spare time I&amp;#8217;ve had otherwise has been devoted to writing code, as opposed to writing blog posts about writing code. The blog has suffered.&lt;/p&gt;

&lt;p&gt;The past couple days I&amp;#8217;ve been working on Rosella&amp;#8217;s Test library. It&amp;#8217;s an old but good library and is, as far as I am aware, the most full-featured and easy to use testing tool in the Parrot ecosystem. With some of these most recent changes the library is better still.&lt;/p&gt;

&lt;h3 id='matchers'&gt;Matchers&lt;/h3&gt;

&lt;p&gt;Kakapo had a series of Matcher routines and objects as part of it&amp;#8217;s testing facilities, and for a long time I&amp;#8217;ve been wanting to port some of those ideas over to Rosella. As of last week, I have a simple version of them. Matchers allow you to ask the question &amp;#8220;is this thing like that thing&amp;#8221;, with a custom set of rules. Let me give a basic example.&lt;/p&gt;

&lt;p&gt;Previously in Rosella if you were unit testing a method which returned an array and you wanted to check that the array contained the right values, you would have to do something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var result = obj.my_method();
var expected = [1, 2, 3, 4, 5];
self.assert.is_true(result != null);
self.assert.equal(elements(result), elements(expected));
for (int i = 0; i &amp;lt; elements(expected); i++) {
    self.assert.equal(result[i], expected[i]);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That&amp;#8217;s a lot of work, although you can cut it down a little bit if you know for certain that the array isn&amp;#8217;t null. With the new matcher functionality, you pass in two arrays and the Test library will match them for you:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var result = obj.my_method();
self.assert.is_match(result, [1, 2, 3, 4, 5]);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Internally the Test library maintains a list of matchers by name. When you pass in two objects, it loops over the list looking for a matcher that can handle the pair. In this case, one of the default matchers the library provides looks for objects which implement the &lt;code&gt;&amp;quot;array&amp;quot;&lt;/code&gt; role, and then does element-wise matching on them. Another similar matcher does the same for hash-like objects that implement the &lt;code&gt;&amp;quot;hash&amp;quot;&lt;/code&gt; role.&lt;/p&gt;

&lt;p&gt;Another matcher checks to see if one or both of the two objects are strings, and then does a string comparison on them (converting the other, if it isn&amp;#8217;t a string already) and the last of the default matchers is used to compare floating point values with a certain error tolerance.&lt;/p&gt;

&lt;p&gt;Since matchers are stored in a hash, you can access them by name, delete them, add your own, and replace existing ones if you want new matching semantics. This is especially useful in something like Parrot-Linear-Algebra, where I can say&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$!assert.is_match($matrix_a, $matrix_b);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&amp;#8230;and the library will automatically compare the dimensions of the matrices and the contents of them without needing nested loops and other distractions.&lt;/p&gt;

&lt;h3 id='nested_tap'&gt;Nested TAP&lt;/h3&gt;

&lt;p&gt;Another item I&amp;#8217;ve had on my wishlist for a while now has been nested TAP. I&amp;#8217;ve always wanted to support it, and in theory at least the system was designed modularly enough to generate it without too much hassle. Last weekend I put on the finishing touches and now am proud to say that Rosella.Test can run nested tests and generate nested TAP. At the moment the interface to use it is a little ugly (I&amp;#8217;m actively soliciting feedback!), but the capabilities are all there:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function my_test_method()
{
    self.status.suite().subtest(class MySubtestClass);
}

function my_vector_test_method()
{
    self.status.suite().subtest_vector(
        function(var a, var b) { ... },
        [1, 2, 3, 4, 5]
    );
}

function my_list_test_method()
{
    self.status.suite().subtest_list(
        function(var test) { ... },
        function(var test) { ... },
        function(var test) { ... }
    );
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here&amp;#8217;s an example of output from a similar test file in the Rosella suite:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;1..4
    1..2
    ok 1 - test_1A
    ok 2 - test_1B
    # You passed all 2 subtests
ok 1 - test_1
    1..3
    ok 1 - test 1
    ok 2 - test 2
    ok 3 - test 3
    # You passed all 3 subtests
ok 2 - test_2
    1..5
    ok 1 - test 1
    ok 2 - test 2
    ok 3 - test 3
    ok 4 - test 4
    ok 5 - test 5
    # You passed all 5 subtests
ok 3 - test_3
    1..1
    not ok 1 - test 1
    # failure
    # Called from &amp;#39;fail&amp;#39; (rosella/test.winxed : 481)
    # Called from &amp;#39;&amp;#39; (t/winxed_test/Nested.t : 40)
    # Called from &amp;#39;&amp;#39; (rosella/test.winxed : 1589)
    # Looks like you failed 1 of 1 subtests run
ok 4 - test_4
# You passed all 4 tests&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The fourth test expects a failure in the subtest, which is why it says it passes when there is clearly some failure diagnostics appearing. This brings me to my next point&amp;#8230;&lt;/p&gt;

&lt;h3 id='cleaner_diagnostics'&gt;Cleaner Diagnostics&lt;/h3&gt;

&lt;p&gt;Before when you ran a test and had a failure, you might see something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;not ok 2 - ooopsie_doopsies
# objects not equal &amp;#39;0&amp;#39; != &amp;#39;1&amp;#39;
# Called from &amp;#39;throw&amp;#39; (rosella/test.winxed : 851)
# Called from &amp;#39;internal_fail&amp;#39; (rosella/test.winxed : 1853)
# Called from &amp;#39;fail&amp;#39; (rosella/test.winxed : 481)
# Called from &amp;#39;equal&amp;#39; (rosella/test.winxed : 577)
# Called from &amp;#39;ooopsie_doopsies&amp;#39; (t/core/Error.t : 18)
# Called from &amp;#39;execute_test&amp;#39; (rosella/test.winxed : 1455)
# Called from &amp;#39;__run_test&amp;#39; (rosella/test.winxed : 1483)
# Called from &amp;#39;run&amp;#39; (rosella/test.winxed : 1392)
# Called from &amp;#39;test&amp;#39; (rosella/test.winxed : 1747)
# Called from &amp;#39;_block1000&amp;#39; (t/core/Error.t : 7)
# Called from &amp;#39;_block1177&amp;#39; ( : 158)
# Called from &amp;#39;eval&amp;#39; ( : 151)
# Called from &amp;#39;evalfiles&amp;#39; ( : 0)
# Called from &amp;#39;command_line&amp;#39; ( : 0)
# Called from &amp;#39;main&amp;#39; ( : 1)
# Called from &amp;#39;(entry)&amp;#39; ( : 0)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That&amp;#8217;s a huge mess, and it&amp;#8217;s a mess from two sides. At the top of the backtrace, you see all sorts of Rosella internal functions involved in the assertion and error handling. The bottom half of the backtrace is devoted to entry-way stuff. In this case there&amp;#8217;s NQP-related entry code and then the Rosella entry code. You, as the test writer, don&amp;#8217;t care about any of that. All you care about is the code you wrote and where its broken. If you have to dig through a huge backtrace to figure out where the error is, that&amp;#8217;s a big waste of time and effort.&lt;/p&gt;

&lt;p&gt;Now, Rosella filters that crap out for you. Here&amp;#8217;s the same exact failure with the new backtrace reporting:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;not ok 2 - ooopsie_doopsies
# objects not equal &amp;#39;0&amp;#39; != &amp;#39;1&amp;#39;
# Called from &amp;#39;equal&amp;#39; (rosella/test.winxed : 577)
# Called from &amp;#39;ooopsie_doopsies&amp;#39; (t/core/Error.t : 18)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here you see the important parts of the backtrace only: The parts you wrote and the one assertion that failed. You don&amp;#8217;t see the internal garbage, you don&amp;#8217;t see the entry-way garbage, because those things aren&amp;#8217;t of interest to the test writer.&lt;/p&gt;

&lt;h3 id='parrotlinearalgebra'&gt;Parrot-Linear-Algebra&lt;/h3&gt;

&lt;p&gt;Another small project I did a few days ago was getting the PLA test suite working again. It&amp;#8217;s a testament to how stable both BLAS and Parrot&amp;#8217;s extending interfaces are. Recent Rosella refactors removed some of the special-purpose features that existed only for PLA and for no other reason (and which were a pain in the butt to maintain). I fixed up the test suite and PLA builds and runs perfectly now.&lt;/p&gt;

&lt;p&gt;That&amp;#8217;s what I&amp;#8217;ve been up to this month. I&amp;#8217;m mostly done with my cleanups to the Test library now, barring a few more interface improvements I want to make. After that I&amp;#8217;ve got a few projects to tackle inside libparrot itself. I&amp;#8217;ll write more about those topics when I have something to say.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/zuAtihOgeGE" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/01/30/rosella_test_cleanups.html</feedburner:origLink></entry>
    

</feed>

