<?xml version="1.0" encoding="US-ASCII"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
    <title>Whiteknight's Blog</title>
    
    <link href="http://whiteknight.github.com/" />
    <updated>2013-01-26T17:55:11-08:00</updated>
    <id>http://whiteknight.github.com/</id>
    <author>
        <name>Andrew Whitworth (Whiteknight)</name>
        <email>wknight8111@gmail.com</email>
    </author>

    
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/afwknight" /><feedburner:info uri="afwknight" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
        <title>Entity Framework Code Only Migrations</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/Y1leWDSMKz0/efcodeonlymigrations.html" />
        <updated>2013-01-26T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2013/01/26/efcodeonlymigrations</id>
        <content type="html">&lt;p&gt;I&amp;#8217;ve been using Entity Framework 4.4 at work a lot recently, and as part of that I&amp;#8217;ve been running into some questions about how to do this or that with some of the new features, particularly code first. Sometimes I&amp;#8217;m able to find the answers I need from the Googles, but sometimes I&amp;#8217;ve got to sit down with VisualStudio and find the answers through good old-fashioned trial and error. Then, I figure what&amp;#8217;s the point of having a tech-related blog in the first place if you can&amp;#8217;t share the things you&amp;#8217;ve learned there. I&amp;#8217;ll be sharing bits of what I&amp;#8217;m learning as I go.&lt;/p&gt;

&lt;h2 id='codeonly_migrations'&gt;Code-Only Migrations&lt;/h2&gt;

&lt;p&gt;The new Entity Framework releases have a feature called code-first, where you can write pure csharp or VB code objects (&amp;#8220;Plain Old Code Objects&amp;#8221;, or POCO), and have the Entity engine automatically discern from those classes the shape of your DB tables and generate a change script to create them. Most tutorials on the topic explain the process through the use of the Package Manager Console in Visual Studio. I have slightly different requirements and so I&amp;#8217;m going to try to do the same exact process using the csharp APIs directly.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s a short but helpful blog post where I started my search:&lt;/p&gt;

&lt;p&gt;&lt;span&gt;http://romiller.com/2012/02/09/running-scripting-migrations-from-code/&lt;/span&gt;&lt;/p&gt;

&lt;h3 id='create_your_dbcontext_and_poco_classes'&gt;Create Your DbContext and POCO Classes&lt;/h3&gt;

&lt;p&gt;I won&amp;#8217;t go into detail about that here. There are plenty of cool resources for this purpose elsewhere. For the purposes of the rest of this post, I&amp;#8217;ll assume you&amp;#8217;ve got a &lt;code&gt;DbContext&lt;/code&gt; subclass called &amp;#8220;MyDbContext&amp;#8221;. Even though you may not like to have it, your DbContext subclass must provide a parameterless constructor to work with the Package Manager Console tools.&lt;/p&gt;

&lt;h3 id='create_a_configuration'&gt;Create a Configuration&lt;/h3&gt;

&lt;p&gt;A Migration Configuration is a class that derives from &lt;code&gt;System.Data.Entity.Migrations.DbMigrationsConfiguration&lt;/code&gt;. You can create one of these automatically through the Package Manager Console with the &lt;code&gt;Enable-Migrations&lt;/code&gt; command, or you can just create it in code yourself:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='csharp'&gt;&lt;span class='k'&gt;namespace&lt;/span&gt; &lt;span class='nn'&gt;MyProgram.Migrations&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;using&lt;/span&gt; &lt;span class='nn'&gt;System&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;using&lt;/span&gt; &lt;span class='nn'&gt;System.Data.Entity&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;using&lt;/span&gt; &lt;span class='nn'&gt;System.Data.Entity.Migrations&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;using&lt;/span&gt; &lt;span class='nn'&gt;System.Linq&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;using&lt;/span&gt; &lt;span class='nn'&gt;System.Reflection&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;using&lt;/span&gt; &lt;span class='nn'&gt;MyProgram&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;internal&lt;/span&gt; &lt;span class='k'&gt;sealed&lt;/span&gt; &lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;MyConfiguration&lt;/span&gt; &lt;span class='p'&gt;:&lt;/span&gt; &lt;span class='n'&gt;DbMigrationsConfiguration&lt;/span&gt;&lt;span class='p'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;MyProgram&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;MyDbContext&lt;/span&gt;&lt;span class='p'&gt;&amp;gt;&lt;/span&gt;
    &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='nf'&gt;Configuration&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
        &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='n'&gt;AutomaticMigrationsEnabled&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;false&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

            &lt;span class='c1'&gt;// These things are not strictly necessary, but are helpful when the assembly where&lt;/span&gt;
            &lt;span class='c1'&gt;// the migrations stuff lives is different from the assembly where the DbContext&lt;/span&gt;
            &lt;span class='c1'&gt;// lives. For instance, you may not want to run migrations from a separate&lt;/span&gt;
            &lt;span class='c1'&gt;// development-time console program, and not have that code included in production&lt;/span&gt;
            &lt;span class='c1'&gt;// assemblies.&lt;/span&gt;
            &lt;span class='n'&gt;MigrationsAssembly&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='n'&gt;Assembly&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;GetExecutingAssembly&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
            &lt;span class='n'&gt;MigrationsNamespace&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;MyProgram.Migrations&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

        &lt;span class='p'&gt;}&lt;/span&gt;

        &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='k'&gt;override&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;Seed&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;MyProgram&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;MyDbContext&lt;/span&gt; &lt;span class='n'&gt;context&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
        &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='c1'&gt;// TODO: Initialize seed data here&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id='create_a_migration'&gt;Create a Migration&lt;/h3&gt;

&lt;p&gt;Next step is to create a migration. A migration is any class which derives from &lt;code&gt;System.Data.Entity.Migrations.DbMigration&lt;/code&gt;. You can create one of these manually, but it&amp;#8217;s much easier to create them through the Package Manager Console with the &lt;code&gt;Add-Migration&lt;/code&gt; command.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Add-Migration MyMigration&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, if you need some more options (if your solution has multiple projects, etc):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Add-Migration -Name MyMigration -ProjectName MyProject -ConfigurationTypeName MyProject.Migrations.MyConfiguration&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You may also need to specify &lt;code&gt;-StartupProjectName&lt;/code&gt;, if your migrations live in a library assembly.&lt;/p&gt;

&lt;p&gt;Als, you can specify a separate connection string from what is provided by the default parameterless constructor of your DbContext by specifying &lt;code&gt;-ConnectionStringName&lt;/code&gt; (for a named connection string in your app.config/web.config file) or &lt;code&gt;-ConnectionString&lt;/code&gt; and &lt;code&gt;-ConnectionProviderName&lt;/code&gt; to use a value which is not in your app.config/web.config file.&lt;/p&gt;

&lt;p&gt;What do all these options mean? Let&amp;#8217;s consider a solution with two projects:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;MyProgram.sln
    - MyProgram      (a .exe which references MyProgram.Core.dll)
    - MyProgram.Core (a .dll Class Library)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The project &lt;code&gt;MyProgram.Core.dll&lt;/code&gt; contains our &lt;code&gt;DbContext&lt;/code&gt; instance and the &lt;code&gt;MyProgram&lt;/code&gt; assembly has the app.config with connection string information.&lt;/p&gt;

&lt;p&gt;If we want our migrations to live in &lt;code&gt;MyProgram.Core&lt;/code&gt; we can use this command as our base (plus any other options we need to add):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Add-Migration MyMigration -ProjectName MyProgram.Core -StartupProjectName MyProgram ...&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If, on the other hand, we want the migrations to live in &lt;code&gt;MyProgram&lt;/code&gt;, the .exe instead of the .dll, we can use this version:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Add-Migration MyMigration -ProjectName MyProgram -StartupProjectName MyProject ...&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you do not specify &lt;code&gt;-ProjectName&lt;/code&gt; or &lt;code&gt;-StartupProjectName&lt;/code&gt;, the &lt;code&gt;Add-Migration&lt;/code&gt; command will attempt to use whichever project you have flagged as the &amp;#8220;default startup project&amp;#8221; in the solution explorer (whichever project runs when you press F5).&lt;/p&gt;

&lt;p&gt;What if I want to separate my migrations out into a different assembly entirely, one which isn&amp;#8217;t included in my production deployment? Here&amp;#8217;s another example solution:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;MyProgram.sln
    - MyProgram                 (the production deployed .exe)
    - MyProgram.Core            (where our DbContext and model classes live)
    - MyProgram.DbMigration     (where our migration code will live, references MyProgram.Core.dll)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this case, we can use a command like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Add-Migration MyMigration -ProjectName MyProgram.DbMigration -StartupProjectName MyProgram ...&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You&amp;#8217;re going to have to play with some of the options for different configurations. If the &lt;code&gt;Add-Migrations&lt;/code&gt; command says something&amp;#8217;s wrong, try tweaking your values and adding more info to the commandline.&lt;/p&gt;

&lt;h3 id='run_the_migrations_some_recipes'&gt;Run the Migrations (Some Recipes)&lt;/h3&gt;

&lt;p&gt;Now that you&amp;#8217;ve got migrations and a configuration, you can run the migrations manually. Here are some snippets from a console program which does exactly this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='csharp'&gt;&lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;DoDbUpdate&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;DbMigrator&lt;/span&gt; &lt;span class='n'&gt;migrator&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;DbMigrator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MyConfiguration&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
    &lt;span class='n'&gt;migrator&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Update&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;#8217;s take a minute to step back and ask how this all works. You build your assembly and run it. The &lt;code&gt;DbMigrator&lt;/code&gt; class uses reflection to read out all classes from your assembly, and find the ones which are subclasses of &lt;code&gt;DbMigration&lt;/code&gt;. Each DB migration has a name, which is a combination of a timestamp and the name you gave it in the &lt;code&gt;Add-Migration&lt;/code&gt; command. In the database, there&amp;#8217;s a table (or will be, after you run your first migration) called &lt;code&gt;dbo.__MigrationHistory&lt;/code&gt; (it may be under the &amp;#8220;System Tables&amp;#8221; folder). That table holds information about migrations you have already ran. When you call &lt;code&gt;DbMigrator.Update()&lt;/code&gt;, it searches for all migrations, removes the ones which already have entries in the table, and orders them according to timestamp. This is the list of pending migrations. You can get that list like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='csharp'&gt;&lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;DoDbUpdate&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;DbMigrator&lt;/span&gt; &lt;span class='n'&gt;migrator&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;DbMigrator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MyConfiguration&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
    &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;string&lt;/span&gt; &lt;span class='n'&gt;migration&lt;/span&gt; &lt;span class='k'&gt;in&lt;/span&gt; &lt;span class='n'&gt;migrator&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;GetPendingMigrations&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
        &lt;span class='n'&gt;Console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;WriteLine&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;migration&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;migrator&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Update&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can also get the raw SQL script which is going to be used:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='csharp'&gt;&lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;GetDbUpdateScript&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;DbMigrator&lt;/span&gt; &lt;span class='n'&gt;migrator&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;DbMigrator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MyConfiguration&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
    &lt;span class='n'&gt;MigratorScriptingDecorator&lt;/span&gt; &lt;span class='n'&gt;scripter&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MigratorScriptingDecorator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;migrator&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='kt'&gt;string&lt;/span&gt; &lt;span class='n'&gt;script&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='n'&gt;scripter&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;ScriptUpdate&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;Console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;WriteLine&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;script&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running the scripting decorator clears out the list of pending migrations from the migrator. If you want to generate the script first (for logging) and then run the migration, you need to create two migrators:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='csharp'&gt;&lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;GetDbUpdateScriptAndUpdate&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;MyConfiguration&lt;/span&gt; &lt;span class='n'&gt;myConfig&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MyConfiguration&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='n'&gt;DbMigrator&lt;/span&gt; &lt;span class='n'&gt;migrator&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;DbMigrator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;myConfig&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;MigratorScriptingDecorator&lt;/span&gt; &lt;span class='n'&gt;scripter&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MigratorScriptingDecorator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;migrator&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='kt'&gt;string&lt;/span&gt; &lt;span class='n'&gt;script&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='n'&gt;scripter&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;ScriptUpdate&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;Console&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;WriteLine&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;script&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='n'&gt;migrator&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;DbMigrator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;myConfig&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;migrator&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Update&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Another thing we could try is to create a logging object, and use a logging decorator to log progress. This mechanism will also output the raw SQL text, but will do so piecewise intermixed with other information (so you&amp;#8217;ll need to filter out what is and what is not part of the SQL script):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='csharp'&gt;&lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;MyLogger&lt;/span&gt; &lt;span class='p'&gt;:&lt;/span&gt; &lt;span class='n'&gt;System&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Data&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Entity&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Migrations&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Infrastructure&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;MigrationsLogger&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;override&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;Info&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;string&lt;/span&gt; &lt;span class='n'&gt;message&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='c1'&gt;// Short status messages come here&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;override&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;Verbose&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;string&lt;/span&gt; &lt;span class='n'&gt;message&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='c1'&gt;// The SQL text and other info comes here&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;override&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;Warning&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;string&lt;/span&gt; &lt;span class='n'&gt;message&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
    &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='c1'&gt;// Warnings and other bad messages come here&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once we have a logger, we can use it in our migration:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='csharp'&gt;&lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;DoDbUpdateWithLogging&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;DbMigrator&lt;/span&gt; &lt;span class='n'&gt;migrator&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;DbMigrator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MyConfiguration&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
    &lt;span class='n'&gt;MigratorLoggingDecorator&lt;/span&gt; &lt;span class='n'&gt;logger&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MigratorLoggingDecorator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;migrator&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MyLogger&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
    &lt;span class='n'&gt;logger&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Update&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can update to a specific migration, or we can rollback to a specific migration by name. Remember, the &amp;#8220;name&amp;#8221; used by the migrator is a combination of the timestamp and the name you gave it at the console.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='csharp'&gt;&lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;UpdateOrRollbackTo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;string&lt;/span&gt; &lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;DbMigrator&lt;/span&gt; &lt;span class='n'&gt;migrator&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;DbMigrator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MyConfiguration&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
    &lt;span class='n'&gt;migrator&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Update&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And what if you want to completely trash the DB, undo all migrations, delete everything, and start over?&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='csharp'&gt;&lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='k'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;CompletelyTrashDb&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;DbMigrator&lt;/span&gt; &lt;span class='n'&gt;migrator&lt;/span&gt; &lt;span class='p'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;DbMigrator&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='n'&gt;MyConfiguration&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
    &lt;span class='n'&gt;migrator&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Update&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;0&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id='whats_my_use_case'&gt;What&amp;#8217;s My Use Case?&lt;/h2&gt;

&lt;p&gt;So what exactly is my use-case here? Why don&amp;#8217;t I just stick with the Package Manager Console like many other tutorials do? I have a few criteria:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I need to seed the new DB with a lot of complex data, pulled from another source, which needs to be updated regularly.&lt;/li&gt;

&lt;li&gt;I may have more than one DB, for multiple instances of my application. Connection strings for all of these may be kept in another DB or a file or somewhere else. All of these need to be kept in sync, and a script that runs a migration on all targets is better than a command which runs on only one and needs to be manually updated.&lt;/li&gt;

&lt;li&gt;I&amp;#8217;d like the ability to log the SQL scripts which are used for the migration, for various purposes.&lt;/li&gt;

&lt;li&gt;I&amp;#8217;d like to be able to do some scripted unit testing where we create and migrate a test DB from scratch, seed it with test data, and use that for testing. I would like these temporary test DBs to be identical to the production ones.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Overall I think the new Entity Framework Code-First features are really cool, and remind me very closely of the equivalent db migrations scripts in Rails, but we have a little bit more control over it here because we can incorporate the DbMigration process into our application logic.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/Y1leWDSMKz0" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2013/01/26/efcodeonlymigrations.html</feedburner:origLink></entry>
    
    <entry>
        <title>Working On a New Project</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/lcrzSOF3IUo/new_mono_project.html" />
        <updated>2012-12-09T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2012/12/09/new_mono_project</id>
        <content type="html">&lt;p&gt;As they say on my son&amp;#8217;s favorite Thomas The Tank Engine, yesterday an idea flew into my funnel. I was &lt;a href='http://accidentallycooking.wordpress.com/2012/12/08/shower-problems/'&gt;doing some work on my bathroom&lt;/a&gt; when I got an idea for a new website that I would like to make. I want to make this site for a few reasons: First, I&amp;#8217;m going to be using the new ASP.NET MVC framework at &lt;a href='/2012/11/20/new_job.html'&gt;my new job&lt;/a&gt; eventually, and I wanted to practice with it. Second, I have been getting motivated to do more programming in general, and a new project that I&amp;#8217;m hot on seems like just the thing to get me moving again. Third, my idea for this site is relatively straight-forward but should offer some good practice and interesting technical challenges. Fourth and finally, this website I&amp;#8217;m thinking about is actually something that I would like to use myself (even if nobody else joins me).&lt;/p&gt;

&lt;p&gt;I still don&amp;#8217;t have a &lt;a href='/2012/09/14/sept_status_update.html'&gt;new laptop&lt;/a&gt; yet, though I&amp;#8217;ve been shopping for one in earnest. I figure I&amp;#8217;ll have it shortly after the holidays. In either case, for the time being I&amp;#8217;m stuck with my current crappy laptop, which exclusively runs (the decidedly not-crappy) Linux. If I&amp;#8217;m going to make a new ASP.NET MVC website on this box, it&amp;#8217;s going to have to use &lt;a href='http://www.mono-project.com/Main_Page'&gt;Mono&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And that&amp;#8217;s fine. &lt;a href='/2010/11/03/blogger2jekyll.html'&gt;I&amp;#8217;ve used Mono before&lt;/a&gt; and I like it plenty despite it&amp;#8217;s shortcomings.&lt;/p&gt;

&lt;p&gt;I scratched out a few ideas and designs in my notebook. I decided I wanted to use some kind of dependency injection/inversion of control/service locator feature. I&amp;#8217;ve used &lt;a href='http://unity.codeplex.com/'&gt;Unity&lt;/a&gt; in the past and loved it, but I wouldn&amp;#8217;t be against using Ninject or something else too. I also want to use some kind of ORM to make persistance a little easier.&lt;/p&gt;

&lt;p&gt;Now, I can already hear some people mumbling to themselves about all the many flaws of ORMs. I won&amp;#8217;t even bother to list them or link to the (many) pages where they are discussed on the interwebs. Use your imagination. In any case, I&amp;#8217;m not detered and ORMs actually make good sense for the project I&amp;#8217;m thinking about, if I can find the right one.&lt;/p&gt;

&lt;p&gt;I thought about using MongoDB, but after thinking hard about work flows and data relationships in my site, I think a regular, SQL-based relational DB would just be a better fit in this instance. I&amp;#8217;d probably like to stick with MySQL or MS SQL Server, initially. (This is not to say anything about the relative merits of one type of DB over another, just that one type seems to be a more natural fit for this particular problem domain and I&amp;#8217;d like not to be shoehorning in the wrong software for the wrong reasons. Don&amp;#8217;t get my involved in your holy war.)&lt;/p&gt;

&lt;p&gt;The problem, I discovered, is finding a good ORM that&amp;#8217;s worth using, doesn&amp;#8217;t introduce more hassle than it saves, and actually works (with examples) on Mono. So far, my search is proving to be a little bit fruitless.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href='http://nhforge.org/'&gt;NHibernate&lt;/a&gt;&lt;/strong&gt; seems like a common and popular choice, but the large amounts of required XML configuration make me sick to my stomach. I would far prefer something that I can do in pure C# code without large amounts of externa config.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;a href='http://www.castleproject.org/projects/activerecord/'&gt;Castle ActiveRecord&lt;/a&gt;&lt;/strong&gt; builds an ActiveRecord-like interface on top of NHibernate. In theory you get all the power of NHibernate without the XML headaches. However, this package is listed on the castle website as being &amp;#8220;Archived&amp;#8221; and &amp;#8220;no longer being worked on&amp;#8221;. Also, I can&amp;#8217;t find any real examples of using it on Mono. I&amp;#8217;m not going to start a new project (which presumably could be active for years) by starting on an old and unmaintained foundation.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;a href='http://www.db4o.com/'&gt;db4o&lt;/a&gt;&lt;/strong&gt; It looks to me like this little project uses it&amp;#8217;s own custom DB file format and doesn&amp;#8217;t connect to existing databases. I think I&amp;#8217;d really like to stick with an existing DB, and not use something custom.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;a href='http://msdn.microsoft.com/en-us/library/bb425822.aspx'&gt;Linq-To-SQL&lt;/a&gt;&lt;/strong&gt;, probably using the SQLMetal code generator, would seem like a decent option except it doesn&amp;#8217;t seem to be well-supported on Mono, and there&amp;#8217;s the issue of having to generate a whole bunch of pure-data objects, which will need to be laboriously mapped to and from my actual object type definitions. I&amp;#8217;ve seen the kinds of morass that this kind of situation can lead to, and I&amp;#8217;m not interested in going this route if it is even possible to traverse.&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;a href='https://github.com/markrendle/Simple.Data'&gt;Simple.Data&lt;/a&gt;&lt;/strong&gt; Is a newer option which uses all sorts of fancy modern C# features to provide an extremely flexible, extremely natural-looking interface for accessing a database. It&amp;#8217;s supposed to work on Mono, but I&amp;#8217;ve not figured out a good way to get it (and a MySQL connector, and prerequisites) installed in a reasonable way for Mono. The docs suggest NuGet, but I can&amp;#8217;t get NuGet working on my box (and the &lt;a href='http://nuget.codeplex.com/workitem/1271'&gt;NuGet devs don&amp;#8217;t seem to care about Mono too much&lt;/a&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Overall, the experience of trying to get this project working on Mono has been frustrating. I understand what a big engineering task Mono is in general, and how much work goes into getting a diverse ecosystem of software to work together nicely on a VM that&amp;#8217;s supposed to be cross-platform with low barriers to entry. I get all that. Although, for the purposes of this project I really wanted to start writing some code sooner than later and not have to fight with so much infrastructural stuff. I suppose I have a few options: I can wait till I get a new laptop and do things on a windows partition or VM instead. Or, I can keep fighting with this setup to try and get things to work. Finally, I guess I could port over my idea to Ruby on Rails, another platfrom that I&amp;#8217;m interested in learning more about.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/lcrzSOF3IUo" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/12/09/new_mono_project.html</feedburner:origLink></entry>
    
    <entry>
        <title>More IO Work?</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/NYrAiA0Ymug/more_io_work.html" />
        <updated>2012-11-21T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2012/11/21/more_io_work</id>
        <content type="html">&lt;p&gt;I might not be too bright. Either that or I might not have a great memory, or maybe I&amp;#8217;m just a glutton for punishment. Remember the big IO system rewrite I completed only a few weeks ago? Remember how much of a huge hassle that turned into and how burnt-out I got because of it? Apparently I don&amp;#8217;t because I&amp;#8217;m back at it again.&lt;/p&gt;

&lt;p&gt;Parrot hacker brrt came to me with a problem: After the io_cleanup merge he noticed that his mod_parrot project doesn&amp;#8217;t build and pass tests anymore. This was sort of expected, he was relying on lots of specialized IO functionality and I broke a lot of specialized IO functionality. Mea culpa. I had a few potential fixes in mind, so I tossed around a few ideas with brrt, put together a few small branches and think I&amp;#8217;ve got the solution.&lt;/p&gt;

&lt;p&gt;The problem, in a nutshell is this: In mod_parrot brrt was using a custom Winxed object as an IO handle. By hijacking the standard input and output handles he could convert requests on those handles into NCI calls to Apache and all would just work as expected. However with the IO system rewrite, IO API calls no longer redirect to method calls. Instead, they are dispatched to new IO VTABLE function calls which handle the logic for individual types.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First question&lt;/strong&gt;: How do we recreate brrt&amp;#8217;s custom functionality, by allowing custom bytecode-level methods to implement core IO functionality for custom user types?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My Answer&lt;/strong&gt;: We add a new IO VTABLE, for &amp;#8220;User&amp;#8221; objects, which can redirect low-level requests to PMC method calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second Question&lt;/strong&gt;: Okay, so how do we associate thisnew User IO VTABLE with custom objects? Currently the &lt;code&gt;get_pointer_keyed_int&lt;/code&gt; VTABLE is used to get access to the handle&amp;#8217;s &lt;code&gt;IO_VTABLE*&lt;/code&gt; structure, but bytecode-level objects cannot use &lt;code&gt;get_pointer_keyed_int&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My Answer&lt;/strong&gt;: For most IO-related PMC types, the kind of &lt;code&gt;IO_VTABLE*&lt;/code&gt; to use is staticly associated with that type. Socket PMCs always use the Socket IO VTABLE. StringHandle PMCs always use the StringHandle IO VTABLE, etc. So, we can use a simple map to associate PMC types with specific IO VTABLEs. Any PMC type not in this map can default to the User IO VTABLE, making everything &amp;#8220;just work&amp;#8221;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third Question&lt;/strong&gt;: Hold your horses, what do you mean &amp;#8220;most&amp;#8221; IO-related PMC types have a static IO VTABLE? Which ones don&amp;#8217;t and how do we fix it?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My Answer&lt;/strong&gt;: The big problem is the FileHandle PMC. Due to some legacy issues the FileHandle PMC has two modes of operation: normal File IO and Pipe IO. I guess these two ideas were conflated together long ago because internally the details are kind of similar: Both files and pipes use file descriptors at the OS level, and many of the library calls to use them are the same, so it makes sense not to duplicate a lot of code. However, there are some nonsensical issues that arise because Pipes and files are not the same: Files don&amp;#8217;t have a notion of a &amp;#8220;process ID&amp;#8221; or an &amp;#8220;exit status&amp;#8221;. Pipes don&amp;#8217;t have a notion of a &amp;#8220;file position&amp;#8221; and cannot do methods like &lt;code&gt;seek&lt;/code&gt; or &lt;code&gt;tell&lt;/code&gt;. Parrot uses the &lt;code&gt;&amp;quot;p&amp;quot;&lt;/code&gt; mode specifier to tell a FileHandle to be in Pipe mode, which causes the IO system to select a between either the File or the Pipe IO VTABLE for each call. Instead of this terrible system, I suggest we separate out this logic into two PMC types: FileHandle (which, as it&amp;#8217;s name suggests, operates on Files) and Pipe. By breaking up this one type into two, we can statically map individual IO VTABLEs to individual PMC types, and the system just works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fourth Question&lt;/strong&gt;: Once we have these maps in place, how do we do IO with user-defined objects?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My Answer&lt;/strong&gt;: The User IO VTABLE will redirect low-level IO requests into method calls on these PMCs. I&amp;#8217;ll break &lt;code&gt;IO_BUFFER*&lt;/code&gt; pointers out into a new PMC type of their own (IOBuffer) and users will be able to access and manipulate these things from any level. We&amp;#8217;ll attach buffers to arbitrary PMCs using named properties, which means we can attach buffers to &lt;em&gt;any PMC&lt;/em&gt; that needs them.&lt;/p&gt;

&lt;p&gt;So that&amp;#8217;s my chain of thought on how to solve this problem. I&amp;#8217;ve put together three branches to start working on this issue, but I don&amp;#8217;t want to get too involved in this code until I get some buy-in from other developers. The FileHandle/Pipe change is going to break some existing code, so I want to make sure we&amp;#8217;re cool with this idea before we make breaking changes and need to patch things like NQP and Rakudo. Here are the three branches I&amp;#8217;ve started for this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;whiteknight/pipe_pmc&lt;/code&gt;: This branch creates the new Pipe PMC type, separate from FileHandle. This is the breaking change that we need to make up front.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;whiteknight/io_vtable_lookup&lt;/code&gt;: This branch adds the new IOBuffer PMC type, implements the new IO VTABLE map, and implements the new properties-based logic for attaching buffers to PMCs.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;whiteknight/io_userhandle&lt;/code&gt;: This branch implements the new User IO VTABLE, which redirects IO requests to methods on PMC objects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Like I said, these are all very rough drafts so far. All these three branches build, but they don&amp;#8217;t necessarily pass all tests or look very pretty. If people like what I&amp;#8217;m doing and agree it&amp;#8217;s a good direction to go in, I&amp;#8217;ll continue work in earnest and see where it takes us.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/NYrAiA0Ymug" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/11/21/more_io_work.html</feedburner:origLink></entry>
    
    <entry>
        <title>New Job</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/G-aMyxWvIDA/new_job.html" />
        <updated>2012-11-20T00:00:00-08:00</updated>
        <id>http://whiteknight.github.com/2012/11/20/new_job</id>
        <content type="html">&lt;p&gt;One thing that&amp;#8217;s been eating up my time (and energy, and attention) lately has been a hunt for a new job. I had intended to look around passively for a while because I wasn&amp;#8217;t in any big hurry. However, once the recruiters got word that I was looking, things started to move much more quickly. Suddenly I&amp;#8217;m getting dozens of phone calls every day, and dozens of emails. I was doing phone screens and going on interviews. All of this and I was trying to not impact my current job so much.&lt;/p&gt;

&lt;p&gt;First thing&amp;#8217;s first: I&amp;#8217;m not leaving &lt;a href='http://www.weblinc.com/'&gt;WebLinc&lt;/a&gt; because I&amp;#8217;m unhappy with it. Also, I don&amp;#8217;t think they&amp;#8217;re unhappy with me. WebLinc is a &lt;em&gt;great&lt;/em&gt; place to work, and I&amp;#8217;m thankful for the time I&amp;#8217;ve spent there. If you&amp;#8217;re in the Philadelphia area and you know ColdFusion, Ruby on Rails, and/or have solid web fundamentals (JS/CSS/HTML) or graphic design experience (or sales, or project management, etc) &lt;a href='http://www.weblinc.com/about/careers/'&gt;you should consider applying&lt;/a&gt;. It&amp;#8217;s a very hip young organization with great talent, a rapidly growing and diverse clientele, and some real opportunity to do cool things. Also, there&amp;#8217;s a cool bar/restaurant on premises, and the company has a good (and growing) relationship with open-source and the developer community at large. If you&amp;#8217;re young and talented and care about the craft of web development, definitely consider WebLinc in your job search. You won&amp;#8217;t be disappointed.&lt;/p&gt;

&lt;p&gt;So why am I leaving? WebLinc historically has had two main platforms: ColdFusion and ASP.NET. Between the two, the ColdFusion team has had some of the biggest project successes and the more demonstrated ability to scale up the size of it&amp;#8217;s team. When you&amp;#8217;re a company that&amp;#8217;s growing as fast as WebLinc, the ability to scale up your team quickly, to meet deadlines and to keep to budgets are all very important. The ColdFusion team was doing these things better than .NET (for a variety of reasons, not the least of which were endemic to the platform itself). This lead to more sales for the ColdFusion team, and a larger, steadier stream of work. At some point the decision was made to devote resources going forward into ColdFusion (and a small, but growing, Ruby on Rails team) and not devote new resources to .NET. This has nothing to do with the relative theoretical merits of ASP.NET vs ColdFusion and, in my opinion, has nothing to do with the quality of developers they had working on those platforms. The reasons why one team was doing better than the other team aren&amp;#8217;t really worth exploring at this point, but from a business perspective it was clear where effort and resources needed to be devoted going forward.&lt;/p&gt;

&lt;p&gt;I started looking around for a variety of reasons. I could have stayed for a while in my current position, riding the waves of boom and bust that are inherent in any job that bills hourly for maintenance. I could have started transitioning over towards the Ruby on Rails team but chose not to go in that direction, yet. Instead of sitting around and hoping things went well, I wanted to take a little bit more control of my situation. I started looking passively at first, but once the recruiters got involved things started moving quickly and the rest is history. I&amp;#8217;ve written a few notes up about my job hunt and my dealings with various recruiters. These notes may turn into additional blog posts (now that I have time and energy to try blogging more).&lt;/p&gt;

&lt;p&gt;In early December I&amp;#8217;ll be starting at &lt;a href='http://www.halfpenny.com/'&gt;Halfpenny Technologies&lt;/a&gt;, a small but growing company involved with electronic medical records and related areas. The team is small, the company is growing rapidly, and they have some very real and very interesting technical challenges bubbling to the forefront. At the interview they were throwing around words like &amp;#8220;ownership&amp;#8221; and &amp;#8220;leadership&amp;#8221;, and were talking about some very interesting new technologies. Combine that with a few other factors, and the decision was actually an easy one for me to make.&lt;/p&gt;

&lt;p&gt;I have a good idea about what kinds of work I&amp;#8217;m going to be doing there but I don&amp;#8217;t want to talk about it quite yet. In reality, you don&amp;#8217;t know anything until you start working and get knee-deep in code. I also don&amp;#8217;t know what their policies and attitudes are on blogging and public commentary, but I&amp;#8217;ll say what I can when I&amp;#8217;m confident enough to say it.&lt;/p&gt;

&lt;p&gt;So starts a new chapter in my career, one that I&amp;#8217;m hoping lasts quite a while and takes me in some cool new directions. Again, I&amp;#8217;ll post more when I have more to say.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/G-aMyxWvIDA" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/11/20/new_job.html</feedburner:origLink></entry>
    
    <entry>
        <title>September Status</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/KCw7el5kDMM/sept_status_update.html" />
        <updated>2012-09-14T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/09/14/sept_status_update</id>
        <content type="html">&lt;p&gt;First, some personal status:&lt;/p&gt;

&lt;h3 id='personal_status'&gt;Personal Status&lt;/h3&gt;

&lt;p&gt;I haven&amp;#8217;t blogged in a little while, and there&amp;#8217;s a few reasons for that. I&amp;#8217;ll list them quickly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Work has been&amp;#8230;tedious lately and when I come home I find that I want to spend much less time looking at a computer, especially any computer that brings more stress into my life. Also,&lt;/li&gt;

&lt;li&gt;My computer at home generates a huge amount of stress. In addition to several physical problems with it, and the fact that I effectively do not have a working mouse (the built-in trackpad is extremely faulty, and the external USB mouse I had been using is now broken and the computer won&amp;#8217;t even book if it&amp;#8217;s plugged into the port), I&amp;#8217;ve been having some software problems with lightdm and xserver crashing and needing to be restarted much more frequently than I think should be needed. We are planning to buy me a new one, but the budget won&amp;#8217;t allow that until closer to xmas.&lt;/li&gt;

&lt;li&gt;The &lt;code&gt;io_cleanup1&lt;/code&gt; work took much longer than I had anticipated. I wrote a lot more posts about that branch than I ever published, and the ones I did publish were extremely repetitive (&amp;#8220;It&amp;#8217;s almost finished, any day now!&amp;#8221;). Posting less means I got out of the habit of posting, which is a hard habit to be in and does require some effort.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I&amp;#8217;m going to do what I can to post something of a general Parrot update here, and hopefully I can get back in the habit of posting a little bit more regularly again.&lt;/p&gt;

&lt;h3 id='_status'&gt;&lt;code&gt;io_cleanup1&lt;/code&gt; Status&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;io_cleanup1&lt;/code&gt; did indeed merge with almost no problems reported at all. I&amp;#8217;m very happy about that work, and am looking forward to pushing the IO subsystem to the next level. Before I started &lt;code&gt;io_cleanup1&lt;/code&gt;, I had some plans in mind for new features and capabilities I wanted to add to the VM. However, I quickly realized that the house had some structural problems to deal with before I could slap a new coat of paint on the walls. The structure is, I now believe, much better. I&amp;#8217;ve still got that paint in the closet and eventually I&amp;#8217;m going to throw it on the walls.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;io_cleanup&lt;/code&gt; branch did take a lot of time and energy, much more than I initially expected. But, it&amp;#8217;s over now and I&amp;#8217;m happy with the results so now I can start looking on to the next project on my list.&lt;/p&gt;

&lt;h3 id='threads_status'&gt;Threads Status&lt;/h3&gt;

&lt;p&gt;Threads is very very close to being mergable. I&amp;#8217;ve said that before and I&amp;#8217;m sure I&amp;#8217;ll have occasion to say it again. However there&amp;#8217;s one remaining problem pointed out by tadzik, and if my diagnosis is correct it&amp;#8217;s a doozie.&lt;/p&gt;

&lt;p&gt;The basic threads system, which I outlined in a series of blog posts ages ago goes like this: We cut out the need to have (most) locks, and therefore we cut out many possibilities of deadlock, by making objects writable only from the thread that owns them. Other threads can have nearly unfettered read access, but writes require sending a message to the owner thread to perform the update in a synchronized, orderly manner. By limiting cross-thread writes, we cut out many expensive mechanisms that would need to be used for writing data, like Software Transactional Memory (STM) and locks (and, therefore, associated deadlocks). It&amp;#8217;s a system inspired closely by things like Erlang and some functional languages, although I&amp;#8217;m not sure there&amp;#8217;s any real prior art for the specifics of it. Maybe that&amp;#8217;s because other people know it won&amp;#8217;t work right. The only thing we can do is see how it works.&lt;/p&gt;

&lt;p&gt;The way nine implemented this system is to setup a Proxy type which intercepts and dispatches read/write requests as appropriate. When we pass a PMC from one thread to another, we instead create and pass a Proxy to it. Every read on the proxy redirects immediately to a read on the original target PMC. Every write causes a task to dispatch to the owner thread of the target PMC with update logic.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s some example code, adapted from the example tadzik had, which fails on the threads branch:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function main[main](var args) {
    var x = 1;
    var t = new &amp;#39;Task&amp;#39;(function() { x++; say(x); });
    ${ schedule t };
    ${ wait t };
    say(&amp;quot;Done!&amp;quot;);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Running this code on the threads branch creates anything from an assertion failure to a segfault. Why?&lt;/p&gt;

&lt;p&gt;This example creates a closure and schedules that closure as a task. The task scheduler assigns that task to the next open thread in the pool. Since it&amp;#8217;s dispatching the Task on a new thread, all the data is proxied. Instead of passing a reference to Integer PMC &lt;code&gt;x&lt;/code&gt;, we&amp;#8217;re passing a &lt;code&gt;Proxy&lt;/code&gt; PMC, which points to &lt;code&gt;x&lt;/code&gt;. This part works as expected.&lt;/p&gt;

&lt;p&gt;When we invoke a closure, we update the context to point to the &amp;#8220;outer&amp;#8221; context, so that lexical variables (&amp;#8221;x&amp;#8221;, in this case) can be looked up correctly. However, instead of having an outer which is a &lt;code&gt;CallContext&lt;/code&gt; PMC, we have a &lt;code&gt;Proxy&lt;/code&gt; to a &lt;code&gt;CallContext&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;An overarching problem with &lt;code&gt;CallContext&lt;/code&gt; is that they get used, a lot. Every single register access, and almost all opcodes access at least one register, goes through the CallContext. Lexical information is looked up through the CallContext. Backtrace information is looked up in the CallContext. A few other things are looked up there as well. In short, CallContexts are accessed quite a lot.&lt;/p&gt;

&lt;p&gt;Because they are accessed so much, CallContexts ARE NOT dealt with through the normal VTABLE mechanism. Adding in an indirect function call for every single register access would be a huge performance burden. So, instead of doing that, we poke into the data directly and use the raw data pointers to get (and to cache) the things we need.&lt;/p&gt;

&lt;p&gt;And there&amp;#8217;s the rub. For performance we need to be able to poke into a CallContext directly, but for threads we need to pass a Proxy instead of a CallContext. And the pointers for Proxy are not the same as the pointers for CallContext. See the problem?&lt;/p&gt;

&lt;p&gt;I identified this issue earlier in the week and have been thinking it over for a few days. I&amp;#8217;m not sure I&amp;#8217;ve found a workable solution yet. At least, I haven&amp;#8217;t found a solution that wouldn&amp;#8217;t impose some limitations on semantics.&lt;/p&gt;

&lt;p&gt;For instance, in the code example above, the implicit expectation is that the x variable lives on the main thread, but is updated on the second thread. And those updates should be reflected back on main after the &lt;code&gt;wait&lt;/code&gt; opcode.&lt;/p&gt;

&lt;p&gt;The solution I think I have is to create a new dummy CallContext that would pass requests off to the Proxied LexPad. I&amp;#8217;m not sure about some of the individual details, but overall I think this solution should solve our biggest problem. I&amp;#8217;ll probably play with that this weekend and see if I can finally get this branch ready to merge.&lt;/p&gt;

&lt;h3 id='other_status'&gt;Other Status&lt;/h3&gt;

&lt;p&gt;rurban has been doing some great cleanup work with native PBC, something that he&amp;#8217;s been working on (and fighting to work on) for a long time. I&amp;#8217;d really love to see more work done in this area in the future, because there are so many more opportunities for compatibility and interoperability at the bytecode level that we aren&amp;#8217;t exploiting yet.&lt;/p&gt;

&lt;p&gt;Things have otherwise been a little bit slow lately, but between &lt;code&gt;io_cleanup1&lt;/code&gt;, &lt;code&gt;threads&lt;/code&gt; and rurban&amp;#8217;s pbc work, we&amp;#8217;re still making some pretty decent progress on some pretty important areas. If we can get threads fixed and merged soon, I&amp;#8217;ll be on to the next project in the list.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/KCw7el5kDMM" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/09/14/sept_status_update.html</feedburner:origLink></entry>
    
    <entry>
        <title>io_cleanup1 Lands!</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/FjmM1kQ50Ew/io_cleanup1_lands.html" />
        <updated>2012-08-27T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/08/27/io_cleanup1_lands</id>
        <content type="html">&lt;p&gt;FINALLY! The big day has come. I&amp;#8217;ve just merged &lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt; to master. Let us rejoice!&lt;/p&gt;

&lt;p&gt;When &lt;a href='/2012/05/27/io_cleanup_first_round.html'&gt;I started the project&lt;/a&gt;, months ago, I had intended to work on the branch for maybe a week or two at the most. Get in, clean what I could, get out. Wash, rinse, repeat. That&amp;#8217;s exactly why I named the branch &amp;#8220;io_cleanup1&amp;#8221;, because I intended it to just be the first of what would be a large series of small branches. Unfortunately as I started cleaning I was lead to other things that needed to go. And those things lead elsewhere. Before I new it I had deleted just about all the code in all the files in &lt;code&gt;src/io/*&lt;/code&gt; and started rewriting from the ground up.&lt;/p&gt;

&lt;p&gt;Sometimes sticking with a plan and breaking up projects into small milestones is a good thing. Othertimes when you know what the final goal is and you&amp;#8217;re willing to put in the effort, it&amp;#8217;s good to just go there directly. That&amp;#8217;s what I ended up doing.&lt;/p&gt;

&lt;p&gt;To give you an idea of what my schedule was originally, I had intended to get this first branch wrapped up and merged before GSOC started, so that I could keep my promise of implementing 6model concurrently with that program. With GSOC over last week (I&amp;#8217;ll write a post-mortem blog entry about it soon), I&amp;#8217;ve clearly failed at that. I&amp;#8217;m extremely happy with the results so far and given the choice I would not go back and do things any differently. The IO system was in terrible condition and it desperately needed this overhaul. I wish it hadn&amp;#8217;t taken me so long, but with a system that&amp;#8217;s so central and important, it was worthwhile taking the extra time to make sure things were correct.&lt;/p&gt;

&lt;p&gt;Where to go from here? My TODO list for the near future is very short:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Threads&lt;/li&gt;

&lt;li&gt;6model&lt;/li&gt;

&lt;li&gt;More IO work&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Threads branch, the magnum opus of Parrot hacker &lt;strong&gt;nine&lt;/strong&gt; is 99.9% of the way there. If we can just push it up over the cliff, we should be able to merge soon and open up a whole new world of functionality and cool features for Parrot. I&amp;#8217;m already planning out all the cool additions to Rosella I&amp;#8217;m going to make once threads are merged: Parallel test harness. Asynchronous network requests, an IRC client library. The addition of a real, sane threading system opens up so many avenues to us that really haven&amp;#8217;t been available before. Sure there are going to be plenty of hiccups and speedbumps to deal with as we really get down and start to use this system for real things, but the merge of the threads branch represents a huge step forward and a great foundation to build upon.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m going to be putting forward as much effort as I can to getting this branch wrapped up and merged. Some of the remaining problems only manifest on hard-to-test platforms, which is where things start to get tricky. As I mentioned in an email to parrot-dev a while ago, test reports on rare platforms are great, but if we can&amp;#8217;t take action on the reported failures we can get ourselves into something of a bind. The capability to find problems on those platforms and the capability to fix problems on those platforms are two very different capabilities. But, most of the time that&amp;#8217;s a small issue and we&amp;#8217;re going to just have to find a way to muscle through and get this branch merged one way or the other. If we can merge it without purposefully excluding any platforms, that would be great.&lt;/p&gt;

&lt;p&gt;Before anybody thinks that I&amp;#8217;m done with IO and that system is now complete, think again. There is still plenty of work to be done on the IO subsystem, and all sorts of cool new features that become possible with the new architecture and unified type semantics. I want to separate out Pipe logic from FileHandle into a new dedicated PMC type. Opening FileHandles in &amp;#8220;p&amp;#8221; mode for pipes is clumsy at best, and I want a more sane system. And while I&amp;#8217;m at it, 2-way and 3-way pipes would make for a great feature addition (we can&amp;#8217;t currently do these in any reliable way).&lt;/p&gt;

&lt;p&gt;The one thing that has changed most dramatically in the new IO system is buffers. The buffering subsystem has not only been rewritten but completely redesigned. Instead of being type-specific they are now unified and type independent. Buffers are their own struct with their own API. Instead of having a single buffer that is used for both read and write, handles now have separate read and write buffers that can be created and managed independently. I want to create a new PMC type to wrap these buffers and give the necessary management interface so they can be used effectively from the PIR level and above.&lt;/p&gt;

&lt;p&gt;Finally, the &lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt; branch tried to stay as backwards compatible as possible, so many breaking changes I wanted to make had to wait until later. In the future expect to see many smaller branches to remove old broken features, old crufty interfaces, and old bad semantics. We&amp;#8217;ll make these kinds of disruptive changes in much smaller batches, with more space between them.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/FjmM1kQ50Ew" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/08/27/io_cleanup1_lands.html</feedburner:origLink></entry>
    
    <entry>
        <title>Parrot 4.7.0 "Hispaniolan" Released!</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/e_48ZrYVca8/parrot_4_7_0.html" />
        <updated>2012-08-22T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/08/22/parrot_4_7_0</id>
        <content type="html">&lt;p&gt;On behalf of the Parrot team, I&amp;#8217;m proud to announce Parrot 4.7.0, also known as &amp;#8220;Hispaniolan&amp;#8221;. &lt;a href='http://parrot.org/'&gt;Parrot&lt;/a&gt; is a virtual machine aimed at running all dynamic languages.&lt;/p&gt;

&lt;p&gt;Parrot 4.7.0 is available on &lt;a href='ftp://ftp.parrot.org/pub/parrot/releases/devel/4.7.0/'&gt;Parrot&amp;#8217;s FTP site&lt;/a&gt;, or by following the download instructions at &lt;a href='http://parrot.org/download'&gt;http://parrot.org/download&lt;/a&gt;. For those who would like to develop on Parrot, or help develop Parrot itself, we recommend using Git to retrieve the source code to get the latest and best Parrot code.&lt;/p&gt;

&lt;p&gt;Parrot 4.7.0 News:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;- Core
    + Added .all_tags() and .all_tagged_pmcs() methods to PackfileView PMC
    + Several build and coding standards fixes&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The SHA256 message digests for the downloadable tarballs are:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;4360ac3dffafffaa00bce561c1329df8ad134019f76930cf24e7a875a4422a90 parrot-4.7.0.tar.bz2
c0bffd371dea653b9881ab2cc9ae5a57dc9f531dfcda0a604ea693c9d2165619 parrot-4.7.0.tar.gz&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Many thanks to all our contributors for making this possible, and our sponsors for supporting this project. Our next scheduled release is 18 September 2012.&lt;/p&gt;

&lt;p&gt;The release is indeed out a day late. It&amp;#8217;s not that I forgot about it, it&amp;#8217;s just that I can&amp;#8217;t read a calendar and HOLY CRAP, IT&amp;#8217;S WEDNESDAY ALREADY? When did that happen? So, and I can&amp;#8217;t stress this enough, &lt;strong&gt;Mea Culpa&lt;/strong&gt;.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/e_48ZrYVca8" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/08/22/parrot_4_7_0.html</feedburner:origLink></entry>
    
    <entry>
        <title>io_cleanup1 Done?</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/1SuaGOZWP6k/io_cleanup1_done.html" />
        <updated>2012-07-22T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/07/22/io_cleanup1_done</id>
        <content type="html">&lt;p&gt;This morning I made a few last commits on my &lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt; branch, and I&amp;#8217;m cautiously optimistic that the branch is now ready to merge. The last remaining issue, which has taken the last few days to resolve, has been fixing readine semantics to match some old behavior.&lt;/p&gt;

&lt;p&gt;A few days ago I wrote a post about &lt;a href='/2012/06/13/io_readline.html'&gt;how complicated readline is&lt;/a&gt;. At the time, I thought I had the whole issue under control. But then Moritz pointed out a problem with a particular feature unique to Socket that was missing in the new branch.&lt;/p&gt;

&lt;p&gt;In master, you could pass in a custom delimiter sequence as a string to the &lt;code&gt;.readline()&lt;/code&gt; method. Rakudo was using this feature like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;str = s.readline(&amp;quot;\r\n&amp;quot;)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Of course, as I&amp;#8217;ve pointed out in the post about readline and elsewhere, there was no consistency between the three major builtin types: FileHandle, Socket and StringHandle. The closest thing we could do with FileHandle is this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;f.record_separator(&amp;quot;\n&amp;quot;);
str = f.readline();&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notice two big differences between FileHandle and Socket here: First, FileHandle has a separate &lt;code&gt;record_separator&lt;/code&gt; method that must be called separately, and the record separator is stored as state on the FileHandle between &lt;code&gt;.readline()&lt;/code&gt; calls. Second, FileHandle&amp;#8217;s record separator sequence may only be a single character. Internally, it&amp;#8217;s stored as an &lt;code&gt;INTVAL&lt;/code&gt; for a single codepoint instead of as a &lt;code&gt;STRING*&lt;/code&gt;, even though the &lt;code&gt;.record_separator()&lt;/code&gt; method takes a &lt;code&gt;STRING*&lt;/code&gt; argument (and extracts the first codepoint from it).&lt;/p&gt;

&lt;p&gt;Initially in the &lt;code&gt;io_cleanup1&lt;/code&gt; branch I used the FileHandle semantics to unify the code because I wasn&amp;#8217;t aware that Socket didn&amp;#8217;t have the same restrictions that FileHandle did, even if the interface was a little bit different. I also didn&amp;#8217;t think that the Socket version would be so much more flexible despite the much smaller size of the code to implement it. In short, I really just didn&amp;#8217;t look at it closely enough and assumed the two were more similar than they actually were. Why would I ever assume that this subsystem ever had &amp;#8220;consistency&amp;#8221; as a driving design motivation?&lt;/p&gt;

&lt;p&gt;So I rewrote readline. From scratch.&lt;/p&gt;

&lt;p&gt;The new system follows the more flexible Socket semantics for all types. Now you can use almost any arbitrary string as the record separator for &lt;code&gt;.readline()&lt;/code&gt; on FileHandle, StringHandle and Socket. In the &lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt; branch, as of this morning, you can now do this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var f = new &amp;#39;FileHandle&amp;#39;;
f.open(&amp;#39;foo.txt&amp;#39;, &amp;#39;r&amp;#39;);
f.record_separator(&amp;quot;TEST&amp;quot;);
string s = f.readline();&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&amp;#8230;And you can also do this, which is functionally equivalent:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var f = new &amp;#39;FileHandle&amp;#39;;
f.open(&amp;#39;foo.txt&amp;#39;, &amp;#39;r&amp;#39;);
string s = f.readline(&amp;quot;TEST&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The same two code snippets should work the same for all built-in handle types. For all types, if you don&amp;#8217;t specify a record separator by either method, it defaults to &amp;#8220;\n&amp;#8221;.&lt;/p&gt;

&lt;p&gt;Above I mentioned that almost any arbitrary string should work. I use the word &amp;#8220;almost&amp;#8221; because there are some restrictions. First and foremost, the delimiter string cannot be larger than half the size of the buffer. Since buffers are sized in bytes, this is a byte-length restriction, not a character-length restriction. In practice we know that delimiters are typically things like &amp;#8220;\n&amp;#8221;, &amp;#8220;\r\n&amp;#8221;, &amp;#8221;,&amp;#8221;, etc. So if the buffer is a few kilobytes this isn&amp;#8217;t a meaningful limitation. Also, the delimiter must be the same encoding as the handle uses, or it must be able to convert to that encoding. So if your handle uses &lt;code&gt;ascii&lt;/code&gt;, but you pass in a delimiter which is &lt;code&gt;utf16&lt;/code&gt;, you may see some exceptions raised.&lt;/p&gt;

&lt;p&gt;I think that the work on this branch, save for a few small tweaks, is done. I&amp;#8217;ve done some testing myself and have asked for help to get it tested by a wider audience. Hopefully we can get this branch merged this month, if no other problems are found.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/1SuaGOZWP6k" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/07/22/io_cleanup1_done.html</feedburner:origLink></entry>
    
    <entry>
        <title>IO Cleanups Home Stretch</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/lmRViXRg5ds/io_cleanup_final.html" />
        <updated>2012-07-10T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/07/10/io_cleanup_final</id>
        <content type="html">&lt;p&gt;I had made a round of fixes with regards to encodings in the &lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt; branch a few days ago. Rakudo hacker Moritz was able to take a look at Rakudo&amp;#8217;s spectests and verify that more tests were indeed passing because of it. The remaining test failures represent the changing semantics for the &lt;code&gt;read&lt;/code&gt; method and what appear to be two genuine regressions or bugs.&lt;/p&gt;

&lt;p&gt;Hopefully I will be able to get all these things sorted out this week before I go away on a mini vacation next weekend. Otherwise I can&amp;#8217;t imagine this branch gets merged before the 4.6 release this month.&lt;/p&gt;

&lt;p&gt;A few days ago I wrote a &lt;a href='/2012/06/13/io_readline.html'&gt;post about readline&lt;/a&gt; and some of the intricacies involved in that, and some of the weird semantics that I was attempting to unify. It turns out that some of these semantics are a major cause in one of the last bugs in the branch. Let&amp;#8217;s look at some code in master to see where the hangup is. First, &lt;code&gt;readline&lt;/code&gt; on a Socket:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;METHOD readline(STRING *delimiter    :optional,
                INTVAL has_delimiter :opt_flag) {
    INTVAL idx;
    STRING *result;
    STRING *buf;
    GET_ATTR_buf(INTERP, SELF, buf);

    if (!has_delimiter)
        delimiter = CONST_STRING(INTERP, &amp;quot;\n&amp;quot;);

    if (Parrot_io_socket_is_closed(INTERP, SELF))
        RETURN(STRING * STRINGNULL);

    if (buf == STRINGNULL)
        buf = Parrot_io_reads(INTERP, SELF, CHUNK_SIZE);

    while ((idx = Parrot_str_find_index(INTERP, buf, delimiter, 0)) &amp;lt; 0) {
        STRING * const more = Parrot_io_reads(INTERP, SELF, CHUNK_SIZE);
        if (Parrot_str_length(INTERP, more) == 0) {
            SET_ATTR_buf(INTERP, SELF, STRINGNULL);
            RETURN(STRING *buf);
        }
        buf = Parrot_str_concat(INTERP, buf, more);
    }

    idx += Parrot_str_length(INTERP, delimiter);
    result = Parrot_str_substr(INTERP, buf, 0, idx);
    buf = Parrot_str_substr(INTERP, buf, idx, Parrot_str_length(INTERP, buf) - idx);
    SET_ATTR_buf(INTERP, SELF, buf);
    RETURN(STRING *result);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We can ignore the fact that this implementation of &lt;code&gt;readline&lt;/code&gt; doesn&amp;#8217;t call &lt;code&gt;Parrot_io_readline&lt;/code&gt; like every other PMC does. Or that if we did call that function the program would throw an exception because &lt;code&gt;Parrot_io_readline&lt;/code&gt; doesn&amp;#8217;t support sockets anyway. Whatever. Moving on&amp;#8230;&lt;/p&gt;

&lt;p&gt;For comparison, let&amp;#8217;s look at the version from the Handle PMC (which is inherited by FileHandle):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;METHOD readline() {
    STRING * const string_result = Parrot_io_readline(INTERP, SELF);
    RETURN(STRING *string_result);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The Socket version takes a &lt;code&gt;delimiter&lt;/code&gt; parameter which is a STRING. When doing readline on a Socket, you can pass in any arbitrary string which is used as the token for end of line. With FileHandle, you don&amp;#8217;t seem to have that. However, you can definitely use custom delimiters with FileHandle. However, we clearly don&amp;#8217;t take a delimiter here and we aren&amp;#8217;t passing one in as an argument to &lt;code&gt;Parrot_io_readline&lt;/code&gt; like we do in the branch. Let&amp;#8217;s see how it&amp;#8217;s done instead. Here&amp;#8217;s a snippet from Handle PMC:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    ATTR INTVAL    record_separator;  /* Record separator (only single char supported) */&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We don&amp;#8217;t need to look at any other code. This is the smoking gun. &lt;code&gt;Socket.readline()&lt;/code&gt; can take any arbitrary STRING to use as a record separator, but &lt;code&gt;FileHandle.readline()&lt;/code&gt; can only use a single codepoint, which it doesn&amp;#8217;t take as an argument.&lt;/p&gt;

&lt;p&gt;So that&amp;#8217;s the problem right there. When I standardized the readline mechanics between types, I picked the FileHandle semantics. This was probably the wrong decision, because not only could Sockets use a more general mechanism but Rakudo relies on that behavior in its spectests. This does raise a question about why nobody ever expected this same behavior from FileHandle, or why the difference was not considered some kind of bug. It really goes to show how immature our IO system has been for all these years, and how we had all just grown accustomed to the arbitrary, inconsistent, nonsensical behaviors. It just works for some basic usages, so nobody ever complains about it. That time is, thankfully, coming quickly to an end.&lt;/p&gt;

&lt;p&gt;Fixing this issue is actually going to take some serious work. Several function signatures are going to need updating to take a STRING delimiter instead of an INTVAL codepoint, and a major chunk of buffering logic is going to need to be rewritten to work on substrings instead of on individual codepoints. This, in turn, is going to require a heck of a lot more testing.&lt;/p&gt;

&lt;p&gt;Last night I started putting in some of the changes necessary to use a substring terminator instead of a single codepoint. Most of what I&amp;#8217;ve already done has been modifying function signatures. The real changes need to occur deep within the buffering logic and will require a little bit more time.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m looking forward to getting this branch fixed up and merged back to master so I can get to work on my next project. I think 6model is going to be the next thing I dig into, before I find something else that annoys me enough to put in a huge amount of effort to rewrite it. I&amp;#8217;ll post more updates about my future projects and plans as I go.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/lmRViXRg5ds" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/07/10/io_cleanup_final.html</feedburner:origLink></entry>
    
    <entry>
        <title>HTML with Rosella Xml and Net</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/Bi-cUawaalM/html_with_rosella.html" />
        <updated>2012-06-30T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/06/30/html_with_rosella</id>
        <content type="html">&lt;p&gt;HTML is a derivative of SGML, just like XML is. Sure, they look pretty much the same for the most part, but there are a few key differences that prevent HTML from being parsed exactly like XML. Part of the reason why I like XHTML so much is that it&amp;#8217;s more usable with more parsers, including many of simpler and full-featured XML parsers. Simplicity in parsing was one of the original motivations of the XML design, at least in comparison to a full SGML parser or even something like a full HTML parser.&lt;/p&gt;

&lt;p&gt;But that&amp;#8217;s all besides the point.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve been in something of a backyard gardening kick lately. We bought our house only a few short months ago, and are only half way through the first summer growing season in my modest little garden. My plans for next year are much more expansive. I&amp;#8217;ve finally talked my wife into letting me buy some cherry trees to plant. She was also pretty willing to get a few grape vines planted (especially when I sketched out the beautiful wooden arbor they would be growing on). She put her foot down when I started talking about blueberries, apples and pears, however. And another garden bed or two for more vegetables. For some reason she&amp;#8217;s convinced that we need some measure of open space in our little plot so the kid has somewhere to run and play. Some people have weird priorities.&lt;/p&gt;

&lt;p&gt;This is all sort of besides the point too.&lt;/p&gt;

&lt;p&gt;Getting the things I need for all this gardening work I&amp;#8217;ve talked myself into is not cheap. Cherry tree seeds actually &lt;em&gt;do grow on trees&lt;/em&gt; so that&amp;#8217;s not a big deal, but other things like fertilizers, soil amendments, tools, materials for building a grape trellis and raised garden beds, not to mention a longer hose to reach all the new things that are going to require regular watering all cost money. And maybe a sprinler, like one of those fancy ones on an electronic timer. I can avoid some of that cost by getting things used and at discount on sites like Craigslist. So I&amp;#8217;ve been going there. Every day.&lt;/p&gt;

&lt;p&gt;And it&amp;#8217;s tedious. I have to sort through hundreds of listings for things I don&amp;#8217;t want, in categories that seem far too course. Sometimes, because things often get incorrectly categorized, I have to look in other related categories too, sorting through things that are even less relevant on average to try and find the occasional gem. This is all on top of the hardware-related problems I have being unable to use the trackpad on my laptop so web navigation on sites without keyboard shortcuts is an extreme pain. I start to think to myself: I can do better, I&amp;#8217;m a programmer! For some values of &amp;#8220;better&amp;#8221; and &amp;#8220;programmer&amp;#8221;.&lt;/p&gt;

&lt;p&gt;Enter Rosella. Now with Parrot, Winxed and Rosella I can use the Net library to fetch the text of the HTML code of the page. After some hacking in the last few days, I can parse that code with my Xml library (set in a new lenient mode) and start to work with it in a meaningful way:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function main[main]() {
    var rosella = load_packfile(&amp;quot;rosella/core.pbc&amp;quot;);
    Rosella.initialize_rosella(&amp;quot;xml&amp;quot;, &amp;quot;net&amp;quot;, &amp;quot;string&amp;quot;);

    var ua = new Rosella.Net.UserAgent.SimpleHttp();
    var response = ua.get(&amp;quot;http://philadelphia.craigslist.org/w4m/&amp;quot;);
    var doc = Rosella.Xml.read_string(response.content, false);

    doc.get_document_root()
        .get_children_named(&amp;quot;body&amp;quot;)
        .get_children_named(&amp;quot;blockquote&amp;quot;)
        .get_children_named(&amp;quot;p&amp;quot;, &amp;quot;row&amp;quot;:[named(&amp;quot;class&amp;quot;)])
        .map(function(node) {
            return {
                &amp;quot;title&amp;quot;: node.first_child(&amp;quot;a&amp;quot;).get_inner_xml(),
                &amp;quot;link&amp;quot;:  node.first_child(&amp;quot;a&amp;quot;).attributes[&amp;quot;href&amp;quot;],
                &amp;quot;price&amp;quot;: node.first_child(&amp;quot;span&amp;quot;, &amp;quot;itempp&amp;quot;:[named(&amp;quot;class&amp;quot;)]).get_inner_xml(),
                &amp;quot;has_pic&amp;quot;: !Rosella.String.null_or_empty(
                    node.first_child(&amp;quot;span&amp;quot;, &amp;quot;itempx&amp;quot;:[named(&amp;quot;class&amp;quot;)]).get_inner_xml()
                )
            };
        })
        .filter(function(obj) {
            return indexof(obj[&amp;quot;title&amp;quot;], &amp;quot;compost&amp;quot;) &amp;gt;= 0;
        })
        .map(function(obj) {
            return Rosella.String.format_obj(&amp;quot;&amp;lt;a href=&amp;#39;{link}&amp;#39;&amp;gt;{title} for {price}&amp;lt;/a&amp;gt;&amp;quot;, obj);
        })
        .foreach(function(string s) { say(s); });
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That second argument to &lt;code&gt;Rosella.Xml.read_string&lt;/code&gt; tells the parser to go into &amp;#8220;non-strict&amp;#8221; mode, which is basically my attempt to fudge the XML parsing rules to allow for the SGML nonsense in HTML. Without that, the parser will blow up pretty early in the parse because of unbalanced tags. The XML parser by default does not handle tags which are not balanced and which do not have the trailing slash to indicate a standalone tag, and the Craigslist source is filled with those kinds of things.&lt;/p&gt;

&lt;p&gt;All I need to do is set this scraper up on a timer, and have it send me results somehow. If I set up a small server with mod_parrot and some kind of tool for generating RSS feeds, I could have this output neatly delivered to me on a regular basis. Considering that mod_parrot is moving along so smoothly and RSS is just another XML format, I think this is a pretty reasonable idea.&lt;/p&gt;

&lt;p&gt;So, I started working on that. As of last night, I&amp;#8217;ve sketched out two small libraries, one for RSS feeds and one for the competing standard, Atom. These libraries are thin wrappers around the XML library to deal with the specifics of RSS and Atom. Here&amp;#8217;s an example of consuming an RSS feed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var rss = Rosella.Rss.read_url(&amp;quot;http://www.parrot.org/rss.xml&amp;quot;);
rss
    .channels()
    .first()
    .items()
    .foreach(function(i) {
        say(Rosella.String.format_obj(&amp;quot;{title} (by {creator}) : {description}&amp;quot;, i));
    });&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can do almost exactly the same thing with an Atom feed too, if you&amp;#8217;ve got one of those instead. Right now RSS and Atom are implemented in two separate libraries, but I may combine them together for simplicity and to avoid unnecessary code duplication.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m working on an interface to write and publish feeds as well, though that&amp;#8217;s not quite ready yet. You can bet that when I&amp;#8217;ve got that working, I&amp;#8217;ll be setting up a copy of mod_parrot to use it with.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve been sort of kicking around the idea of a specialized HTML parsing library, which would more or less be an SGML parser with some schema information. I&amp;#8217;m not sure I want to get into that hassle because HTML is a pretty messy thing and it will take a huge amount of effort to get something that works most of the time. But, if you&amp;#8217;re willing to put up with a little bit of oddity, the Xml library works well enough for many cases.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/Bi-cUawaalM" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/06/30/html_with_rosella.html</feedburner:origLink></entry>
    
    <entry>
        <title>Sockets and Encodings</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/8URf0FpdBtU/io_socket_encodings.html" />
        <updated>2012-06-28T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/06/28/io_socket_encodings</id>
        <content type="html">&lt;p&gt;The &lt;code&gt;io_cleanup1&lt;/code&gt; branch is nearing completion, though as always the last few details are what holds everything up. In the past few days all the remaining tests in the parrot repo were passing. The coding standards tests, as usual, the last to be resolved. Then I started building and testing other things on the branch: Winxed builds and tests fine. So does Rosella. Then I looked at NQP and Rakudo. Both built fine, but Rakudo was failing two socket-related spectests.&lt;/p&gt;

&lt;p&gt;That&amp;#8217;s not entirely unexpected. Even though my intention was to make this branch as painless as possible there were still some unavoidable changes to interfaces and semantics. There are a few places where older semantics are surrounded by large &lt;code&gt;/* HACK! */&lt;/code&gt; comments, but for the most part I&amp;#8217;ve tried to make everything sane. That&amp;#8217;s why I wasn&amp;#8217;t surprised to see Rakudo failing a few tests. I was much more surprised that Rakudo built without any problems the first time I tried it. I figured the test failures represented some kind of semantic mismatch, and getting Rakudo passing again would have been as easy as getting the old semantics returned, with a note about a future update path.&lt;/p&gt;

&lt;p&gt;It turns out this wasn&amp;#8217;t exactly the case. For one test it was the simple difference in the way we read on streams with multibyte encodings. This was expected and we can fix it to use the old behavior if that&amp;#8217;s what Rakudo prefers. For the second failing test, it&amp;#8217;s not that there&amp;#8217;s a semantic difference per se, but instead there is a glaring and serious bug in master that was corrected in the new branch. Here, I&amp;#8217;m going to explain what&amp;#8217;s going on.&lt;/p&gt;

&lt;p&gt;Look at this code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Parrot_io_recv_handle(PARROT_INTERP, ARGMOD(PMC *pmc), size_t len)
{
    Parrot_Socket_attributes * const io = PARROT_SOCKET(pmc);

    /* This must stay ASCII to make Rakudo and UTF-8 work for now */
    STRING * res    = Parrot_str_new_noinit(interp, len);
    INTVAL received = Parrot_io_recv(interp, io-&amp;gt;os_handle,
                                     res-&amp;gt;strstart, len);

    res-&amp;gt;bufused = received;
    res-&amp;gt;strlen  = received;

    return res;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is a pared-down version of the code behind the &lt;code&gt;recv&lt;/code&gt; method on Socket. It creates a new string with the specified length pre-allocated, then passes the buffer to the low-level &lt;code&gt;recv&lt;/code&gt; C API (which has been abstracted a little to account for platform differences).&lt;/p&gt;

&lt;p&gt;Notice the comment there in the middle which says the string uses the ASCII encoding, for use by Rakudo. This is what I saw, and this is the semantic I followed in the new system: When you read from a socket by default in the new system, the string is encoded as ASCII unless you specify differently.&lt;/p&gt;

&lt;p&gt;Just for my own verification, I had to look at the &lt;code&gt;Parrot_str_new_noinit&lt;/code&gt; function to verify that the string was, in fact, being set to ASCII:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Parrot_str_new_noinit(PARROT_INTERP, UINTVAL capacity)
{
    STRING * const s = Parrot_gc_new_string_header(interp, 0);
    s-&amp;gt;encoding = Parrot_default_encoding_ptr;

    Parrot_gc_allocate_string_storage(interp, s,
        (size_t)string_max_bytes(interp, s, capacity));

    return s;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Elsewhere in the system, we have this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Parrot_default_encoding_ptr = Parrot_ascii_encoding_ptr;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So yes, the string returned by the Socket does indeed use the ASCII encoding in master. And, after double-checking, the version in the &lt;code&gt;io_cleanup1&lt;/code&gt; branch was using ASCII also. However, in the new branch Rakudo&amp;#8217;s test fails because of an exception about a lossy conversion of non-ascii data into the the lower bit-width format. A quick check shows that both systems create an ASCII string buffer and both systems call the same &lt;code&gt;recv&lt;/code&gt; function to fill it. So where&amp;#8217;s the problem? What the hell?&lt;/p&gt;

&lt;p&gt;For comparison, here&amp;#8217;s the snippet of code from the new branch that reads data into a STRING, possibly using a buffer:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;bytes_read = Parrot_io_buffer_read_b(interp, buffer, handle, vtable,
                                   s-&amp;gt;strstart + s-&amp;gt;bufused, byte_length);
s-&amp;gt;bufused += bytes_read;
STRING_scan(interp, s);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We&amp;#8217;re reading out a number of bytes, appending them into the string&amp;#8217;s pre-allocated storage and updating the number of bytes actually used. That&amp;#8217;s all the same as in master. However, the last line, &lt;code&gt;STRING_scan&lt;/code&gt; does not appear in master. What is it?&lt;/p&gt;

&lt;p&gt;&lt;code&gt;STRING_scan()&lt;/code&gt; loops through the data in the string to verify that it correctly matches the string&amp;#8217;s encoding. For instance, if the string is encoded as ASCII, &lt;code&gt;STRING_scan&lt;/code&gt; will loop through to make sure all character values are lower than 128. If the string is UTF-16, &lt;code&gt;STRING_scan&lt;/code&gt; verifies that we have an even number of bytes and that each value is an acceptable codepoint.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;master&lt;/code&gt; doesn&amp;#8217;t do this, which means there is a bug. In master, we don&amp;#8217;t scan the string after &lt;code&gt;recv&lt;/code&gt; but before we return it to the user, which means we can have non-ASCII data in a string marked with the ASCII encoding. The Rakudo test puts UTF-8 data into the socket on the server side, and then reads out a string and encodes that to UTF-8 to verify that it comes out correctly. However in the new branch we actually check that the string is valid before giving it out to user code, and it isn&amp;#8217;t, so we throw an exception.&lt;/p&gt;

&lt;p&gt;Combine that with the fact that the Socket PMC has no way to change the encoding it uses in master, which means all Sockets used in Parrot master are potential sources of bugs.&lt;/p&gt;

&lt;p&gt;Two nights ago I added methods to Socket to get/set the encoding to use, and everybody&amp;#8217;s favorite Moritz created a branch for Rakudo to use it. Last night I did some playing with default encodings. Tonight and into the weekend I&amp;#8217;m hoping to wrap up the last few details to get the Rakudo spectest passing like normal again. Hopefully, if all goes well, we can start talking about a merger within the next week or two.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/8URf0FpdBtU" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/06/28/io_socket_encodings.html</feedburner:origLink></entry>
    
    <entry>
        <title>Reading a Line of Text</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/ciFE5ekaOS0/io_readline.html" />
        <updated>2012-06-13T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/06/13/io_readline</id>
        <content type="html">&lt;p&gt;In terms of usage, there aren&amp;#8217;t too many IO-related features in Parrot&amp;#8217;s user interface more straight-forward than the &lt;code&gt;readline&lt;/code&gt; method. It does exactly what you tell it to do: read a line of text from the given file and return that line of text as a Parrot string. Easy.&lt;/p&gt;

&lt;p&gt;Tonight I was looking at some of the old code to get an idea about expected semantics for some tests that need fixing. Let&amp;#8217;s look at some code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;.sub read_a_line
    .param string type
    $P0 = new [type]
    $S0 = $P0.&amp;#39;readline&amp;#39;()
    .return($S0)
.end

.sub test_readline
    $S0 = &amp;#39;read_a_line&amp;#39;(&amp;#39;FileHandle&amp;#39;)
    say $S0
    $S0 = &amp;#39;read_a_line&amp;#39;(&amp;#39;Socket&amp;#39;)
    say $S0
    $S0 = &amp;#39;read_a_line&amp;#39;(&amp;#39;StringHandle&amp;#39;)
    say $S0
.end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The valid types for this are, as usual, &lt;code&gt;&amp;quot;FileHandle&amp;quot;&lt;/code&gt;, &lt;code&gt;&amp;quot;Socket&amp;quot;&lt;/code&gt; and &lt;code&gt;&amp;quot;StringHandle&amp;quot;&lt;/code&gt;. Notice that we&amp;#8217;re reading a line from the object of the given type before we&amp;#8217;ve opened, connected or initialized. Pretend, in order to save myself some typing, that I&amp;#8217;ve set up exception handlers and the like above. So, what happens?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For &lt;code&gt;FileHandle&lt;/code&gt; we throw an exception. You can&amp;#8217;t read from a closed handle.&lt;/li&gt;

&lt;li&gt;For &lt;code&gt;StringHandle&lt;/code&gt;, we throw an exception for the same reason.&lt;/li&gt;

&lt;li&gt;For &lt;code&gt;Socket&lt;/code&gt; we return null because&amp;#8230;whatever. (in the test suite we test that when converted to a floating-point number, that it&amp;#8217;s 0.0. Again, whatever).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So that&amp;#8217;s a little bit weird that socket does something different from the other two, but fundamentally it&amp;#8217;s a pretty different type so I suppose some differences can be allowed.&lt;/p&gt;

&lt;p&gt;Now, let&amp;#8217;s try something slightly different:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;.sub read_a_line
    .param string type
    $P0 = new [type]
    $P0.&amp;#39;open&amp;#39;(&amp;quot;foo.txt&amp;quot;, &amp;quot;r&amp;quot;)
    $P0.&amp;#39;print&amp;#39;(&amp;quot;This is \n test text&amp;quot;)
    $P0.&amp;#39;close&amp;#39;()
    $S0 = $P0.&amp;#39;readline&amp;#39;()
    .return($S0)
.end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With this example we can only operate on &lt;code&gt;FileHandle&lt;/code&gt; and &lt;code&gt;StringHandle&lt;/code&gt; because &lt;code&gt;Socket&lt;/code&gt; doesn&amp;#8217;t have an &lt;code&gt;.open()&lt;/code&gt; method like those two do. What does this do for those two types?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For &lt;code&gt;FileHandle&lt;/code&gt; we throw the same exception, you still can&amp;#8217;t read from a closed handle.&lt;/li&gt;

&lt;li&gt;For &lt;code&gt;StringHandle&lt;/code&gt; you can &lt;em&gt;read like normal&lt;/em&gt; without any indication that the handle is closed!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So that&amp;#8217;s weird to say the least that StringHandle has two different behaviors. &lt;code&gt;Socket&lt;/code&gt; has yet another problem, in a slightly different way. The method &lt;code&gt;Socket.readline()&lt;/code&gt; returns null when not open, but if you pass a &lt;code&gt;Socket&lt;/code&gt; to the &lt;code&gt;Parrot_io_readline&lt;/code&gt; method, it always throws an exception because apparently readline on a &lt;code&gt;Socket&lt;/code&gt; isn&amp;#8217;t supported! And because readline on a &lt;code&gt;Socket&lt;/code&gt; uses a completely different code path from &lt;code&gt;FileHandle&lt;/code&gt; the two types use completely different buffering mechanisms with subtly different semantics (&lt;code&gt;StringHandle&lt;/code&gt;, because it uses the in-memory string buffer, does it in a third way).&lt;/p&gt;

&lt;p&gt;To recap: What is conceptually a simple operation, read in some text until we find a delimiter, is done in three completely different ways by three different types, each with different error-handling semantics depending on both history, state, and the interface used. If anybody was wondering why I wanted to rewrite this subsystem, here&amp;#8217;s part of the reason.&lt;/p&gt;

&lt;p&gt;Actually, I kind of lied. It&amp;#8217;s really not a simple operation which is all the more reason we should share common code. It&amp;#8217;s a clear case of an algorithm where the hard parts should be encapsulated inside a clean interface so that different types can avoid needing to reimplement it over and over again (with differences, bugs and complications). That&amp;#8217;s the way it really should be, but some of the complications in the code are a little hard to live with. Here&amp;#8217;s the general algorithm for readline on a &lt;code&gt;FileHandle&lt;/code&gt;, as it&amp;#8217;s implemented in Parrot master:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The filehandle requires a buffer for this, so create (and fill) a buffer if one isn&amp;#8217;t configured.&lt;/li&gt;

&lt;li&gt;Create a new, empty STRING header.&lt;/li&gt;

&lt;li&gt;Treating the buffer like an encoded STRING, scan the buffer looking for the end of the delimiter or the end of the buffer, whichever comes first.&lt;/li&gt;

&lt;li&gt;Allocate/reallocate enough space in the STRING header to hold all the data we&amp;#8217;ve found in the buffer.&lt;/li&gt;

&lt;li&gt;Append all the characters we&amp;#8217;ve found to the STRING.&lt;/li&gt;

&lt;li&gt;If we&amp;#8217;ve found the delimiter, we&amp;#8217;re done. Return it to the user.&lt;/li&gt;

&lt;li&gt;Otherwise, check if we are at the end of file for the input. If so, go to 8. If not end of file, go to 9.&lt;/li&gt;

&lt;li&gt;Check that the last codepoint is complete and has all its bytes. If so, return the STRING to the user. If not, throw an exception about a malformed string.&lt;/li&gt;

&lt;li&gt;Check that the last codepoint is complete and has all its bytes. If so, go to 10. Otherwise, go to 11.&lt;/li&gt;

&lt;li&gt;Refill the buffer and go to 3.&lt;/li&gt;

&lt;li&gt;Determine how many more bytes we need to read to complete the last codepoint.&lt;/li&gt;

&lt;li&gt;Refill the buffer, and check that we have at least that many bytes available to read. If so, go to 13. Otherwise, throw an exception about a malformed string input.&lt;/li&gt;

&lt;li&gt;Read in the necessary number of bytes (1, 2 or 3 at most) from the buffer and go to 3.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you&amp;#8217;re reading an ASCII or fixed8 string the logic obviously collapses down to something a little bit more manageable. Also, this same logic, almost line for line, is repeated in the routine to read a given number of characters from the handle, where characters in a non-fixed-width encoding (like utf8) may need multiple reads to get if we don&amp;#8217;t get all the bytes for the character into the buffer in a single go. Notice that the versions provided by StringHandle and Socket are both much more simple and not safe for multi-byte encodings like &lt;code&gt;utf8&lt;/code&gt; or &lt;code&gt;utf16&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In my &lt;code&gt;io_cleanup1&lt;/code&gt; branch, the logic has been simplified substantially, and a single codepath is now used for all three of the major types:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make sure the handle has a read buffer set up and filled.&lt;/li&gt;

&lt;li&gt;Create a new, empty STRING header.&lt;/li&gt;

&lt;li&gt;Ask the buffer to find the given end-of-line character. The buffer will return a number of bytes to read in order to get a whole number of codepoints, and a flag that says whether we&amp;#8217;ve found the delimiter or not.&lt;/li&gt;

&lt;li&gt;Append those bytes to the string header.&lt;/li&gt;

&lt;li&gt;If the delimiter is found or if we are at EOF, return the string.&lt;/li&gt;

&lt;li&gt;Fill the bufffer and go to #3.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By simply coding the buffer logic to refuse to return incomplete codepoints in response to a STRING read request, the whole algorithm becomes hugely simplified. The readline routine in master takes up 185 lines of C code. In my new branch, the same routine takes up only 47 lines. Of course, this isn&amp;#8217;t comparing apples to apples, because I did break up some of the repeated logic into helper routines, and the buffers in my system are obviously a little bit smarter about STRINGs and codepoints, but that&amp;#8217;s not exactly the point. The real point is that three large, complicated, hard-to-read functions in master are now a single, much smaller, easier-to-read routine that relies on clear abstraction boundaries to do a difficult job in a much more conceptually simple way.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve also updated the STRING read routine (now called &lt;code&gt;Parrot_io_read_s&lt;/code&gt;) to use a similar algorithm and actually share some of the new helper methods. That sharing itself also helps to decrease total lines of code has has other benefits as well.&lt;/p&gt;

&lt;p&gt;Notice that there is one small change in these two algorithms, which may or may not need to be worked around if it causes problems. Notice that we don&amp;#8217;t read out of the buffer an incomplete codepoint. If we have an incomplete one at the end of the file, the first algorithm will read it in and throw an exception about a malformed string. The second algorithm will ignore those final bytes and successfully return all the rest of the valid-looking data from the buffer instead. In the first algorithm, it then becomes impossible to read the partial data out and make a best effort, while in the second algorithm you can easily get to the data, even if the last codepoint is corrupted and cannot be read. I&amp;#8217;d really love to hear what people think about this change, and whether it&amp;#8217;s worth keeping or needs to change. I suspect it is better this way but only the users can really say for sure.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/ciFE5ekaOS0" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/06/13/io_readline.html</feedburner:origLink></entry>
    
    <entry>
        <title>IO Rewrite Status</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/nnKjASPFm1M/io_cleanup_status.html" />
        <updated>2012-06-08T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/06/08/io_cleanup_status</id>
        <content type="html">&lt;p&gt;I was going to call this post &amp;#8220;IO Cleanup Status&amp;#8221;, but let&amp;#8217;s face facts: This is a complete rewrite of the entire subsystem. I haven&amp;#8217;t hardly left a single line of code untouched. It is a full rewrite of the system hiding behind a mostly-similar (though not quite the same) API. I didn&amp;#8217;t intend to completely rewrite the whole subsystem when I started the branch, hence the benign-sounding branch name. Following along with our cultural norms, I could have called it &lt;code&gt;whiteknight/io_massacre&lt;/code&gt; or something similarly upbeat. Whatever. I&amp;#8217;ve known people stuck with un-liked names for their entire lives, so this branch can be misnamed for a few weeks.&lt;/p&gt;

&lt;p&gt;So what is the status of this branch, exactly?&lt;/p&gt;

&lt;p&gt;At the time of this writing the branch is mostly complete. The major architectural work has all been done, with per-type logic separated out into new &lt;code&gt;IO_VTABLE&lt;/code&gt; structures, and buffering logic divorced from FileHandle into a new &lt;code&gt;IO_BUFFER&lt;/code&gt; structure. Now you can do things that have never been possible before, like buffering socket input and output, or doing readline with custom line-end characters on all handle types, and a whole bunch of other, increasingly-obscure operations. A lot of the new capabilities are things you didn&amp;#8217;t even know we didn&amp;#8217;t support before. Now, we do.&lt;/p&gt;

&lt;p&gt;We aren&amp;#8217;t quite there yet, but the stage is set for some other awesome changes in the future too, which I&amp;#8217;ll talk about in more depth when we get there.&lt;/p&gt;

&lt;p&gt;The current status of the branch is good. Parrot builds without any huge amount of new warnings and with no errors on my platform. Some platform-specific code needs to be updated for Windows, I&amp;#8217;m sure. The one big thing standing in the way is keeping track of file positions through operations like &lt;code&gt;seek&lt;/code&gt; and &lt;code&gt;tell&lt;/code&gt;. These things are made a little bit more difficult when you have read buffers reading ahead, because the position of the next character to read according to the user may be far different than the position of the file descriptor according to the operating system. Then consider the case when you have a file opened for read and write, with buffers in both directions. The old system had a single buffer per FileHandle which needed to be flushed if you tried to read when the buffer was in write mode, or you tried to write when it was in read mode. If you&amp;#8217;re switching back and forth between reading and writing often enough, buffering actually decreases performance when it&amp;#8217;s supposed to be a performance enhancer.&lt;/p&gt;

&lt;p&gt;The FileHandle has an attribute to keep a pointer to the current cursor location, but I&amp;#8217;m not always updating it as often as I should and not always reading it when I should. If you have a file opened for read and write, when you write 5 characters at the current file position you need to increment the read buffer by 5 characters also. When you go to read in 5 characters from the current position, you either need to flush the write buffer first or you can try to read those characters right out of the write buffer. There&amp;#8217;s nothing complicated about it, just a lot of bookkeeping to get right and lots of little interactions that need to be tested. It&amp;#8217;s helpful that we don&amp;#8217;t do &lt;code&gt;seek&lt;/code&gt; or &lt;code&gt;tell&lt;/code&gt; on some things like Sockets, and we don&amp;#8217;t really buffer StringHandles.&lt;/p&gt;

&lt;p&gt;The branch is moving along well and if I can find the time to actually sit down and work on it for a dedicate period of time I might be able to get it closer to being done. I&amp;#8217;m shooting for being mergable sometime after the coming release.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/nnKjASPFm1M" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/06/08/io_cleanup_status.html</feedburner:origLink></entry>
    
    <entry>
        <title>IO Cleanups Status</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/99TGszJbVko/io_cleanup_status.html" />
        <updated>2012-06-01T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/06/01/io_cleanup_status</id>
        <content type="html">&lt;p&gt;Tonight I&amp;#8217;ve hit something of a milestone with my branch to rewrite the IO subsystem. As of tonight the parrot binary, &lt;code&gt;parrot-nqp&lt;/code&gt; and &lt;code&gt;winxed&lt;/code&gt; all build in my branch and coretest runs (though fails some tests). The entire build does not complete because of some failures related to dynops, but it does get most of the way through. This means that most of the main-path IO APIs and FileHandle operations are working correctly, which is a relatively small portion of everything that has changed.&lt;/p&gt;

&lt;p&gt;With Parrot building, I&amp;#8217;m now able to more closely keep track of progress and regressions, and do more live testing as I make new changes. Until this point all my changes have just been mental exercises, so I&amp;#8217;m happy to have a little bit more feedback and even some validation.&lt;/p&gt;

&lt;p&gt;Of course, just saying that it builds doesn&amp;#8217;t really mean anything. Several things are still not implemented or completely wired up. Some operations on files such as &lt;code&gt;seek&lt;/code&gt;, &lt;code&gt;peek&lt;/code&gt; and &lt;code&gt;tell&lt;/code&gt; are still not implemented yet. Several methods on the various PMCs (&lt;code&gt;FileHandle&lt;/code&gt;, &lt;code&gt;Socket&lt;/code&gt; and &lt;code&gt;StringHandle&lt;/code&gt;) have not been updated to use the new system. There are a few regressions I need to address with regards to buffering. Specifically, &amp;#8220;line buffering&amp;#8221; has been removed from the system during the rewrite and hasn&amp;#8217;t been added back. Line buffering in Parrot has never really done much, but it&amp;#8217;s just hacky and obscure enough that I&amp;#8217;m sure somebody is relying on it.&lt;/p&gt;

&lt;p&gt;Some things, like files opened for dual read/write modes or append modes haven&amp;#8217;t been completely dealt with in code either. I don&amp;#8217;t think there&amp;#8217;s a lot of work to do for this, but since the buffering architecture has changed so much from what it used to be and since these modes are relatively rare and not as thoroughly tested I want to spend a little bit of extra time making sure there are no regressions.&lt;/p&gt;

&lt;p&gt;Also there are several coding standards tests (especially for function-level annotations and documentation) which fail spectacularly in the branch, and it&amp;#8217;s going to take time to update all the old documentation and add docs for all the new functions. I also need to update PDD 22 to reflect the new architecture of the IO system.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve been working on this branch pretty aggressively for the last two weeks and I think I&amp;#8217;m about 50% of the way done. That&amp;#8217;s not too bad considering the magnitude of the change and the amount of time I&amp;#8217;ve had to hack. Within a week or two more, if all goes well, I think the branch might be ready for wider testing and eventually merging.&lt;/p&gt;

&lt;p&gt;As usual when we&amp;#8217;re talking about changes this big, merges are not something to be rushed. Assuming all goes well and other people like what I&amp;#8217;ve been doing, expect to see a brand new IO system in Parrot sometime later this summer.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/99TGszJbVko" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/06/01/io_cleanup_status.html</feedburner:origLink></entry>
    
    <entry>
        <title>IO Refactors</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/V4D3FNyxOy0/io_cleanup_first_round.html" />
        <updated>2012-05-27T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/05/27/io_cleanup_first_round</id>
        <content type="html">&lt;p&gt;The IO subsystem is a lot like the garbage collector: So long as it &lt;em&gt;just works&lt;/em&gt; we can ignore its faults for quite a long time. The garbage collector had performance and other issues for years before everybody&amp;#8217;s favorite bacek went through and finally rewrote it. His effort there saves the rest of us meer mortals from having to touch the GC again for another couple years.&lt;/p&gt;

&lt;p&gt;The IO system works reasonably well. It&amp;#8217;s got a decent set of features more or less, it implements most of the important operations that our users have needed in the past, it&amp;#8217;s not spectacularly slow (and disk or network operation performance almost always outweighs any issues in the code that leads to those things), and we haven&amp;#8217;t been getting a lot of error reports or feature requests for it. In short, if it ain&amp;#8217;t broke, don&amp;#8217;t fix it.&lt;/p&gt;

&lt;p&gt;A few days ago I was working on a ticket for moritz to add better integration between our various IO vector PMCs (FileHandle, Socket, etc) and the ByteBuffer PMC. ByteBuffer is what it&amp;#8217;s name implies: It&amp;#8217;s an array-like type for working with individual bytes in a chunk of memory. It&amp;#8217;s like a binary encoded STRING, but it&amp;#8217;s not immutable and has a handful of additional features that a raw STRING (or the String PMC) doesn&amp;#8217;t. ByteBuffer can be populated from and exported to a STRING, and it is useful for certain types of operations that need to operate on a sequence of bytes without having to worry about strings and encodings and all that other nonsense. Mortiz&amp;#8217;s request was a reasonable one so I sat down and made it happen. A few nights ago I merged that work in to master with an &amp;#8220;experimental&amp;#8221; tag on it.&lt;/p&gt;

&lt;p&gt;However while I was in the IO subsystem code making this happen something did break. Not in the code, instead something broke inside my poor little head. The snapping sound you hear is the poor camel&amp;#8217;s back under the load of that last piece of straw. I&amp;#8217;ve had enough of that system and its inside-out organization and collection of half-ideas and botched refactors. I&amp;#8217;ve had my fill of the nonsense and finally decided it was time to make things right.&lt;/p&gt;

&lt;p&gt;And before anybody says to me, &amp;#8220;hey Mr Whiteknight, you shouldn&amp;#8217;t be so mean, somebody probably worked really hard to make this code do what it does&amp;#8221;, let me just say two things: First, &amp;#8220;Mr Whiteknight&amp;#8221; is my father&amp;#8217;s handle and Second, &lt;em&gt;I was one of the people who helped put IO where it is today&lt;/em&gt;. I don&amp;#8217;t feel particularly bad insulting myself or my own work, and my contributions, though well-intentioned at the time, are a big part of why the system is in the condition its in now. First, a brief history lesson.&lt;/p&gt;

&lt;p&gt;When I joined Parrot, it sported an IO system based on layers. Layers were arranged in a structure something like a vtable, and IO requests would be fed through the layers. Each layer getting the output of the one before it until the bottom layer actually spat the data out (or, read it in depending on which way you were moving). This worked pretty well when you were trying to do File IO on a file with a particular encoding, with buffering, through an asychrony mechanism, etc. Actually I say it worked well but it was sort of overkill: It was just too much infrastructure for the possible benefits and despite the theory of allowing better code reuse there really weren&amp;#8217;t too many different layering combinations that could be set up. Plus, layers start to interdepend and violate encapsulation, then optimization starts prompting a few &amp;#8220;short cuts&amp;#8221; where layers were flattened together. One of the earlier things I did on Parrot, post-GSOC, was to remove some of the last vestiges of the then-unused layering system from Parrot&amp;#8217;s IO.&lt;/p&gt;

&lt;p&gt;The IO subsystem has something of a problem where it has a few masters and has to be performance conscious. Many of our programs are still the kind that shuffle data about (very much in the influence of Perl) and IO operation performance mattered when your compiler is reading in HLL code and outputting PIR code, then you&amp;#8217;re reading PIR code in and trying to compile it again. Too much nonsense and everybody feels it.&lt;/p&gt;

&lt;p&gt;In Parrot at the user level you can do IO in two ways: Through the IO PMCs (FileHandle, mostly) and through opcodes (&lt;code&gt;say&lt;/code&gt;, &lt;code&gt;print&lt;/code&gt;, etc). The problem, put succinctly, is this: We want to encapsulate logic for writing to files inside the FileHandle PMC, but we don&amp;#8217;t want to add new IO-specific VTABLES and we don&amp;#8217;t want to incur the costs of method calls on every single IO request. In other words, we didn&amp;#8217;t want the &lt;code&gt;print&lt;/code&gt; opcode to just be a thin wrapper around the &lt;code&gt;print&lt;/code&gt; method on FileHandle. Such a thing, especially if implemented naively, would have killed performance by creating nested runloops and a whole host of other problems.&lt;/p&gt;

&lt;p&gt;The way the system is set up is that both &lt;code&gt;FileHandle.print()&lt;/code&gt; and the &lt;code&gt;print&lt;/code&gt; opcode are both thin wrappers around the real routine &lt;code&gt;Parrot_io_putps&lt;/code&gt;, which does all the hard work. And, more importantly, that routine is expected to act transparently (like the &lt;code&gt;print&lt;/code&gt; opcode does) on any IO PMC type like Socket or StringHandle. The only real way to do this, if you can&amp;#8217;t call a method on the FileHandle and Socket PMC is to use a large switch-statement:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;switch (handle-&amp;gt;vtable-&amp;gt;base_type) {
    case enum_class_FileHandle:
        ...
    case enum_class_Socket:
        ...
    case enum_class_StringHandle:
        ...
    default:
        Parrot_pcc_invoke_method_from_c_args(..., handle, &amp;quot;print&amp;quot;, ...);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I&amp;#8217;ve obviously glossed over all the details, but this is the general form of that routine and several other similar routines in the IO API. You&amp;#8217;ll notice several things from even a quick glance at this example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If we want to add a new IO type to Parrot core we need to add a new entry to the switch statement in &lt;em&gt;every IO API routine that needs to care about PMC type&lt;/em&gt; (this is a major part of the reason we don&amp;#8217;t yet have a sane, separate Pipe type).&lt;/li&gt;

&lt;li&gt;If the user passes in an Object, something defined at the PIR level, we do fall back to calling the method, because we can&amp;#8217;t do anything else intelligently.&lt;/li&gt;

&lt;li&gt;We can&amp;#8217;t really subclass FileHandle or Socket from the user level, because it would fail the &lt;code&gt;base_type&lt;/code&gt; test, and wouldn&amp;#8217;t be able to handle the low-level structure accesses from that point forward anyway.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Point number 2 is particularly interesting because the &lt;code&gt;FileHandle.print()&lt;/code&gt; method calls &lt;code&gt;Parrot_io_putps&lt;/code&gt;, which may turn around and call the &lt;code&gt;.print()&lt;/code&gt; method. This is a big part of the reason why FileHandle cannot be subclassed in user code. It&amp;#8217;s clearly an example of poorly separated concerns and poor encapsulation. Either the method should call the IO API or the IO API should call the method but we can&amp;#8217;t be doing both. Actually, I&amp;#8217;d far prefer the former, if we can do it in a good, general way.&lt;/p&gt;

&lt;p&gt;There are a few other issues worth mentioning, which I&amp;#8217;ll just dump rapid-fire without much explanation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We don&amp;#8217;t have a separate Pipe type. Instead, FileHandle can be opened in &amp;#8220;pipe mode&amp;#8221; to write to a separate process or read output from a separate process.&lt;/li&gt;

&lt;li&gt;We have limited buffering, but only on FileHandle and we cannot configure buffers for input and output separately, or use separate buffers.&lt;/li&gt;

&lt;li&gt;We don&amp;#8217;t really have encodings set up in any consistent way, so it&amp;#8217;s very possible, though I haven&amp;#8217;t worked out all the details, to write strings with different encodings to a file. This is especially true if we&amp;#8217;re using buffers and performing writes through different API routines.&lt;/li&gt;

&lt;li&gt;FileHandle logic is considered to be the default and is given deference in the code. Pipe logic is unified with file logic at a very low level. Socket and StringHandle are treated as bolted-on spare parts and don&amp;#8217;t benefit from hardly any code sharing or unified architecture. They also don&amp;#8217;t have all the same useful features as FileHandle has.&lt;/li&gt;

&lt;li&gt;Several functions in the IO subsystem are poorly or inconsistently named and implemented, not to mention the often-times confusing documentation and absurd architectural arrangements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So that&amp;#8217;s the system we&amp;#8217;ve got. What do I want to do to fix these issues?&lt;/p&gt;

&lt;p&gt;The first thing I&amp;#8217;ve suggested is to break up IO functionality into an &lt;code&gt;IO_VTABLE&lt;/code&gt; of function pointers, similar to how the &lt;code&gt;STR_VTABLE&lt;/code&gt;, the sprintf dispatch mechanism, the packfile segment dispatch table and other similar mechanisms in Parrot work. Each IO request would go through the API routines, which dispatch to a vtable routine (possibly with some intermediate buffering logic). Here&amp;#8217;s what it looks like in the branch to do a basic write:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;IO_VTABLE * const vtable = IO_GET_VTABLE(interp, handle);
vtable-&amp;gt;write_s(interp, handle, str-&amp;gt;strstart, str-&amp;gt;bufused);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And here&amp;#8217;s how to do it with write buffering:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;IO_VTABLE * const vtable = IO_GET_VTABLE(interp, handle);
IO_BUFFER * const read_buffer = IO_GET_READ_BUFFER(interp, handle);
Parrot_io_buffer_write_s(interp, handle, vtable, buffer, str);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Internall, the buffer does it&amp;#8217;s magic and flushes data out to the vtable if necessary.&lt;/p&gt;

&lt;p&gt;The second thing I want to do is break out buffering so that instead of being a detail of the FileHandle PMC a buffer is a separate struct which can be attached to any IO type as desired. And, even better, we can attach multiple buffers to an IO stream, at least one each for input and output, configured separately. The buffering API, which will be cleaned up and properly encapsulated, will take a pointer to the &lt;code&gt;IO_VTABLE&lt;/code&gt; for the handle and will pass data through transparently as required. A thin wrapper PMC type, &lt;code&gt;IO_BUFFER&lt;/code&gt;, would allow references to buffers to be accessed and configured directly, which would be very useful in some cases.&lt;/p&gt;

&lt;p&gt;Imagine, if I may go off on a short tangent, a threaded system where one worker task had a reference to a buffer and continuously made sure it was filled in the background while another worker task read bits and pieces from the buffer very quickly. It would be possible, through careful choice of algorithm, to do such a thing lock-free. Feel free to replace &amp;#8220;file&amp;#8221; with &amp;#8220;socket&amp;#8221; or &amp;#8220;pipe&amp;#8221; in the example above too. Imagine also a system where we can transparently use &lt;code&gt;mmap&lt;/code&gt; (or it&amp;#8217;s windows equivalent) to map a file to memory as part of the buffer, and keep working with it that way.&lt;/p&gt;

&lt;p&gt;The third thing I want to do is start teasing apart the logic for Pipes from the file logic. I&amp;#8217;ll create a separate &lt;code&gt;io_vtable&lt;/code&gt; for pipe operations, and use that inside FileHandle when we&amp;#8217;re in pipe mode. Eventually we&amp;#8217;ll be able to create a separate type, divide out all the logic completely, and get to work on really interesting stuff like feasible 2-way and 3-way pipes.&lt;/p&gt;

&lt;p&gt;The fourth thing I want to do is start setting up interfaces so that IO operations including buffering, low-level IO, file descriptor manipulation and other things become more accessible at the PIR level so users can make better use of these tools, both in subclasses of the in-built handle PMCs and in custom types which neither derive from nor hold instances of those types.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve started sketching out many of these ideas in the &lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt; branch. cotto seems to agree with the general direction and I haven&amp;#8217;t heard any complaints so far, so I&amp;#8217;ve had my head down and been working hard on making these ideas reality. As of this writing, I&amp;#8217;ve modified just about every single line of code in the subsystem, gotten most of the new architecture and logic into place and set up the vtables for the most important built-in types. I have a few details to finish up before I try to build (and inevitably debug) this new beasts. Ultimately I would like this first round of cleanups to produce no user-visible changes, so the old PMC methods and exported API functions are going to continue doing what they&amp;#8217;ve always done. Later rounds of cleanups will add new interfaces and eventually deprecate and remove some of the crufty older ones. I&amp;#8217;ll post more updates as this work progresses.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/V4D3FNyxOy0" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/05/27/io_cleanup_first_round.html</feedburner:origLink></entry>
    
    <entry>
        <title>Destructors are Hard</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/3rcWz4_4zGw/destructors_are_hard.html" />
        <updated>2012-05-23T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/05/23/destructors_are_hard</id>
        <content type="html">&lt;p&gt;&amp;#171;&amp;#171;&amp;lt;&amp;#171;&amp;#160;HEAD:drafts/gc_destructors.md In my last post I mentioned some of the work I was trying to do with GC finalization and destructors. I promised I would publish a longer and more in-depth post about destructors, what the current state is, what I am doing,&lt;/p&gt;

&lt;h1 id='and_what_still_needs_to_be_done'&gt;and what still needs to be done.&lt;/h1&gt;

&lt;p&gt;In &lt;a href='/2012/05/20/pending_branchwork.html'&gt;my last post&lt;/a&gt; I mentioned some work involving the GC, finalization and destructors. Today I&amp;#8217;m going to expand on some of those ideas, talk about what the current state of destruction and finalization are in Parrot, some of the problems we have with coming up with better solutions, and some of the things I and others are working on to get this all working as our users expect us to. I apologize in advance for such a long post, there&amp;#8217;s a lot of information to share, and hopefully a much larger architectural discussion to be started.&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;3d0a502723cb7124eb717d7e82bac5ecc567ac31:_posts/2012-05-23-destructors_are_hard.md&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;Destructors are hard. The idea behind a destructor is a simple one: We want to have a piece of code that is guaranteed to execute when the associated object is freed. Memory allocated on the heap is going to get reclaimed en masse by the operating system when the process exits. However, things such as handles, connections, tokens, mutexes, and other remote resources might not necessarily get freed or handled correctly if the process just exits, or if the object is destroyed without some sort of finalization logic performed on it. Here&amp;#8217;s a sort of example that&amp;#8217;s been bandied about a lot recently:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function main () {
    var f = new &amp;#39;FileHandle&amp;#39;;
    f.open(&amp;quot;foo.txt&amp;quot;, &amp;quot;w&amp;quot;);
    f.print(&amp;quot;hello world&amp;quot;);
    exit(1);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this example we would expect that the text &lt;code&gt;&amp;quot;hello world&amp;quot;&lt;/code&gt; would be written to the &lt;code&gt;foo.txt&lt;/code&gt; file. However, because the text to be written may be buffered (both in Parrot and by the OS), there&amp;#8217;s a very real chance that the data won&amp;#8217;t get written if we do not call the finalizer for the &lt;code&gt;FileHandle&lt;/code&gt; PMC.&lt;/p&gt;

&lt;p&gt;Obviously, the brain-dead solution to this particular problem is to manually close or flush the file handle:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function main () {
    var f = new &amp;#39;FileHandle&amp;#39;;
    f.open(&amp;quot;foo.txt&amp;quot;, &amp;quot;w&amp;quot;);
    f.print(&amp;quot;hello world&amp;quot;);
    f.close();
    exit(1);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;However, the whole point of having things like finalizers (&amp;#8220;destructors&amp;#8221;) and GC is to make it so that the programmer does not need to worry about little details like these. The program should be smart enough to find dead objects in a timely manner and free their resources. Beyond that, many programming languages (with special emphasis on Perl6) require the availability of reliable and sane destructors.&lt;/p&gt;

&lt;p&gt;In the remainder of this post I would like to talk about why destructors are hard to implement correctly, why Parrot does not currently (really) have them, and some of the ideas we&amp;#8217;ve been kicking around about how to add them.&lt;/p&gt;

&lt;p&gt;First, let&amp;#8217;s cover where we currently stand. Parrot does have destructors, of a sort, in the form of the &lt;code&gt;destroy&lt;/code&gt; vtable. That routine is called by the GC when the object is being reclaimed, during the sweep pass. A side-effect of this implementation is that if PMC &lt;code&gt;A&lt;/code&gt; refers to PMC &lt;code&gt;B&lt;/code&gt; and both are being collected, it&amp;#8217;s very possible that &lt;code&gt;A&lt;/code&gt;&amp;#8217;s destructor tries to access some information in &lt;code&gt;B&lt;/code&gt; &lt;em&gt;after &lt;code&gt;B&lt;/code&gt; has already been reclaimed&lt;/em&gt;. Think about a database connection object that maintains a socket on one side, and a hash of connection information on the other. The socket probably cannot perform a simple disconnect, but instead should send some sort of sign-off message first to alert the server that it can proceed with its own cleanup. The socket PMC would need information from the connection information hash to send this final message, but if the hash had already been reclaimed the access would fail with undefined results.&lt;/p&gt;

&lt;p&gt;This situation has lead to more than a few calls for ordered destruction. In one of the most common and severe cases, Parrot&amp;#8217;s Scheduler PMC was being relied upon by various managed PMCs. When a Task PMC was destroyed, at least in earlier iterations of the system, it would attempt to send a message to the Scheduler that it was no longer available to be scheduled. Ignore for a moment the fact that the Task could not possibly have been reclaimed in the first place if the Scheduler had a live reference to it, and if the Scheduler was still alive itself.&lt;/p&gt;

&lt;p&gt;Because of some of these order-of-destruction bugs, GC finalization (a final, all-encompassing GC sweep path guaranteed to execute all remaining destructors prior to program exit) had been turned off. That and performance reasons. Turning off GC finalization leads to the problem above where data written to the FileHandle is not not flushed before program exit. You are probably now starting to understand the bigger picture here.&lt;/p&gt;

&lt;p&gt;Having ordered destruction means essentially that we should be able to have an acyclic dependency graph of all objects in the system with destructors. However, maintaining this in the general case is impossible and attempting to approximate it would be very expensive in terms of performance. In any case, this is just a way to work around the problem of our naive sweep algorithm, which destroys and frees dead objects in a single pass, and not a real solution to the larger problems. A far better idea, recently suggested by hacker Moritz, is a 2-pass GC sweep.&lt;/p&gt;

&lt;p&gt;In the 2-pass case the GC sweep phase would have two loops: the first to identify all PMCs to be swept (from a linear scan of the entire memory pool), execute destructors on them and add them all to a list, and the second to iterate over that list (after all destructors had been called) and reclaim the memory. Because of the linked-list setup of the GC, this second pass could, conveivably, be almost free because we could simply append this list of swept items to the end of the free list for an &lt;code&gt;O(1)&lt;/code&gt; operation , and the first pass would be no less friendly on the processor data cache than our current sweep would be. This, in theory, solves our problem with ordered destruction, and should allow us to re-enable GC finalization globally without having to worry about these kinds of bugs causing segfaults in the final milliseconds of a program.&lt;/p&gt;

&lt;p&gt;So that&amp;#8217;s the basics of our current system and our problem with GC finalization, and shows us how we would proceed to make sure destructors were always called as a guarantee of the VM. However, this doesn&amp;#8217;t begin to address any of the problems with destructors that will plague their implementation and improvement. I&amp;#8217;ll talk about that second subject now.&lt;/p&gt;

&lt;p&gt;Destructors, as I said earlier, are hard. In the case of GC finalization, after the user program has executed and exited, it&amp;#8217;s relatively easy to loop over all objects and call destructors. It is those destructors which happen during normal program execution that cause problems.&lt;/p&gt;

&lt;p&gt;In the C++ language, destructors have certain caveats and limitations. For instance, we can&amp;#8217;t really throw exceptions from destructors, because that may crash the program. Not just an &amp;#8220;oops, here&amp;#8217;s an exception for you to handle&amp;#8221;, but instead a full-on crash. In Parrot we can probably be smarter about avoiding a crash but not by much. It&amp;#8217;s a limitation of the entire destructors paradigm. Let me demonstrate what I&amp;#8217;m talking about.&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s say I have this program, which opens up a filehandle to write a message and then starts doing something unrelated to the filehandle but expensive with GC:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function foo() {
    var f = new &amp;#39;FileHandle&amp;#39;;
    f.open(&amp;quot;foo.txt&amp;quot;, &amp;quot;w&amp;quot;);
    f.write(&amp;quot;hello world!&amp;quot;);
    f = null;       // No more references to f!

    for (int j = 0; j &amp;lt; 1000000; j++) {
        var x = new MyObject(j);
        x.DoSomething();
        x.DoSomethingElse();
        x.DoOneLastThing();
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Somewhere along the line, when the GC runs out of space, it&amp;#8217;s going to attempt a sweep and that means that &lt;code&gt;f&lt;/code&gt; is going to be identified as unreferenced, finalized and reclaimed. The question is, where? The thing is that we don&amp;#8217;t know where GC is going to run for a few reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We don&amp;#8217;t know how many free headers GC has left in the free list before it has to sweep to find more.&lt;/li&gt;

&lt;li&gt;We don&amp;#8217;t know how many PMCs are being allocated per loop iteration, because the various methods on &lt;code&gt;x&lt;/code&gt; could be allocating them internally, and all PCC calls currently generate at least one PMC, and this is a lot of pressure.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So at any point in that loop, any point at all, GC could execute and reclaim the FileHandle &lt;code&gt;f&lt;/code&gt;. That calls the destructor, flushes the output, and frees the handle back to the operating system. Good, right? What if there is a problem closing that handle, and the destructor for FileHandle tries to throw an exception (or, if this example isn&amp;#8217;t stoking your imagination, imagine that &lt;code&gt;f&lt;/code&gt; is an object of type &lt;code&gt;MyFileHandleSubclass&lt;/code&gt; with an error-prone finalizer).&lt;/p&gt;

&lt;p&gt;There are a few options for what to do here. The first option is that we throw the exception like normal. This means that the loop code with the &lt;code&gt;MyObject&lt;/code&gt; variables, which is running perfectly fine and has no reason to throw an exception by itself, is interrupted in mid loop. The backtrace, if we provide one at all, probably points to &lt;code&gt;MyObject&lt;/code&gt; but with an exception type and exception message indicative of a failed FileHandle closing. Initial review by the poor developer doing the debugging will show that there are no filehandles trying to close inside this loop and then we get a bug report because a snippet of code which is running just fine exits abruptly with an error condition which it did not cause. The solution for this, wrapping every single line of code you ever write in exception handlers to catch the various possible exceptions thrown from GC finalizers, is untenable from a developer perspective.&lt;/p&gt;

&lt;p&gt;A second option is that we somehow disallow things like exceptions from being thrown from destructors, because there&amp;#8217;s no real way to catch them rationally. This seems reasonable, until we start digging into details. How do we disallow these, by technical or cultural means? And if we&amp;#8217;re relying on cultural means (a line in a .html document somewhere that says &amp;#8220;don&amp;#8217;t do that, and we won&amp;#8217;t be responsible if you do!&amp;#8221;), what happens if a hapless young programmer does it anyway without having first read all million pages of our hypothetical documentation? Does Parrot just crash? Does it enter into some kind of crazy undefined state? Obviously we would need some kind of technical mechanism to prevent bad things from happening in a destructor, though the list of potentially bad things is quite large indeed (throwing exceptions, allocating new PMCs, installing references to dead about-to-be-swept objects into living global PMCs, etc) and filtering these out by technical means would be both difficult and taxing on performance. When you consider that even basic error reporting operations at an HLL level, depending on syntax and object model used, may cause a string to be boxed into a PMC, or a method to be called requiring allocation of a PMC for the PCC logic, or whatever, we end up with finalizers which are effectively useless.&lt;/p&gt;

&lt;p&gt;A third option is that we could just ignore certain requests in finalizers, such as throwing exceptions. If an exception is thrown at any point we just pack up shop, exit the finalizer and pretend it never happened. This works fine for exceptions, but does nothing for the problem of a finalizer attempting to store a reference to the dieing object into a living object. I don&amp;#8217;t know why a programmer would ever want to do that, but if it&amp;#8217;s possible you can be damned sure it will happen eventually. Also, when I say &amp;#8220;pack up shop&amp;#8221;, we&amp;#8217;re probably talking about a &lt;code&gt;setjmp&lt;/code&gt;/&lt;code&gt;longjump&lt;/code&gt; call sequence, which isn&amp;#8217;t free to do.&lt;/p&gt;

&lt;p&gt;The general consensus among developers is that errors caused by programs running on top of Parrot should never segfault. If you&amp;#8217;re running bytecode in a managed environment, the worst that you should ever be able to get is an exception. Segmentation faults should be impossible to get from a pure-pbc example program.&lt;/p&gt;

&lt;p&gt;However, as soon as you introduce destructors, suddenly these things become possible. And not just from specifically malicious code, even moderately naive code will be able to segfault by storing a reference to a dieing PMC in a place accessible from live PMCs. Unless, that is, we try to do something expensive like filtering or sandboxing, which would absolutely kill performance.&lt;/p&gt;

&lt;p&gt;And this point I keep bringing up about dead objects installing references to themselves in living objects is not trivial. Our whole system is built around the premise that objects which are referenced are alive and objects which are no longer referenced can be reclaimed by GC. Throughout most of the system we dereference pointers as if they point to valid memory locations or to live PMCs. If we turn that assumption around and say that dead objects may still be referenced by the system, then we lose almost all of the benefits that our mark and sweep GC has to offer. Specifically we would either have to install tests for &amp;#8220;liveness&amp;#8221; around &lt;em&gt;every single PMC pointer access&lt;/em&gt;, which would bring performance to a standstill. Otherwise, we need to have a policy that says the user at the PIR level is able to create segfaults without restriction, though officially we declare it to be a bad idea. It&amp;#8217;s not just a matter of having to test PMCs to make sure they are alive, the memory could be reclaimed and used for some other purpose entirely! Meerly accessing a reclaimed PMC could cause problems (segfaults, etc) or, if the PMC has already been recycled into something like a transparent proxy for a network resource, send network requests to do things that you don&amp;#8217;t want to have happen! The security implications are troubling &lt;em&gt;at best&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The only real solution I can come up with to this problem, and it&amp;#8217;s not a very good one, is to add a &amp;#8220;purgatory&amp;#8221; section to the GC, where we put PMCs during GC sweep but we do not actually free them. The next time GC runs, anything which is still in purgatory is clearly not referenced and can be freed immediately. Anything that is no longer in purgatory has been &amp;#8220;resurrected&amp;#8221; by some shenanigans and has to be treated as still being alive &lt;em&gt;even though its destructor has already been called&lt;/em&gt;. In other words, we take a performance hit and enable zombification in order to prevent segfaults. I don&amp;#8217;t know what we want to do here, this is probably the kind of decision best left to the architect (or tech-savvy clergy) but I just want to point out that none of our options are great.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve also brought up the problem with allocating new objects during a finalizer. Why is this a problem? Keep in mind that GC tends to execute when we&amp;#8217;ve attempted to allocate an object and have none in the free list. If we have no available headers on the free list, are already in the middle of a GC sweep and ask to allocate a new header, what do we do? Maybe we say that we invoke GC when we have only 10 items left (instead of 0) on the free list, guaranteeing that we always have a small number of headers available for finalization, though no matter what we set this limit at it&amp;#8217;s possible we could exhaust the supply if we have many objects to finalize with complex finalizers. Every time a finalizer calls a method or boxes a string, or does any of a million other benign-sounding things PMCs get allocated. If we try to allocate a PMC when there are no PMCs on the free list and we&amp;#8217;re already in the middle of GC sweep, the system may trigger another recursive GC run.&lt;/p&gt;

&lt;p&gt;Another option is that we could maintain multiple pools and only sweep one at a time. If one pool is being swept we could allocate PMCs from the next pool (possibly triggering a GC sweep in that second pool and needing to recurse into a second pool, etc). Maybe we allocate headers directly from malloc while we&amp;#8217;re in a finalizer, keep them in a list and free them immediately after the finalizer exits. We have some options here, but this is still a very &amp;#171;&amp;#171;&amp;lt;&amp;#171;&amp;#160;HEAD:drafts/gc_destructors.md real problem that requires very careful consideration. Something like a semi-space GC algorithm might help here, because we could allocate from the &amp;#8220;dead space&amp;#8221; before that space was freed.&lt;/p&gt;

&lt;p&gt;Or we could try to immediately free some PMCs during the first sweep pass, and use those headers as the free list from which to allocate during destructors. This raises some problems because it would be very difficult to identify PMCs which could be freed during the first pass without negating any references which are going to be accessed during the destructors. Also, we run into the (rare) occurance where all the PMCs swept during a particular GC run have destructors, and there are no &amp;#8220;unused&amp;#8221; headers to immediately free and&lt;/p&gt;

&lt;h1 id='recycle_for_destructors'&gt;recycle for destructors.&lt;/h1&gt;

&lt;p&gt;real problem that requires very careful consideration. Again, I don&amp;#8217;t have an answer here, just a long list of terrible options that need to be sorted according to the &amp;#8220;lesser of all evils&amp;#8221; algorithm.&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;3d0a502723cb7124eb717d7e82bac5ecc567ac31:_posts/2012-05-23-destructors_are_hard.md&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let&amp;#8217;s look at destructors from another angle. Obviously a garbage-collected system is supposed to free the programmer up from having to manually manage memory (at least) and possibly other resources as well. You make a mess and don&amp;#8217;t want to clean it yourself, the GC comes along after you and takes care of the things you don&amp;#8217;t wnat to do yourself. On one hand the argument can be made that if you really care about a resource being cleaned in a responsible, timely manner, that you call an explicit finalizer yourself and leaving those kinds of tasks to the finalizer is akin to saying &amp;#8220;I don&amp;#8217;t care about that object and whatever happens, happens.&amp;#8221; After all, if you can&amp;#8217;t throw an exception from a destructor and if the destructor is called outside normal program flow with no opportunity to report back even the simplest of success/failure conditions, it really doesn&amp;#8217;t matter from the standpoint of the programmer whether it succeeded or silently failed. Further, if the resource is sensitive, you don&amp;#8217;t clean it explicitly and Parrot later crashes and segfaults because some uninformed user created a zombie PMC reference, your destructor cannot and will not get called no matter what. If all sorts of things at multiple levels can go wrong and prevent your destructor from running, does it &lt;em&gt;really&lt;/em&gt; matter if the destructor gets called at all?&lt;/p&gt;

&lt;p&gt;Another viewpoint is that destructors don&amp;#8217;t need to be black-boxes, and we don&amp;#8217;t care if they have problems so long as they&amp;#8217;ve given a best effort to &amp;#171;&amp;#171;&amp;lt;&amp;#171;&amp;#160;HEAD:drafts/gc_destructors.md free the resources, those efforts have a decent expected chance of success, and they have an opportunity to log problems in case somebody has a few moments to spare reading through log files. After all, if a FileHandle fails to close in an automatically-invoked destructor, it also would have failed to close in a manually-invoked one and what are you going to do about it? If the thing won&amp;#8217;t close, it won&amp;#8217;t close. You can either log the failure and keep going with your program (like our destructor would have done automatically) or you can raise hell and possibly terminate the program (like what &lt;em&gt;could&lt;/em&gt; happen if an exception is thrown from a destructor). In other words, when you&amp;#8217;re talking about failures related to basic resources at the OS level, there aren&amp;#8217;t many good options when you&amp;#8217;re writing programs at the Parrot level.&lt;/p&gt;

&lt;p&gt;I suspect that what we are going to end up with is a system where we allocate a temporary managed pool of PMCs to be available, and allocate all PMCs during a destructor from that pool. After GC, we clear the emergency pool at once. This solution adds a certain amount of complexity to the GC and also does nothing to deal with the zombie references problem I&amp;#8217;ve mentioned several times. We&amp;#8217;d have to make a stipulation that PMCs allocated during a destructor &lt;em&gt;may not&lt;/em&gt; themselves have automatic destructors.&lt;/p&gt;

&lt;p&gt;Things start to get a little bit complicated no matter what path we choose. This is the kind of issue where we&amp;#8217;re going to need lots more input,&lt;/p&gt;

&lt;h1 id='especially_from_our_users'&gt;especially from our users.&lt;/h1&gt;

&lt;p&gt;free the resources and they have an opportunity to log problems in case somebody has a few moments to spare reading through log files. After all, if a FileHandle fails to close in an automatically-invoked destructor, it also would have failed to close in a manually-invoked one and what are you going to do about it? If the thing won&amp;#8217;t close, it won&amp;#8217;t close. You can either log the failure and keep going with your program (like our destructor would have done automatically) or you can raise hell and possibly terminate the program (like what &lt;em&gt;could&lt;/em&gt; happen if an exception is thrown from a destructor). In other words, when you&amp;#8217;re talking about failures related to basic resources at the OS level, there aren&amp;#8217;t many good options when you&amp;#8217;re writing programs at the Parrot level. If you&amp;#8217;re not so hot at OS administration, there might not be anything you can do no matter what.&lt;/p&gt;

&lt;p&gt;In Parrot we really want to enable PMC destruction and GC finalization. As things stand now you can run &lt;code&gt;destroy&lt;/code&gt; vtables written in C, usually without issue. However when we expose this functionality to the user we are talking about executing PBC, in a nested runloop (at least one!), with fresh allocations and all the capabilities of PBC at your disposal. As soon as you open that can of worms, the many problems and problematic possibilities become manifest. The security concerns become real. The performance implications become real. I&amp;#8217;m not saying that these are problems we can&amp;#8217;t solve, I&amp;#8217;m only pointing out that they haven&amp;#8217;t been solved already because they are hard problems with real trade-offs and some tough (and unpopular) decisions to be made.&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;3d0a502723cb7124eb717d7e82bac5ecc567ac31:_posts/2012-05-23-destructors_are_hard.md&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/3rcWz4_4zGw" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/05/23/destructors_are_hard.html</feedburner:origLink></entry>
    
    <entry>
        <title>Pending Branchwork</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/F2eanBpmZIM/pending_branchwork.html" />
        <updated>2012-05-20T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/05/20/pending_branchwork</id>
        <content type="html">&lt;p&gt;As I promised in my last post, I have several branches up in the air that need to be worked on. Some branches merged last week after the release. Others are pending to merge soon and some are still in development. In this post I&amp;#8217;m going to give a short summary of these things, since I haven&amp;#8217;t been posting regular updates like normal.&lt;/p&gt;

&lt;h3 id='already_merged'&gt;Already Merged&lt;/h3&gt;

&lt;p&gt;After the release last week I merged three small branches that brought small changes and appeared to test cleanly with NQP and Rakudo. In short, these were uncontroversial.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;whiteknight/gh_675&lt;/code&gt; named after the &lt;a href='https://github.com/parrot/parrot/issues/675'&gt;Github Issue of the same name&lt;/a&gt;, this branch removed the &lt;code&gt;can&lt;/code&gt; vtable. In all cases in core and in external projects where I looked, the &lt;code&gt;can&lt;/code&gt; vtable was simply a redirect to the &lt;code&gt;find_method&lt;/code&gt; vtable and a check for null. There&amp;#8217;s no need for this added indirection, we can call the &lt;code&gt;find_method&lt;/code&gt; VTABLE directly from &lt;code&gt;can&lt;/code&gt; opcode.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;whiteknight/imcc_file_line&lt;/code&gt; This branch removed some very old, long-deprecated IMCC directives. The &lt;code&gt;.line&lt;/code&gt; and &lt;code&gt;.file&lt;/code&gt; directives were not poorly implemented (as far as IMCC goes) but they weren&amp;#8217;t used and weren&amp;#8217;t introspectable. The &lt;code&gt;setline&lt;/code&gt; and &lt;code&gt;setfile&lt;/code&gt; directives (yes, they are directives even though they looked like opcodes!) weren&amp;#8217;t used anywhere and weren&amp;#8217;t implemented well. I&amp;#8217;ve removed all four. Now, we can use the &lt;code&gt;.annotate&lt;/code&gt; directive to replace all of these and add other metadata besides in a way that is easy to introspect from within running bytecode.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;whiteknight/remove_cmd_ops&lt;/code&gt; removed a few command-line arguments from the parrot executable which were non-functional. These arguments have been disconnected since the time of the IMCC API cleanups months ago, and nobody had even noticed. Now they&amp;#8217;re gone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those things out of the way, here&amp;#8217;s a list of some of the branches that are currently unmerged but may be merging soon.&lt;/p&gt;

&lt;h3 id='id1'&gt;&lt;code&gt;eval_pmc&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;This is one of the most disruptive branches I&amp;#8217;ve got going, which is why I&amp;#8217;m in no hurry to merge it. Before I can merge it I need to patch both NQP and Rakudo. I submitted patches for these but they weren&amp;#8217;t ready to apply and I have to go back and re-do them.&lt;/p&gt;

&lt;p&gt;This branch removes the deprecated &lt;code&gt;Eval&lt;/code&gt; PMC. The &lt;code&gt;IMCCompiler&lt;/code&gt; PMC has already been updated to use a PDD31-compliant interface, which returns a &lt;code&gt;PackfileView&lt;/code&gt; PMC instead of an &lt;code&gt;Eval&lt;/code&gt;. NQP and Rakudo need to be updated to use this new interface instead of the older &lt;code&gt;VTABLE_invoke&lt;/code&gt; one. This update will work in the Parrot master branch just fine, so we can make those updates to NQP and Rakudo and test them thoroughly before we merge the &lt;code&gt;eval_pmc&lt;/code&gt; branch in.&lt;/p&gt;

&lt;h3 id='id2'&gt;&lt;code&gt;remove_sub_flags&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;This is a much bigger and much more disruptive branch. However, because of the fact that NQP and Rakudo don&amp;#8217;t really use subroutine flags for their control flow, those two projects won&amp;#8217;t really be affected as much as everybody else will be.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;remove_sub_flags&lt;/code&gt; branch removes the &lt;code&gt;:load&lt;/code&gt; and &lt;code&gt;:init&lt;/code&gt; flags from the PIR syntax and replaces them with &lt;code&gt;:tag&lt;/code&gt;. The only real way to work with &lt;code&gt;:tag&lt;/code&gt; is through the &lt;code&gt;PackfileView&lt;/code&gt; PMC, so we need to merge the &lt;code&gt;eval_pmc&lt;/code&gt; branch into Parrot first before we can make any further progress on this one. This is a back-burner task and will probably not be touched before the end of the summer.&lt;/p&gt;

&lt;h3 id='id3'&gt;&lt;code&gt;whiteknight/gc_finalize&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;We&amp;#8217;ve received some requests from Rakudo folks that we need to start getting serious about GC finalization. This involves two changes: First is setting the GC to perform a finalization sweep at interp exit, which it currently is not doing. The second is to fix some sweep-related behaviors so the &lt;code&gt;destroy&lt;/code&gt; VTABLE can be much more sane and useful.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;whiteknight/gc_finalize&lt;/code&gt; branch does both of these things. First, it re-enables GC finalization which had been turned off for so long that the code for it no longer works in master. Second, it moves to a two-stage sweep algorithm, so that we execute all &lt;code&gt;destroy&lt;/code&gt; vtables first before we start freeing any resources.&lt;/p&gt;

&lt;p&gt;There are still going to be problems with &lt;code&gt;destroy&lt;/code&gt; vtables however, and I&amp;#8217;m searching for solutions to these. Let me illustrate with a short example. We call GC to sweep typically in response to a request for a new PMC when we have none on the free list. If we have an item on the freelist, we return that immediately and very quickly. If not, we invoke GC to try and free up some headers (or allocate new ones from the OS).&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s say we&amp;#8217;re programming in Rakudo Perl6 and we have an object with a destructor. For the purposes of our example, it&amp;#8217;s a DB connection object. That destructor needs to call a method on a Socket object connecting the client program to the server. As everybody should be aware of now, calling a method in Parrot itself is going to allocate a CallContext PMC.&lt;/p&gt;

&lt;p&gt;However, we run into a small problem because we&amp;#8217;re in GC &lt;em&gt;because&lt;/em&gt; we&amp;#8217;re out of PMCs to allocate. So if we try to allocate a new PMC at this point I don&amp;#8217;t know exactly what will happen but I can only imagine that the results would not be good. At the worst case, we recursively call into GC which goes back to sweeping, which re-executes finalizers, and we get into an infinite loop.&lt;/p&gt;

&lt;p&gt;I won&amp;#8217;t go into all the details here, I&amp;#8217;ve got another (long) post drafted that discusses these and some other issues related to finalization. This &lt;code&gt;whiteknight/gc_finalize&lt;/code&gt; branch solves some of the first few problems but there will be more to come after that.&lt;/p&gt;

&lt;h3 id='id4'&gt;&lt;code&gt;whiteknight/gh_663&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;singleton&lt;/code&gt; designator for C-level PMCs has been deprecated for some time now, and the &lt;code&gt;whiteknight/gh_663&lt;/code&gt; branch intends to remove them.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s how singletons work in Parrot: The &lt;code&gt;get_pointer&lt;/code&gt; and &lt;code&gt;set_pointer&lt;/code&gt; vtables are used to manage a single reference to an existing singleton PMC if any. To get the PMC, we invoke the &lt;code&gt;get_pointer&lt;/code&gt; vtable &lt;em&gt;without an invocant PMC&lt;/em&gt; (the only such occurance of a vtable invoked without an existing PMC reference in the whole codebase that I am aware of). If it returns NULL, a new header is created. If the new header is created, the &lt;code&gt;set_pointer&lt;/code&gt; vtable is called on the new object with itself as an argument.&lt;/p&gt;

&lt;p&gt;This all happens inside &lt;code&gt;Parrot_pmc_new&lt;/code&gt; and is mostly transparent, except for the few bits of code throughout the system which violate this (rather flimsy) encapsulation boundary.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;get_pointer&lt;/code&gt; and &lt;code&gt;set_pointer&lt;/code&gt; vtables operate on &lt;code&gt;void*&lt;/code&gt; pointers, so we even lose typesafety. Plus, we don&amp;#8217;t expose &lt;code&gt;get_pointer&lt;/code&gt; or &lt;code&gt;set_pointer&lt;/code&gt; vtables to PIR code, so there&amp;#8217;s absolutely no way to create a singleton class at the user-level using this mechanism. You can do what users of all other languages do and create an accessor and restricted constructor and implement singletons that way. In fact, I think that&amp;#8217;s better.&lt;/p&gt;

&lt;p&gt;The majority of offending code has been ripped out of this branch, though I&amp;#8217;m still seeing some segfaults during the build as a result of bad, unchecked pointer accesses in places where encapsulation has been violoated. I&amp;#8217;ve got to spend a little bit more time tracking down some of these failures. Then, assuming NQP and Rakudo aren&amp;#8217;t relying on this mechanism, the merge should be relatively painless.&lt;/p&gt;

&lt;h3 id='id5'&gt;&lt;code&gt;whiteknight/gh_610&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;A while ago, moritz suggested that we improve integration of our ByteBuffer PMC type, especially with our FileHandle and Socket types. We should be able to read a sequence of raw bytes from either of those PMCs into a ByteBuffer and we should be able to write raw bytes from a ByteBuffer into either of those destinations too.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;whiteknight/gh_610&lt;/code&gt; aims to make this a reality. Already I&amp;#8217;ve done most of the code work to get this in place, though I haven&amp;#8217;t added all the necessary tests and documentation. Plus, a few coding standards tests are failing too.&lt;/p&gt;

&lt;p&gt;While looking at this code, I am reminded that the IO subsystem is kind of messy. I&amp;#8217;ve tried to clean it up in the past, and made a few small improvements over time. However, without a larger guiding vision to follow, I never really had a great idea of what kind of larger architectural changes to make to really bring this subsystem up out of the mud. After working on this branch, I finally had something like a flash of insight, and think I have a good idea about how to clean things up. This leads me to&amp;#8230;&lt;/p&gt;

&lt;h3 id='id6'&gt;&lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;My idea is a relatively simple one: All our IO operations are controlled by the various PMC types (FileHandle, Socket, StringHandle, etc), but all our IO API functions are currently implemented as ugly (and brittle) switch statements to pick between execution pathways for these different types. A far better idea would be to separate out the different logic behind a virtual function dispatch table (vtable).&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve written up some proposed changes in the &lt;code&gt;whiteknight/io_cleanup1&lt;/code&gt; branch, and will start work if other people think it&amp;#8217;s a decent idea.&lt;/p&gt;

&lt;p&gt;The key points are as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Move all FileHandle-specific logic into src/io/filehandle.c. Do the same for Pipe, Socket and StringHandle types.&lt;/li&gt;

&lt;li&gt;Implement a new &lt;code&gt;io_vtable&lt;/code&gt; type, which will contain a dispatch table for common operations. Each one of the files created in #1 above will implement the routines for one &lt;code&gt;io_vtable&lt;/code&gt; and supporting logic.&lt;/li&gt;

&lt;li&gt;Buffering will be refactored. Instead of the FileHandle PMC containing several attributes for buffering, we&amp;#8217;ll instead use an &lt;code&gt;io_buffer&lt;/code&gt; object to hold buffering details. An encapsulated buffering API will take this buffer structure and the relevant vtable and automatically perform buffering if necessary.&lt;/li&gt;

&lt;li&gt;I am going to start separating out Pipe logic from FileHandle, though I&amp;#8217;m not planning to create a separate type for it quite yet.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once these things are done, I think the IO system will be much cleaner and much more hackable. This is lower priority right now until some of my ideas are vetted, but I&amp;#8217;m glad I finally have a plan in mind after so many years of staring helplessly at this code.&lt;/p&gt;

&lt;h3 id='id7'&gt;&lt;code&gt;whiteknight/sprintf_cleanup&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;The engine for our &lt;code&gt;sprintf&lt;/code&gt; implementation is sort of old and messy. It&amp;#8217;s some very functional and very stable code, but it needs to be brought up to date with our modern coding and organizational standards.&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;whiteknight/sprintf_cleanup&lt;/code&gt; branch I make several changes, most of which are entirely internal and should not affect users at all:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I move the files from &amp;#8216;src/misc.c&lt;code&gt; and &lt;/code&gt;src/spf_&lt;em&gt;.c&lt;code&gt; to
&lt;/code&gt;src/string/sprintf.c&lt;code&gt; and &lt;/code&gt;src/string/spf_&lt;/em&gt;.c&lt;code&gt; respectively.&lt;/code&gt;&lt;/li&gt;

&lt;li&gt;I&amp;#8217;ve cleaned up some header-file nonsense and created a new &lt;code&gt;src/string/spf_private.h&lt;/code&gt; header file to hold private data.&lt;/li&gt;

&lt;li&gt;I&amp;#8217;ve changed the code to use a StringBuilder instead of the older (and now-incorrect) repeated string concatenations. With immutable strings, each concat operation creates a new STRING instead of appending to the pre-allocated buffer, which is extremely wasteful. I haven&amp;#8217;t benchmarked this change, but I suspect it has higher performance on longer, more complicated formats.&lt;/li&gt;

&lt;li&gt;I&amp;#8217;ve fixed a sub-optimal error message at request of benabik in &lt;a href='https://github.com/parrot/parrot/issues/759'&gt;ticket #759&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This branch is almost complete and I&amp;#8217;ll probably merge it this weekend. Besides the text of the exception message, there are no visible user changes so it shouldn&amp;#8217;t be controversial at all.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/F2eanBpmZIM" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/05/20/pending_branchwork.html</feedburner:origLink></entry>
    
    <entry>
        <title>Parrot 4.4.0 Banana Fanna Fo Ferret</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/QKBRFa-Ke4Q/parrot_4_4_0.html" />
        <updated>2012-05-17T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/05/17/parrot_4_4_0</id>
        <content type="html">&lt;blockquote&gt;
&lt;p&gt;Its existence guarantees nothing in itself, and the catalytic or Promethean moment only occurs when one individual is prepared to cease being the passive listener to such a voice and to become instead is spokesman, or representative.&lt;/p&gt;

&lt;p&gt;But it&amp;#8217;s important to remember the many dreary years when the prospect of victory appeared quite unattainable. On every day of those years, the &amp;#8220;as if&amp;#8221; pose had to be kept up, until its cumulative effect could be felt.&lt;/p&gt;

&lt;p&gt;&amp;#8211; Christopher Hitchens, &lt;i&gt;Letters to a Young Contrarian&lt;/i&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On behalf of the Parrot team, I&amp;#8217;m proud to announce the 4.4.0 release of Parrot &amp;#8220;Banana Fanna Fo Ferret&amp;#8221;. &lt;a href='http://parrot.org/'&gt;Parrot&lt;/a&gt; is a virtual machine aimed at running all dynamic languages.&lt;/p&gt;

&lt;p&gt;Parrot 4.4.0 is available on &lt;a href='ftp://ftp.parrot.org/pub/parrot/releases/stable/4.4.0/'&gt;Parrot&amp;#8217;s FTP site&lt;/a&gt;, or by &lt;a href='http://parrot.org/download'&gt;following the download instructions&lt;/a&gt;. For those who want to hack on Parrot or languages that run on top of Parrot, we recommend &lt;a href='https://github.com/parrot'&gt;our organization page&lt;/a&gt; on GitHub, or you can go directly to the official Parrot Git repo on &lt;a href='https://github.com/parrot/parrot'&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Parrot 4.4.0 News:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;- Core
    + Most internal calls to libc exit(x) have been replaced with
      Parrot_x_* API calls or PARROT_FORCE_EXIT
- Documentation
    + &amp;#39;pdd31_hll.pod&amp;#39; made stable in &amp;#39;docs/pdds/&amp;#39;.
    + Updated main &amp;#39;README&amp;#39; to &amp;#39;README.pod&amp;#39;
    + Updated various dependencies, e.g., &amp;#39;lib/Parrot/Distribution.pm&amp;#39;.
    + Updated all &amp;#39;README&amp;#39; files to &amp;#39;README.pod&amp;#39; files.
    + Added &amp;#39;README.pod&amp;#39; files to top-level directories.
- Tests
    + Update various tests to pull from new &amp;#39;README.pod&amp;#39;
    + Updated &amp;#39;t/tools/install/02-install_files.t&amp;#39; to pull from new
      &amp;#39;README.pod&amp;#39;
- Community
- Platforms
- Tools
    + pbc_merge has been fixed to deduplicate constant strings and
      merge annotations segments&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Alvis Yardley (or a delegate) will release Parrot 4.5.0, the next scheduled monthly release, on June 16th 2012. Subsequent release managers are to be announced. A special thanks to our donors, contributors and volunteers for making this release possible.&lt;/p&gt;

&lt;p&gt;Enjoy!&lt;/p&gt;

&lt;p&gt;I haven&amp;#8217;t been doing enough blogging lately! On Tuesday I put out the 4.4.0 release of Parrot, &amp;#8220;Banana Fanna Fo Ferret&amp;#8221;. I figured it was a fun play on words. I added a little quote from a favorite writer of mine, Christopher Hitchens. Much of his writings can be pretty inflamatory, but I picked two quotes that related to historical struggles for social progress, and which when read in a certain light (and dramatically out of context) make sense for Parrot too.&lt;/p&gt;

&lt;p&gt;The release went off without a problem, and I&amp;#8217;ve got a few branches waiting in the environs to be merged. I&amp;#8217;m sure I&amp;#8217;ll talk about some of those projects if I can get back into a normal blogging rhythm again.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/QKBRFa-Ke4Q" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/05/17/parrot_4_4_0.html</feedburner:origLink></entry>
    
    <entry>
        <title>XML Is Hard</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/WKE7ujNe74c/xml_is_hard.html" />
        <updated>2012-04-28T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/04/28/xml_is_hard</id>
        <content type="html">&lt;p&gt;Last week I promoted the Parse and Json libraries in Rosella to stable status. For both those libraries I wrapped up a few outstanding TODO issues, wrote up some &lt;a href='/Rosella/libraries/json.html'&gt;website&lt;/a&gt; &lt;a href='/Rosella/libraries/parse.html'&gt;documentation&lt;/a&gt; and added a bunch of unit tests. I figured I would do the same thing for the XML library too. After all I had done the hard part: the first 90% of the library was the recursive descent parser which I had most of.&lt;/p&gt;

&lt;p&gt;So today I got to work on that library, trying to put together the last few bits so I could make the library stable. Like I said, I had about 90% of it done already. I spent the time today doing another 90%. I figure I only have about 90% left to go before I have a &amp;#8220;real&amp;#8221;, usable XML library. Somewhere a mathematician is reading this post and inventing new curse words, but nobody can hear him, because he has no friends.&lt;/p&gt;

&lt;p&gt;It turns out that XML is hard.&lt;/p&gt;

&lt;p&gt;Anybody can put together a little parser for XML-like tag syntax with attributes, text, and nested tags. That part is dirt simple, and I had that done in an hour or two. It&amp;#8217;s once you start getting into DTD declarations and schema validation that things get messy. Honestly, I don&amp;#8217;t think I can seriously call Rosella&amp;#8217;s XML library &amp;#8220;complete&amp;#8221; without those things. Or, not without most of them. I can probably get away with only the first 90% or so.&lt;/p&gt;

&lt;p&gt;So, what can Rosella&amp;#8217;s Xml library do today? Here is a sample of XML text that I can parse into a document object tree without problems:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;
&amp;lt;!DOCTYPE foo [
    &amp;lt;!ELEMENT foo (bar, baz)&amp;gt;
    &amp;lt;!ELEMENT bar ANY&amp;gt;
    &amp;lt;!ELEMENT baz (fie)&amp;gt;
    &amp;lt;!ELEMENT fie EMPTY&amp;gt;
    &amp;lt;!ATTLIST fie
                lol CDATA #REQUIRED
                wat CDATA #IMPLIED
                sux CDATA #FIXED &amp;quot;hello!&amp;quot;&amp;gt;
]&amp;gt;
&amp;lt;foo&amp;gt;
    &amp;lt;bar/&amp;gt;
    &amp;lt;baz&amp;gt;
        &amp;lt;fie lol=&amp;quot;laughing out loud&amp;quot; wat=&amp;quot;you talkin bout?&amp;quot; sux=&amp;quot;hello!&amp;quot;/&amp;gt;
    &amp;lt;/baz&amp;gt;
&amp;lt;/foo&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, if I want, I can jam all that schema nonsense into a separate file, and load it separately:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;!DOCTYPE foo SYSTEM &amp;quot;foo.dtd&amp;quot;&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Although I haven&amp;#8217;t integrated Rosella Net yet, to allow loading schemas from a URL. In code, I can do a few things:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var dx = new Rosella.Xml.Document();
dx.read_from_file(&amp;quot;foo.xml&amp;quot;);
dx.validate();
if (!dx.is_valid()) {
    for (string err in dx.errors)
        say(err);
}
dx.write_to_file(&amp;quot;newfoo.xml&amp;quot;);

var dtd = new Rosella.Xml.DtdDocument();
dtd.read_from_file(&amp;quot;foo.dtd&amp;quot;);
var errors = dtd.validate_xml(dx);
if (elements(errors) &amp;gt; 0) {
    for (string err in errors)
        say(err);
}
dtd.write_to_file(&amp;quot;newfoo.dtd&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That example shows us loading an XML document from a file and validating it with it&amp;#8217;s built-in rules from the &lt;code&gt;!DOCTYPE&lt;/code&gt; header. The second part shows us loading a separate DTD definition from a standalone file, and using that to validate the XML document too. In both cases, the validator runs through the document object and returns a whole list of error messages, not just a simple yes/no flag. In both cases, we can also re-serialize the XML and DTD documents back to string and then to file.&lt;/p&gt;

&lt;p&gt;So what is left to do? Well, for starters there&amp;#8217;s a bunch of syntax in the &lt;code&gt;!ELEMENT&lt;/code&gt; tag that I don&amp;#8217;t quite handle yet, such as quantifiers and alternations:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;!ELEMENT foo (bar*, (baz|bar), fie?)&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Parsing all that in a way that doesn&amp;#8217;t suck is not something I&amp;#8217;m looking forward to doing.&lt;/p&gt;

&lt;p&gt;Then in attribute lists, there&amp;#8217;s some syntax I don&amp;#8217;t deal with, such as enumerated values again:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;!ATTLIST foo bar (yes|no)&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The validator I&amp;#8217;ve implemented is pretty naive so far, and isn&amp;#8217;t set up to do quantifiers anyway. That&amp;#8217;s all going to take a while to do. We&amp;#8217;re doing some basic validation now, but nowhere near as much as we would expect from a full implementation.&lt;/p&gt;

&lt;p&gt;And keep in mind, even when I&amp;#8217;m done implementing (mostly) proper XML and DTD parsing, I could still go on to parse other schema languages like XSD which some applications might expect and even prefer. Maybe I could do something like XPath too, which would be very nice. I probably won&amp;#8217;t try to do XSLT though: I&amp;#8217;m still young and I would like to keep some of my sanity in reserve for my twilight years.&lt;/p&gt;

&lt;p&gt;My Json library is about 1300 lines of winxed code long, including whitespace. My Xml library is about 2400 lines of code long and still growing. Json is pretty easy (by design!), but XML is very hard. I&amp;#8217;m not going to push the Xml library to become stable any time soon, there&amp;#8217;s a hell of a lot of work left on it and I&amp;#8217;m not going to rush anything.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/WKE7ujNe74c" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/04/28/xml_is_hard.html</feedburner:origLink></entry>
    
    <entry>
        <title>Various Updates</title>
        <link href="http://feedproxy.google.com/~r/afwknight/~3/UvfO829A1Lc/various_updates.html" />
        <updated>2012-04-25T00:00:00-07:00</updated>
        <id>http://whiteknight.github.com/2012/04/25/various_updates</id>
        <content type="html">&lt;p&gt;Here are some updates on various projects I&amp;#8217;ve been working on or been planning to work on:&lt;/p&gt;

&lt;h2 id='parrotstore'&gt;ParrotStore&lt;/h2&gt;

&lt;p&gt;In my post introducing ParrotStore, I mentioned that I only had support for MySQL, Memcached, and a little bit of stuff working for MongoDB. In the past few days I&amp;#8217;ve also added SQLite3 support. Now you can do this, after installing the prequisites, building and installing:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var sqlite3_lib = loadlib(&amp;#39;sqlite3_group&amp;#39;);
var sqlite3 = new &amp;#39;SQLite3DbContext&amp;#39;;
sqlite3.open(&amp;quot;test.sqlite3&amp;quot;);
sqlite3.query(&amp;quot;INSERT INTO tbl1 (name, number) VALUES (&amp;#39;Andrew&amp;#39;, 100)&amp;quot;);
var result = sqlite3.query(&amp;quot;SELECT * FROM tbl 1&amp;quot;);
for (var row in result) {
    for (string colname in row)
        print(colname + &amp;quot;=&amp;quot; + string(row[colname]) + &amp;quot; &amp;quot;);
    say(&amp;quot;&amp;quot;);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;SQLite3 offers a bunch of features that I don&amp;#8217;t tap into yet, but we have a good start and can do some basic work with it already.&lt;/p&gt;

&lt;p&gt;Also, I mentioned that we didn&amp;#8217;t support queries with multiple result sets in the MySQL bindings. Well, now we do (and we do in SQLite3 too):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var result1 = mysql.query(&amp;quot;CALL my_stored_proc&amp;quot;);
var result2 = sqlite3.query(&amp;quot;SELECT * FROM tbl1 ; SELECT * from tbl2&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If the query returns one result set, a DataTable object is returned. If it has multple result sets, an array of DataTables is returned instead.&lt;/p&gt;

&lt;h2 id='eval_pmc'&gt;Eval PMC&lt;/h2&gt;

&lt;p&gt;I went digging through my backlog of old branches last night and found my incomplete branch for removing the deprecated Eval PMC. After updating to current master I gave it a spin and most things looked good. I fixed all the core parrot tests and then moved on to the rest of the ecosystem.&lt;/p&gt;

&lt;p&gt;Winxed works fine with the PackfileView PMC instead of the Eval PMC. I made a few of those updates in the past, so it mostly worked out of the gate. Rosella compiled and ran like a charm too.&lt;/p&gt;

&lt;p&gt;NQP-rx works fine because it mostly relies on the PCT libraries that ship with Parrot, and which I had already fixed.&lt;/p&gt;

&lt;p&gt;The new NQP is a little bit more of a hassle. It took me a little bit of effort to figure out the bootstrapping mechanism, but after a few hours of hacking I had NQP building on the new Parrot using PackfileView instead of Eval. However, one of the regex tests hangs indefinitely now and I&amp;#8217;m having trouble tracking that down. this project may get bumped down to a lower priority level until I can either figure out what the problem with NQP is, or until I can enlist some help to fix it.&lt;/p&gt;

&lt;p&gt;I would like to merge this branch as soon as NQP is fixed and I can prove that I can build it and Rakudo on the branch.&lt;/p&gt;

&lt;h2 id='sub_flags_cleanup'&gt;Sub Flags Cleanup&lt;/h2&gt;

&lt;p&gt;My &lt;code&gt;remove_sub_flags&lt;/code&gt; branch, tasked with removing the old &lt;code&gt;:load&lt;/code&gt; and &lt;code&gt;:init&lt;/code&gt; flags from Parrot and replacing them with the new &lt;code&gt;:tag()&lt;/code&gt; syntax is right where I left it a few weeks ago. I&amp;#8217;m down to a relatively small list of test failures, the solution to most of which is to update the syntax in the tests themselves. A handful of tests such as those using the &lt;code&gt;parrot-nqp&lt;/code&gt; and &lt;code&gt;winxed&lt;/code&gt; compilers are failing because I need to update those compilers first to generate the correct code so the tests can run correctly.&lt;/p&gt;

&lt;p&gt;After fixing NQP-rx and Winxed, I need to get started testing out the new NQP and Rakudo. I suspect both of those two things will be made to work without too much effort.&lt;/p&gt;

&lt;p&gt;It turns out that the Eval PMC deprecation work overlaps with this slightly, so the things I change for that branch should help reduce failures in this branch too. After I get Eval deprecated and removed, I&amp;#8217;ll come back to this branch and see where things stand.&lt;/p&gt;

&lt;p&gt;This is such a large and disruptive change that I can&amp;#8217;t imagine we would want a merge before the 4.4 release, even if I got all the bugs ironed out. We could be a month or more away from a merge, so I&amp;#8217;m not listing this work as high priority.&lt;/p&gt;

&lt;h2 id='pcc'&gt;PCC&lt;/h2&gt;

&lt;p&gt;Bacek has been doing a lot of refactoring in PCC land, trying to fix some slow and infelicitious aspects of it. I&amp;#8217;ve gotten a set of new PCC-related opcodes added to core and have a few more that I want to add, including new variants of &lt;code&gt;set_args&lt;/code&gt;, &lt;code&gt;get_params&lt;/code&gt; and friends to take explicit context arguments instead of using magical behavior to try and find them automatically. A few patches to IMCC and the new behavior might go in without anybody noticing. I&amp;#8217;ve talked more about this in past posts, and I&amp;#8217;m sure I&amp;#8217;ll have more to say when I start making changes.&lt;/p&gt;

&lt;h2 id='rosella'&gt;Rosella&lt;/h2&gt;

&lt;p&gt;Rosella is mostly where I want it to be right now. I&amp;#8217;m planning to change around the development cycle to stick to supported releases of Parrot and Winxed instead of tracking HEAD for both of them. I&amp;#8217;m going to promote one or two more libraries to &amp;#8220;stable&amp;#8221; status and then put out a release sometime after Parrot 4.4 hits the news stands next month. I&amp;#8217;ve already promoted the Parse and Json libraries to stable status. I will probably promote Xml and Net too, since I am pretty happy with both of those two libraries and feel that they are almost ready for general use.&lt;/p&gt;

&lt;p&gt;After that, I suspect Rosella is going to take a back seat for a while, so I can focus on some other projects.&lt;/p&gt;

&lt;h2 id='google_summer_of_code'&gt;Google Summer of Code&lt;/h2&gt;

&lt;p&gt;GSOC is keeping me pretty busy so far. We accepted 4 projects this summer. The fifth project, which was to do some work on the Jaesop Stage 1 compiler, was lost because the student was accepted to a different organization instead. The four remaining projects are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Security Sandbox&lt;/strong&gt; by Justin&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;Mod_Parrot 2.0&lt;/strong&gt; by brrt&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;LAPACK Bindings&lt;/strong&gt; by jashwanth&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;PACT Assembly&lt;/strong&gt; by benabik&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I think these projects will be very cool, and I am looking forward to see what kinds of great code they can produce this summer.&lt;/p&gt;

&lt;h2 id='green_threads'&gt;Green Threads&lt;/h2&gt;

&lt;p&gt;nine has been doing some amazing work on his threading branch. Yesterday he informed me that he had a solution to make green threads work on Windows, and had already implemented part of it. That&amp;#8217;s awesome, because I was planning to work on porting the green threads to windows next, but if he&amp;#8217;s doing it then I don&amp;#8217;t have to.&lt;/p&gt;

&lt;p&gt;Some of the performance numbers he&amp;#8217;s been getting are pretty impressive for certain tasks. Some benchmarks he has are even showing a significant threading performance improvement over a similar benchmark written in perl5.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve been doing some testing on his branch and things are looking mostly good except for one or two remaining GC-related bugs that need to be ironed out. After that, if we can get some concensus, I would love to start talking a merger shortly after 4.4.&lt;/p&gt;

&lt;h2 id='6model'&gt;6Model&lt;/h2&gt;

&lt;p&gt;With Green Threads possibly off my TO-DO list, Eval PMC Deprecation mostly wrapped up and remove_sub_flags on the back burner, I can start moving towards my next project: 6model. And I can do it much earlier than I was expecting. I&amp;#8217;m going to mine benabik&amp;#8217;s rejected 6model project proposal for some ideas, then I&amp;#8217;m going to jump in and try to get things working. I suspect things could get moving pretty quickly, if I can keep my level of free time relatiely high.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/afwknight/~4/UvfO829A1Lc" height="1" width="1"/&gt;</content>
    <feedburner:origLink>http://whiteknight.github.com/2012/04/25/various_updates.html</feedburner:origLink></entry>
    

</feed>
