XOR Logic

Yo.

noreply@blogger.com (J) — Fri, 29 Aug 2008 14:19:00 +0000

Still there?

Perl's Diamond Operator in Python

noreply@blogger.com (J) — Wed, 05 Dec 2007 16:35:00 +0000

I'm been a full Python convert for over a year now. Perl is just too cumbersome at times. I think the artist who draws XKCD has made the same choice. The one thing that still takes me back to Perl is the ultra-handy <> "diamond operator". You can't deny Perl's dominance when it comes to string processing. Look at this elegance:

#!/usr/bin/perl

while (<>) {
    print
}

Every time I encounter a string processing problem where I have to work with multiple files, I go back to Perl. I didn't think Python could do this. Python should have the same ability as Perl's diamond:

def diamond():
    """
    diamond()
    Pre: Nothing.
    This is my attempt to recreate the very handy diamond operator found in
    Perl, which is my only reason for still using that language.

    To use the code:
    for line in diamond():
        process(line)
    """
    import sys
    if len(sys.argv[1:]) > 0:
        for file in sys.argv[1:]:
            if file == '-':
                for line in sys.stdin:
                    yield line
            else:
                f = open(file, 'r')
                for line in f:
                    yield line
                f.close()
    else:
        for line in sys.stdin:
            yield line

What does this do? It checks the sys.argv variable for file names on the command line and iterates through each line of each file. If it can't find a file, it defaults to standard input.

Of course, before I start to write code, I should always use Google first. There exist a module called "fileinput" which does the same thing.

D'oh! Hey... it's practice.

A Losing Strategy

noreply@blogger.com (J) — Mon, 03 Dec 2007 05:50:00 +0000

Today I ran across a year 2000 article in the New York Times titled "Paradox in Game Theory: Losing Strategy That Wins." Given two games which lose most of the time, a strategy can be devised so that we can win most of the time. The article has a (fairly) detailed description of two example games:

The paradox is illustrated by two games played with coins weighted on one side so that they will not fall by chance to heads or tails.

In game A, a player tosses a single loaded coin and bets on each throw. The probability of winning is less than half.

In game B, there are two coins and the rules are more complicated. The player tosses either Coin 1, loaded to lose almost all the time or Coin 2 loaded to win more than half the time. He plays Coin 1 if his money is a multiple of a particular whole number, like three.

If his money cannot be divided evenly by that number, he plays Coin 2. In this setup, the second coin will be played more often than the first.

After reading this story, I wrote a quick Python script to demonstrate this game. The problem description isn't definitive on things like the exact probabilities of the coin flips, so we have to make a few assumptions. I set the probability of the coin flip in Game A to p1=0.33, and in Game B, the probability of coin1 to p=0.1 and coin2 to p=0.66. These probabilities are pulled from the air, but they do meet the requirement set forth in the problem description, so they should work. So the two games alternating should outperform both Game A and Game B when run indivdually. Let's run the program a few times to see if a combination of Games A and B are better than Game A or Game B.


[jcchurch@mcart python]$ python badgames.py 
Game A, 1000 runs:
  Before: 1000 MONEY
  After:  674 MONEY

Game B, 1000 runs:
  Before: 1000 MONEY
  After:  978 MONEY

Alternating: 1000 runs:
  Before: 1000 MONEY
  After:  850 MONEY

[jcchurch@mcart python]$ python badgames.py 
Game A, 1000 runs:
  Before: 1000 MONEY
  After:  734 MONEY

Game B, 1000 runs:
  Before: 1000 MONEY
  After:  890 MONEY

Alternating: 1000 runs:
  Before: 1000 MONEY
  After:  870 MONEY

[jcchurch@mcart python]$ python badgames.py 
Game A, 1000 runs:
  Before: 1000 MONEY
  After:  636 MONEY

Game B, 1000 runs:
  Before: 1000 MONEY
  After:  932 MONEY

Alternating: 1000 runs:
  Before: 1000 MONEY
  After:  820 MONEY

[jcchurch@mcart python]$ python badgames.py 
Game A, 1000 runs:
  Before: 1000 MONEY
  After:  604 MONEY

Game B, 1000 runs:
  Before: 1000 MONEY
  After:  918 MONEY

Alternating: 1000 runs:
  Before: 1000 MONEY
  After:  846 MONEY

I just ran the problem described in the New York Times 4 times and got 4 negative results. Game B always outperforms Game A and the combined Games A and B. This article looked very interesting and very improbable. If it's too good to be true, check for yourself.

Three Parts of Division

noreply@blogger.com (J) — Sat, 08 Sep 2007 22:43:00 +0000

I was going to update this blog at some point.

I'm back to solving problems on Project Euler. I've solved the easy ones, and now I'm getting around to the medium difficulty problems. I like #26.

Problem 26: Find the value of d < 1000 for which 1/d contains the longest recurring cycle in its decimal fraction part.

That's kinda interesting. Every time you divide two numbers, there are three parts: The integer part, the non-repeating decimal part, and the repeating decimal part.

I wrote up this little piece of Python to compute the three parts. It is a forced base-10 long division calculator, and it makes good use of Python's stings doubling as list, which is a feature that I always thought was nice about Python.

def division(num, den):
    ipart = str(num / den);
    fpart = ""
    rpart = ""
    seen  = [ num%den ]
    while num % den != 0:
        num        = (num%den) * 10
        multiple   = num / den
        fpart     += str(multiple)
        num        = num - (den * multiple)
        c = 0
        quit = False
        for i in seen:
            if num == i: # This number has been seen before.
                rpart = fpart[c:] # Make characters from c to end of list part of repeating part
                fpart = fpart[:c] # Truncate fpart
                quit = True
                break
            c = c + 1
        if quit:
            break
        seen += [num]
    return [ipart, fpart, rpart]

This method accepts two arguments: An integer numerator and an integer denominator. The return is a list with three items: The whole number part, the non-repeating decimal part, and the repeating decimal part. Each part is a string data type (because sometimes these parts can be excessively long for python's Integer type.)

>>> division(1,4)
['0', '25', '']
>>> division(1,6)
['0', '1', '6']
>>> division(1,7)
['0', '', '142857']

Converting ISBN13 to ISBN10 in PHP

noreply@blogger.com (J) — Thu, 26 Apr 2007 05:39:00 +0000

I've been writing a textbook buyback system for a buddy of mine. He runs a service that purchases textbooks from students at the end of a school semester. He wanted me to write a program to query Amazon.com and estimate the best value to offer a student for his or her book.

As of January 1, 2007, the agency that controls ISBN codes for all books printed in the world has changed their code format from 10 digits (regular expression /[0-9Xx]{10}/) to 13 digits (regular expression /[0-9]{13}/). Amazon has not updated their Web E-Commerce API to accommodate this change in standard. Thus, any time I get an 13-digits ISBN, I must convert it to the older 10-digits standard.

When "Googling" for solutions to this problem, I found several, but I decided to write this one myself.

// Converts ISBN-13 to ISBN-10
// Leaves ISBN-10 numbers (or anything else not matching 13 consecutive numbers) alone
function ISBN13toISBN10($isbn) {
    if (preg_match('/^\d{3}(\d{9})\d$/', $isbn, $m)) {
        $sequence = $m[1];
        $sum = 0;
        $mul = 10;
        for ($i = 0; $i < 9; $i++) {
            $sum = $sum + ($mul * (int) $sequence{$i});
            $mul--;
        }
        $mod = 11 - ($sum%11);
        if ($mod == 10) {
            $mod = "X";
        }
        else if ($mod == 11) {
            $mod = 0;
        }
        $isbn = $sequence.$mod;
    }
    return $isbn;
}

Handling Multiple MySQL Queries - "AS" labels

noreply@blogger.com (J) — Fri, 23 Mar 2007 04:13:00 +0000

I'm working on a project for the School of Pharmacy on campus. They wanted me to write a program that pulls numbers from a database on Marijuana into a nicely formatted report that they could print off and send to government agencies that are interested in the data. One of the reports requires me to pull hundreds of individually calculated items from a MySQL database. A single MySQL query isn't able to do the job. When this report is completed, there will probably 75 to 100 queries. To think that this quarterly report has been done manually for almost 2 decades overwhelms me as I try to write this report software. I'm trying to reduce several weeks of work to prepare this report to a few seconds.

The entire project is done in PHP and MySQL. PHP and MySQL go hand-in-hand, but every time I write a query, I can't help but wonder how this could be made easier. Pulling one value out of a query looks like spaghetti code. Pulling 300 values from 100 different queries is an explosion of crappy code. MySQL queries required for this report look like this:

SELECT THC FROM samples HAVING MAX(THC);

This is just one query that will find the one sample in the database with the highest THC composition and report what that composition value is. There are hundreds more queries in the report similar to this one. Typically, when you run a MySQL query from PHP, here are the steps you have to take:

Format the query command.
Execute the query command.
Retrieve the first row of results from the successful execution (usually in the form of an associative array).
Transfer those values from the array to variables that actually mean something to the nature of the program (like $max_thc_composition).

If I have 100 queries, and I have to perform the same 4 steps for each query, that's 400 lines of code. Obviously, as any freshmen computer science instructor will tell you (because I am one), this needs to be regulated to a function. But how do you specify a query (which can return more than one value) and the variables associated with those results in a concise manner? If the function doesn't know the variables names going into the function, it can't assign those values coming out of the function.

SELECT THC AS max_thc_composition FROM samples HAVING MAX(THC);

The "AS" expression in MySQL will rename a column into something else. It can also be used to uniquely label a query. This may not seem useful in a single call, but when you have a few 100 calls to make, it's a real time saver.

    //runQueries: Runs multiple queries.
    // Pre: An array of MySQL single-result queries where
    // each result has a unique "AS" expression.
    // Post: An associative array populated with all of those values.
    function runQueries($sqls) {
        $report = array();
        //Execute each SQL query and drop the result in the report array
        foreach ($sqls as $query) {
            $result = mysql_query($query) or die("Query Failed: $query Reason: ".mysql_error());
            $row    = mysql_fetch_assoc($result);

            // Essentially, for each "AS" expression in the query,
            // look for that result in row returned and pull that value out if it exist.
            // If it does not exist, just give it a value of 0.
            preg_match_all("/ AS (\w+)/", $query, $keys);
            foreach ($keys[1] as $key)
                if (isset($row[$key]))
                    $report[$key] = $row[$key];
                else
                    $report[$key] = 0;
        }
        return $report;
    }

This function takes an array of MySQL single-result query strings, where each result has an "AS" label identifying what this result means. This method executes each query and drops the result into a associative array where the "AS" labels double as the array keys. And of course, it uses a regular expression.

I'm suppose to be teaching a class on Python programming in the fall. Have you noticed that I haven't featured a Python script in this blog series yet? That bothers me.

Finding lost files

noreply@blogger.com (J) — Thu, 22 Feb 2007 20:09:00 +0000

So a student e-mails me today saying that he doesn't have a grade recorded for one of his programming assignments. I check his folder of turned-in assignments and I can't find it.

Who knows? Maybe I just dropped his homework in a different folder by mistake. Checking through everyone's folders for his will take me a while. I make all of my students put their e-mail address in the comments of their code, so I just run a search on all the *.java files on my computer that have a certain e-mail address with this handy line:

for i in $(find . -name *.java); do grep -H 'emailName' $i; done

(Replace 'emailName' with the descriptive feature of the file you are looking for.)

I still can't find his homework.

Regular Expression Speeds

noreply@blogger.com (J) — Thu, 25 Jan 2007 06:18:00 +0000

I read a paper today titled "Regular Expression Matching Can Be Simple And Fast". I thought his results were interesting enough to compare to my own regular expression engine. (Yes, I'm dorky enough to have my own engine.) I call my program "chrep", which is short for "Church's Regular Expression Parser".

Below, I've tested four engines with the string "aaaaaaaaaaaaaaaaaaaa" against the expression "a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa" (which should always return a successful match). (Note: Sorry if you aren't able to read the entire command. My CSS-foo sucks and you probably have to read the source code.)

jcchurch@linux:~> time echo "aaaaaaaaaaaaaaaaaaaa" | chrep 'a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa'
[aaaaaaaaaaaaaaaaaaaa] in : aaaaaaaaaaaaaaaaaaaa

real    0m5.137s
user    0m2.964s
sys     0m0.012s

5 seconds. Not bad, but not really all that good. The next program tested was egrep:

jcchurch@linux:~> time echo "aaaaaaaaaaaaaaaaaaaa" | egrep 'a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa'
aaaaaaaaaaaaaaaaaaaa

real    0m0.055s
user    0m0.008s
sys     0m0.008s

0.055 seconds. Roughly 100 times faster than my parser. The speed benchmark has been set pretty high. The next test is the old, reliable AWK:

jcchurch@linux:~> time echo "aaaaaaaaaaaaaaaaaaaa" | awk '/a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa/{print}'
aaaaaaaaaaaaaaaaaaaa

real    0m0.049s
user    0m0.004s
sys     0m0.004s

0.049 seconds. Still making my program look bad. Finally, I wanted to test Perl:

jcchurch@linux:~> time echo "aaaaaaaaaaaaaaaaaaaa" | perl -ne'print if /a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa/'
aaaaaaaaaaaaaaaaaaaa

real    0m0.393s
user    0m0.328s
sys     0m0.008s

Perl takes almost a half second. That's horribly slow compared to grep and awk, but still 13 times faster than my parser.

My parser ('cause I know you are wondering) is a simple interpretative, recursive decent parser. There is no compiling of the expression, which I know would speed things up. I'm glad to see that I'm not too far behind the competition, and this will encourage me to make those benchmarks.

XOR Logic

noreply@blogger.com (J) — Wed, 10 Jan 2007 04:15:00 +0000

It's winter break, so I have had less "essential" programming to do. I have been wasting time on projecteuler.net. This site is an on-line competition devoted to solving very difficult math problems of the type that require programs to solve. I've been on the site for about two weeks and I've solved a third of the problems.

The problem that has interested me the most is problem #59: Brute force decryption using XOR Logic. I can't talk about the solution here, because that would only help people who are trying to win the competition by Googling answers, but I can talk about XOR Logic.

XOR Logic is the comparison between two items where the results is true if one is true and the other isn't. If both are false or if both are true, the result is false. In programming terms, we think of it like this:

unsigned char a, b, c;
scanf("%c",&a);
scanf("%c",&b);
c = (~a&b)|(a&~b); // XOR Logic Here!
printf("%c(%d) -> %c(%d) -> %c(%d)\n", a,a, b,b, c,c);

In this example, we have two terms: a and b. We wish to XOR these two items to make a result c. Using C's bitwise and "&", bitwise or "|", and bitwise negate "~", we can create the XOR gate. We have two inputs, and one has to be true and the other must be false for the result to be true. It doesn't matter the order. Output c is true if input a is true and input b is false or if a is false and b is true. In fact, that sentence is pretty much how I programmed it.

English:
Output c is true if input a is true and input b is false
or if a is false and b is true.

English and C:
Output c is true [c = ] if [(] input a is true [a] and(&)
input b is false[~b] [)] or[|] if[(] a is false[~a] and[&]
b is true[b] [)].[;]

C:
c = (~a&b)|(a&~b);

So how is this useful? Cryptography is based on the practice of taking a message, encrypting it, then taking the encrypted message and decrypting it. The great thing about XOR logic is that it is used for both encrypting and decrypting text.

Let's say that your secret message consist of one character: "L". "L" on the ASCII table is decimal 76 or binary "01001100". You need a secret password to encrypt the letter "L". For that, we'll use "!" which is decimal 33 or binary "00100001".

01001100 L (76)
00100001 ! (33)
-------- XOR
01101101 m (109)

"L" has now been encrypted. The encrypted text is now "m". Using our password "!", we can now decrypt the text:

01101101 m (109)
00100001 ! (33)
-------- XOR
01001100 L (76)

We just decrypted our message "m" using the same XOR method and password "!" that we used to encrypt the message. XOR Logic is pretty useful because it provides the method to both encrypt and decrypt messages quickly.

Matrix Fun in Haskell

noreply@blogger.com (J) — Wed, 13 Dec 2006 09:46:00 +0000

I have always thought that the snobbiest of programmers are those that use Haskell, which is a language so academic that I'm not sure if it's been used outside academia. Haskell is known for what it doesn't have. Haskell doesn't have the same loop structure. It doesn't have variables. It doesn't have the imperative design like most languages. I like Haskell. Any time I need to jump on a math program, I usually solve it in Haskell first, then the language I need second.

A friend of mine is a physics Ph.D. student who is trying to learn C. He wanted to write a program to find all of the 2-by-2 determinants found in a 4-by-4 matrix. There are 36 total. He had his own ideas on how to solve this, but I still think the quickest way is to brute force all of the possibilities.

The following patches of Haskell and C are almost identical. They use the exact same variables ("labels" for the Haskell code) to accomplish the same task. The Haskell code returns an array of floats, where the C code prints each of the 36 determinants.

twoByTwo :: [[Float]] -> [Float]
twoByTwo m = concat (map(\i->
                 concat (map(\j->
                     concat (map(\p->
                         map(\q->
                             (m!!i!!j)*(m!!p!!q)-(m!!i!!q)*(m!!p!!j)
                         )[(j+1)..(n-1)]
                     )[(i+1)..(n-1)])
                 )[0..(n-2)])
             )[0..(n-2)])
             where
                 n = length m

Yes, Haskell fans, there is a quadruple-nested lambda expression there, which does seem a little confusing. It's short and elegant, which is why Haskell reminds me of Python.

void twoByTwo(float *m, int n) {
    int i, j, p, q;
    for (i = 0; i < n-1; i++)
        for (j = 0; j < n-1; j++)
            for (p = i+1; p < n; p++)
                for (q = j+1; q < n; q++)
                    printf("%3.1f\n", m[i*n+j]*m[p*n+q] - m[i*n+q]*m[p*n+j]);
}

No one ever said C was pretty. I also wrote a recursive Matrix determinant algorithm in Haskell, but I may save that for another time.

A Review of MySpace Passwords using Ruby

noreply@blogger.com (J) — Wed, 22 Nov 2006 06:02:00 +0000

Over on the site Reddit, someone posted a link to some 50,000+ MySpace user e-mails and passwords from a phishing site. I'm not going to post the link to those passwords. You'll have to find it yourself.

I wrote a quick Ruby script to pour through the text. Normally I would have done this in Perl, but Ruby seems to be the language du jour.

Here's a breakdown of things I noticed about people's passwords. I tallied the number of passwords that had a lowercase letter, uppercase letter, a number, or a symbol in the password itself. I also checked for usernames that were the same as their passwords, and passwords that followed the pattern of "word-then-number".

total: 52692
lowercase: 50818 (96.44%)
numbers: 42570 (80.79%)
symbols: 2959 (5.62%)
uppercase: 2330 (4.21%)
Same as username: 197 (0.37%)
Word-Number pattern: 35800 (67.94%)

Almost everyone is using lowercase letters in their password. 4 out of every 5 users are using numbers. Less than 6% of all users are using symbols or uppercase letters in their passwords. Almost 1/3rd of a percent of MySpace's users are dumb enough to use their e-mail's user name as their password. Two-thirds of users use a "word followed by a number" pattern for their password. If I were to write a password cracker, I would first try to hack accounts that followed this very common pattern. (I must admit: most of my own passwords follow this pattern.)

I went on to check for the most common word usage. This was done by stripping out any non-letter characters and dumping the results in a hash table.

password 334
a 299 (The single letter 'a' is commonly used with a sequence of numbers.)
soccer 285
iloveyou 273
fuckyou 173
love 139
abc 137
football 135
baseball 125
myspace 122

There are a few surprises in this list. Why on earth would anyone make their password "password"? For that matter, if you are on MySpace, why make your password "myspace". There seem to be a lot of sports fans on MySpace. "soccer","baseball", and "football" all appear in the top ten most common words. There is also a love/hate drama going on between many users. "love", "iloveyou" and "fuckyou" are common choices for passwords.

So where's the code?!?

#!/usr/bin/ruby

file = ARGV[0]

total     = 0
wordnum   = 0
numbers   = 0
lowercase = 0
uppercase = 0
symbol    = 0
samename  = 0

words = Hash.new

IO.foreach(file) do |line|

    line.strip!

    (user,password) = line.split /:/
    next if password == nil

    (name,domain)   = user.split /@/
    total = total + 1

    if password =~ /^[a-zA-Z]+[0-9]+$/
        wordnum = wordnum + 1
    end

    if password =~ /[0-9]/
        numbers = numbers + 1
    end

    if password =~ /[a-z]/
        lowercase = lowercase + 1
    end

    if password =~ /[A-Z]/
        uppercase = uppercase + 1
    end

    if password =~ /[`~!@#\$%^&*()_\+\-\\=]/
        symbol = symbol + 1
    end

    if password == name
        samename = samename + 1
    end

    lettersonly = password.gsub(/[^a-zA-Z]/, '')

    next if lettersonly =~ /^$/

    if false == words.key?(lettersonly)
        words[lettersonly] = 1
    else
        words[lettersonly] = words[lettersonly] + 1
    end

end

puts "total:               #{total}"
puts "numbers:             #{numbers}"
puts "lowercase:           #{lowercase}"
puts "uppercase:           #{uppercase}"
puts "symbols:             #{symbol}"
puts "Same as username:    #{samename}"
puts "Word-Number pattern: #{wordnum}"

words_ary = words.sort {|a,b| b[1]<=>a[1]}

10.times do |i|
    puts "#{words_ary[i][0]}  #{words_ary[i][1]}"
end

Bits of Java

noreply@blogger.com (J) — Tue, 14 Nov 2006 06:34:00 +0000

Note: Today's coding gem is one that probably most Java programmers have experienced at least once.

A student of mine came to me with a problem from a different class she was taking. She had to write a program to generate truth tables for certain electrical gates. Because she's in my Java class, Java was her choice to complete the assignment. While I teach Java, I'm very unfamiliar with many of its low level aspects. This was a unique challenge for me.

These particular electrical gates required 4 binary inputs. How could I quickly generate all 16 possible binary combinations of 1's and 0's without resorting to spaghetti code? My first thought was to write a simple 'for' loop to generate the binary bits.

Solution #1:

    for (int i = 0; i < 16; i++) { // Combination Number
        for (int j = 0; j < 4; j++) { // Nth bit in combination
            int bit = (int) Math.pow(2,j);
            System.out.print( (i&bit)/bit );
        } // End For
        System.out.println("");
    } // End For

It just looks ugly, but it's mathematically correct (or at least plausible). We call "Math.pow", convert double to integer, there seems to be division taking place (but it's anyone's guess as to why), and we still have to perform the logical "and" (&). Bleh! How is anyone suppose to understand what's going on?

Solution #2:

    for (int i = 0; i < 16; i++) { // Combination Number
        for (int j = 0; j < 4; j++) { // Nth bit in combination
            System.out.print( (i>>>j)&1 );
        } // End For
        System.out.println("");
    } // End For

It's a little cleaner. There's a unsigned shift which is then compared and-wise to the number 1. It's only two operations, both of which happen at the binary level. Works for me. Too bad I figured this out after she came up with her own solution using large, ugly 2D-arrays.

The Impossible FFTW Bug

noreply@blogger.com (J) — Fri, 03 Nov 2006 03:34:00 +0000

Ever had a computer bug that you spent weeks working on it? You might try every trick you know to solve it: debugging messages, profilers, Google, professors, coworkers, fellow students. You might try to explain the code to someone who doesn't even program just so you have to describe the logic in simple terms. Nothing seems to fix the problem.

I had this bug in my code for about two weeks. I had to write a program to compute the 2 dimensional FFT of an image. That's it. I'll just use FFTW, the most popular open source library for computing FFTs.

The Wrong Solution for computing FFTs with FFTW:

int j;
fftw_plan myplan;
fftw_complex in[x*y], out[x*y];

for (j = 0; j < x*y; j++) {
    in[j][0] = r[j];
    in[j][1] = i[j];
} // End For

myplan = (dir == 1)? fftw_plan_dft_2d(x, y, in, out, FFTW_FORWARD,  FFTW_MEASURE) :
                     fftw_plan_dft_2d(x, y, in, out, FFTW_BACKWARD, FFTW_MEASURE) ;

fftw_execute(myplan);

In these few lines of code lies a problem that I've spent weeks trying to find. There are no syntax errors, nor are there logic errors. Every time I ran this code, the result would be all zeros. If I ran it multiple times, I got the correct results for every execution after the first. I dropped the data into a prepared array, made a plan (as defined by the FFTW C library), then executed that library. Should work, right? Wrong.

The Correct Solution for computing FFTs with FFTW:

int j;
fftw_plan myplan;
fftw_complex in[x*y], out[x*y];

myplan = (dir == 1)? fftw_plan_dft_2d(x, y, in, out, FFTW_FORWARD,  FFTW_MEASURE) :
                     fftw_plan_dft_2d(x, y, in, out, FFTW_BACKWARD, FFTW_MEASURE) ;

for (j = 0; j < x*y; j++) {
    in[j][0] = r[j];
    in[j][1] = i[j];
} // End For

fftw_execute(myplan);

I found buried in the FFTW FAQ that I'm suppose to make a plan before giving it the data. I switched two sets of lines around and everything magically works for reasons I fail to understand. Let this be a lesson to anyone having to use FFTW.

vi Number Sequence Trick

noreply@blogger.com (J) — Tue, 24 Oct 2006 20:24:00 +0000

I'm a vim user. Often times in my editing, I want to number some of the lines in a file from 1 to some ending number. It might be only a few lines, or it might be thousands. To save myself some carpal tunnel, I wrote one line of perl to do the job. This program will pass a block of lines in whatever file you are editing to a one line perl script which will number the lines starting with 1 and drop that text back into the editor:

:x,y!perl -ne'print "$.  $_";'

In this example, x is the starting line number and y is the ending line number. That's it!

File Character Frequency

noreply@blogger.com (J) — Sun, 15 Oct 2006 08:33:00 +0000

In my machine learning course, our first project was to write a program to sort e-mail into nine class labels A through I. In my implementation after every e-mail was classified, I would write a single character to the screen. There were several thousand of e-mails that needed sorting, so I needed a program that displayed a final tally of the results.

I could have easily built something into the program I was writing, but I thought something more general purpose would be useful. I quickly wrote the following code. It takes a tally of each character in a file and then uses a quick sort to sort the data and then prints the results. I even added a switch to ignore whitespace characters.

#include <stdio.h>

/* Author:      James Church
 * Date:        09/11/06
 * Program:     charfreq
 * Version:     0.3
 * Description: Takes a single file from the command line as input and
 *              reports the character frequence of each character in order
 *              from most common to least common.
 *
 * Copyright © 2006 Free Software Foundation, Inc.
 *
 * Copying and distribution of this file, with or without modification,
 * are permitted in any medium without royalty provided the copyright
 * notice and this notice are preserved.
 */

#define CHARRANGE 256

typedef struct _charcount {
    unsigned char c;
    long          count;
} CharCount;

void quicksort (CharCount *a, int i, int j);
int  partition (CharCount *a, int i, int j);

int main(int argc, char **argv) {
    int            i;
    unsigned char  c;
    CharCount      freq[CHARRANGE];
    FILE          *file;
    int           ignore_whitespace = 0;
    int           argparse = 1;

    if (argc == 1) {
        printf("\nUsage: %s [-s] [filename] - Reports character frequence of file.\n", argv[0]);
        printf(" -s  Ignores whitespace (optional)\n");
        return 0;
    } // End If

    if (strcmp("-s", argv[argparse]) == 0) {
        ignore_whitespace = 1;
        argparse++;
    } // End If

    if ((file = fopen(argv[argparse], "r")) == NULL) {
        printf("Error: Cannot open %s\n", argv[1]);
        return 0;
    } // End If
    argparse++;

    for (i = 0; i < CHARRANGE; i++) {
        freq[i].c     = (unsigned char) i;
        freq[i].count = 0;
    } // End For

    while (1) {
        fread (&c, sizeof(unsigned char), 1, file);
        if (feof(file)) break;

        if (ignore_whitespace && (c == ' ' || c == '\t' || c == '\r' || c == '\n'))
            continue;

        freq[c].count++;
    } // End While

    quicksort(freq, 0, CHARRANGE-1);

    for (i = CHARRANGE-1; i >= 0; i--) {
        if (freq[i].count == 0) break;

        printf("%c[%3d]: %d\n", freq[i].c, freq[i].c, freq[i].count);
    } // End For

    fclose(file);
    return 0;
} // End Main

void quicksort (CharCount *a, int i, int j) {
    int p;

    if (i < j) {
        p = partition (a, i, j);
        quicksort (a, i, p-1);
        quicksort (a, p+1, j);
    } /* End If */
} /* End mergesort */

int partition (CharCount *a, int i, int j) {
    int val = a[i].count;
    int h   = i;
    int k;
    CharCount temp;

    for (k = i+1; k <= j; k++)
        if (a[k].count < val) {
            h++;
            temp = a[h];
            a[h] = a[k];
            a[k] = temp;
        } /* End If */

    temp = a[i];
    a[i] = a[h];
    a[h] = temp;
    return h;
} /* End partition */

Enjoy!

Finding Identical Files using bash and find

noreply@blogger.com (J) — Thu, 12 Oct 2006 08:00:00 +0000

I wanted to start a blog of all the little bits of code featuring tricks that I've learned over the years. I don't know how often I'll update this blog, but any time I use a bit of code to complete a complicated task, I'll describe the task and show the code used to solve it. My first example will be a identical file finder. In my class on Java programming that I teach, I suspected two students of turning in the exact same work. As I was grading, I had seen some identical code elsewhere, but I couldn't remember. I keep all the students work in different directories on my office computer just in case I need to look at their old homework solutions. Rather than look at every solution manually for an identical file, I wrote this little bash script to do the work for me:

#!/bin/bash

if [ $# -eq 2 ]
then
    for i in $(find . -name $1); do diff -s $i $2 | grep -v differ; done
else
    echo "USAGE: findIdent [SOME FILE EXPRESSION] [SOME FILE]"
fi

This script "findIdent" takes two arguments: a file pattern (say... "*.mp3") and the file that you want to see if duplicates exist. So did I find any duplicates of students' work? No. Turns out I just found some very similar code and it was nothing to worry about. But I kept this script in case I ever need it again.