<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;CEIAQ3s4fip7ImA9WhdTEEU.&quot;"><id>tag:blogger.com,1999:blog-6792450305901069564</id><updated>2011-07-08T07:49:02.536+08:00</updated><title>A Coder's Musings</title><subtitle type="html">A little blog on coding, math, science, and whatever else comes up in my mind.</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://acodersmusings.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://acodersmusings.blogspot.com/" /><author><name>Tim Dumol</name><uri>http://www.blogger.com/profile/07647576678670279091</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>1</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/ACodersMusings" /><feedburner:info uri="acodersmusings" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;C0cMRHw8cCp7ImA9WxFaFU0.&quot;"><id>tag:blogger.com,1999:blog-6792450305901069564.post-3186481646981133124</id><published>2009-07-28T18:26:00.012+08:00</published><updated>2010-07-19T08:58:05.278+08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2010-07-19T08:58:05.278+08:00</app:edited><title>Curve fitting with Pyevolve</title><content type="html">&lt;p&gt;I've toyed around with the idea of using genetic algorithms for curve fitting, and I've finally gotten round to trying it out. Here's how it went:&lt;/p&gt;&lt;h2&gt;A short intro&lt;/h2&gt;&lt;h3&gt;What are genetic algorithms?&lt;/h3&gt;&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Genetic_algorithms"&gt;Genetic Algorithms&lt;/a&gt;, or GA's, solve problems using a method inspired by biological evolution -- selection of the fittest, crossing-over of alleles, and mutation of alleles, repeated over many generations on a large population. GA's have been used to &lt;a href="http://ti.arc.nasa.gov/projects/esg/research/antenna.htm"&gt;design antennae&lt;/a&gt;, &lt;a href="http://www.predict.com/"&gt;predict the market&lt;/a&gt;, and &lt;a href="http://www.sciencedirect.com/science?_ob=ArticleURL&amp;amp;_udi=B6VC5-3SWVHMH-9&amp;amp;_user=10&amp;amp;_rdoc=1&amp;amp;_fmt=&amp;amp;_orig=search&amp;amp;_sort=d&amp;amp;_docanchor=&amp;amp;view=c&amp;amp;_acct=C000050221&amp;amp;_version=1&amp;amp;_urlVersion=0&amp;amp;_userid=10&amp;amp;md5=d77e7bd9c5bdeecadac94139cd9fba20"&gt;design networks&lt;/a&gt;.&lt;/p&gt;&lt;h4 style="margin-bottom: 0pt;"&gt;Learn more:&lt;/h4&gt;&lt;br /&gt;&lt;a href="http://www.obitko.com/tutorials/genetic-algorithms/index.php"&gt;Introduction to Genetic Algorithms&lt;/a&gt; – a quick start on the basics&lt;br /&gt;&lt;a href="http://samizdat.mines.edu/ga_tutorial/ga_tutorial.ps"&gt;A Genetic Algorithm Tutorial by Darrell Whitley Computer Science Department Colorado State University&lt;/a&gt; – heavy on theory&lt;br /&gt;&lt;a href="http://cs.gmu.edu/%7Esean/book/metaheuristics/"&gt;Essentials of Metaheuristics &lt;/a&gt;– a free downloadable book that features genetic algorithms, simulated annealing, and other metaheuristics algorithms.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;What is Pyevolve?&lt;/h3&gt;&lt;a href="http://pyevolve.sourceforge.net/wordpress/"&gt;Pyevolve&lt;/a&gt; is an open source genetic algorithms library framework in Python. I've chosen it since it seems to be the most actively developed framework in Python. Since it's on Python, one can make use of the many libraries available for math and science, whilst enjoying the ease of a high-level language.&lt;br /&gt;&lt;h2&gt;&lt;br /&gt;&lt;/h2&gt;&lt;h2&gt;Starting off&lt;/h2&gt;&lt;h3&gt;Downloading and installing Pyevolve&lt;/h3&gt;&lt;p&gt;Pyevolve can be downloaded &lt;a href="http://pyevolve.sourceforge.net/intro.html#downloads"&gt;here&lt;/a&gt;. Installing it is as simple as running the executable (on Windows), or using easy_install (on Linux).&lt;/p&gt;&lt;h3&gt;Defining our fitness function&lt;/h3&gt;&lt;p&gt;Our objective is to get a curve that matches the sample data as close as possible. Thus, our fitness function will take the absolute values of the differences between the predicted y's and sample y's, sum them all up, and take their negative. In other words:&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_FKhfwb8_60M/Sm7Tvj5up5I/AAAAAAAAAAw/eq6FVIw9pQ8/s1600-h/neg_abs_sum_delta_y.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 237px; height: 21px;" src="http://1.bp.blogspot.com/_FKhfwb8_60M/Sm7Tvj5up5I/AAAAAAAAAAw/eq6FVIw9pQ8/s320/neg_abs_sum_delta_y.png" alt="" id="BLOGGER_PHOTO_ID_5363457020289525650" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;In Python:&lt;/p&gt;&lt;pre class="brush: python"&gt;def fitness_function(chromosome):&lt;br /&gt;score = 0&lt;br /&gt;for point in sample_points:&lt;br /&gt;  # eval_polynomial computes the value of the polynomial ``chromosome`` at ``x``&lt;br /&gt;  delta = abs(eval_polynomial(point[0], *chromosome) - point[1])&lt;br /&gt;  score += delta&lt;br /&gt;score = -score&lt;br /&gt;return score&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;p&gt;Our genetic algorithm will rank the chromosomes (polynomials) based on these scores. The higher the score, the higher chance of getting selected and of crossing.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;To make our fitness functions flexible, we opt to make a factory function to make fitness functions based on different sets of sample points, like so:&lt;/p&gt;&lt;pre class="brush: python"&gt;def generate_fitness_function(sample_points):&lt;br /&gt;def fitness_function(chromosome):&lt;br /&gt;  score = 0&lt;br /&gt;  for point in sample_points:&lt;br /&gt;     delta = abs(eval_polynomial(point[0], *chromosome) - point[1])&lt;br /&gt;     score += delta&lt;br /&gt;  score = -score&lt;br /&gt;  return score&lt;br /&gt;return fitness_function&lt;/pre&gt;&lt;br /&gt;&lt;h3&gt;Creating the population&lt;/h3&gt;&lt;p&gt;For simplicity's sake, we'll use a list of integers to represent each chromosome. This is represented by &lt;code&gt;G1DList.G1DList&lt;/code&gt;. &lt;code&gt;G1DList&lt;/code&gt; has convenient default settings, but for this experiment we'll change the range of integers each chromosome can contain, and their length. Finally, we set its fitness function (or evaluator function).&lt;/p&gt;&lt;pre class="brush: python"&gt;&lt;br /&gt;genome = G1DList.G1DList(5)&lt;br /&gt;genome.setParams(rangemin=-50, rangemax=50)&lt;br /&gt;genome.evaluator.set(generate_fitness_function(sample_points))&lt;/pre&gt;&lt;h3&gt;&lt;br /&gt;&lt;/h3&gt;&lt;h3&gt;Setting up the engine&lt;/h3&gt;&lt;p&gt;The GA engine is represented by the &lt;code&gt;GSimpleGA.GSimpleGA&lt;/code&gt; class. We create an instance of it, and optionally, set its population size, generations, selectors, and other factors. We'll stick with changing the population size and the selector.&lt;/p&gt;&lt;pre class="brush: python"&gt;ga = GSimpleGA.GSimpleGA(genome)&lt;br /&gt;ga.setPopulationSize(1000)&lt;br /&gt;ga.selector.set(Selectors.GRouletteWheel)&lt;/pre&gt;&lt;p&gt;Increasing the population from the default 80 increases our algorithm's accuracy at a cost in speed. We'll see just how much in the next article.&lt;/p&gt;&lt;p&gt;Changing the selector is a crucial step in this particular algorithm. The default selector is &lt;code&gt;Selectors.GRankSelector&lt;/code&gt;, which selects chromosomes based on their ranking – it uses the scores only to rank the chromosomes. This discards important infromation – the deltas from the sample points. Without making full use of these deltas, our algorithm will be highly inaccurate, as we'll see in a future article.&lt;/p&gt;&lt;p&gt;To fix this problem, we change our selector to &lt;code&gt;Selectors.GRouletteWheel&lt;/code&gt;, which selects all the chromosomes to be picked simultaneously, with probabilities corresponding to their fitness relative to the population's mean fitness.&lt;a href="http://samizdat.mines.edu/ga_tutorial/ga_tutorial.ps"&gt;This e-book&lt;/a&gt; has more information on the particular mechanism and the theory behind it.&lt;/p&gt;&lt;p&gt;Finally, before we run the algorithm, we'll change the scaling method used. The default scaling method is &lt;code&gt;Scaling.LinearScaling&lt;/code&gt;, which only works for postive raw scores. Since our scoring function uses negative scores, and requires the use of negative scores unless we decide to use reciprocals – which will require arbitary precision to work properly – we must change our scaling method. The only scaling method currently available that works for negative scores is &lt;code&gt;Scaling.SigmaTruncScaling&lt;/code&gt;, so we'll use that.&lt;/p&gt;&lt;pre class="brush: python"&gt;pop = ga.getPopulation()&lt;br /&gt;pop.scaleMethod.set(Scaling.SigmaTruncScaling)&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;h3&gt;&lt;br /&gt;&lt;/h3&gt;&lt;h2&gt;Running the algorithm&lt;/h2&gt;&lt;p&gt;Finally we run it with &lt;code&gt;GSimpleGA&lt;/code&gt;'s &lt;code&gt;evolve&lt;/code&gt; function, and print out the best individual from the evolution. We'll also print out the original polynomial and the sample points to see how well our algorithm did.&lt;/p&gt;&lt;pre class="brush: python"&gt;ga.evolve(freq_stats=10) # freq_stats is the number of generations between each statistics message.&lt;br /&gt;print(ga.bestIndividual())&lt;br /&gt;print(source_polynomial)&lt;br /&gt;print(sample_points)&lt;br /&gt;&lt;/pre&gt;&lt;p&gt;The full code of our algorithm is:&lt;/p&gt;&lt;br /&gt;&lt;pre class="brush: python"&gt;from pyevolve import G1DList, GSimpleGA, Selectors, Scaling, DBAdapters&lt;br /&gt;from random import seed, randint, random&lt;br /&gt;&lt;br /&gt;def eval_polynomial(x, *coefficients):&lt;br /&gt;result = 0&lt;br /&gt;for exponent, coeff in enumerate(coefficients):&lt;br /&gt; result += coeff*x**exponent&lt;br /&gt;return result &lt;br /&gt;&lt;br /&gt;def generate_fitness_function(sample_points):&lt;br /&gt;def fitness_function(chromosome):&lt;br /&gt; score = 0&lt;br /&gt; for point in sample_points:&lt;br /&gt;     delta = abs(eval_polynomial(point[0], *chromosome) - point[1])&lt;br /&gt;     score += delta&lt;br /&gt; score = -score&lt;br /&gt; return score&lt;br /&gt;return fitness_function&lt;br /&gt;&lt;br /&gt;if __name__ == "__main__":&lt;br /&gt;# Generate a random polynomial, and generate sample points from it&lt;br /&gt;seed()&lt;br /&gt;&lt;br /&gt;source_polynomial = []&lt;br /&gt;for i in xrange(randint(1, 5)):&lt;br /&gt; source_polynomial.append(randint(-20,20))&lt;br /&gt;&lt;br /&gt;sample_points = []&lt;br /&gt;for i in xrange(20):&lt;br /&gt; n = randint(-100, 100)&lt;br /&gt; sample_points.append((n, eval_polynomial(n, *source_polynomial)))&lt;br /&gt;&lt;br /&gt;# Create the population&lt;br /&gt;genome = G1DList.G1DList(5)&lt;br /&gt;genome.evaluator.set(generate_fitness_function(sample_points))&lt;br /&gt;genome.setParams(rangemin=-50, rangemax=50)&lt;br /&gt;&lt;br /&gt;# Set up the engine&lt;br /&gt;ga = GSimpleGA.GSimpleGA(genome)&lt;br /&gt;ga.setPopulationSize(1000)&lt;br /&gt;ga.selector.set(Selectors.GRouletteWheel)&lt;br /&gt;&lt;br /&gt;# Change the scaling method&lt;br /&gt;pop = ga.getPopulation()&lt;br /&gt;pop.scaleMethod.set(Scaling.SigmaTruncScaling)&lt;br /&gt;&lt;br /&gt;# Start the algorithm, and print the results.&lt;br /&gt;ga.evolve(freq_stats=10)&lt;br /&gt;print(ga.bestIndividual())&lt;br /&gt;print("Source polynomial: " + repr(source_polynomial))&lt;br /&gt;print("Sample points: " + repr(sample_points))&lt;/pre&gt;&lt;br /&gt;&lt;h2&gt;Results&lt;/h2&gt;&lt;p&gt;Upon running, you should get something like this:&lt;/p&gt;&lt;pre class="brush: plain"&gt;&lt;br /&gt;Gen. 1 (1.00%): Max/Min/Avg Fitness(Raw) [15727000722.62(-4292810.00)/0.00(-18226440830.00)/9094256863.27(-6681328575.30)]&lt;br /&gt;Gen. 10 (10.00%): Max/Min/Avg Fitness(Raw) [4637786255.82(-4204824.00)/0.00(-17834916446.00)/3703815517.04(-1089486993.87)]&lt;br /&gt;Gen. 20 (20.00%): Max/Min/Avg Fitness(Raw) [4631124226.18(-4204824.00)/0.00(-17834764598.00)/4133406394.92(-687453861.04)]&lt;br /&gt;Gen. 30 (30.00%): Max/Min/Avg Fitness(Raw) [2783863862.93(-4204824.00)/0.00(-14596526122.00)/2444879772.09(-441610027.03)]&lt;br /&gt;Gen. 40 (40.00%): Max/Min/Avg Fitness(Raw) [3285644641.87(-4197260.00)/0.00(-16392222838.00)/2983713510.34(-436706999.06)]&lt;br /&gt;Gen. 50 (50.00%): Max/Min/Avg Fitness(Raw) [3144311576.46(-62648.00)/0.00(-16006551450.00)/2935495141.08(-336836855.08)]&lt;br /&gt;Gen. 60 (60.00%): Max/Min/Avg Fitness(Raw) [2965577707.55(-58706.00)/0.00(-14566565128.00)/2791088446.86(-294883289.58)]&lt;br /&gt;Gen. 70 (70.00%): Max/Min/Avg Fitness(Raw) [2824632389.09(-57634.00)/0.00(-14567123878.00)/2675797266.42(-256698663.52)]&lt;br /&gt;Gen. 80 (80.00%): Max/Min/Avg Fitness(Raw) [2960822190.42(-21138.00)/0.00(-17483256712.00)/2798736824.07(-273902724.22)]&lt;br /&gt;Gen. 90 (90.00%): Max/Min/Avg Fitness(Raw) [2803805896.09(-2874.00)/0.00(-17482829524.00)/2653855523.20(-264112618.46)]&lt;br /&gt;Gen. 100 (100.00%): Max/Min/Avg Fitness(Raw) [3539918853.96(-2868.00)/0.00(-18213161666.00)/3390487526.67(-290427123.81)]&lt;br /&gt;Total time elapsed: 18.642 seconds.&lt;br /&gt;- GenomeBase&lt;br /&gt;Score:    -2868.000000&lt;br /&gt;Fitness:   3539918853.958958&lt;br /&gt;&lt;br /&gt;Slot [Evaluator] (Count: 1)&lt;br /&gt;Name: fitness_function&lt;br /&gt;Slot [Initializator] (Count: 1)&lt;br /&gt;Name: G1DListInitializatorInteger&lt;br /&gt;Doc:  Integer initialization function of G1DList&lt;br /&gt;&lt;br /&gt;This initializator accepts the *rangemin* and *rangemax* genome parameters.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Slot [Mutator] (Count: 1)&lt;br /&gt;Name: G1DListMutatorSwap&lt;br /&gt;Doc:  The mutator of G1DList, Swap Mutator&lt;br /&gt;Slot [Crossover] (Count: 1)&lt;br /&gt;Name: G1DListCrossoverSinglePoint&lt;br /&gt;Doc:  The crossover of G1DList, Single Point&lt;br /&gt;&lt;br /&gt;.. warning:: You can't use this crossover method for lists with just one element.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;- G1DList&lt;br /&gt;List size:  5&lt;br /&gt;List:   [1, -10, 12, 6, 0]&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Source polynomial: [16, -7, 12, 6]&lt;br /&gt;Sample points: [(1, 27), (-88, -3995272), (-47, -596085), (49, 734379), (35, 271721), (54, 979414), (-44, -487548), (-93, -4721687), (-45, -522119), (-72, -2176760), (99, 5938729), (78, 2919790), (-62, -1383390), (0, 16), (-25, -86059), (-36, -264116), (-52, -810820), (-16, -21376), (6, 1702), (64, 1621584)]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;p&gt;As you can see, the GA did well, getting the 3rd and 4th coefficients spot on, and getting close to the others. You can compare the two polynomial functiosn in the following graph, with blue points and line representing the orginal polynomial, and the red line, the genetic algorithm's.&lt;/p&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_FKhfwb8_60M/Sm8I_CyjRZI/AAAAAAAAAA4/G7XUFe621wc/s1600-h/sample_plot.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 382px; height: 236px;" src="http://1.bp.blogspot.com/_FKhfwb8_60M/Sm8I_CyjRZI/AAAAAAAAAA4/G7XUFe621wc/s320/sample_plot.png" alt="" id="BLOGGER_PHOTO_ID_5363515560395228562" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;p&gt;The red line matches the blue line almost perfectly – the blue line is barely visible behind it. The GA fit the curve well – although less efficiently than a specialized algorithm (try polynomial regression at http://www.arachnoid.com/sage/polynomial.html). Try it yourself, see how it performs for you&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6792450305901069564-3186481646981133124?l=acodersmusings.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/ACodersMusings/~4/y87NAVn3tdk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://acodersmusings.blogspot.com/feeds/3186481646981133124/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://acodersmusings.blogspot.com/2009/07/curve-fitting-with-pyevolve.html#comment-form" title="14 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6792450305901069564/posts/default/3186481646981133124?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6792450305901069564/posts/default/3186481646981133124?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ACodersMusings/~3/y87NAVn3tdk/curve-fitting-with-pyevolve.html" title="Curve fitting with Pyevolve" /><author><name>Tim Dumol</name><uri>http://www.blogger.com/profile/07647576678670279091</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_FKhfwb8_60M/Sm7Tvj5up5I/AAAAAAAAAAw/eq6FVIw9pQ8/s72-c/neg_abs_sum_delta_y.png" height="72" width="72" /><thr:total>14</thr:total><feedburner:origLink>http://acodersmusings.blogspot.com/2009/07/curve-fitting-with-pyevolve.html</feedburner:origLink></entry></feed>

