<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Marcio von Muhlen]]></title>
  <link href="http://marciovm.com/atom.xml" rel="self"/>
  <link href="http://marciovm.com/"/>
  <updated>2014-12-07T18:48:20-08:00</updated>
  <id>http://marciovm.com/</id>
  <author>
    <name><![CDATA[]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[We need a Github of Science]]></title>
    <link href="http://marciovm.com/i-want-a-github-of-science/"/>
    <updated>2011-04-21T17:05:00-07:00</updated>
    <id>http://marciovm.com/i-want-a-github-of-science</id>
    <content type="html"><![CDATA[<p><img src="http://marciovm.com/images/profzi_scheme.gif" title="&#34;Piled Higher and Deeper&#34; by Jorge Cham" alt="&#34;Piled Higher and Deeper&#34; by Jorge Cham"></p>

<p><a href="http://www.phdcomics.com">&#8220;Piled Higher and Deeper&#8221; by Jorge Cham</a></p>

<p><strong> Summary </strong></p>

<ul>
<li><p>Publishing is central to Academia, but its publishing system is outclassed by what Open Source software developers have in GitHub</p></li>
<li><p>GitHub&#8217;s success is not just about openness, but also a prestige economy that rewards valuable content producers with credit and attention</p></li>
<li><p>Open Science efforts like arXiv and PLoS ONE should follow GitHub&#8217;s lead and embrace the social web</p></li>
</ul>


<p><strong>Publish or Perish</strong></p>

<p>I am postdoctoral fellow, and <a href="http://dbmi.ucsd.edu/confluence/display/BMI/Division+of+BioMedical+Informatics">my academic department</a> is currently
running a junior faculty search.  We are interviewing four candidates, each of
whom will present a job talk attended by the entire department.  Before each
talk, I&#8217;ll receive each candidate&#8217;s application packets, and my eyes will scan
the &#8220;publications&#8221; section of their resume.  The presence of a first-author
article in the ultra-prestigious academic journals Science or Nature would all
but guarantee an offer.  Multiple publications in top-tier journals would
indicate a strong application.  If those are missing, meaning the publication
history is weak, I&#8217;ll wonder how that person got an interview in the first
place.  Cultural fit, letters of reference and other credentials certainly
matter, but beyond publications, everything is secondary.</p>

<p>To anyone involved in academia, this overwhelming focus on publications is a
given.  Publishing is so central to scientists that their academic value can
be measured by adding the relative worth of their publications.  After many
years, citations preferentially accumulate towards publications of
significance, and by extension, their authors.  Ranking importance by
citations received is a powerful concept, and incidentally is the basis for
<a href="http://en.wikipedia.org/wiki/PageRank">Google&#8217;s search algorithm</a>.  But at the beginning of an academic&#8217;s
career, before citations accumulate, reputation rests largely on what journals
they have published in.</p>

<p>Getting a paper accepted into an academic journal requires passage through the
often opaque process of peer-review. Scientists make a big deal of peer-
review, because it is supposed to be the filter that separates mere opinions
from trusted, citable sources. However, the peer-review process in science has
close analogs in any &#8220;old-media&#8221; field, such as TV or radio. Like academic
journals, these are mediums of limited capacity, and there are always more
submissions (or ideas for submissions) than there are openings.  Selecting
content worthy enough for distribution is made by the field&#8217;s establishment,
which effectively silences what they don&#8217;t choose. This is especially true of
peer-review as practiced in prestigious journals, defined as the ones that get
their contributors faculty jobs.</p>

<p>Having editorial decisions made by established experts makes sense, since they
draw on judgement born from years of experience. But this exposes the system
to vulnerabilities common to any decision by committee &#8211; especially semi-
secret committee &#8211; such as lack of agility, an aversion to disruptive
innovation, and the tendency of committee members (and their friends) to be
more equal in their own eyes than anyone else.  Because publishing affects
scientists so deeply, the strengths and weaknesses of this system inevitably
affect the makeup and character of science as a whole.  Which makes one
wonder, is there a better way?</p>

<p><strong>GitHubbing</strong></p>

<p>My training has spanned biology, engineering, and computer science.  My latest
project, <a href="http://secure.dokbot.com/">Instant Q&amp;A for Physician Communities</a>, relies heavily on open
source code and led me to <a href="http://www.github.com/">GitHub</a> and <a href="http://git-scm.com/">Git</a>.  The Linux community
<a href="http://progit.org/book/ch1-2.html">developed Git</a>, a distributed version control system, to coordinate work
on the Linux code repository among thousands of programmers. Git is itself
open source, and has become widely adopted for many software projects (open
source and not).  GitHub is a cloud service that hosts <a href="http://techcrunch.com/2010/07/24/github-one-million/">over 1 million</a>
Git repositories.  Since its launch in 2008, GitHub has quickly become the <em>de
facto</em> platform for publishing open source code, whose popularity is changing
the world.  If you&#8217;ve ever been astonished at how quickly the web world seems
to move, the primary reasons are 1) it&#8217;s not dominated by Microsoft, so we
have competition instead of a monopoly, and 2) open source code, widely shared
through a multitude of email lists in the past and now centralized at GitHub
[<a href="http://marciovm.com/i-want-a-github-of-science#f1n">1</a>].  How has GitHub become so successful?</p>

<p>GitHub is a social network of code, the first  platform for sharing validated
knowledge native to the social web [<a href="http://marciovm.com/i-want-a-github-of-science#f2n">2</a>].  This is a big deal.  I believe
it represents a demonstrably superior way of distributing validated knowledge
than academic publishing.  How are these even related?  Software developers
rarely write applications from scratch.  Instead, they often start with
various modular bundles of open source code.  Within Ruby (the programming
language underlying the popular web application framework Ruby on Rails
[<a href="http://marciovm.com/i-want-a-github-of-science#f3n">3</a>]) these bundles are called gems.  My current project employs 34 gems.
Each one is responsible for a specific task, such as <a href="https://github.com/plataformatec/devise">logging in users</a>,
<a href="https://github.com/marcel/aws-s3">interfacing with cloud storage</a>, or making <a href="https://github.com/imathis/fancy-buttons">fancy-looking buttons</a>.
Science operates in a similar way.  Scientists never begin a research project
from an intellectual vacuum.  They stand on the shoulders of giants, building
on the knowledge contained in previous publications to form a new, coherent
finding.  For example, the <a href="https://s3.amazonaws.com/marcios/Marciovm+et+al+Analytical+Chem.pdf">article</a> in which I published the bulk of my
PhD thesis cites 38 others.</p>

<p>Gems are typically developed, distributed, and promoted through GitHub, and
therein lies the connection.  GitHub has evolved to solve the same general
problem that scientific publishing does: making modular, validated units of
knowledge easily usable by a global community, with mechanisms that
efficiently allocate prestige to proven contributors.  GitHub has the
advantage of doing this with 21st century technology, the social web, while
academic publishing is based on the printing press.  This suggests an
opportunity for the scientific community to evolve its publishing practices by
assimilating mechanisms proven to work for GitHub.</p>

<p><strong>Published Versus Prestigious</strong></p>

<p>The existing peer-review process <a href="http://www.timeshighereducation.co.uk/story.asp?sectioncode=26&amp;storycode=414579&amp;c=2">arose from the limited carrying capacity of physical journals</a>.  Prioritization had to happen before publication,
because journals were limited in size to what could be economically printed
and shipped.  If you were born before 1990, you may recall the prestige
formerly associated with being &#8220;a published author&#8221;.  However, in the times we
are living in, distributing media is basically free.  Anyone can start a blog
and deliver content worldwide in minutes.  Clay Shirky has made a career of
<a href="http://www.amazon.com/gp/product/1594202532/ref=as_li_ss_tl?ie=UTF8&amp;tag=marciovm-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1594202532">deftly explaining</a> how this has fundamentally changed the media equation,
with such unexpected consequences as YouTube videos that <a href="http://www.youtube.com/watch?v=_OBlgSz8sSM&amp;feature=player_embedded">get more views</a>
than Super Bowl commercials.</p>

<p>Individuals still have a limited capacity for consuming and evaluating
content, so prioritization and authentication remain necessary, but look
different.  These functions are now disconnected from publication.  Google
prioritizes web pages by analyzing utility <em>after</em> publication, by tracking
citations in the form of inbound links.  Similarly, anyone can publish a gem
to GitHub, and published gems are prioritized by the numbers of developers
&#8220;watching&#8221; for updates or &#8221;<a href="http://en.wikipedia.org/wiki/Fork_(software_development)">forking</a> &#8221; new development lines.  This is the
social web at work, where the audience gets to decide what and whom to pay
attention to all by itself, without requiring assistance from all-powerful
editorial committees.  One can complain that lowering barriers to publication
leads to content that on average is of lower quality.  But the abundance of
non-significant projects in GitHub does not detract from its usability,
because those projects are never brought to anyone&#8217;s attention [<a href="http://marciovm.com/i-want-a-github-of-science#f4n">4</a>].</p>

<p>Prestige is really about having an engaged audience that follows and
recognizes your activities.  This formerly required publication through
established venues, but that&#8217;s no longer needed since your audience can use
the social web to recognize and engage with you directly.</p>

<p><strong>The Market for Prestige</strong></p>

<p>Gems on GitHub are not just code.  They also have authors whose relative
contributions are automatically catalogued by Git, as shown in this <a href="https://github.com/jquery/jquery/graphs/impact">impact graph</a> for the popular and open source jQuery project.  If you&#8217;ve visited
a web application recently, chances are you&#8217;ve benefitted from jQuery, which
makes it easy for a web engineer to turn static web sites into responsive web
applications (think interactions with buttons instead of navigation through
links).  This impact graph can let you know precisely which developers are
responsible for this awesome-ness. In this way, GitHub acts as an efficient,
incorruptible &#8220;central bank&#8221; of the prestige supply. Furthermore, unlike in
Google, great contributions in GitHub bring prestige to their creators, not
their domain names.  If you wanted to hire a contractor to work on a web
application, GitHub can let you know who has publicly demonstrated the skills
you&#8217;d need.  It&#8217;s thus not surprising that <a href="http://blog.stackoverflow.com/2011/03/careers-2-0-now-does-github/">GitHub profiles are supplanting traditional resume items</a>, such as a CS degree, for discerning employers
looking to hire top talent.</p>

<p>By contrast, current Open Science <a href="http://openwetware.org/wiki/Main_Page">efforts</a> that ask scientists to &#8221;<a href="http://figshare.com/">share all your data</a>&#8221; have not become mainstream, because they do not
appropriately reward knowledge producers.  They are all free-distribution and
no prestige, solving a different half of the problem than traditional journals
but not the whole enchilada.  Put another way, when anything can be published,
there is no prestige associated with being published, so prestige must be
introduced in other ways.  Evangelists for Open Science should focus on
promoting new, post-publication prestige metrics that will properly
incentivize scientists to focus on the utility of their work, which will allow
them to start worrying less about publishing in the right journals.</p>

<p>The biomedical world is increasingly permeated by code and data [<a href="http://marciovm.com/i-want-a-github-of-science#f5n">5</a>],
which should be very amenable to GitHub style metrics since they are by nature
tied to networked computers.  Scientists in fields like genomics and
biomedical informatics are being held to the same publication expectations as
their peers, but this makes little sense.  An article describing a genomic
database is nowhere near as useful as an open API for accessing it.  We need
trusted ways to quantify just how useful that API and associated code are to
the scientific community, which can be listed on a scientist&#8217;s profile and
utilized by committees making hiring and funding decisions [<a href="http://marciovm.com/i-want-a-github-of-science#f6n">6</a>].</p>

<p><strong>Challenges and Current Efforts</strong></p>

<p>Of course, there are fundamental differences between publishing software code
and publishing science.  Copying code results in an exact replica and does not
affect the original.  By contrast, duplicating a research finding may require
significant expenses just to recreate experimental conditions.  Code is
structured by the strict syntax of programming languages, while most
scientific research is not.  For this reason and others, academic articles and
journals are not going to disappear, but they should not be the only way for a
scientist to accumulate prestige.</p>

<p>Unfortunately, energy that could be spent developing these new solutions is
instead tied up with the older struggle of open-access.  Universities still
pay outrageous sums to journal publishers to allow them access to the
knowledge they just produced, reviewed, and edited on their own dime
[<a href="http://marciovm.com/i-want-a-github-of-science#f7n">7</a>].  Broadly speaking, traditional journals are being reduced to rent-
takers on brand names with reputational inertia.  <a href="http://www.arxiv.org/">arXiv</a>, which provides
open access to pre-prints in many quantitative disciplines, is a notable and
<a href="http://www.sciencemag.org/content/259/5099/1246">long-running</a> example of the scientific community&#8217;s workaround to this
problem.  The arXiv is amazing, but why remain dependent on a system it could
be replacing [<a href="http://marciovm.com/i-want-a-github-of-science#f8n">8</a>]?</p>

<p><a href="http://www.plos.org/about/principles.php">PLoS</a> is at the cutting edge of both open-access and <a href="http://www.slideshare.net/dduin/rethinking-the-functions-of-a-journal-some-case-studies-from-plos-by-mark-patterson">rethinking the functions of a journal</a>.  <a href="http://www.plosone.org/">PLoS One</a> comes closest to what I am
describing, in that their peer-review process screens only for scientific
rigour, not perceived impact, meaning they will publish content considered
unsexy and let future citations determine importance.  But they have not yet
embraced the social web, as the lack of scientist profiles (with associated
prestige metrics) in their website demonstrates.  In programmer jargon, PLoS
ONE needs to become a web application, not a website that hosts content.  One
problem might be that they still consider themselves a journal first, <a href="http://www.plosone.org/static/edboard.action">and journals have editorial boards</a>, while <a href="http://www.poynter.org/latest-news/top-stories/110111/why-the-new-york-times-eliminated-its-social-media-editor-position/">social web is all about <em>not</em> having editors</a>.  There is no editorial board at GitHub.</p>

<p>When I discuss this with current faculty, a typical reaction is that I&#8217;m
pining for a social network of scientists.  That seems reasonable, and it is
<a href="http://www.academia.edu/">being tried</a>, but may not be bold enough.  GitHub did not succeed by
being a social network of programmers.  It succeeded by being a social network
of code.  We need a social network of science, meaning scientific bundles of
knowledge must be structured and accessible by API, with the connections among
those bundles and appropriate utility metrics being what connects and
prioritizes scientists.</p>

<p>APIs for science already exist, and some are incredibly useful, but they have
ignored authorship and prestige implications which have prevented them from
achieving their potential.  For example, biophysicists have the <a href="http://www.pdb.org/pdb/home/home.do">RCSB Protein Data Bank</a>, which stores experimentally determined protein structures.
This database is a tremendous asset to the field, but it could represent much
more, as a story from my younger days illustrates.  In 2004, as an undergrad,
I spent a summer writing Python code to download and analyze all existing RCSB
structures.  That program built a database of &#8220;real&#8221; structures to train a
scoring algorithm, which subsequently scored computationally generated
structures to see how &#8220;real&#8221; they seemed [<a href="http://marciovm.com/i-want-a-github-of-science#f9n">9</a>].  Unfortunately, my results
were not compelling enough to be published in a prestigious academic journal,
and therefore not interesting to my research adviser.  Open-sourcing and
publishing that code might have saved someone&#8217;s time, spurred new thinking, or
at the very least marked a tangible reward for my work.  But the incentives to
my adviser weren&#8217;t there, so he did not suggest it.  That idea did not even
occur to me, because I was not a good enough programmer to know about
<a href="http://sourceforge.net/">SourceForge</a>, a less-social precursor to GitHub, so the code went
nowhere.</p>

<p><strong>Hey Mr. Gates</strong></p>

<p>It may be that the activation energy required to initiate changes won&#8217;t arise
within the system.  In that case, an outside push might do the job, and the
best place for this push to come from may be a nimble funding agency.  For
example, a request for proposals could specify that phase II funding decisions
would be based on the impact of online resources developed in phase I, as
measured by specific metrics developed with community feedback.  Nothing makes
a scientist contemplate change faster than a new source of grant money, and
the only thing better than a faculty applicant with a paper in Science may be
one bringing in a multimillion dollar grant.</p>

<p><strong>Further reading:</strong></p>

<p><a href="http://www.sciencedirect.com/science?_ob=MImg&amp;_imagekey=B758F-4R9GGVW-1-7&amp;_cdi=12925&amp;_user=4429&amp;_pii=S1570826807000583&amp;_origin=gateway&amp;_coverDate=02%2F29%2F2008&amp;_sk=999939998&amp;view=c&amp;wchp=dGLzVzb-zSkzk&amp;md5=84599c163a5b359195c25f6fd5449c2b&amp;ie=/sdarticle.pdf">Collective knowledge systems: Where the Social Web meets the Semantic Web
</a>(Tom Gruber)</p>

<p><a href="http://cshe.berkeley.edu/publications/publications.php?id=379">Peer Review in Academic Promotion and Publishing: Its meaning, locus and
future</a> (Diane Harley and Sophia Krzy)</p>

<p><a href="http://www.cs.cornell.edu/home/kleinber/stoc11-credit.pdf">Mechanisms for (Mis)Allocating Scientific Credit.</a> (Jon Kleinberg and
Sigal Ore)</p>

<p><a href="http://friendfeed.com/the-life-scientists">The Life Scientists room on Friendfeed</a></p>

<p><strong>Notes</strong></p>

<p><a name="f1n">1</a> &#8220;The combination of the Internet and open source transformed the
functionality in modern programming tools, increasing developer productivity
10 fold&#8221; - <a href="http://voices.allthingsd.com/20110323/bubble-trouble-i-dont-think-so/">Ben Horowitz</a>, formerly of Netscape.</p>

<p><a name="f2n">2</a> &#8220;Native&#8221; in the sense <a href="http://www.usv.com/2009/06/the-mobile-chal.php">eloquently explained by USV</a> (the VCs who funded
Twitter): &#8220;Native opportunities are the ones that make use of unique
capabilities of [new] platforms&#8221;.  The social web is the new platform.</p>

<p><a name="f3n">3</a> For example, Twitter, Groupon, and GitHub itself run on Ruby on Rails.</p>

<p><a name="f4n">4</a> I speculate that many gems are also discovered through technical blogs
(found through Google) or the programmer Q&amp;A site <a href="http://stackoverflow.com/">StackOverflow</a>.</p>

<p><a name="f5n">5</a> Biomedical research is also huge - funding has been <a href="http://jama.ama-assn.org/content/303/2/137.full">squeezed lately</a>,
but is still on the order of ~$100B annually.  Therefore the potential market
is large enough to be worthwhile to build for.</p>

<p><a name="f6n">6</a> The unique requirements of the scientific community probably mean GitHub
itself can&#8217;t do the job.</p>

<p><a name="f7n">7</a> Or more accurately, on the federal government or philanthropic organizations
that fund them. Journals do not compensate their editors or peer reviewers.</p>

<p><a name="f8n">8</a> The peer-review process <a href="http://www.news.cornell.edu/stories/Sept06/library_arXiv.html">has been hacked via arXiv before</a>, by Grigori
Perelman.  But to appreciate how unusual Grigori&#8217;s motivations are, consider
that he also refused to accept the Fields Medal and its $1M cash prize.</p>

<p><a name="f9n">9</a> Computing protein structure from amino acid sequence is known as &#8220;the
protein folding problem&#8221; and is one of the <a href="http://mail.cmu.edu.tw/~ythuang/the_holy_grail.htm">holy grails of science</a>.</p>

<p><strong>Thanks to</strong> <a href="http://seanahrens.net">Sean Ahrens</a>, Sean Carroll, Manuel Cebrian, Wendy Chapman, Lawrence David, Lucila Ohno-Machado, Carlos von Muhlen, Denise von Muhlen, and Ryan Weald for reading drafts and helpful discussions.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[hackers-wanted-1000-job-posts-to-course-vi-at]]></title>
    <link href="http://marciovm.com/blog/2011/02/21/hackers-wanted-1000-job-posts-to-course-vi-at/"/>
    <updated>2011-02-21T17:58:00-08:00</updated>
    <id>http://marciovm.com/blog/2011/02/21/hackers-wanted-1000-job-posts-to-course-vi-at</id>
    <content type="html"><![CDATA[<p><strong>Tried to hire a hacker lately?</strong></p>

<p>Hacker talent is highly non-commoditized, and the ROI at the top end of the
market more closely resembles professional sporting leagues than traditional
engineering career fields. [<a href="http://marciovm.com/hackers-wanted-1000-job-posts-to-course-vi-at#f1n">1,2</a>] I have a theory that <a href="http://paulgraham.com/">Paul Graham</a>
is the <a href="http://en.wikipedia.org/wiki/Scott_Boras">Scott Boras</a> of this marketplace, which I&#8217;ll write about in the
future.  The ability to scout and hire great hackers before other market
participants (GOOG = Yankees?) can be extremely valuable.</p>

<p>The mailing list for Course VI at MIT is an interesting place to see what
people who are (presumably) looking to hire great hackers are saying in
&#8220;hackers wanted&#8221; posts. [<a href="http://marciovm.com/hackers-wanted-1000-job-posts-to-course-vi-at#f3n">3</a>] Many years ago, my friend Fergus at
<a href="http://picbounce.com/">PicBounce</a> tipped me off that it&#8217;s a great resource to capture trends and
vocabulary in the tech entrepreneurship world. [<a href="http://marciovm.com/hackers-wanted-1000-job-posts-to-course-vi-at#f4n">4</a>] It&#8217;s also good for a
few laughs. [<a href="http://marciovm.com/hackers-wanted-1000-job-posts-to-course-vi-at#f5n">5</a>]  To brush off my python skills, I performed basic word
count analyses on last semester&#8217;s posts; I figured the results might be
interesting to others so I&#8217;m sharing them here.</p>

<p>Updated 01-18-11: The administrator of this list contacted me and asked me to
clarify details of who maintains the list.  It&#8217;s not the EECS Department&#8217;s
list, but rather a personal list maintained by Anne Hunter, who can be reached
at anneh@eecs at MIT&#8217;s domain name.  Thanks Anne for keeping this very useful
list going, and my apology for not crediting you when I first posted this.</p>

<p><img src="http://marciovm.com/images/hacker_posts_word_count.png"></p>

<p><strong>Methods</strong></p>

<p>I extracted 944 messages sent to the announcement list from July to December
2010 to a text file using Automator on Snow Leopard.  Using python, I parsed
messages into message objects containing date, subject, and body.  I ignored
date and subject and looked only at body text.  I then stripped punctuation
(except # and +), split() to tokenize strings, and built a wordDict with
key/value pairs of {word: wordCount}.  To filter non-job posts like course
announcements, I ignored messages that contained any of a list of words
usually used by the department. [<a href="http://marciovm.com/hackers-wanted-1000-job-posts-to-course-vi-at#f6n">6</a>]  I then merged all the individual
wordDicts into a globalDict.  I dropped words commonly used in English.
[<a href="http://marciovm.com/hackers-wanted-1000-job-posts-to-course-vi-at#f7n">7</a>]  I did something similar for two-word phrases, to capture terms like
&#8220;social networking&#8221; (data used for figures B and C).  I&#8217;m reporting average
word count in the globalDict &#8211; total word count divided by number of posts.
The number of posts analyzed was 905 (thus 39 were scored as being department
announcements).</p>

<p><strong>Results</strong></p>

<p>The most common word was &#8220;experience&#8221;.  Lots of generic technology words show
up in the top-50 (figure A), as expected.  I figured a more interesting plot
might be the occurrence of words that I thought <em>a priori</em> were interesting,
which I call &#8220;programmer vocabulary&#8221; (figure B).  This is subject to my own
biases of what I think is important, so sorry if I didn&#8217;t include your
favorite.  The winner there was &#8220;web&#8221;, followed closely by &#8220;mobile&#8221;.  The
prevalence of buzzwords like &#8220;rockstar&#8221;, &#8220;ninja&#8221; and &#8220;guru&#8221; was smaller than I
expected.  Finally I looked at locations (figure C).  The sum of the SF Bay
Area terms was 0.067, which I didn&#8217;t place in the figure because it would have
been the only multi-term aggregate.</p>

<p>Any thoughts on how I can improve this analyses?  I&#8217;ll do this again in 6
months if enough people find this interesting (email me and I&#8217;ll send you the
update when it&#8217;s ready).</p>

<p>Disclaimers:  This is a relatively small sample size subject to outlier
effects (i.e. a single message that contains &#8220;mobile&#8221; twenty times).  I used
an arbitrary exclusion list to filter department announcements and it might be
incomplete (i.e. these results may include department announcements that were
not job posts).</p>

<p><strong>Notes</strong></p>

<p><a name="f1n">1</a> Hacker as in &#8220;person who builds things with computer code&#8221;, as opposed to
the more common definition of &#8220;16 year old who does bad things with
computers&#8221;.</p>

<p><a name="f2n">2</a> Just look at the size of signing bonuses.  Paul Buchheit&#8217;s was up there with
Lebron James&#8217;s when he was <a href="http://www.readwriteweb.com/archives/facebook_just_bought_friendfeed.php">acqui-hired</a> by Facebook via Friendfeed (there
were additional engineers in this trade deal).</p>

<p><a name="f3n">3</a> Course VI is MIT-speak for Department of Electrical Engineering and Computer
Science.</p>

<p><a name="f4n">4</a> A great complement to <a href="http://news.ycombinator.com/">Hacker News</a>.</p>

<p><a name="f5n">5</a> A real email from Fall 2010, identities redacted to protect the guilty:</p>

<blockquote><p>Subject: Coder Needed for Social Network Website</p>

<p>Greetings,</p>

<p>My name is [<em>redacted</em>] and I am looking for a very experienced coder. My team
and I are looking to build a social networking site, very similar to facebook,
that has extreme potential. We plan on 95 percent of all college students
becoming active users within 4-8 months from the launch date. My partners
[<em>redacted</em>], [<em>redacted</em>], and I are marketing and promotions specialists
with a great outreach to the college market. We are searching for an
unbelievably talented coder, who is ready and willing to PARTNER UP on this
college based website project to make history. We are very serious about our
business venture so we are looking for someone who really feels confident that
they are capable of handling a project like this.</p>

<p>Please email all your contact info as well as a resume/portfolio with sample
projects.</p>

<p>I look forward to hearing from anyone who&#8217;s up for this challenge. Lets Make
History.</p>

<p>Best,</p>

<p>[<em>redacted</em>]</p>

<p>[<em>redacted</em>]@gmail.com</p></blockquote>

<p><a name="f6n">6</a> Words that appear in department posts: &#8216;ta&#8217;, &#8216;lecture&#8217;, &#8216;course
description&#8217;, &#8216;grad school&#8217;, &#8216;correction&#8217;, &#8216;announcement&#8217;, &#8216;websis&#8217;.</p>

<p><a name="f7n">7</a> Common words in English: &#8216;a&#8217;, &#8216;able&#8217;, &#8216;about&#8217;, &#8216;across&#8217;, &#8216;after&#8217;, &#8216;all&#8217;,
&#8216;almost&#8217;, &#8216;also&#8217;, &#8216;am&#8217;, &#8216;among&#8217;, &#8216;an&#8217;, &#8216;and&#8217;, &#8216;any&#8217;, &#8216;are&#8217;, &#8216;as&#8217;, &#8216;at&#8217;, &#8216;be&#8217;,
&#8216;because&#8217;, &#8216;been&#8217;, &#8216;but&#8217;, &#8216;by&#8217;, &#8216;can&#8217;, &#8216;cannot&#8217;, &#8216;could&#8217;, &#8216;dear&#8217;, &#8216;did&#8217;, &#8216;do&#8217;,
&#8216;does&#8217;, &#8216;either&#8217;, &#8216;else&#8217;, &#8216;ever&#8217;, &#8216;every&#8217;, &#8216;for&#8217;, &#8216;from&#8217;, &#8216;get&#8217;, &#8216;got&#8217;, &#8216;had&#8217;,
&#8216;has&#8217;, &#8216;have&#8217;, &#8216;he&#8217;, &#8216;her&#8217;, &#8216;hers&#8217;, &#8216;him&#8217;, &#8216;his&#8217;, &#8216;how&#8217;, &#8216;however&#8217;, &#8216;i&#8217;, &#8216;if&#8217;,
&#8216;in&#8217;, &#8216;into&#8217;, &#8216;is&#8217;, &#8216;it&#8217;, &#8216;its&#8217;, &#8216;just&#8217;, &#8216;least&#8217;, &#8216;let&#8217;, &#8216;like&#8217;, &#8216;likely&#8217;,
&#8216;may&#8217;, &#8216;me&#8217;, &#8216;might&#8217;, &#8216;most&#8217;, &#8216;must&#8217;, &#8216;my&#8217;, &#8216;neither&#8217;, &#8216;no&#8217;, &#8216;nor&#8217;, &#8216;not&#8217;,
&#8216;of&#8217;, &#8216;off&#8217;, &#8216;often&#8217;, &#8216;on&#8217;, &#8216;only&#8217;, &#8216;or&#8217;, &#8216;other&#8217;, &#8216;our&#8217;, &#8216;own&#8217;, &#8216;rather&#8217;,
&#8216;said&#8217;, &#8216;say&#8217;, &#8216;says&#8217;, &#8216;she&#8217;, &#8216;should&#8217;, &#8216;since&#8217;, &#8216;so&#8217;, &#8216;some&#8217;, &#8216;than&#8217;, &#8216;that&#8217;,
&#8216;the&#8217;, &#8216;their&#8217;, &#8216;them&#8217;, &#8216;then&#8217;, &#8216;there&#8217;, &#8216;these&#8217;, &#8216;they&#8217;, &#8216;this&#8217;, &#8216;tis&#8217;, &#8216;to&#8217;,
&#8216;too&#8217;, &#8216;twas&#8217;, &#8216;us&#8217;, &#8216;wants&#8217;, &#8216;was&#8217;, &#8216;we&#8217;, &#8216;were&#8217;, &#8216;what&#8217;, &#8216;when&#8217;, &#8216;where&#8217;,
&#8216;which&#8217;, &#8216;while&#8217;, &#8216;who&#8217;, &#8216;whom&#8217;, &#8216;why&#8217;, &#8216;will&#8217;, &#8216;with&#8217;, &#8216;would&#8217;, &#8216;yet&#8217;, &#8216;you&#8217;,
&#8216;your&#8217;.</p>
]]></content>
  </entry>
  
</feed>
