<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Commercial Intelligence</title>
	<atom:link href="http://haleyai.com/wordpress/feed/" rel="self" type="application/rss+xml" />
	<link>http://haleyai.com/wordpress</link>
	<description>systems that know and understand and think and learn</description>
	<lastBuildDate>Sun, 23 Apr 2023 12:36:37 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.2.21</generator>
	<item>
		<title>GPT under $100,000?</title>
		<link>http://haleyai.com/wordpress/2023/04/23/gpt-under-100000/</link>
				<pubDate>Sun, 23 Apr 2023 12:27:00 +0000</pubDate>
		<dc:creator><![CDATA[paul@haleyAI.com]]></dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Chat-GPT]]></category>
		<category><![CDATA[GPT-4]]></category>
		<category><![CDATA[Instruct-GPT]]></category>
		<category><![CDATA[language model]]></category>
		<category><![CDATA[LLaMA]]></category>
		<category><![CDATA[pre-training]]></category>
		<category><![CDATA[Stability AI]]></category>
		<category><![CDATA[Vicuna]]></category>

		<guid isPermaLink="false">http://haleyai.com/wordpress/?p=1662</guid>
				<description><![CDATA[For the last several years, we’ve been hearing about how much it costs to build ever larger language models.&#160; Today, a state-of-the-art language model requires approaching a million-trillion-trillion (1024) arithmetic operations involving hundreds of billions of parameters.&#160; Doing the math, assuming a decent, if older GPU, such as an A100, you come up with how &#8230; <p class="link-more"><a href="http://haleyai.com/wordpress/2023/04/23/gpt-under-100000/" class="more-link">Continue reading<span class="screen-reader-text"> "GPT under $100,000?"</span></a></p>]]></description>
								<content:encoded><![CDATA[
<p>For the last several years, we’ve been hearing about how much
it costs to build ever larger language models.&nbsp;
Today, a state-of-the-art language model requires approaching a
million-trillion-trillion (10<sup>24</sup>) arithmetic operations involving
hundreds of billions of parameters.&nbsp;
Doing the math, assuming a decent, if older GPU, such as an A100, you
come up with how many years this computation will take.&nbsp; Then you figure out how many GPUs you need
given how many days you have to complete the computation.&nbsp; For example, Meta recently published that
training a 65 billion parameter version of the <a href="https://arxiv.org/abs/2302.13971">LLaMA</a> model using over a trillion
tokens of text took approximately 21 days on roughly 2,000 such GPUs.&nbsp; That’s almost exactly 1 million hours of GPU
time, which can be had for less than $1,000,000.</p>



<p>So, for $1 million, given a few decent machine learning
folks, you could replicate a state-of-the-art language model or build your own
tweaked to perform better in your domain, such as Bloomberg has done given its
financial market.&nbsp; Expect to see much
more of this from many corners, especially in healthcare, various areas of the
life sciences, and others.</p>



<p>I would like to save the $1 million and start with the 30 or 65 billion parameter LLaMA model rather than train it from scratch.  Unfortunately, Meta is not forthcoming with model weights for LLaMA beyond 13 billion parameters.  The 13 billion parameter model is impressive enough.  The 7B model is not capable enough for me.  The 65 billion parameter model would be better, but not twice as good.  The 30B parameter model is in the sweet spot.</p>



<p>Note that if you&#8217;re fine with a smaller pre-trained language mode, you could try Stability AI&#8217;s LM.  These are the folks who brought you Stable Diffusion.  They promise to eventually release, for any use including commercial, language models up to 65B.  When available, the 15B model may be a good option.  For now, I&#8217;d like to stick with LLaMA because of some of its significant algorithmic improvements.</p>



<p>Although available,  even the 13 billion parameter model is not openly licensed.  As is frustratingly common, the model weights are licensed only for research, not for commercial purposes.  Meta invites commercial inquiry but, regrettably, based on my experience, is not eager to respond.  So, wanting to use LLaMA commercially, you may have to train your own model.  So, let’s talk budgets?</p>



<p>Training a 13 billion parameter model will cost about 20% of
training a 65-billion parameter version.&nbsp;
That will cost less than $200,000.&nbsp;
You might be able to cut that cost by half, maybe more, however.&nbsp; It’s a little dicey, but you can cut back on
the training data.&nbsp; Google’s excellent
results from Chinchilla teach us to balance model size with training.&nbsp; OK, but the truth is that you can get over
90% of a language model’s final performance with less than half a trillion
tokens of training data.&nbsp; </p>



<p>If you can afford it, you can avoid cutting back on the training data by taking your time.  Your language model will be pretty good in 30% of the time Meta took and you can just let it improve over time.  That is, start using the model and keep training it, replacing the one you’re using every once and a while.  This is viable even if you perform fine-tuning (and even reinforcement learning) because the relative costs for such tuning are quite small versus pre-training.  </p>



<figure class="wp-block-image"><img src="http://haleyai.com/wordpress/wp-content/uploads/GalacticaLossChart-1024x595.png" alt="" class="wp-image-1666" srcset="http://haleyai.com/wordpress/wp-content/uploads/GalacticaLossChart-1024x595.png 1024w, http://haleyai.com/wordpress/wp-content/uploads/GalacticaLossChart-300x174.png 300w, http://haleyai.com/wordpress/wp-content/uploads/GalacticaLossChart-768x447.png 768w, http://haleyai.com/wordpress/wp-content/uploads/GalacticaLossChart.png 1216w" sizes="(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px" /></figure>



<p>The bottom line here is that you can build your own 13
billion parameter LLaMA for less than $100,000.&nbsp;
If you’re going to do millions of transactions, you might not be able to
afford not to go in this direction!</p>



<p>LLaMA is essentially an improved version of Open AI’s
GPT.&nbsp; LLaMA benefits from various
algorithmic improvements since GPT-3 was released a couple years ago.&nbsp; Recently, Open AI introduce Instruct GPT,
which follows instructions and Chat GPT, which holds conversations.&nbsp; And GPT has advanced to version 4.</p>



<p>LLaMA is GPT without the instruction following or
conversational abilities.&nbsp; These are
easy, and inexpensive, to add, however.&nbsp;
Consider instruction following, for example.&nbsp; Researchers from Stanford generating tens of
thousands of simple instructions and results using GPT-4 and trained the 7
billion parameter version of LLaMA with them.&nbsp;
The dataset is relatively simple, and I thought weak, but was remarkably
effective.&nbsp; I was quite surprised how
well it follows instructions given only that simple, synthetic dataset. </p>



<p>On the other hand, it’s not all that surprising, given that
we have seen much transfer of learned representations in vision.&nbsp; The ease of improvement here is simply
because any decent generative language model will quickly adapt to using its
representation to new linguistic sequences, such as those involving
instructions.&nbsp; It doesn’t have to
construct much new representation to do so.</p>



<p>The resulting language model is dubbed Alpaca; a cute play on
words.&nbsp; Well, now there is <a href="https://vicuna.lmsys.org/">Vicuna</a>!&nbsp;
Vicuna takes the 13 billion parameter LLaMA to approach ever more
closely to Open AI’s state-of-the-art performance.&nbsp; According to the researchers from CMU,
Stanford, and the University of California at Berkley and San Diego. </p>



<p>Look them up.&nbsp; It’s
stunning how easily they compete with Google and approach Open AI.&nbsp; And the training cost to improve LLaMA to “within
10%” of GPT was less than $1,000.&nbsp; More
and more is happening on this front.&nbsp; For
more, see Microsoft’s <a href="https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat">DeepSpeed-Chat</a>
(which may seem odd given their investments in OpenAI, the company.).</p>
]]></content:encoded>
										</item>
		<item>
		<title>Imposing Our Constitution on AI</title>
		<link>http://haleyai.com/wordpress/2023/04/18/imposing-our-constitution-on-ai/</link>
				<pubDate>Wed, 19 Apr 2023 01:12:04 +0000</pubDate>
		<dc:creator><![CDATA[paul@haleyAI.com]]></dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[constitutional AI]]></category>

		<guid isPermaLink="false">http://haleyai.com/wordpress/?p=1660</guid>
				<description><![CDATA[In the face of calls for moratoria on AI research, it is clear that society wants to govern AI.&#160; In the United States, we are governed by The Constitution.&#160; It lays out founding principles and inalienable rights, since amended 27 times, including the Bill of Rights.&#160; It defines the structure of our government into 3 &#8230; <p class="link-more"><a href="http://haleyai.com/wordpress/2023/04/18/imposing-our-constitution-on-ai/" class="more-link">Continue reading<span class="screen-reader-text"> "Imposing Our Constitution on AI"</span></a></p>]]></description>
								<content:encoded><![CDATA[
<p>In the face of calls for moratoria on AI research, it is
clear that society wants to govern AI.&nbsp;
In the United States, we are governed by The Constitution.&nbsp; It lays out founding principles and
inalienable rights, since amended 27 times, including the Bill of Rights.&nbsp; It defines the structure of our government
into 3 branches and how they are governed by the people.&nbsp; Thus, we govern ourselves, democratically.&nbsp; How shall we govern AI?&nbsp; (Presumably, we don’t want to give AI the
vote!)</p>



<p>Recent advances in AI have shocked our social fabric.&nbsp; There is fear of economic upheaval, sudden
significant unemployment, or deep fakes and toxic bots swarming social
media.&nbsp; Fear is easy, and
understandable.&nbsp; Thank you, Darwin, God,
et al.&nbsp; Flight commonly follows.</p>



<p>The rational, educated, adult reaction to fear which is not
immediately threatening should be to think – without fear &#8211; before acting.&nbsp; Is the banning of AI by various social media
sites, school districts, universities, etc. rational?&nbsp; </p>



<p>We think “yes” in some cases, such as trying to avoid coding
bots from overwhelming Stack Overflow with misinformation.&nbsp; AI is not smart enough to provide reliable
answers.&nbsp; And don’t hold your breath,
please?&nbsp; The same is true for social
media, where deep fakes and misinformation sewn by Russian bots can be
overwhelming, but the more practical solution seems to be banning anonymity
more broadly.&nbsp; Once you know “who” a user
is, you can label or prohibit AI, but you can’t tell which is which otherwise.&nbsp; To put that another way, unless you validate
who a social media user is, a ban is nothing but preening.</p>



<p>We think “no” in education, generally speaking.&nbsp; In particular, trying to stop students from
using Chat GPT to help with homework or write assigned essays seems like a
fool’s errand.&nbsp; It’s like banning
calculators.&nbsp; First, it’s
impractical.&nbsp; The student will use the
calculator (or AI) at home and teachers will only find out when in-class
behavior differs.&nbsp; Second, the days of
Luddites are past.&nbsp; Today, calculators do
much more than arithmetic and their use is permitted even in standardized
testing, such as the SATs.&nbsp; Beyond
calculators, we don’t have penmanship or typing classes anymore.&nbsp; The same will happen with AI in
education.&nbsp; It will fundamentally change
what and how we teach and why it matters.&nbsp;
It may take as long as it did for calculators to be accepted, for
printing to dominate cursive, or for typewriters to disappear, but it’s
inevitable.</p>



<p>Various proposals for a moratorium on AI development have
been proposed.&nbsp; For the most part these
are also fearful reactions.&nbsp; They seem
more rational, however, because they propose less action.&nbsp; They seek to freeze a moment in time to grasp
time to think.&nbsp; We think that’s
counter-productive, but it’s not unreasonable for society to make such a
decision.&nbsp; But it can’t work for
long.&nbsp; And it will be costly.</p>



<p>We think the rational path is to lay out a constitution
governing AI.&nbsp; Just as the Founding
Fathers did not put society on hold while they debated and drafted the
constitution in Independence Hall, we prefer not to impose a moratorium on
those who are making a living advancing and applying AI towards staggering improvements
in world-wide economics, education, healthcare, and more.</p>



<p>Our corner of the AI community has adopted the notion of “constitutional AI” promoted by Anthropic.  This notion of constitutional AI requires a provider of AI to define the constitution by which it will abide.  The dominant aspects of such constitutions today are that it will be helpful, harmless, and honest.  However the AI provider defines the constitution, they represent to its users that it will substantially conform to its constitution and that the provider is committed to improving such conformance and ameliorating any non-conformance.</p>



<p>That’s all well and good, and we could go into the details of
the typical constitution at length, but it’s not good enough for society.&nbsp; Society needs all such constitutions to be
governed by a constitution that we, the people, accept.&nbsp; This shall be the constitution for AI.</p>



<p>The debate by which this constitution shall be defined and
ratified belongs, as all matters of law, in the bodies of our legislatures,
influenced as always by the public (and, of course, special interests).&nbsp; It should, as with all matters of
legislation, be out in the open and deliberate enough to allow for public
discussion and protest before enactment, as may be appropriate.</p>



<p>For decades, experts foretold superhuman AI as a decade
away.&nbsp; Today, “the singularity”, where AI
becomes completely general, capable of superhuman performance in all regards,
is again, supposedly, a decade or so away.&nbsp;
Don’t believe the prognosticators.&nbsp;
Past performance is indicative of future returns. </p>



<p>Today, our legislators have little understanding of AI.&nbsp; They, like most of us, can imagine what AI is
and will become capable of.&nbsp; None of us
can distinguish fantasy from reality when projecting AI forward.</p>



<p>What matters is that quick legislation governing AI inside
the six-month moratorium some are calling for will be more fearful than
thoughtful.&nbsp; If so, it will not allow
time for many of us, especially our legislators, and eventually regulators, to
learn the technical capabilities and limitations of the technology well enough
to govern it wisely.</p>



<p>How and whether other nations govern AI is not within our control, however.  And is a perhaps the only rational argument needed to defeat calls for any moratorium.  The reasons are somewhat obvious.  In any case, given history, international consensus is ever evasive, implying that any moratorium will be either ineffective or perpetual.  The former is catastrophic.  The latter is impossible.</p>
]]></content:encoded>
										</item>
		<item>
		<title>Will Oligarchs Own the Future of AI?</title>
		<link>http://haleyai.com/wordpress/2023/04/18/will-oligarchs-own-the-future-of-ai/</link>
				<pubDate>Wed, 19 Apr 2023 00:54:43 +0000</pubDate>
		<dc:creator><![CDATA[paul@haleyAI.com]]></dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Antropic]]></category>
		<category><![CDATA[constitutional AI]]></category>
		<category><![CDATA[language model]]></category>
		<category><![CDATA[LLaMA]]></category>
		<category><![CDATA[Open AI]]></category>
		<category><![CDATA[OpenAI]]></category>
		<category><![CDATA[Pyton]]></category>

		<guid isPermaLink="false">http://haleyai.com/wordpress/?p=1643</guid>
				<description><![CDATA[Here we go again.  We are set back a bit this morning (two weeks ago now) with a recent Tech Crunch article about Anthropic, perhaps the most inspiring company touting safe AI.  They seek to raise billions to compete with not so Open AI.  Before commenting on the article, how about a little context? Open &#8230; <p class="link-more"><a href="http://haleyai.com/wordpress/2023/04/18/will-oligarchs-own-the-future-of-ai/" class="more-link">Continue reading<span class="screen-reader-text"> "Will Oligarchs Own the Future of AI?"</span></a></p>]]></description>
								<content:encoded><![CDATA[
<p>Here we go again.  We are set back a bit this morning (two weeks ago now) with <a href="https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-to-take-on-openai/">a recent Tech Crunch article about Anthropic</a>, perhaps the most inspiring company touting safe AI.  They seek to raise billions to compete with not so Open AI.  Before commenting on the article, how about a little context?</p>



<p>Open AI, the company, was founded in 2015 precisely to, <a href="https://en.wikipedia.org/wiki/OpenAI">as stated on Wikipedia today</a>, &#8220;freely
collaborate&#8221; [&#8230;and make] its patents and research open to the public.&nbsp; Things began to change in 2019 as Open AI
transitioned to a for-profit corporation.&nbsp;
Ultimately, this year, Open AI has written that they will no longer
“share [any] details for commercial and other reasons”.</p>



<p>Last year, in <a href="https://greylock.com/greymatter/sam-altman-ai-for-the-next-era/">a
podcast with Sam Altman and Reid Hoffman</a>, the CEO of Open AI suggested that
only a handful of companies could provide the foundational AI models on which
everyone else will build “the middle layer”.&nbsp;
This suggests that foundational AI will simply become part of “Big
Tech”, raising familiar questions of <a href="https://www.simonandschuster.com/books/Who-Owns-the-Future/Jaron-Lanier/9781451654974">who
owns the future</a>.</p>



<p>Open AI has been stodgy with being open since it initially refused to share the GPT-2 model in 2019, eventually doing so, arguably due to pressure from the AI community, in particular.  Open AI has not shared any subsequent model. </p>



<p>Open AI (little ‘o’; not the company) does not mean
commercial AI becomes impractical.&nbsp; The
intent of open AI is to keep fundamentals of nature, like math, electricity,
and fire (including nuclear power), from becoming private property at the
expense of society.&nbsp; Making a living
harnessing them and applying them innovatively should remain fair game.</p>



<p>Our entire tech industry is built on shared intellectual
property, most notably the open-source software movement.&nbsp; Without open-source, much of modern life
would be stuck decades in the past.&nbsp; All
the progress in machine learning over the last few decades would have been
impractical without open-source operating systems and programming languages,
such as Linux and Python, in particular.</p>



<p>AI models are a little different.&nbsp; They have two critical parts.&nbsp; One is the source code that implements
them.&nbsp; Typically, this is Python code
which runs on tens to thousands of GPUs, which are massively parallel matrix
manipulating machines.&nbsp; Essentially,
given data, the algorithms written in Python adjust the matrices until the
error in predicting things about the training data is minimized (or nearly so).</p>



<p>This second part, the resulting contents of the matrices
after training (a.k.a., the model weights), is where the controversy of open AI
started with GPT-2.&nbsp; Through GPT-3, Open
AI was quite good about publishing details of the algorithms used in its
models.&nbsp; The AI community easily
replicated such models, with various modifications.&nbsp; Fair enough, that’s one degree of open AI.&nbsp; Many believe it’s not enough.</p>



<p>Better is the general, open-source attitude among AI
researchers, including many with commercial affiliation, and especially the
Hugging Face community sponsored by Meta.&nbsp;
But having the source code of a model is not “democratic” enough.&nbsp; Wherein democratic, means practically
available to anyone and everyone.&nbsp;
Practically available to everyone requires both the source code of a
model and the weights resulting from its training to be available.</p>



<p>Just a few details on the models and their weights.&nbsp; The transformer architecture has been refined
significantly but remains basically unchanged over the last 5 years.&nbsp; The source code for producing the state of
the art is widely available and gradually evolving as techniques improve.&nbsp; In order to produce a state-of-the-art model,
massive amounts of data are needed.&nbsp;
Whether training a language model from text or a multi-modal model with
text, images, etc., we have enough readily available data to approach the
state-of-the-art results democratically.&nbsp;
Where it becomes less democratic is the cost of computing the model
weights given the training data. </p>



<p>The amount of computation required to train is model is
(naively) proportional to the number of training iterations times the size of
the model.&nbsp; For the most part, the amount
of computation is proportional to the number of parameters in the model.&nbsp; The weights are simply the values of those
parameters after optimizing the model by training it with the data.</p>



<p>Table stakes for a good language model, which generally
require over 10 billion parameters, is 10<sup>23</sup> floating point
operations.&nbsp; For example, Google’s
Chinchilla proved a 70B parameter model superior in many regards to models many
times its size.&nbsp; Meta’s more recent LLaMA
benefits from additional improvements.&nbsp; A
Chinchilla-scale model was trained on 2,048 A100 GPUs on over 1 trillion tokens
of text in 21 days.&nbsp; At arms-length,
on-demand processing last year, this would cost roughly $1 million.</p>



<p>Commercial Open AI would have us believe that this is just
the tip of an iceberg.&nbsp; That $1 million
today will be $1 billion tomorrow.&nbsp; Open
AI would have us believe that we can’t afford to keep up as they build models
10 to 100 times larger.&nbsp; Well, the jury
is out.&nbsp; There have already been models
with 3 to 10 times as many parameters as GPT-3 which have fizzled quickly.&nbsp; But the intent is clear.</p>



<p>Unfortunately, according to the article, inspiring Anthropic now aspires to be one of the oligarchs of AI.  The article states that Anthropic’s investor pitch deck claims, “We believe that companies that train the best 2025/26 models will be too far ahead for anyone to catch up in subsequent cycles.”  It goes on to assert that AI will automate large swaths of the economy in very few years.  Such hyperbole may further inflame unfortunate calls for a moratorium on AI.</p>



<p>We like Anthropic’s approach to Constitutional AI and are big
fans of continuous, self-supervised learning, as well as reinforcement learning
given human feedback.&nbsp; These aspects have
materially advanced the safety and the instruction-following and conversational
abilities of language models recently.&nbsp;
But in doing so, they require an order of magnitude less compute than
the brute force pre-training discussed above.</p>



<p>Anthropic thinks building a model with “tens of thousands” of
GPUs will produce magic.&nbsp; We’ll see.&nbsp; There are a few stubborn facts in the
way.&nbsp; One problem is that increasing the size
of a model and the amount of training data <a href="https://arxiv.org/abs/2203.15556">must be balanced</a>.&nbsp; Loosely speaking, one cannot just double the
number of parameters without doubling the amount of training data.&nbsp; One problem with doubling the amount of
training data is that <a href="https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications">we
are running out of data</a>.&nbsp; Another is
that we are already approaching the asymptotes that we can get from numbers of
parameters and amounts of training data.&nbsp;
</p>



<p>The basics of the learning curves for models of more than a
few billion parameters is that the inflection point of diminishing returns is
passed quickly, somewhere between 10 and 100 billion tokens of training
data.&nbsp; After “just” a few 100 billion
training tokens, a model with 30 to 120 billion parameters begins to look
asymptotically close to “fitting” the training data.&nbsp; And the 30 billion parameter model fits the
data over 95% as well as the model 4 times its size.</p>



<p>Whether or not size matters, other innovations are coming
into focus now that we have sufficient scale.&nbsp;
Hopefully, we can avoid Big Tech, including Open AI and Anthropic,
owning our future by more openly sharing models, including their weights.&nbsp; If not, we can expect the innovations and
advances in AI to slow as proprietary interests slow the exchange and
experimentation that has produced staggering advances in the last decade.&nbsp; Either way, if the limits of scale alone are
indeed near, it’s not the end of the world.</p>



<p></p>
]]></content:encoded>
										</item>
		<item>
		<title>Democracy and AI</title>
		<link>http://haleyai.com/wordpress/2023/04/18/democracy-and-ai/</link>
				<pubDate>Wed, 19 Apr 2023 00:49:28 +0000</pubDate>
		<dc:creator><![CDATA[paul@haleyAI.com]]></dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Alpaca]]></category>
		<category><![CDATA[Anthropic]]></category>
		<category><![CDATA[ChatGPT]]></category>
		<category><![CDATA[constitutional AI]]></category>
		<category><![CDATA[Galactica]]></category>
		<category><![CDATA[GPT-4]]></category>
		<category><![CDATA[InstructGPT]]></category>
		<category><![CDATA[LLaMA]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[Open AI]]></category>
		<category><![CDATA[PaLM]]></category>

		<guid isPermaLink="false">http://haleyai.com/wordpress/?p=1646</guid>
				<description><![CDATA[This is an earlier version of what became &#8216;Truly open AI: Meta’s LLaMA offers open-source foundation models&#8216; published at Merlyn Mind. It was drafted before subsequent developments which I aim to address shortly. I am solely responsible for the opinions expressed herein. The ability of large language models has made huge leaps with the launch &#8230; <p class="link-more"><a href="http://haleyai.com/wordpress/2023/04/18/democracy-and-ai/" class="more-link">Continue reading<span class="screen-reader-text"> "Democracy and AI"</span></a></p>]]></description>
								<content:encoded><![CDATA[
<p>This is an earlier version of what became &#8216;<a href="https://www.merlyn.org/blog/truly-open-ai-metas-llama-offers-open-sourced-foundation-models">Truly open AI: Meta’s LLaMA offers open-source foundation models</a>&#8216; published at <a href="https://www.merlyn.org">Merlyn Mind</a>.  It was drafted before subsequent developments which I aim to address shortly.   I am solely responsible for the opinions expressed herein.</p>



<p>The ability of large language models has made huge leaps with the launch of ChatGPT in late 2022 and now GPT-4, both from OpenAI (and, effectively, Microsoft). Unfortunately, the GPT family of language models is being held increasingly close to the chest. From GPT-3 on, the models are simply not available other than as hosted services.  </p>



<p>Much has
been written suggesting that AI will become concentrated in a few Big Tech firms
because language modeling at scale has become prohibitively expensive. In
particular, we have heard that democratic AI – whereby the state of the art is
truly open and available to all – is impractical given the amount of compute
needed to reach or surpass language models such as GPT-3, Google’s PaLM, and
others.</p>



<p>Papers on
GPT-3 have gone into some detail about the proprietary models, such as the
number of layers and attention heads, as well as model width. This has allowed the
AI community to glean some insights and compare the performance of other models
against GPT. There are many papers comparing smaller models such as DeepMind’s Chinchilla
against GPT-3, and it is not uncommon for the smaller models to outperform their
larger sibling.</p>



<p>In mid-March, OpenAI published a <a href="https://arxiv.org/abs/2303.08774">paper describing GPT-4</a>, but it gives few details of the model architecture. For example, the number of parameters is not even disclosed. The authors attribute GPT-4’s improvement on exam taking over GPT-3.5 to pre-training methodology, but they explicitly state that no details will be shared for competitive and other reasons. We simply don’t know whether size matters as much in GPT-4 as it did previously. </p>



<p><strong>The
promise of open source</strong></p>



<p>Well, let’s step back to Meta’s late 2022 release of Galactica, a model 2/3 the size of GPT-3 that is trained not on arbitrary internet content but on scientific literature and data. As soon as it was made available, however, it was harshly criticized, even though it is superior to GPT in many regards. The criticism was mostly regarding its “toxicity,” which is a reasonable concern (though it would be hard to argue that it was more toxic than GPT has been). </p>



<p>Galactica
was promising. The model could be obtained from Meta, unlike those of OpenAI,
and it could be deployed on readily available and affordable hardware. And,
again, it was superior to GPT-3 in various regards. </p>



<p>Well, Meta
has upped the ante significantly with the release of <a href="https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/">LLaMA: Open and Efficient Language
Models</a>.</p>



<p>LLaMA models
range up to 1/3 the size of GPT-3. They differ by leveraging architectural
improvements from the many works of Google, DeepMind, Meta, and others. This
allows LLaMA to be trained more efficiently and to perform better given
whatever training budget.&nbsp; The largest
LLaMA model competes handily with models three time its size (GPT-3) and eight
times its size (PaLM).</p>



<p>Here at
Merlyn Mind, we’ve given LLaMA a go, and it’s impressive. It’s truly open AI.&nbsp; </p>



<p>But wait!
There’s more …</p>



<p>InstructGPT
and ChatGPT are much better at following instructions and chatting than
language models that have not been fine-tuned to follow instructions or chat.
There is a lot going on to train non-GPT models with such capabilities, but one
effort, in particular warrants kudos. </p>



<p>First, let’s
given Anthropic an honorable mention for its work on <a href="https://www.anthropic.com/index/measuring-progress-on-scalable-oversight-for-large-language-models">Constitutional AI</a> and related data sets that will
enable the community to address toxicity and harm in language models.&nbsp; </p>



<p>Now …</p>



<p>Stanford’s
work on <a href="https://crfm.stanford.edu/2023/03/13/alpaca.html">Alpaca: a Strong, Replicable,
Instruction-Following Model</a> is quite fun and helpful. The Stanford team took a small LLaMA model and
taught it to follow instructions. The fun part is that they prompted ChatGPT to
give it instructions! The result is Alpaca, a fine-tuned LLaMA. It’s worth your
time to check it out!</p>
]]></content:encoded>
										</item>
		<item>
		<title>Here we go again&#8230;</title>
		<link>http://haleyai.com/wordpress/2023/04/18/here-we-go-again/</link>
				<pubDate>Tue, 18 Apr 2023 20:03:53 +0000</pubDate>
		<dc:creator><![CDATA[paul@haleyAI.com]]></dc:creator>
		
		<guid isPermaLink="false">http://haleyai.com/wordpress/?p=1641</guid>
				<description><![CDATA[It has been over 4 years since I joined Merlyn Mind to apply AI in Education in the summer of 2018. It has been fabulous working with great folks around the world and the founding and extended team, many of whom came from IBM Research with a bunch of folks formerly working with Watson, especially &#8230; <p class="link-more"><a href="http://haleyai.com/wordpress/2023/04/18/here-we-go-again/" class="more-link">Continue reading<span class="screen-reader-text"> "Here we go again&#8230;"</span></a></p>]]></description>
								<content:encoded><![CDATA[
<p>It has been over 4 years since I joined <a href="https://www.merlyn.org">Merlyn Mind</a> to apply AI in Education in the summer of 2018.  It has been fabulous working with great folks around the world and the founding and extended team, many of whom came from IBM Research with a bunch of folks formerly working with <a href="https://www.ibm.com/watson">Watson</a>, especially in applying <a href="https://medium.com/swlh/technology-for-education-15fd7c536d28">Watson to education</a>.  </p>



<p>Merlyn has grown tremendously thanks to the founding team and <a href="https://www.learn.vc/">Learn Capital</a>, in particular.  The corporate culture is absolutely fantastic!  I am honored to have been designated as a Distinguished Engineer, which I had never heard of before.  Folks from IBM assure me it&#8217;s a good thing&#8230; In a recent award, the company added another honor while calling me &#8220;agent provocateur extraordinary&#8221;, perhaps subtly provoking as well as complementing me!  As a Distinguished Scientist colleague says, it&#8217;s &#8220;fair&#8221;.</p>



<p>That&#8217;s all for this note.  I plan to add some new posts which will reflect on the advances in AI over the last several years, especially as they pertain to some of my favorite matters, such as natural language.  Of course, there will be much regarding machine learning and various deep learning technologies, including multi-modal language models.</p>
]]></content:encoded>
										</item>
		<item>
		<title>CPC and the Grandmother Neuron</title>
		<link>http://haleyai.com/wordpress/2019/09/02/cpc-and-the-grandmother-neuron/</link>
				<pubDate>Mon, 02 Sep 2019 17:47:04 +0000</pubDate>
		<dc:creator><![CDATA[paul@haleyAI.com]]></dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[acoustic modeling]]></category>
		<category><![CDATA[connectionist temporal classification]]></category>
		<category><![CDATA[contrastive predictive coding]]></category>
		<category><![CDATA[CPC]]></category>
		<category><![CDATA[CTC]]></category>
		<category><![CDATA[DeepMind]]></category>
		<category><![CDATA[DeepSpeech]]></category>
		<category><![CDATA[embeddings]]></category>
		<category><![CDATA[Facebook AI Research]]></category>
		<category><![CDATA[FAIR]]></category>
		<category><![CDATA[grandmother cell]]></category>
		<category><![CDATA[knowledge representation]]></category>
		<category><![CDATA[unsupervised learning]]></category>
		<category><![CDATA[wav2vec]]></category>

		<guid isPermaLink="false">http://haleyai.com/wordpress/?p=1625</guid>
				<description><![CDATA[A lot of recent work has advanced the learning of increasingly context-sensitive distributed representations (i.e., so-called &#8217;embeddings&#8217;). In particular. DeepMind&#8217;s paper on &#8220;Contrastive Predictive Coding&#8221; (CPC) is particularly interesting and advances on a number of fronts. For example, in wav2vec, Facebook AI Research (FAIR) uses CPC to obtain apparently superior acoustic modeling results to DeepSpeech&#8217;s &#8230; <p class="link-more"><a href="http://haleyai.com/wordpress/2019/09/02/cpc-and-the-grandmother-neuron/" class="more-link">Continue reading<span class="screen-reader-text"> "CPC and the Grandmother Neuron"</span></a></p>]]></description>
								<content:encoded><![CDATA[
<p>A lot of recent work has advanced the learning of increasingly context-sensitive distributed representations (i.e., so-called &#8217;embeddings&#8217;).  In particular.  DeepMind&#8217;s <a rel="noreferrer noopener" aria-label="paper  (opens in a new tab)" href="https://arxiv.org/abs/1807.03748" target="_blank">paper </a>on &#8220;Contrastive Predictive Coding&#8221; (CPC) is particularly interesting and advances on a number of fronts.  For example,   in  <a rel="noreferrer noopener" aria-label="wav2vec (opens in a new tab)" href="https://arxiv.org/abs/1904.05862" target="_blank">wav2vec</a>, Facebook AI Research (FAIR) uses CPC to obtain apparently superior acoustic modeling results  to DeepSpeech&#8217;s  connectionist temporal classification (CTC) approach.  In the CPC paper, the following image is particularly striking, harkening back to the early notion of a <a rel="noreferrer noopener" aria-label="Grandmother Cell (opens in a new tab)" href="https://en.wikipedia.org/wiki/Grandmother_cell" target="_blank">Grandmother Cell</a>.</p>



<figure class="wp-block-image"><img src="http://haleyai.com/wordpress/wp-content/uploads/Contrastive-Predictive-Coding-grandmother-neurons-1024x649.png" alt="" class="wp-image-1626" srcset="http://haleyai.com/wordpress/wp-content/uploads/Contrastive-Predictive-Coding-grandmother-neurons-1024x649.png 1024w, http://haleyai.com/wordpress/wp-content/uploads/Contrastive-Predictive-Coding-grandmother-neurons-300x190.png 300w, http://haleyai.com/wordpress/wp-content/uploads/Contrastive-Predictive-Coding-grandmother-neurons-768x487.png 768w" sizes="(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px" /><figcaption>grandmother cells resulting from CPC<br /></figcaption></figure>
]]></content:encoded>
										</item>
		<item>
		<title>Neural Logic Machines</title>
		<link>http://haleyai.com/wordpress/2019/06/28/neural-logic-machines/</link>
				<pubDate>Fri, 28 Jun 2019 16:28:31 +0000</pubDate>
		<dc:creator><![CDATA[paul@haleyAI.com]]></dc:creator>
				<category><![CDATA[Formal Logic]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Question Answering]]></category>
		<category><![CDATA[first order logic]]></category>
		<category><![CDATA[FOL]]></category>
		<category><![CDATA[inductive logic programming (ILP)]]></category>
		<category><![CDATA[knowledge graph]]></category>
		<category><![CDATA[MemNN]]></category>
		<category><![CDATA[memory network]]></category>
		<category><![CDATA[neural logic machine]]></category>
		<category><![CDATA[neural turing machine]]></category>
		<category><![CDATA[nlm]]></category>
		<category><![CDATA[NTM]]></category>
		<category><![CDATA[relational reasoning]]></category>

		<guid isPermaLink="false">http://haleyai.com/wordpress/?p=1618</guid>
				<description><![CDATA[This is an important paper in the development of neural reasoning capabilities which should reduce the brittleness of purely symbolic approaches:  Neural Logic Machine The potential reasoning capabilities, such as with regard to multi-step inference, as in problem solving and theorem proving, are most interesting, but there are important contemporary applications in machine learning and &#8230; <p class="link-more"><a href="http://haleyai.com/wordpress/2019/06/28/neural-logic-machines/" class="more-link">Continue reading<span class="screen-reader-text"> "Neural Logic Machines"</span></a></p>]]></description>
								<content:encoded><![CDATA[<p>This is an important paper in the development of neural reasoning capabilities which should reduce the brittleness of purely symbolic approaches:  <a href="https://scholar.google.com/scholar?lr&amp;ie=UTF-8&amp;oe=UTF-8&amp;q=Neural+Logic+Machines+Dong+Mao+Lin+Wang+Li+Zhou">Neural Logic Machine</a></p>
<p>The potential reasoning capabilities, such as with regard to multi-step inference, as in problem solving and theorem proving, are most interesting, but there are important contemporary applications in machine learning and question answering.  I&#8217;ll just provide a few hightlights from the paper on the latter and some more points and references on the former below.</p>
<p><span id="more-1618"></span>The paper relates  to inductive logic programming (ILP), making it more practical learn rules at meaningful scale:</p>
<p style="padding-left: 30px;"><span class="fontstyle0">Inductive logic programming (ILP) (Muggleton, 1991; 1996; Friedman et al., 1999) is a paradigm for learning logic rules derived from a limited set of rule templates from examples. Being a powerful way of reasoning over discrete symbols, it is successfully applied to various language-related problems, and has been integrated into modern learning frameworks (Kersting et al., 2000; Richardson &amp; Domingos, 2006; Kimmig et al., 2012). Recently, Evans &amp; Grefenstette (2018) introduces a differentiable implementation of ILP which works with connectionist models such as CNNs. Sharing a similar spirit, Rocktäschel &amp; Riedel (2017) introduces an end-to-end differentiable logic proving system for knowledge base (KB) reasoning. A major challenge of these approaches is to scale up to a large number of complex rules. </span></p>
<p>This should have significant practical effect on relational reasoning, which is prevalent in question answering (QA).</p>
<p style="padding-left: 30px;"><span class="fontstyle0">Our work is also related to symbolic relational reasoning, which has a wide application in processing discrete data structures such as knowledge graphs and social graphs (Zhu et al., 2014; Kipf &amp; Welling, 2017; Zeng et al., 2017; Yang et al., 2017). Most symbolic relational reasoning approaches (e.g., Yang et al., 2017; Rocktäschel &amp; Riedel, 2017) are developed for KB reasoning, in which the predicates on both sides of a rule is known in the KB. Otherwise, the complexity grows exponentially in the number of used rules for a conclusion, which is the case in the blocks world. Moreover, Yang et al. (2017) considers rues of the form </span><span class="fontstyle2">query</span><span class="fontstyle3">(</span><span class="fontstyle2">Y</span><span class="fontstyle4">; </span><span class="fontstyle2">X</span><span class="fontstyle3">) </span><span class="fontstyle2">R</span><span class="fontstyle5">n</span><span class="fontstyle3">(</span><span class="fontstyle2">Y</span><span class="fontstyle4">; </span><span class="fontstyle2">Z</span><span class="fontstyle5">n</span><span class="fontstyle3">) </span><span class="fontstyle6">^ · ·· ^ </span><span class="fontstyle2">R</span><span class="fontstyle5">1</span><span class="fontstyle3">(</span><span class="fontstyle2">Z</span><span class="fontstyle5">1</span><span class="fontstyle4">; </span><span class="fontstyle2">X</span><span class="fontstyle3">)</span><span class="fontstyle0">, which is not for general reasoning. </span></p>
<p>The most interesting aspect, however, is with regard to the neural turing machines of <span class="fontstyle0">Graves.</span></p>
<p style="padding-left: 30px;"><span class="fontstyle0">Neural Turing Machine (NTM) (Graves et al., 2014; 2016) enables general-purpose neural problem solving such as sorting by introducing an external memory that mimics the execution of Turing Machine. Neural program induction and synthesis (Neelakantan et al., 2016; Reed &amp; De Freitas, 2016; Kaiser &amp; Sutskever, 2016; Parisotto et al., 2017; Devlin et al., 2017; Bunel et al., 2018; Sun et al., 2018) are recently introduced to solve problems by synthesizing computer programs with neural augmentations. Some works tackle the issue of the systematical generalization by introducing extra supervision (Cai et al., 2017). In Chen et al. (2018), more complex programs such as language parsing are studied. However, the neural programming and program induction approaches are usually hard to optimize in an end-to-end manner, and often require strong supervisions (such as ground-truth programs).</span></p>
<p><a href="https://arxiv.org/pdf/1906.06805.pdf"> <span class="fontstyle0">Neural Theorem Provers Do Not Learn Rules Without Exploration</span></a> bolsters the assessment in the above paper with this from its discussion section:</p>
<p style="padding-left: 30px;"><span class="fontstyle0">Neural theorem proving is a promising combination of logical learning and neural network approaches. In this work, we evaluate the performance of the NTP algorithm on synthetic logical datasets with injected relationships. We show that NTP has difficulty recovering relationships in all but the simplest settings.</span></p>
<p>The paper hits all the right topics at the intersection of deep learning and reasoning, including:</p>
<ul>
<li><span class="fontstyle0">In this paper, we propose Neural Logic Machines (NLMs) to address the aforementioned challenges.  In a nutshell, NLMs offer a neural-symbolic architecture which realizes Horn clauses (Horn, 1951) in first-order logic (FOL). The key intuition behind NLMs is that logic operations such as logical </span><span class="fontstyle2">AND</span><span class="fontstyle0">s and </span><span class="fontstyle2">OR</span><span class="fontstyle0">s can be efficiently approximated by neural networks, and the wiring among neural modules can realize the logic quantifiers.</span></li>
<li><span class="fontstyle0">The NLM is a neural realization of logic machines (under the Closed-World Assumption</span><span class="fontstyle0">3</span><span class="fontstyle0">). Given a set of base predicates, grounded on a set of objects (the </span><span class="fontstyle2">premises</span><span class="fontstyle0">), NLMs sequentially apply first-order rules to draw </span><span class="fontstyle2">conclusions</span></li>
<li><span class="fontstyle0">Our goal is to build a neural architecture to learn rules that are both lifted and able to handle relational data with multiple arities. We present different modules of our neural operators by making analogies to a set of essential </span><span class="fontstyle2">meta-rules </span><span class="fontstyle0">in symbolic logic systems. Specifically, we discuss our neural implementation of (1) </span><span class="fontstyle2">boolean logic rules</span><span class="fontstyle0">, as lifted rules containing boolean operations (</span><span class="fontstyle3">AND</span><span class="fontstyle0">, </span><span class="fontstyle3">OR</span><span class="fontstyle0">, </span><span class="fontstyle3">NOT</span><span class="fontstyle0">) over a set of predicates; and (2) </span><span class="fontstyle2">quantifications</span><span class="fontstyle0">, which bridge predicates with different arities by logic quantifiers (</span><span class="fontstyle4">8 </span><span class="fontstyle0">and </span><span class="fontstyle4">9</span><span class="fontstyle0">).</span></li>
</ul>
<p>There is an interesting comparison with <a href="https://arxiv.org/pdf/1502.05698.pdf">Memory Networks (2015)</a> in the paper, but it&#8217;s not clear whether all the 2015 improvements from <a href="https://arxiv.org/abs/1410.3916">the 2014 work</a> were employed in the comparison (I assume so given the reference to the former).  Memory Networks are another step in the right direction.</p>
<ul>
<li><span class="fontstyle0">MemNNs (Weston et al., 2014) are a recently proposed class of models that have been shown to perform well at QA. They work by a “controller” neural network performing inference over the stored memories that consist of the previous statements in the story. The original proposed model performs </span><span class="fontstyle2">2 hops </span><span class="fontstyle0">of inference</span></li>
</ul>
<p>Such memory augmented networks are broadly applicable and many clever designs are likely to emerge.  For example, see <a href="https://arxiv.org/abs/1803.03067">compositional attention networks for machine reasoning</a>.</p>
<p>&nbsp;</p>
]]></content:encoded>
										</item>
		<item>
		<title>Entailment-driven Extracting and Editing for Conversational Machine Reading</title>
		<link>http://haleyai.com/wordpress/2019/06/20/entailment-driven-extracting-and-editing-for-conversational-machine-reading/</link>
				<pubDate>Thu, 20 Jun 2019 14:07:53 +0000</pubDate>
		<dc:creator><![CDATA[paul@haleyAI.com]]></dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language]]></category>
		<category><![CDATA[Question Answering]]></category>
		<category><![CDATA[BERT]]></category>
		<category><![CDATA[CMR]]></category>
		<category><![CDATA[conversational machine reading]]></category>
		<category><![CDATA[deep learning]]></category>
		<category><![CDATA[fact extraction]]></category>
		<category><![CDATA[knowledge acquisition]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[QA]]></category>
		<category><![CDATA[RTE]]></category>
		<category><![CDATA[ShARC]]></category>
		<category><![CDATA[textual entailment]]></category>
		<category><![CDATA[Zettlemoyer]]></category>

		<guid isPermaLink="false">http://haleyai.com/wordpress/?p=1613</guid>
				<description><![CDATA[When I wrote Are Vitamins Subject to Sales Tax, I was addressing the process of translating knowledge expressed in formal documents, like laws, regulations, and contracts, into logic suitable for inference using the Linguist. Recently, one of my favorite researchers working in natural language processing and reasoning, Luke Zettlemoyer, is among the authors of Entailment-driven &#8230; <p class="link-more"><a href="http://haleyai.com/wordpress/2019/06/20/entailment-driven-extracting-and-editing-for-conversational-machine-reading/" class="more-link">Continue reading<span class="screen-reader-text"> "Entailment-driven Extracting and Editing for Conversational Machine Reading"</span></a></p>]]></description>
								<content:encoded><![CDATA[<p>When I wrote <a href="http://haleyai.com/wordpress/2018/02/26/are-vitamins-subject-to-sales-tax-in-california/">Are Vitamins Subject to Sales Tax</a>, I was addressing the process of translating knowledge expressed in formal documents, like laws, regulations, and contracts, into logic suitable for inference using the <a href="http://haleyai.com/wordpress/tag/linguist/">Linguist</a>.</p>
<p>Recently, one of my favorite researchers working in natural language processing and reasoning, Luke Zettlemoyer, is among the authors of <a href="https://arxiv.org/abs/1906.05373"><span class="fontstyle0">Entailment-driven Extracting and Editing for Conversational<br />
Machine Reading</span></a>.  This is a very nice turn towards knowledge extraction and inference that improves on superficial reasoning by textual entailment (RTE).</p>
<p>I recommend this paper, which relates to BERT, which is among my current favorites in deep learning for NL/QA.  Here is an image from the paper, FYI:</p>
<p><img class="alignnone wp-image-1614 " src="http://haleyai.com/wordpress/wp-content/uploads/CMR-E3-ShARC-300x239.png" alt="Entailment-driven Extracting and Editing for Conversational Machine Reading" width="355" height="283" srcset="http://haleyai.com/wordpress/wp-content/uploads/CMR-E3-ShARC-300x239.png 300w, http://haleyai.com/wordpress/wp-content/uploads/CMR-E3-ShARC-768x613.png 768w, http://haleyai.com/wordpress/wp-content/uploads/CMR-E3-ShARC-1024x817.png 1024w, http://haleyai.com/wordpress/wp-content/uploads/CMR-E3-ShARC.png 1330w" sizes="(max-width: 355px) 100vw, 355px" /></p>
]]></content:encoded>
										</item>
		<item>
		<title>Simon Says</title>
		<link>http://haleyai.com/wordpress/2018/10/27/simon-says/</link>
				<pubDate>Sat, 27 Oct 2018 15:12:23 +0000</pubDate>
		<dc:creator><![CDATA[paul@haleyAI.com]]></dc:creator>
				<category><![CDATA[Speech Recgonition]]></category>
		<category><![CDATA[AEC]]></category>
		<category><![CDATA[Alexa Prize]]></category>
		<category><![CDATA[Amazon Alexa]]></category>
		<category><![CDATA[Apple HomePod]]></category>
		<category><![CDATA[array microphone]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Baidu]]></category>
		<category><![CDATA[beam-forming]]></category>
		<category><![CDATA[chatbot]]></category>
		<category><![CDATA[Cortana]]></category>
		<category><![CDATA[Daniel Povey]]></category>
		<category><![CDATA[DeepSpeech]]></category>
		<category><![CDATA[echo cancellation]]></category>
		<category><![CDATA[far-field]]></category>
		<category><![CDATA[Google Assistant]]></category>
		<category><![CDATA[Google Home]]></category>
		<category><![CDATA[hidden Markov model]]></category>
		<category><![CDATA[HMM]]></category>
		<category><![CDATA[HTK]]></category>
		<category><![CDATA[intelligent agent]]></category>
		<category><![CDATA[KALDI]]></category>
		<category><![CDATA[KenLM]]></category>
		<category><![CDATA[language model]]></category>
		<category><![CDATA[Microsoft Bing]]></category>
		<category><![CDATA[natural langauge]]></category>
		<category><![CDATA[NLU]]></category>
		<category><![CDATA[speech recognition]]></category>
		<category><![CDATA[Sphinx]]></category>
		<category><![CDATA[TDNN]]></category>
		<category><![CDATA[WER]]></category>
		<category><![CDATA[word error rate]]></category>

		<guid isPermaLink="false">http://haleyai.com/wordpress/?p=1578</guid>
				<description><![CDATA[Some folks use the term &#8220;automatic speech recognition&#8221;, ASR.  I don&#8217;t like the separation between recognition and understanding, but that&#8217;s where the technology stands. The term ASR encourages thinking about spoken language at a technical level in which purely inductive techniques are used to generate text from an audio signal (which is hopefully some recorded &#8230; <p class="link-more"><a href="http://haleyai.com/wordpress/2018/10/27/simon-says/" class="more-link">Continue reading<span class="screen-reader-text"> "Simon Says"</span></a></p>]]></description>
								<content:encoded><![CDATA[<p>Some folks use the term &#8220;automatic speech recognition&#8221;, ASR.  I don&#8217;t like the separation between recognition and understanding, but that&#8217;s where the technology stands.</p>
<p>The term ASR encourages thinking about spoken language at a technical level in which purely inductive techniques are used to generate text from an audio signal (which is hopefully some recorded speech!).</p>
<p>As you may know, I am very interested in what many in ASR consider &#8220;downstream&#8221; natural language tasks.  Nonetheless, I&#8217;ve been involved with speech since Carnegie Mellon in the eighties.  During Haley Systems, I hired one of the Sphinx fellows who integrated Microsoft and IBM speech products with our natural language understanding software.  Now I&#8217;m working on spoken-language understanding again&#8230;</p>
<p>Most common approaches to ASR these days involve deep learning, such as Baidu&#8217;s <a href="https://github.com/mozilla/DeepSpeech" target="_blank" rel="noopener">DeepSpeech</a>.  If your notion of deep learning means lots of matrix algebra more than necessarily neural networks, then <a href="http://kaldi-asr.org/doc/about.html" target="_blank" rel="noopener">KALDI</a> is also in the running, but it dates to 2011.  KALDI is an evolution from the hidden Markov model toolkit, <a href="http://htk.eng.cam.ac.uk/" target="_blank" rel="noopener">HTK</a> (once owned by Microsoft).  Hidden Markov models (HMM) were the basis of most speech recognition systems dating back to the eighties or so, including <a href="https://cmusphinx.github.io/" target="_blank" rel="noopener">Sphinx</a>.  All of these are open source and freely licensed.</p>
<p>As everyone knows, ASR performance has improved dramatically in the last 10 years. The primary metric for ASR performance is &#8220;word error rate&#8221; (WER).  Most folks think of WER as the percentage of words incorrectly recognized, although it&#8217;s not that simple.  WER can be more than 1 (e.g., if you come up with a sentence given only noise!).  Here is a comparison published in 2011.</p>
<p><a href="https://www.researchgate.net/publication/314938892_Comparing_Speech_Recognition_Systems_Microsoft_API_Google_API_And_CMU_Sphinx" target="_blank" rel="noopener"><img class="alignnone wp-image-1579" src="http://haleyai.com/wordpress/wp-content/uploads/SphinxBingGoogle.png" alt="" width="480" height="265" srcset="http://haleyai.com/wordpress/wp-content/uploads/SphinxBingGoogle.png 895w, http://haleyai.com/wordpress/wp-content/uploads/SphinxBingGoogle-300x166.png 300w, http://haleyai.com/wordpress/wp-content/uploads/SphinxBingGoogle-768x425.png 768w" sizes="(max-width: 480px) 100vw, 480px" /></a></p>
<p>Today, Google, Amazon, Microsoft and others have WER under 10% in many cases. To get there, it takes some talent and thousands of hours of training data.  Google is best, Alexa is close, and Microsoft lags a bit in 3rd place.  (Click the graphic for the article summarizing Vocalize.io results.)</p>
<p><a href="https://voicebot.ai/2018/05/14/google-home-beats-amazon-echo-in-two-audio-recognition-performance-tests-but-alexa-delivers-highest-composite-score/" target="_blank" rel="noopener"><img class="alignnone size-full wp-image-1580" src="http://haleyai.com/wordpress/wp-content/uploads/VocalizeSummary.png" alt="" width="410" height="95" srcset="http://haleyai.com/wordpress/wp-content/uploads/VocalizeSummary.png 410w, http://haleyai.com/wordpress/wp-content/uploads/VocalizeSummary-300x70.png 300w" sizes="(max-width: 410px) 100vw, 410px" /></a></p>
<p><span id="more-1578"></span>Even more impressive is that Google and Amazon are doing this with far-field microphones.  Recognizing far-field speech is much harder than the clean speech that a near-field microphone, such as a high-quality headset microphone used in prior speech recognition benchmarking.</p>
<p>These numbers are better than Google and Amazon perform in regular usage tests, however.  For example, using Alexa, you can say &#8220;Simon Says&#8221;, and it will read back what it thinks it heard.  Try a couple 10 word sentences and you&#8217;ll probably conclude that Alexa has a WER of 10% or so (again, WER is not a percentage, but everyone treats it that way).   Google messes up roughly as much in my experience.  I use Cortana (actually, Microsoft Cognitive Services, formerly known as Bing Speech) more and find it generally acceptable, although, as shown above, lagging behind Amazon and Google.  I simply don&#8217;t know about Siri since I talk to array microphones of various sorts and at various distances, not into an iPhone mic or headset.</p>
<p>I suspect that the numbers above are unrealistically good from Amazon and Google because of the test cases.  An utterance like &#8220;what time is it&#8221;, is pretty short and easy to model with a statistical language model.   It also helps that you prime the system by saying, &#8220;Alexa&#8221; or &#8220;Hey, whats-your-name&#8221;.  More natural dialog without such &#8220;keyword spotting&#8221; is more challenging.    You&#8217;re more like to say &#8220;uh&#8221; in a natural dialog than after you say, &#8220;Hey Google&#8221;.  And, as you get into more natural utterances, they get longer and statistical language modelling does not help as much.</p>
<p style="padding-left: 30px;">Indeed, statistical language modeling can increase errors.  Language models are used in &#8220;decoding&#8221; the time-series of parameters estimated by an acoustic model which encodes the speech signal.    Most ASRs decode using n-gram language models.  Roughly speaking, if you have a language model that counts how many times each sequence of 4words is encountered in a corpus you can use it to guide or bias a best-first or beam-search towards a decoding that is &#8220;most consistent&#8221; with the corpus.  If 4 words uttered do not occur in the corpus, decoding needs to estimate that 4-gram&#8217;s consistency with the corpus.  The most common approach to doing so is by &#8220;backing off&#8221; to the consistency of 3 words and interpolating with respect to the 4th word.  There&#8217;s been a lot of work in this area over several decades, but it&#8217;s been pretty stable since <a href="https://github.com/kpu/kenlm" target="_blank" rel="noopener">KenLM</a>, which is used in many state of are deep learning systems.  The references of that paper lead to improved Knesser-Ney smoothing for the estimation problem.  It is clear to me that Google does better.  DeepSpeech uses KenLM and I observe the limitations of Knesser-Ney backoff leading to higher WER.  As this has become quite technical, I will spare you the details!</p>
<p>It takes a lot of data to get anywhere near the far-field performance of Google and Amazon (or the near-field performance of Google and Apple), as shown by DeepSpeech numbers (again, click on the image for the PDF).</p>
<p><a href="https://arxiv.org/abs/1512.02595v1" target="_blank" rel="noopener"><img class="alignnone size-full wp-image-1583" src="http://haleyai.com/wordpress/wp-content/uploads/DeepSpeech2TrainingData.png" alt="" width="424" height="142" srcset="http://haleyai.com/wordpress/wp-content/uploads/DeepSpeech2TrainingData.png 424w, http://haleyai.com/wordpress/wp-content/uploads/DeepSpeech2TrainingData-300x100.png 300w" sizes="(max-width: 424px) 100vw, 424px" /></a></p>
<p>Google has a large edge over Amazon on data, thanks to its Android market penetration and tenure.  Amazon had the lead with Echo, but Google has closed that gap this year.</p>
<p>All the rest of us working on ASR have data envy.  In  the table above, 80% of the data is Baidu&#8217;s!  It&#8217;s hard for anyone to compete with these big players in WER metrics without many times the data that&#8217;s publicly available.  Fortunately, we have the models made available by Baidu.  Unfortunately, anyone who leverages their model is also stuck with it!</p>
<p>Fortunately, there a folks like Daniel Povey.  He is behind KALDI (and an open spoken-language resources <a href="http://www.openslr.org" target="_blank" rel="noopener">web site</a>).  As you can see on &#8220;<a href="https://github.com/syhw/wer_are_we" target="_blank" rel="noopener">WER: where are we</a>&#8220;, his <a href="http://www.danielpovey.com/files/2018_interspeech_tdnnf.pdf" target="_blank" rel="noopener">2018 paper combining time-delay neural networks (TDNN) with KALDI</a> performs much better than DeepSpeech on the LibriSpeech test.  This was accomplished with publicly available data (roughly the same first 20% shown above as used by Baidu).  There are some caveats, however. As suggested above, its not just the acoustic model, but the language model impacts word error rate.  This paper uses a language model that raises some questions&#8230;</p>
<p>Still, there is hope for more open ASR rivaling the leaders in terms of WER.  The rest of the question is how well we&#8217;ll do at spoken language <span style="text-decoration: underline;">understanding</span>, which is enabled by low WER, just as natural language understanding (NLU) is enabled by better natural language <span style="text-decoration: underline;">processing</span> (NLP).  It&#8217;s great fun to be closing the gap between ever-improving machine learning and more intelligent understanding.   Hopefully, this will lead to more valuable intelligent agents than the chatbots that are most likely to claim the Alexa Prize (where by chatbot I mean an agent whose primary goal is merely to converse).</p>
]]></content:encoded>
										</item>
		<item>
		<title>Character by character sentiment</title>
		<link>http://haleyai.com/wordpress/2018/08/05/character-by-character-sentiment/</link>
				<pubDate>Sun, 05 Aug 2018 10:10:27 +0000</pubDate>
		<dc:creator><![CDATA[paul@haleyAI.com]]></dc:creator>
				<category><![CDATA[Natural Language]]></category>
		<category><![CDATA[language modelling]]></category>
		<category><![CDATA[lstm]]></category>
		<category><![CDATA[sentiment]]></category>

		<guid isPermaLink="false">http://haleyai.com/wordpress/?p=1549</guid>
				<description><![CDATA[This is a great page on language modeling with an awesome graphic and commentary on its learned &#8220;sentiment neuron&#8221;.]]></description>
								<content:encoded><![CDATA[<p><a href="https://blog.openai.com/unsupervised-sentiment-neuron/">This is a great page</a> on language modeling with an awesome graphic and commentary on its learned &#8220;sentiment neuron&#8221;.</p>
<p><img class="alignnone size-full" src="https://openai.com/content/images/2017/04/low_res_maybe_faster.gif" width="800" height="215" /></p>
]]></content:encoded>
										</item>
	</channel>
</rss>
