Paul Golding, Chief Technologist (AI)

Companies not Using AI Will Lose Sales

paulgolding — Sat, 25 Mar 2023 19:50:06 +0000

Mind Meld with the Customer

How would you like to mind-meld with the customer, as if you had perfect knowledge of all information pertinent to the sale?

With AI, this might just become a reality.

Thanks to large-scale computation, AI has now exceeded beyond-human performance in many economically valuable cognitive tasks, especially related to language through the use of so-called Large Language Models (LLMs).

Naturally, many business processes will be amenable to beyond-human automation, including sales.

In full disclosure, I credit inspiration to Irzana Golding for her article “Operationalizing Competitive Intelligence, including ChatGPT” in which she introduces the notion of a “SalesGPT” for competitive-intelligence questions.

She posits how a locally-informed LLM could bring critical competitive intelligence (CI) to the salesperson’s attention in a way that might clinch the deal.

But how?

LLMs are Not Just Language Models

It is often revealed via win-loss analysis that lost deals (some say up to “50%”) were winnable, but for some missing information.

It’s likely that the information was available, but unobserved. As Sherlock Holmes said: “Watson, you see, but you do not observe.”

In the realm of deductive reasoning, observation is akin to connecting the dots. For humans, the process is far from simple and fraught with mishaps and biases. Enter the LLM, a powerful tool that recognizes semantic patterns and can apply them to knowledge across disparate datasets.

Beyond being a mere sentence generator, an LLM possesses a prodigious memory of all the data it has encountered. With this wealth of information, an LLM can expertly connect disparate pieces of data and generate insights that might have been impossible for humans to discern.

This ability to identify patterns in data is especially useful for businesses seeking to gain a competitive edge. An LLM could identify connections between seemingly unrelated data sets and illuminate the potential impact of a competitor’s latest product feature on a company’s sales play.

Consider the following prompt provided to ChatGPT about going up against a fictitious company called Volanti:

Competitive Intelligence Provided to ChatGPT

Note the blue highlights are the contextual prompts that I entered manually for the demo. For an integrated solution (“SalesGPT”), this data would be retrieved dynamically (from a vector database like Pinecone) in response to the question. The aggregated data would be sourced from various CI data sources.

Now let’s pose the question: “How might we win the deal selling to a client interested in Volanti because of their Zero Trust offering?”

Response from ChatGPT about sales play informed by CI data

Note how the response is a synthesis–”connecting the dots”. Consider that the salesperson might not have noticed the CRM note by a colleague, but that’s OK because “SalesGPT” did.

The response is synthesized from a range of disparate CI sources, turning that data into something potentially cogent and salient, given the circumstances–i.e. how to deal with the competitor’s feature. Notice the statement:

“Highlight our roadmap: While we may not currently offer Zero Trust, we have a roadmap to introduce it in the future. It is important to communicate this to the client and emphasize that our implementation will be built from the ground up, like our flagship product, ensuring optimal integration and cost efficiency.

This is a classic sales play to recognize an objection and then turn it around to become an advantage–i.e. yes, we’ll be later, but better and lower cost — here’s why!

Maybe this contrived example is too crude and obvious, but you get the idea. And don’t overestimate the ability of a human sales person to do all this synthesis for themselves. There are plenty of places to go wrong: information blindness, distraction, cognitive biases, information overload etc.

Zero Sum Game or Secret Sauce?

From my recent experiments with a number of business use cases, the exciting feature of LLMs is how much latent power there is waiting to be utilized via prompt-engineering, data augmentation, fine-tuning and various architectural tricks. LLMs offer a new toolkit for conducting information retrieval experiments and then quickly putting them into production.

But what if everyone is using the same LLM and similar competitive data? Isn’t this a zero-sum game? And, what if the client is using the same technology, second-guessing the sales pitch? (This seems likely.)

The winner will be whoever uses LLMs creatively enough. The right approach could lead to unique in-house advantages, making LLM-hacking “recipes” the secret sauce for sales.

Moreover, with recent announcements, like Databrick’s work with Dolly, or Stanford’s work with Alpaca, there is every reason to believe that enterprises could build their own LLMs without requiring the gargantuan budgets used to build GPT-4.

There are so many possibilities, such as the use of LLMs to create tabular datasets from unstructured, semi-structured and structured data (e.g. the CRM, sales reports and various metrics tables) and feed those into powerful prediction frameworks like XGBoost or EconML to generate predictive and causal outlooks. Better still, these outlooks are blended back into the LLM generative toolchain to enable sales teams to ask speculative or causal questions:

Which of the three contacts in the client is most likely to have objections and what might they be?

Why did the customer to ask about our Composable API roadmap?

Well, this is just the tip of iceberg for enterprise applications of LLMs.

If you want to know more about how to do your own LLM-hacking for your use case, feel free to get in touch (or connect on LinkedIn).

The AI Paradigm: Scaling

paulgolding — Tue, 14 Mar 2023 19:47:43 +0000

The emergence of Large Language Models with their impressive beyond-human performance (in many benchmarks) gives us reason to ask: what is the strategic value of the underlying principles to the enterprise?

As an antidote to many AI books with faux-paradigms, I explore the at least one core paradigm of Large AI, which is the ability to address Complexity through scaling (Large AI models are Complex Systems).

Because enterprises are Complex systems, we should expect that they too are amenable to computation using Large AI in the same way human language (also a Complex system) has proven amenable.

I call these new models Large Enterprise Models (LEMs). Their emergence could have profound affect upon corporate strategy generally, but certainly IT strategy.

Introduction

Many AI solutions in the enterprise exist within a standard digital transformation paradigm: AI as an embedded technology choice addressing a localized use case (e.g. supply chain management or product recommendation).

Some biz leaders are trying to parse what the apparent inflection-point around Large AI (e.g. ChatGPT) means in a more strategic sense, if anything.

The motivation of this article is to provoke thought about an “AI-first” mindset that asks: What happens if we build our enterprise around Large AI?

My post will be short and non-technical, leaving technical justifications for a long-form elsewhere, most likely on my site.

First, Beware of “Pop” Paradigms

Inevitably, some biz leaders are seeking answers in AI books or magazines.

However, many follow the same glib pattern of introducing quirky metaphors (“Tipping Points”) sprinkled with uncritical marketing case studies. They sound convincing, only to yield little actionable substance.

Many case studies lacks critical details, such as why, exactly, AI was the best choice and what percentage of improvement was uniquely attributable to the AI. (You should always ask this question.)

Some books invent concepts – e.g. “Cognitive Enterprise”. It sounds convincing, but is marketing blurb that wouldn’t translate into a CIO, CTO or CMO world. (Read Richard Feynman’s advice on meaning versus marketing jargon.)

Analyst outputs are often no better, with uncritical claims like “AI high flyers” attributing 30% of revenues to AI.

Really? How, exactly?

Lacking counterfactuals and critical contextual details, we don’t know.

Many transformation improvements are not driven by a single technology, but rather come about via refactoring and modernization. Solutions might contain AI, hence successes will be attributed to AI versus refactoring.

Given the poor success rate of many transformations, the overlap of this with the oft-reported “80% of AI projects fail” might not be a coincidence. As Tony Saldanha argues well in “Why Digital Transformations Fail”, a common failure mode is focus on technology over value. Was AI-for-AI’s sake the culprit?

More often, the reason for failure is simple: the organization wasn’t ready to switch to an automated process. This happens when AI programs are developed in isolation of change management or suitable operating models, like product-centric IT.

AI-assisted Strategy?

Given the connection between strategy and innovation, one framing for exploring the application of AI is via the different types of innovation, per Geoffrey Moore, This is the common method, albeit perhaps via value-streams, not necessarily aligned with these innovation types:

Source: click on image

Each types will have a related strategies such as Marketing Strategy, Information Strategy, and so on, hence we can ask how AI might impact these strategies, however they are formulated according to the current operating model:

How might AI assist my strategy?

In reality, no such question is formally posed. In many of these cases, the choice of a solution will be based upon performance against an existing use case.

You pick the “AI solution” if it offers significant performance in relation to alternatives or if it unlocks an otherwise stubborn use case (e.g. something requiring NLP). Increasingly, the AI is embedded into a vendor solution, but sometimes a data science team might devise custom AI solutions.

These selections become strategic when part of a long-term rollout to achieve whatever the overriding strategy is.

Within this rubric, AI can become a core strategic enabler if it’s a common undergirding theme. Typically, this translates into data strategy e.g. to allow better access to joined-up datasets, data augmentation, secure data federation, and so on.

More general strategic imperatives might fall out of such considerations, such as the need to scale data, which cannot be done overnight (can take years) and so definitely requires strategy and might even affect partnerships or ecosystem strategies etc.

AI Paradigm?

More fundamentally, we need to ask whether or not AI presents a paradigm shift, like with the invention of the web browser did.

If so, what is the paradigm?

There is one problem that AI uniquely solves, with strong evidence from breakthroughs in Large Language Models (LLMs), like ChatGPT.

The unique paradigm is solving Complexity (with a “C”).

AI magic has mostly appeared thanks to one thing: scaling. The “magic models” are huge: 175B parameters for GPT-3, and climbing:

Scaling of parameters in large language models: click for source (Huggingface)

What does scaling do?

It allows a model to encompass complexity with its impenetrable web of interactions between many smaller components (e.g. words). AI makes complexity amenable to computation!

Note that scaling is a concept already embedded within complexity theory.

There is a maxim in complexity theory, that complexity can only be “met” with complexity. Human language is a complex dynamical system, which is why Large AI works.

LLMs are perhaps the first real evidence that a complex system (175B parameters) can indeed encompass a complex mechanism (language).

The magic levels of performance only became apparent once the models were scaled significantly (with no other changes). Smaller versions were relatively unimpressive.

Moreover, unexpected capabilities of LLMs continue to emerge merely by tinkering with the inputs (“prompt engineering” – apparently the next hot job).

This has led some thinkers to use the term emergence in relation to LLMs. Well, we should not be surprised because emergence is a key characteristic of a complex system.

Large Enterprise Models

What then, does scaling mean for the enterprise?

Two strategic opportunities arise:

Systems like LLMs can play a big role in enterprise management, given the undeniable claim that enterprises are also Complex systems (which is what drives innovation) and much of that complexity is presented in language-centric processes. LLMs make enterprise knowledge computable.
Scaling should also play a role directly in enterprise management – i.e. via novel models applied to the entire information state-space of the enterprise: Large Enterprise Models, or LEMs.

Both of these present significant strategic implications that few enterprises are prepared for, such as:

Making all data in the enterprise available to AI models. We are still in the early days of concepts like Data Mesh and Data Fabric, but these are a step in the right direction and will involve AI (e.g. via LLMs) for sense-making prior to use in larger models. How do we architect and manage this?
Do we have a single multi-modal Large Enterprise Model or a highly distributed ensemble of many weaker LEMs (plus other models)? What does that look like?
How do we operate with increasingly opaque models?
What is the impact upon organizational models?
What kind of skill sets are required to work in these hyper organizations?
Do we need new ways of thinking about partnerships–e.g. at the federated “model” level?
Do we need a pivot towards Systems Thinking? Where are the frontiers of “The System” in a highly connected marketplace and ecosystem?

All of the above is largely uncharted territory and nobody quite yet knows what kind of new strategy emerges from the paradigm. But the question “What is AI strategy” must recognize the above and figure out how to deal with the new paradigm as applied to all facets of business: customers, processes, markets, people, regulations, sustainability etc.

Mindless Data

paulgolding — Wed, 08 Mar 2023 19:44:55 +0000

I my last post, I wrote about being “data driven” operationally, and in the post before that I wrote about the mindsets that surround using machine learning models.

Let me try to marry the two ideas.

There is a widely shared mindset that models are somehow free of bias. And I don’t mean statistical bias. I mean prejudicial thinking. We may not recognize this at first because the name “data science” seems to suggest that modeling is a scientific endeavor.

Let’s say it is scientific, then what kind of science?

These methods, as used in enterprises, are often closer to social science.

I say that in the sense that we are often trying to account for behavioral mechanisms (in the enterprise) using statistics.

The hypotheses formed about the enterprise are theories about human activities, not, say, balls rolling down a slope (laws of physics).

As such, business leaders need to be aware that this entire gamut of activities can easily be gamed to make the statistics say whatever we want.

A damning analysis of how this happens is well documented by Gerd Gigerenzer in his paper: “Mindless statistics” — the inspiration for my article’s title.

In it, he gives an account of how a statistics textbook written by a university professor was revised to remove a broader discussion of statistical techniques in order to focus on only one. When asked why, the professor said:

Most researchers […] are not really interested in statistical thinking, but only in how to get their papers published.

We could easily substitute this for: “Most managers are not really interested in statistical thinking, but only how to get their ideas accepted.”

This reality is what data-driven operating models need to address.

Yet, the discussion is often about “Data Governance”, or the like, as if assuring quality and control of data will translate into statistically-valid decisions.

Perhaps in the oft-quoted 5 Vs of Big Data, there needs to be a sixth V: Veridicality, as in coinciding with reality.

There are no easy answers here.

We can wave our hands about solutions, yet the statistical-heuristical dichotomy pervades how we operate our businesses.

I can say two things as food for thought:

Nurturing a Numerical Culture & Systems Thinking

First, as mentioned in the original mindsets-post, all processes are socio-technical in nature. Social pertains to culture. Before we talk about AI-first, we need to talk about building a data-first culture, possible even with the humble spreadsheet or, heavens-forbid, a whiteboard or napkin.

We should all be hands-on with numbers and strive to explain our arguments using numerical reasoning versus our tendency to resort to qualitative narratives. Business leaders should, well, take the lead, and be sure to encourage numerically-reasoned arguments, including causality assumptions.

This is hard to do, but that’s good because it means that it requires System 2 thinking that is supposedly less vulnerable to bias. Its deliberate application to various actives has been fairly well studied (beyond the problematic Thinking Fast & Slow book).

Much has been said about the value of design thinking, with many companies, like IBM, adopting it systematically. But little is said about numerical thinking or its grander counterpart systems-thinking, which is often still treated as a kind of esoteric topic favored by management theorists and Complexity fans.

Just as we should all become design thinkers, we equally need to become systems thinkers. (See this great post by Dave Wells on prognostic analytics and systems thinking.)

Discipline: Start as You Mean to Finish

Still related to culture, leaders have to set the tone about what kinds and levels of discipline they want to operate with. We all know that our behaviors align to incentives. If good data discipline isn’t incentivized, then who’s going to bother?

It is well documented that the manner in which a start-up sets out in terms of adopting good practices is how it ends up. Indeed, they are related because good practices are what makes a business repeatable and scalable. We have known this since Six Sigma, which saved many US companies from the onslaught of process-obsessed Japanese industries.

Yet, just as we learned that you cannot inspect robustness into a product or process (hence Design for Manufacturability), we have seemingly not learned that you cannot inspect robustness into your data processes or culture.

This is still the dominant mindset of so many enterprises and is a reflection of the same lack of strategic mindset (i.e. dominance of short-term thinking — action first, data second) that dogged US industry prior to Six Sigma.

Cultivating a mindset of doing things “the right way” is essential. In some areas, like software engineering, the principles of “the right way” are now well established that they are infused into the wider culture of software participants almost as a code of conduct.

These methods are beginning to percolate into data via the DataOps gamut of ideas, but still slow to penetrate wider organizational behaviors.

I would argue that Design for Measurement ought to become part of Design Thinking. Indeed, like all new things, it is actually an old idea, but still broadly missing from our codes of conduct.

ChatGPT is the Tip of the Iceberg

paulgolding — Mon, 27 Feb 2023 19:42:38 +0000

ChatGPT is just the tip of the iceberg that enterprises find themselves crashing into even though the underlying mass – Large Language Models (LLMs) – was introduced back in 2017. In AI years, that’s a lifetime ago.

LLMs ought to have already featured in digital transformation programs, but many CIOs and technology leaders are still figuring this out.

Whilst there has been a rush of explanations of ChatGPT phenomena and hundreds of proposed applications, from copy-writing to ideation to citizen coding to science to log-processing, very few organizations have considered the strategic use of LLMs.

Stanford declared LLMs as Foundation Models via a rather long, yet instructive, paper. Due to their incredible, almost incredulous, performance, LLMs have proven to be foundational to many applications using Natural Language Processing (NLP).

This power, mostly thanks to the size of the models, has taken many by surprise. It is indeed curious that the underlying training scheme of guessing missing words from large corpuses of text should turn out to be so useful in tackling almost every NLP task imaginable.

What is even more staggering is how this supercharged word-power is easily accessible via just a few lines of Python code thanks to the rise of services like Huggingface that have made these models available beyond research labs.

Such is the power at the fingertips of many a coder, plenty of pre-LLM AI tools have been made obsolete. Here’s a quick tip: for every AI solution touted, check to see if LLMs could replace or improve it. The answer is increasingly yes.

Suppose you want to add keyword labels to call transcripts or contract clauses, in order to classify them, this is literally a few lines of code. As of the time of this writing, Huggingface has 16,757 text classification models in their library. Of these, perhaps a hundred are foundational whilst the rest are fine-tuned riffs, like this one, tuned on US Amazon product reviews.

LLMs for Digital Transformation

There are many frameworks of digital transformation, but let’s consider their context, namely the range of innovation types per Geoffrey Moore:

Source: kikku website

Agile value streams will span the entire gamut of innovation types and there isn’t one that won’t be impacted by LLMs, or even the core idea: Attention.

Let’s take a quick detour of Attention (if you want a longer code-level explanation, see my repo on Github). This simple invention (a differentiable memory) gives rise to two enterprise opportunities:

Models that can understand the language used across your enterprise
Scalable pattern-finding in many kinds of enterprise data besides language

Attention is powerful at finding patterns in set-based data. Indeed, the invention was motivated by mapping one kind of set to another (language translation).

It’s called Attention because it figures out how members of a set “pay attention” to each other. For example, the word He in the sentence “He ran across the road” pays close attention to the word ran, but not so much to the word road.

Attention Mapping

Attention doesn’t just figure out that he attends to ran in particular. Nor does it figure out that the first and second word are related, as they will often be. It figures out that “He ran” is a common kind of pattern, one that us humans would recognize as subject-verb correspondence.

Given enough examples and enough layers of Attention, the Transformer architecture can find many patterns, even over longer spans, like “He, the tall man, ran surprisingly swiftly.” Here it will also associate pronoun (He) with subject (man) or, more generally, subjects with predicates.

Attention doesn’t know what these patterns are, which is why an LLM doesn’t “know” the rules of grammar per se (and, arguably can never know). But grammatical structures, formal and informal, get approximately embedded into the model via this pattern-finding process, as does knowledge encoded in the source text.

Even when approximately embedded, these grammatical rules prove useful as the building blocks for so many NLP tasks built using LLMs.

But what if your enterprise is full of arcane texts specific to business, like bills of materials or technical product descriptions or specific contract legalese, etc?

Using a mechanism called Transfer Learning, the power of a foundation LLM can be transferred to a custom one fine-tuned for your enterprise. For example, the Fairmont hotel group could fine-tune a model to become the “Fairmont-LLM” that understands “hotel speak”. Silicon Valley Bank could build a “venture debt speak” model, and so on.

Let’s explore what this looks like in the context of digital transformation.

Fairmont LLM: “Hotel Speak”

Imagine a value-stream in the Fairmont hotel’s transformation programs that is linked to deeply personalized experiences across the customer journey. Here’s a journey map from GCH Hotel Group as a reference.

Source: GCH Hotel Group

What do all of these interfaces have in common? They all use language.

Given the foundational power of LLMs, it is entirely possible that the “Fairmont-LLM” could power many NLP services within myriad touch-point components:

LLM Integrated into Touch-Point Services

This has obvious implications for digital transformation architecture because a unified custom-LLM potentially impacts many innovation types and related value streams. Failure to notice this will stymy efforts to produce consistent customer-facing experiences. They will cost more and deliver less.

Google’s 2023 AI and Data report confirms a trend set by software experts (like Martin Fowler) who have argued for unified Data Mesh and Data Fabric architectures that put an end to data silos. This thinking applies equally to AI, in particular LLMs.

For example, fine-tuning will require constant updates and related services, such as data-quality to ensure human alignment–i.e. making sure that any generated text is compliant with policies, brand guidelines and ethical standards.

This will require a composable data architecture and a distributed MLOps operating model, more akin to a mesh. This is a far cry from many of today’s embedded enterprise AI solutions with opaque data models–an anti-pattern for scalable value from LLM models and services.

Transformers: LLMs in Disguise

There are many ways custom LLMs could/should be deployed in the enterprise as a strategic IT resource versus a point solution. Various architectural and ops patterns are still emerging.

With different business cases and transformation mindsets to consider, it pays to have awareness of the potentially strategic role of custom LLMs before committing to yet another embedded AI point-solution that lacks flexibility or is heavily tied to a vendor.

Going a step further, let’s return to our earlier claim that Transformers can pay attention to patterns in all kinds of set-based data. It turns out that many enterprise data sources are sets that contain patterns amenable to discovery via Attention: sales forecasting, actions on a website, molecules in a drug, power trends in a wind farm etc. (I did research to apply Transformers to no-code website generation because website layouts are inherently set-based.)

Where it gets really interesting is when different sets get combined into novel multi-modal models capable of solving even larger classes of problems such as how language used in sales might predict revenue.

Many datasets within the enterprise are ripe for multi-modal modeling, which tends to suggest that Transformer-based modeling might become a horizontal data-science capability, maybe via new low-code tools, not just a feature of an embedded AI product. Certainly, the availability of Huggingface models makes this a realistic proposition.

The Killer Use Case for Generative AI is Empowering Enterprise Citizens

paulgolding — Sun, 19 Feb 2023 19:36:45 +0000

Digital Democratization: Tech Osmosis

Following on from a recent post about digital democratization, let’s explore where Generative AI fits into the picture in order to arrive at a more interesting use case for Large Language Models (LLMs) like GPT-3, ChatGPT and Github Co-pilot.

Citizen Coders and Citizen Technologists (henceforth both called Citizens) are vital for the future of digitally complex enterprises. I have argued that the only way to accelerate digital dividends is to amplify the efforts of workers closer to the customer and/or closer to the in-flight business problem.

Indeed, per Gartner, CEOs increasingly hold the belief that the answer to unlocking greater digital dividends is placing more technology into the hands of business operators. They expect their CIOs to deliver.

The reality of modern software tooling is that it is often easier for a business-domain worker to gain access to automation via new tools (like embedded data science — e.g. in Tableau — or no-code super-apps or reverse-ETL) than it is for a highly technical engineer or data scientist to grasp the in-moment need and prioritization of an in-flight business problem. This is thanks to the ongoing march of automation.

There is a kind of digital osmosis whereby power flows more easily from technology to the business domain than business knowledge can flow to the technical domain, if it isn’t lost in translation.

This is especially true under conditions of uncertainty and complexity that are here to stay in the modern enterprise.

Let’s put it another way: if a business worker has to spend more effort explaining and justifying a biz problem to a data scientist, data engineer or salesforce operator than they could theoretically spin up their own solution using self-serve techniques, then you’re already too slow and likely to suffer the dreaded Red Queen effect that business strategists like to remind us of.

Generative AI and Citizen Productivity

A key automation force will be Generative AI in the form of code generation (e.g. Github Co-pilot) and bot-like interfaces like variants of ChatGPT.

Indeed, I predict that the sweet spot for these technological possibilities will come when Citizens are able to create their own hybrid-chat interfaces — think thousands of helper bots, or chat-powered UIs (with embedded low-code widgets), not one giant “Siri for the enterprise”.

Over time, Generative capabilities will widen to include more tools. For example, in a recent assignment for a major digital experience vendor, I proved that it is possible to generate website pages using the same Transformer technologies that underly LLMs. We know that start-ups like Adept are working on generalized UI-learning machines in the same manner.

There will be several outcomes that accelerate digital dividends:

Citizens will become 10x more productive at solving in-flight business problems.
$-for-$, Citizen productivity curves will outstrip the productivity curves of in-house technical specialists.
Specialists will migrate to core technical innovation tasks, such as customized AI model production which, in turn, will empower Citizens.
The delivery model of specialists will increasingly be via generalizable Citizen-accessible interfaces.

Accelerated Context Switching

A key part of the puzzle is knowledge management. Over time, the ability for Citizens to become successful will depend upon their ability to understand various moving parts (like datasets) quickly. Generative techniques will somewhat lessen the burden of having to understand details, leaving that to models that sit in constant fine-tuning loops.

Citizens will recognize that their ability to be successful will depend upon context-switching efficiencies — how easy it is to switch from one task to another.

As I argued elsewhere, the great potential for apps like ChatGPT is a “mental un-blocker” apps that can quickly bootstrap a worker into a knowledge space. So, these technologies will help Citizens with task-switching.

Task contexts might include things like a set of parameters or instructions to set up a data flow via reverse-ETL into Salesforce, or the meaning of a Snowflake model or, the perennial problem of where to find data.

Indeed, at a certain point, the emerging data semantic layer (e.g. see Semantic Layer for DBT Cloud) will be totally subsumed into LLM-based apps to make data more amenable to semantic search.

Apps like ChatGPT will not replace data analysts, but will increase the productivity of those attempting analysis who do not spend all their time with the data, like a Citizen Coder.

Eventually, Generative AI tooling will allow business folks to operationalize the generation of insights “at the speed of thought” — an oft-mentioned phrase, but seldom achieved.

In terms of allowing easier access to data in the first place, the challenge for data-tools vendors will be to make their wares “LLM-indexable” — i.e. easily accessible via natural language querying. (This will be part of the emerging Data Fabric architecture, although many vendors are seemingly behind the ball with this realization.)

Gig Citizens

In working with Generative AI tools, Citizens will learn to adopt a new mode of documentation, or knowledge curation. We can’t say what this will look like, as it hasn’t been invented yet — we are still adjusting to the way interfaces like ChatGPT work and very few of us have had the opportunity to fine tune them to include our own knowledge.

I predict that Citizens will learn to write “notes to self” that are actually “notes to self via bot” — i.e. document my work in a way that I can pick up the thread again quickly by asking my bot.

Powerful new techniques in data sharing, as available via tools like Snowflake, and new modes of AI in data permissions and privacy protection will empower Citizens to work from anywhere and become specialists: RevOps Citizens, SalesOps Citizens, Growth-Hacking Citizens etc.

This will slowly pave the way for what some CIOs are already wondering: what are the opportunities for a “Gig-Citizen” economy?

One has to believe that the opportunities are vast and that new tools will emerge just to power that economy. Generative AI will play a central role.

Impacting The Bottom Line

Citizen Coding is not a new idea. It is already well underway thanks to the rise of automation and expressive tools like no- and low-code apps makers.

The so-called Citizen Technologist, or Business Technologist, is just an extension of the Citizen Coder concept: someone outside of IT who knows how to drive technology to get results. A good example would be someone who knows how to set up a workflow using Zapier.

How will this impact the bottom line?

Any Citizen ought to be working towards a set of KPIs that roll-up via a KPI hierarchy to business outcomes directly related to revenue.

An insight is any data-driven realization that is actionable against improving a KPI. So, if workers are empowered to generative insights for themselves and make workflow adjustments for themselves, this ought to have a direct positive effect upon related KPIs with knock-on improvement to the bottom line.

This is just one way in which AI, in this case Generative AI, will help to accelerate dividends from digital technology investment.

Article written with inputs from Irzana Golding about data democratization and insights operationalization via tools like DBT Cloud.

The Role of Data Scaling in AI Strategy

paulgolding — Wed, 15 Feb 2023 19:32:42 +0000

Introduction

What passes for AI strategy these days has a very low bar that tells us little to nothing about the unique potential of AI.

We see this often from folks pivoting to become “AI experts” with little insights into the unique capabilities of AI.

My first AI design (patented ’96) was used in a commercial product (for improving capacity of Motorola cellular networks). I have used AI ever since (see example projects), so I have genuine insights into its unique commercial possibilities, many of which have only manifested in recent years.

What many posit as an AI strategy is often “Digital Transformation” with a bit of AI thrown in. This might be appropriate for tactical deployments, but insufficient for strategy.

Very few leaders have asked: How do I transform my business via the unique capabilities of AI? Or, how is corporate strategy fundamentally impacted by AI?

Indeed, there is a precursive question: How does the availability of “unlimited computation” impact corporate strategy? In many ways, AI provides the framework for answering that question.

Very few leaders are asking these questions because they don’t know how to interpret them within the context of corporate strategy. To assist with interpretation, I am currently documenting the unique principles of AI and how to deploy them as strategic tools.

I plan to share soon — to be honest, this is a kind of teaser post (and to explore if posting on LI is a useful medium).

As a precursor, let me share one example.

Recipe: AI Scaling & Innovation Networks

A recent discovery of AI is the so-called Scaling Law. It says that performance scales (as a power law) with the size of the dataset and AI model. Model complexity isn’t so important. Indeed, most of the “beyond human” capabilities of AI have come about from scaling.

With this in mind, a business might ask: How do we scale data as a strategic advantage?

Remarkably, many orgs do not ask this question.

They lack strategies specifically related to the accumulation of “data capital”. Data is still seen as a by-product of current processes versus an investment into future gains.

But there are some great examples from folks who really get it.

Recently, I encountered an alliance of farmers who understood that certain types of automation are inevitable in order to overcome common challenges, such as labor pressures.

They strategically identified the need for scalable data and so collaborated in the curation of a pooled repository of high-quality agricultural data.

Operationally, the alliance are organizing this effort via an “Innovation Network” in which resources and information are pooled and managed via a separate body.

The network taps into technology partners and key expert individuals who form what might be called “A Network of Invention”. This is a smart move because it leverages organic “network effects” from knowledge without the need to develop lots of in-house AI expertise, keeping in mind that such expertise is often hard to come by.

The alliance also works with an incubator to co-opt start-ups into the network. The data pool is planned to be a major input into the incubator program.

I have been involved with setting up several such networks, including for O2, the UK’s premier telco, via a loose collaboration of internal innovation teams (“labs”), strategic partners, corporate customers, incubators and informal “inventor” networks.

The farmer’s alliance doesn’t yet know for which applications the data will prove most valuable. But they realize that without the means to scale data they will have little chance of exploiting scaling laws. There is a prior expectation of utility from the data given the existence of such laws and related patterns, like Emergence.

In the alliance’s case, a key expectation from the data is reduced cycle time from AI experiments to in-field benefits, which is critical in agriculture where innovation cycle times are often overly long.

Summary

AI strategy must be formulated along the axes of AI’s unique attributes, such as Scaling Laws. This law alone gives rise to various strategic considerations, such as the accumulation of “data capital”.

Noting that scaling is hard, a possible strategic approach is the recipe:

Pooled data capital + innovation network

But there are many other recipes for accumulating “data capital” for strategic competitive advantage. More to come in later posts 🙂

ChatGPT — An Unblocking Tool

paulgolding — Tue, 10 Jan 2023 07:13:40 +0000

Summary

TL;DR: ChatGPT killer use case is mental unblocking.

I participated in a number of discussions with “knowledge workers” about their experiences using ChatGPT in the workplace via the OpenAI UI.

Many non-technical users are finding it sufficiently useful as a writing aid that they imagine continuing its use, especially as it gets better. Use cases varied, from outlining video scripts to helping with product descriptions to generating job descriptions to general re-writes of documents. Many found it useful as an ideation tool due to its content diversity, having been trained on a truly super-massive set of examples.

The primary use case for ChatGPT is accelerating writing tasks where it’s easier to hack a prompt than think of what to write — or do — from scratch. It is a useful mental unblocking tool. The knack is learning how to write productive prompts as a kind of “programmatic writing” vs. “chatting”.

All those I spoke with encountered issues, such as hallucination. But this did not seem to present enough of a barrier to prevent usage.

Those competent at writing often did not rate the outputs as good enough to wholesale replace entire writing tasks, such as writing finished blog posts. It often has a tendency to produce unappealing unoriginal prose unless manipulated via an excess of prompt hacks. However, some claimed that such outputs still provided a better-than-nothing starting point.

Introduction

There’s much hype about ChatGPT, so I asked various folks what they have found it good at. Mostly I spoke with “knowledge workers”, including quite a few in creative professions (e.g. digital agencies). I did not speak to any who struggled to write to begin with.

What is ChatGPT?

For those unfamiliar, ChatGPT is a specially trained version of another AI model built by OpenAI, called GPT-3, which is a class of model called a Transformer. If you are a coder and want to know more, I provided a detailed code-level annotation on Github. For the rest of us, a Transformer is a giant computer program that spends a mind-blowing amount of time trying to guess how to complete sentences that it finds online.

This guessing game is a brute force attempt to learn language structure. It turns out that if done long enough over a big enough corpus, the program gets stupendously good at guessing which words go with which over very long spans of text, or long enough to subsequently generate quite productive prose. This is known as self-supervised learning because the program doesn’t need to be told (supervised) how to complete the task because the words it is trying to guess were already there: it simply masked them out in order to guess them.

What has surprised seemingly everybody, including AI experts and linguists, is how good this brute force method is when carried out a super-massive scale. And scale is seemingly what matters most, not any particular architectural sophistication.

The scale of the model allows it to learn a tremendous number of structural patterns over large spans of text. Once it has finished guessing, the trained model can be used to carry out other language-related tasks.

One such fine-tuned task is language generation. As you can imagine, a program that knows so much about which words go with which can complete entire sentences just by “guessing” the next most probable word, one at a time, over and over. This isn’t just a testament to the model’s scale, but a feature of language, namely that longer sequences are composed from smaller ones, which are composed from smallers ones (and so on), sometimes called recursion.

Language has this amazing creative aspect: take any starting word and there will be dozens, sometimes hundreds, or more, of potential next words. Over the span of a sentence, this gives rise to a mind-boggling number of novel combinations of words. Indeed, perhaps we don’t realize that many of the sentences we utter in our lifetime are entirely novel in the history of language. This creative aspect of language is considered by experts to be one of the great mysteries of the mind that is very hard to explain with a scientifically tractable theory. This is perhaps why it has surprised so many that a brute-force computer program has been able to detect enough structure in its training data to allow it to complete various language tasks with human or beyond-human performance.

The truly surprising aspect of pre-trained transformers is that not only can they generate a passage of text, but they can interpret an initial seeding sequence, called a prompt, such that it can be used as a kind of instruction as to how to generate the text. This makes the AI a kind of “programmatic writing” machine. (We shall return to this perspective.)

This seems like magic and has led many users to feel an illusory sense that the model “understands” what’s being requested. Here’s an example:

What’s happening here? It looks as if the program has understood the sentence and looked up the answer, like it might a database of questions. But that is not what’s happening. The program is generating the green text as a highly probable follow-on text to the prompt — highly probably in the sense of all the vast swathes of text it has seen whilst training and which of those in which combinations might plausibly follow from the prompt. Think of it this way: if we were to look at all the text on the web and find that prompt, or something similar, then most likely we would see a list of sales methods afterwards.

Let me be clear: it is generating that green text. It is not trying to look it up from the training corpus. It doesn’t know that this is the correct answer — there is no “correctness” or “truth” inherent in the program. Indeed, it does not understand any of the words in the prompt either. It is blindly performing a function: given the input sequence, what is a highly probable output sequence. The output is only plausible, and potentially truthful, because the words it saw (in relation to the prompt) during training are plausible and contain truths, facts (or not) and useful explanations and descriptions.

And this is where scale comes into the picture. If the program sees enough examples of text, as in a truly super-massive corpus, then it will have seen enough examples of this question, or what it represents, such that its answers are so probable as to overlap with common experience or useful information as available on the web and in various books.

Representation is key, as this is what the model learns. For example, the model will learn the pattern “What…[plural nouns]…?” has a high probability of being followed by a numerical list. We can tell this by repeating the pattern and swapping out the meaning:

What is it good for? Unblocking.

Now we know what it is and how it works, what is it good for? Here I participated in a number of discussions, in-person and via forums, with various knowledge workers who had tried to use the tool in earnest to do useful work. I only asked about uses of the raw OpenAI interface, not any specialist application like the AI Copywriter tools built using GPT.

Everyone I spoke to found it useful for something. This is not surprising given its super-massive scale. It is hard to stump it, even, apparently with very esoteric questions of the kind that philosopher and quantum physicist David Deutsch has been posting on his twitter account.

Many folks were surprised by how good the tool was in getting a response that proved useful. A senior content manager at a high-end digital agency told me how he had experimented with generating script outlines for corporate videos. Having recently done some consulting related to online retail in the Metaverse, I tried something similar with a prompt: Write me a corporate video script outline for a person whose shopping experience is enhanced by shopping in the metaverse. The output followed a coherent structure with a meaningful narrative. For someone unrehearsed in script writing, it provided a useful starting point. For professional script writers, it probably isn’t so useful, except, perhaps, for ideation of scenes.

A consultant friend of mine tried to write blog posts about innovation, his area of expertise. He found the outputs coherent as a blog post, but a bit flat, lacking any originality. Indeed, it looked trite. This is to be expected if the tool is basically trying to find the most probable text.

But, like all probabilities, they can be constrained, or made conditional. This is the art of refining the prompt. In that same conversation, another innovation consultant reworded the prompt to generate something more convincing. We all concluded that it was still not good enough to warrant publication (in the name of the original consultant) but some of the participants believed that it was far enough along to get there via editing, even using ChatGPT to suggest new edits on a paragraph or sentence basis.

Despite some believing that this prompt-manipulation will become unnecessary with new versions of the tool, this seems an unlikely prospect. This is because language is, no matter how it is generated, ambiguous and able to convey many different voices, tones and ideas with even subtle rewording.

It was interesting to watch a live session of a Gen-Z user “hack” the prompts back and forth to generate a LinkedIn bio for a friend. It was interesting to watch this younger user take to the tool like it was how writing had always been done. Of those involved in the process, we all agreed that the final version was better worded than the author’s original bio. This hacking on a theme to get a final result seems like it could be skill, but one that is more powerful in the hands of someone with a good command of language to begin with, but nonetheless unaware of what a good LinkedIn bio might look like until they saw it, versus thinking of it entirely from scratch.

And this seemed to be the killer use case: hacking on a prompt to converge upon productive content without the pain of filling the void of blank paper — a kind of mental unblocker, as it were. Indeed, a professional writer friend of mine told me how some of his fellow fiction writers were using it often to “fill the gaps” with hard to find narrative turns or adjectives or scene ideas.

Using ChatGPT: Programming words with words

I am not going to offer a long list of tips and instructions, but rather an insight into how to use ChatGPT productively. And it’s an insight that all of the correspondees discovered for themselves, also widely discussed on the Web — e.g. see this article by Wharton Professor Ethan Mollick.

The trick is to understand what I explained above about how ChatGPT works, as in it is able to consult a truly super-massive corpus to understand how to construct plausible, or highly probable, sequences. This means it is capable of detail and nuance, but only if you instruct it with specific detail and nuance to begin with.

Users learn that ChatGPT is a kind of “programming words using words” versus a mind-reading “Oracle” — and yes, some folks use it in that latter way and get disappointing results. Part of the programming can be the inclusion of lots of specific detail, even as bullet points, without attempting to construct a prompt that another human might find meaningful, per se.

Consider a generic prompt like: Write me an AI strategy. This will likely produce the kind of generic or trite response expected, like identify a problem, find the data, select a model, train it and use it etc. But something more elaborate might produce more interesting responses, like: Why should a CEO of a mature company that is struggling in the marketplace care about the manifold hypothesis?

You’d have to know that the manifold hypothesis is related to AI, but this is a more nuanced prompt that generates a more nuanced response related to the nature of high-dimensionality data.

The best way to discover this “programming words with words” is to experiment. Take you baseline prompt and mix-in a lot of details, perhaps quirky and tangential ones, just to understand how the machine works.

Issues

Unsurprisingly, given that we know how Transformers work (brute force probability machines), we expect some of the outputs not to meet the mark. To recap, the model does not attempt to “make sense” of the prompt, but rather looks at the sequence and, based upon patterns found it it, at many levels (morphologically, syntactically, semantically) computes the most probable follow-on sequence based upon a super-massive scale of prior examples (of the morphological representations, not necessarily the exact wording of the prompt).

The model can easily produce incoherent output, which has been documented by many. They can also hallucinate facts or fail to generate prompts that are well reasoned based upon the prompt.

However, I didn’t find anyone who was attempting to use it as a general knowledge bot, like a Siri. Indeed, it was well understood that this wasn’t a wise use case, nor anything that would involve having to subsequently check all factual claims in the generated text.

Also, that friend of mine who tried to write an innovation post, also used another AI model to detect if the text had been generated. It detected with high confidence that the text had been generated. Whilst it’s not clear how these detection scripts work, the potential for having content detected as fake could be worrying, especially if it gets dinged by Google Search.

Nonetheless, as said, most of the users I spoke to were not attempting to generate wholesale content for publication online, but these various writer’s companion tasks to accelerate production of content, knowledge, ideas and format etc.

Where Next

Of course, pre-trained Transformers can do lots of other things, like summarize text or answer questions from a text. They can be used to analyze different document types, like invoices or business contracts, or even generate them. Perhaps they can find clauses in contracts that are likely to lead to poor value or lack of compliance, and so on. When it comes to the technical use of the underlying technology, suddenly many business processes become amenable to automation. However, this wasn’t the subject of my inquiry, but the use of ChatGPT as a writer’s companion. For that, it appears to have sufficient value that it might be expected to enhance knowledge-worker productivity over time.

And no — none of this article was written using ChatGPT (except the indicated examples).

If you’d like to know more about how to use GPT-like technology in your company to uniquely add value, feel free to reach out.

Transformers (Code-level Commentary)

paulgolding — Fri, 18 Nov 2022 16:17:02 +0000

Transformers

I seldom, if ever, write technical posts. There are just too many excellent resources out there and my goal isn’t to explain technical things to a technical audience, as much as I enjoy teaching from time to time, or giving the odd guest lecture.

But I made an exception whilst perusing the literature regarding Transformers, as in Large Language Models. This was motivated by research into the potential use of Transformers for a novel non-language use case. I wanted to experiment with a few ideas, so I decided to build a base of various notebooks and code libs from which to set out on my research path.

Secondarily, I wanted to ground myself in sufficient code-level mechanics to explore a set of questions I have about what, exactly, are Transformers doing in relation to semantics. This was for the purposes of attempting to understand how world models might get incorporated into LLM schemas, beginning with Transformers. Of course, this isn’t a novel question, but I need my own research bed.

There is also the interesting question as to why it is that depth (of Transformers) is a key indicator for modeling language performance (against the various benchmarks out there). I felt that by playing with the transformations, layer by layer, I might find insights and/or confirm various similar explorations already in the literature.

A question that also piqued my interest is to ask what would the opposite of a Large Language Model (LLM) look like, as in a “Small”, yet nonetheless still performant, LM. This was from the point of view of not only attempting to understand what a more parsimonious system might look like, but from the point of view of understanding the scope of independent research that might be achievable without Google or OpenAI resources ($$$$$).

Anyway, these are not the subjects of this blog post. Rather, this post is merely a link to a notebook that I published that is a more fine-grained explanation of an existing code-level explanation of an Encoder-Decoder Transformer (i.e. BERT-like).

Per the notebook intro:

I wanted to add some missing details to standard transformer annotations often found in courses and texts. But not wanting to reinvent the wheel with yet another explanation, the accompanying notebooks are mostly an expansion of Chapter 11 of the Dive Into Deep Learning open source book — a valuable resource for learners. (We are all learners.)

Don’t Ignore AI in a Downturn

paulgolding — Tue, 01 Nov 2022 13:52:56 +0000

Don't Ignore AI in a Downturn

I recently wrote a guest blogpost (“Five Misconceptions about AI That Are Slowing Down Your Business”) on the wonderful Stephen Shapiro’s Innovation Insights website. It was motivated by two experiences. The first was an indirect encounter with a CEO who declared that AI is a kind of luxury, as if to say that different technologies fall into different “accounting budgets”, not unlike the mental accounting claim made by behavioral economists, if their data is to be believed.

The second was an impression via a consulting engagement wherein the CEO of large corporation seemed to feel that AI had some special status, also assigned to “nice to have” buckets of activity.

There is a third observation, which is the tendency of many CxOs to believe that AI is somehow exotic that it’s completely outside the reach of direct innovation. This only leaves indirect innovation, namely the ticking of AI boxes via the use of vendors with AI-supercharged products. This can tick the box: “We are investing in AI.”

I will leave you to read the post, if so inclined, but the thrust is that AI is a technology to be taken more seriously during a downturn than many other technologies or transformation initiatives. Hopefully the article explains why.

The Manifold and Mantra Hypotheses: Why AI works and Why it Doesn’t.

paulgolding — Fri, 09 Sep 2022 20:12:48 +0000

The Manifold & Mantra Hypotheses: Why AI works, and why it doesn't.

Business leaders: pay attention.

AI works because of a fact of nature that is sometimes expressed by the Manifold Hypothesis — that complex things can be explained via simpler mechanisms (that an AI might find from samples of the complex thing). However, AI often fails because of what I call the Mantra Hypothesis, which is the tendency of business leaders to declare, mantra style, that the business will “Put AI First” or some such moniker, but without the means to translate that intent into operational realities that might lead to a meaningful ROI.

In this post, I will outline for a business audience why many AI projects in enterprises either fail, or are likely to fail. Reasons for failure can be summarized as:

Blind adoption of AI without a clearly defined set of outcomes that can be explained beyond the level of pithy directives like “We need to adopt AI”.
Assignment of AI adoption to folks who lack experience in mapping advanced tech into tangible business outcomes.
Avoidance of vanity success measures in adoption of AI.
Avoiding the trap of hoping that hiring AI technical experts will translate into business outcomes via some kind of osmosis.
Failure to understand the data requirements to make AI training viable.
Underestimation of the operationalization of data and AI required to translate AI experiments (often by scientists) into repeatable and robust products.
Lack of a “Data as a Product” approach to AI, including the tendency to want to “supercharge” existing product features vs. finding new AI-first interpretations.

AI: The Hype is Real, Yet Insufficient

Right now, AI-hype is effervescent. The truth is that no-one knows where we are in the hype cycle — with many voices calling from the trough of disillusionment whilst others holler from the peak of Mount Olympus where they gather with sentient-like AI gods. The naysayers aren’t necessarily theorists like Gary Marcus who claims that Deep Learning has “hit a wall” due to its lack of compositionality — a blurry technical term that could mean many things (whole conference about it here). Put simply, Marcus is saying that any AI that builds a language model merely by predicting which words come next, or thereabouts, cannot retrieve deeper semantic structures (“meaning”) in the data — a capability that he argues is required to make AI usefully “intelligent” in most real-world applications.

The meaning of “intelligent” is hotly debated, often by those who have apparently never read Turing’s original paper in which he suggested that the concept of thinking machines was an “absurdity”. He meant that any agreeable definition of “thinking”, if we might construct one, is commonly understood to be a biological capability, nothing to do with machines.

However, for this discussion, I prefer to lean upon the moniker “usefully intelligent” wherein the word “usefully” is far more interesting and ought to be related to business outcomes, not arcane debates about intelligence. And, in this regard, there are all kinds of brick walls that business leaders will face when attempting to adopt AI.

Other voices are calling from within the trough of disillusionment, namely business leaders who don’t yet see any clear relationship between “adopting AI” (whatever that means) and tangible ROI. And I don’t mean leaders who picked up the airport best-seller on AI and tasked their CIOs to come up with an “AI strategy”. I mean leaders who already placed their bets, although probably not the whole farm, via real hard cash spent investing in AI, acquiring or aqui-hiring AI companies, yet nonetheless failing to translate such investments into meaningful outcomes.

This often stems from a kind of corporate arithmetic:

Business + AI = Better Business.

It apparently works for any business:

Air-Conditioning Vendor + AI = Better Air-Conditioning Vendor.

And so on. Belief in this formula rapidly translates into some kind of mantra for the company: “Put AI First”. However, whilst this formula has the potential to be true, it is not an operational formula — merely believing in the hype of AI — the formula — is insufficient. It tells us nothing about how the addition of AI could make a better business. Yet, it is surprising how many business leaders have adopted this formula without asking probing questions. Moreover, many make the fatal mistake of going about acquiring technical AI expertise as the first step as if the presence of that expertise will permeate into the business and bring about the realization of the formula. Far from it.

This happened repeatedly with the adoption of “Big Data” and “Data Science” only to find mediocre ROI, if not a negative. It is not uncommon for an enterprise to acqui-hire an AI start-up that is basically a team of research scientists, much closer to academia than industry, with scant knowledge of applied innovation and little to no experience of translating their various AI Python libraries into meaningful modifications of the current and future enterprise roadmap.

To dig further, we might consider certain useful generalities as to why the adoption of shiny new tech can easily fail. Any business initiative that proclaims to “Put AI First”, or some such declaration, is possibly doomed from the start. The lack of a meaningful scope and outcomes is the elephant in the room, condemning the program to inevitably proceed along predictable lines. One such line will be suffering a long lag between flipping the switch to “Put AI First” and eventually concluding, perhaps wrongly, that it maybe wasn’t worth it.

Without a practically useful interpretation of AI into provable customer benefits and business outcomes, AI can easily become just another fad, like when many a CEO bought into “Re-engineering” the company only to find that no one had ever really understood in tangible terms what “engineering” a company meant, never mind “Re-engineering”.

This blind approach can be radically worsened when organizations offer “sexy AI” roles to loyalists, often efficient program managers with a track record in other areas, along with their hired-in friends who desperately want the shiny-new-toy badge on their resumes. Yet they lack the depth of analytical expertise and product synthesis to translate shiny tech into outcomes — a task far harder than many imagine. Desperate to show progress, they create “AI departments” and the like, just like they created “Data Science” departments, and then set about constructing various vanity measures of success to present in monthly reviews. This is a common anti-pattern.

The antidote is to ensure that your product teams and key stakeholders throughout the business can articulate clear outcomes and benefits of AI adoption in their roadmaps, present and future. Indeed, I have advocated what I call the 100-of-10 rule. It states, at least as a starting point, that your entire leadership team and their direct reports ought to be able to clearly articulate how the entire 10% of effort within the 70-20-10 rubric could be devoted to AI-related programs. Given the massive disruption potential of AI-first start-ups and the emergence of large-scale models, the potential impact of AI is worthy of such attention and prioritization. But this must be interpreted into meaningful outcomes with detailed arguments before letting rip on the AI Adoption cord.

Many readers might want to stop here and address the above issues first. However, determined readers might read on as we explore the biggest AI elephant in the room, namely the “data”, or “lack of data” challenge.

The Real Brick Wall: Data

What is the data for?

There is often a hidden field in the AI adoption equation:

Business (+ Data) + AI = Better Business.

Many business leaders have at least heard that AI needs data. And, so, with a warm feeling that the enterprise is awash with data, AI-adoption confidence goes up. However, this is often a mistake.

Let’s begin with why AI works at all. If this basic plank isn’t well understood, then everything else rapidly becomes froth. By now, we all know that AI works by digesting data. The “works” bit can be plainly stated: a working (trained) AI can take new, previously unseen, data and reliably make some prediction that coheres with a ground truth. In other words, what it learned from specific data samples can be generalized to other samples. If you trained your AI to understand how to spot sales opportunities based upon a history of sales data, then it must generalize to new sales opportunities — and it ought to generalize in reliable and actionable ways.

AI can do this thanks to a strange mathematical reality, one that is sometimes expressed by theoreticians as the Manifold Hypothesis. For those really interested, you can look up the math, but I will explain it in much simpler terms: things in the natural world tend to obey parsimonious laws. These laws are compact whilst the things they can explain are vast. Imagine, for a moment, that we study thousands of objects falling from the sky until we can reliably predict the force with which they hit the ground — i.e. we take all our measurements, perhaps via a camera, and feed them into an AI.

Now let’s say we want to predict the next hundred objects, which could be of any material and size (i.e. mass). It would be a useless system if we could only predict a particular object because we have already seen an identical one. Indeed, such a system has no predictive power at all — it is merely looking things up from history. It would also not be so useful if we could only predict something when we see an object so similar that our predictive power is too weak to be useful. However, what if our system had somehow figured out Newtonian Mechanics: F=ma. This single “piece of information” (a formula) could be used to determine the behavior of an infinite number of objects. In effect, we have “compressed” the original data space from lots of examples to a single parameter (or maybe three: F, M and A).

The problem is that most of these kinds of symbolic understandings of the world have been discovered by humans via mysterious creative processes that we don’t know how to program, nor explain. Hence we rely upon a different approach, which is to see if an AI can take lots of examples of a behavior and condense them down into a function that is good enough to generalize to lots of unseen examples. Our hope, which we impose via design constraints within the AI architecture, is that the parameters of that function are far fewer than the size of the input space (in totality) such that we can use a reasonably sized set of examples to estimate the parameters well enough to generalize to a class of objects without having to observe every single possible example.

To put it simply, most things in the world could, in theory, be described by some underlying set of equations, if only we knew them. This is how nature is. Although there are trillions of unique leaves on all the trees, they all arise from a broadly similar set of underlying mechanisms that could be condensed into a relatively small set of information (e.g. equations) that is far, far, far smaller than trillions of unique data points. If this were not the case, no AI could learn anything because when we say that an AI is learning, we mean that it is attempting to search for that far smaller approximating function that lies beneath the data. If there were no such smaller underlying function, or principle, then the AI could not easily find it with the clues it has (some training data) and within the confines of a reasonable amount of computational effort.

What do I mean by finding? Well, that is what an AI does. It “searches” for a reduced set of data, called parameters, that not only explains the training data, but could characterize the underlying system from where the training data comes from — e.g. how leaves grow, or how objects fall, or how cancer presents in a biopsy sample image.

This principle applies to digitized systems despite the fact that digitization can give rise to mind-blowingly large datasets. For example, take a tiny image sensor that is only 16 x 16 pixels, and can only encode each pixel as black or white. There are 2^256 possible images, which is 1.15 x 10^7, known by numberphiles as 115 quattuorvigintillion. Imagine an AI had to process each combination to characterize images. Say it could process 1 Billion images per second. That would still take far longer than the age of the universe to complete.

This seems absurd, but those numbers are correct, even though our intuitions can’t relate to that much data from such a tiny image space. But, here’s the really cool thing — most of those images are unrecognizable — they are just noise. Any pixel in one of those noisy pixels doesn’t have any meaningful relationship with any other in terms of forming a recognizable structure, and so, each — and all — do not “say” much.

The actual number of useful images is ridiculously smaller. Take, say, recognition of hand-written digits. It only requires about a few tens of thousands of images of various styles to train an AI to recognize more or less any human-recognizable digit that it hasn’t seen before, no matter how varying the hand-writing style. This is because alphabets have stable morphology – i.e. the underlying rules to make a number 1 are always a vertical line with a top-left diagonal stroke no matter how a particular person writes it. What we want then is for an AI to see enough images to figure out an underlying function that ends up enacting a rule something like: pixels arranged in vertical lines with a diagonal top-left downward stroke predict a digit number 1.

What kind of data?

So this is how AI works — it exploits the natural tendency for things in the world, even when digitized, to be generated by a set of underlying parameterized principles that could be explained with relatively little data compared to all the things those principles could generate. It turns out that the so-called Neural Network is a machine that can find an approximation of those underlying parameters sufficient enough to be usefully generalizable in recognizing classes of data of which it has only seen relatively few samples during training.

So far, so good. But why does this also present a problem? The problem is still having enough data, of the right kind, to allow an AI to converge upon a useful approximating function. In many business situations, this data is really hard to come by, despite the common notion that we are all awash with “Big” data.

The “right kind” is perhaps more critical. For an AI to learn, it has to be able to observe clues about the function it’s trying to approximate. How does this happen? An easy example should suffice. Consider the following sequence:

1 → 1

2 → 4

3 → 9

4 → 16

5 → X

What is X?

Of course, you will think 25, which is correct.

You have figured out that the mapping function here is the squared operator. But if I gave you only the sequence: 1, 2, 3, 4 or only the sequence 4, 16, 9, 1, would you have guessed that these numbers came from a squared-number generator? No. You need the inputs and outputs to guide you — or supervise — when guessing the function. These outputs are called labels. Together with their relevant inputs they are the ground truth of our system. The approach is called Supervised Learning.

There are many challenges to finding this kind of data in a way that it tangibly supports the intended business goals. For example, a CMO might think that the existence of millions of sent emails ought to present a rich-enough dataset to deploy AI in the selection of subject lines that improve email open rates. However, it often turns out not to be so easy. In this case, there might not be enough variance in the data to be useful — i.e. many of the data points are very similar and so they don’t offer enough clues to the AI as to how to find the underlying “generator” function.

Another common problem is finding the ground truth. Lots of datasets in the enterprise might be large, but they lack the supervisory labels to be useful as training data. This might just be because the necessary observations were never recorded. For example, there are plenty of e-commerce datasets wherein the products lack sufficiently rich meta-data, such as keyword tags, to provide sufficient dimensionality in the input data.

It is important to recognize that the labels must represent the ground truth you are interested in. For example, if the true goal of the CMO is to map email behaviors to churn, then you need not just the email interaction data, but the churn data. Of course, in a typical enterprise, churn data might well be available, but you get the idea. Take medical applications as another example, such as the use of AI to analyze tissue samples (via slides). One ground truth might be annotations from a pathologist as to the presence of disease. However, what if the intended goal of the AI is to predict prognosis or probability of a certain treatment’s effectiveness for the patient in question. This requires a different set of ground-truth data, namely historical records of treatment outcomes.

There’s also the nature of the data itself. AI works because the underlying function-approximation assumes that the training samples are highly representative of all the possible samples there could be. This is a certain statistical assumption that does not hold in some cases, such as the so-called “Long Tail” cases where infrequent samples that might not appear in the training data, can have overwhelming effects on the outcome.

Of course, there might be solutions to these problems. For example, in the case of finding useful email headings, data outside of the enterprise can be used to train an AI on aspects of language that evoke a certain emotional response. This kind of data might be found in places like online reviews, or public datasets. Once trained upon this data, the AI can then be tuned to solve the specific AI problem, as seen here in the Oracle email platform. Generally, this technique is called Transfer Learning.

However, the specific solution to data readiness, if a solution exists, is not as important as being able to recognize that the reality of the data part of the AI formula can be very hard to achieve. We will discuss more practical solutions below.

Data in Practice: Operationalization

I could go on and on about the various data requirements needed to make AI effective. Above I have only mentioned a few of those that are somewhat related to the theory of how AI works.

There is the much more practical concern of finding the data, preparing the data, labeling the data (if required) and so on. Both experimentally (when training the AI) and operationally (when deploying in a product), these data processing issues can present a significant amount of the overall effort in bringing an AI-enhanced solution to the market. This operationalization can include the very real challenge of updating the AI with sufficient regularity to adapt to new data samples or better approximations. The gap between the experimental AI program and an operationalized one can be large. This has given rise to a whole new field called Machine Learning Operations (MLOps).

An additional key limitation might be gaining access to data that belongs to the customer. There are all kinds of privacy issues and related audit and compliance concerns to verify what data went into which AI model in order to prove data lineage etc. This might also be important in achieving a certain quality metric from the AI, like avoiding certain kinds of negatively-impacting decisions, such as denying someone a loan because of skin color.

This is related to the wider concern of “AI safety” or “AI alignment” and data governance. Put simply, if your data governance is already lacking pre-AI-adoption, as it often is in enterprises with large technical debt, then you might be in for a rough ride.

Data as a Product and AI-First Mindset

It is almost shocking to see how often the adoption of data-related disciplines, like Data Science and AI, seem to throw out everything we have learned in the modern software era about how to translate commercial intentions into commercial realities, as if the lessons of Agile and the like were never learned. Indeed, this might well be the case, but let’s be more charitable in our estimations of baseline product operations.

There is a tendency in data-related projects to plow straight into complex costly programs with the assumption they are going to work. If we have learned anything from modern software, it is that this assumption is typically incorrect, as the product won’t quite work in the way intended or expected. If those programs are largely staffed by AI researchers and technicians, then it’s often the case that very few of them understand how to manage technical product programs.

It is not inconceivable that an AI team will go about making some kind of tool, say a method to enhance sales, only to find that the sales team doesn’t use it. No stakeholders were consulted, no user tests were done, no demos were conducted of the product in progress, no tests were conceived to assure quality of its outputs. And, often worse of all, no experiments were done to verify scope etc.

It is all too common for rogue data to upset the entire apple cart, again due to lack of data governance. Sales folks changed the way they rated sales prospects via some undocumented hack in the CRM and suddenly the reality of garbage-in, garbage-out is dramatically amplified by an AI that is ingesting inverted or incorrect data.

There is no exception for data not to be treated as a product. All of what we have learned about incremental development, testing, stakeholder involvement, demoing, design-thinking, and so on, still applies, even if you’re making a dashboard.

But there is another product perspective that is often overlooked, namely the absence of critical interpretation of novel AI-first capabilities. There is a tendency for AI to be used as a means to supercharge existing product capabilities. For example, an e-commerce product recommender might be enhanced using AI, as might a website personalization module. This is the “low-hanging fruit” approach. There might not be anything wrong with such an approach, but it should be undertaken with caution.

With any such amplification, one first has to ask how far it really translates into ROI. As you might have already understood, the most precious resource in AI adoption is the AI expertise that ultimately generates the core of the innovation, notwithstanding all of the above constraints in translating that innovation into meaningful ROI. In many sectors, you will be competing, if not now, then soon, with so-called AI-first start-ups who plan to tackle your sector using a radically different approach. For example, in my own work for an e-commerce platform giant, I proposed an AI-first approach called “No Design” web authoring — the conversion of English sentences into finished web pages. The point is that this is uniquely possible because of recent AI advances (called Transformers) and potentially far more impactful within a 1-3 year ROI horizon than tinkering with, say, personalization or search.

Put simply, the “No Design” approach has the potential to radically alter how retailers think about their website strategies, opening up new ideas due to “superhuman” scaling of creative effort etc.

Summary

In going about AI adoption, you should believe the hype. AI really does have transformational potential, perhaps radical in some cases. However, as with all advanced tech, the mere adoption of it does not guarantee success. Enterprises tend to be very amnesic, so it is not unexpected that all the lessons learned from modern product development are suddenly forgotten in the face of a new paradigm like AI.

Naive approaches should be avoided. These include assigning AI as “just another program” to company loyalists who have shown proficiency with running programs. As an antidote to the mere circulation of mantras and hype, business leaders should insist upon well argued AI enhancements to the roadmap articulated by stakeholders with the backing of experiments where possible.

The greatest caution should be applied to any assumptions about data. We have explained the magic of AI (the Manifold Hypothesis) in searching for approximate functions that explain the training data, hopefully in generalizable ways, but this magic needs enough data of the right kind to be effective. It is too easy to mistake existing meta-data as a useful basis for the actual ground truth required for the magic to work in the way intended as a useful outcome.

The operationalization of AI is not to be underestimated, potentially overwhelming the scientific work to the point of rendering it ineffective in practice. Careful attention must be paid to effective data governance and AI quality assurance.

The product mindset must prevail, which could be interpreted as mapping the benefits of Agile, Lean, and so on, to all data programs. The ease with which AI researchers might write Python scripts can easily create a massive bias about how easy it is to build AI products. But these are not products. They are prototypes, or often not far from demos.

The innovation mindset must also prevail, thinking anew about product strategy in light of novel AI capabilities. This is the so-called “AI First” approach often overlooked by enterprises due to an eagerness to attain some validation of AI investments via “low hanging fruit” product enhancements. It is also overlooked when using existing program experts to manage AI, thinking as they might along familiar product lines.