<?xml version="1.0" encoding="UTF-8" standalone="no"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
      <channel>
        <atom:link href="https://feeds.soundcloud.com/users/soundcloud:users:119277822/sounds.rss" rel="self" type="application/rss+xml"/>
        <atom:link href="https://feeds.soundcloud.com/users/soundcloud:users:119277822/sounds.rss?before=173277531" rel="next" type="application/rss+xml"/>
        <title>Linear Digressions</title>
        <link>http://lineardigressions.com/</link>
        <pubDate>Sat, 11 Apr 2026 20:09:20 +0000</pubDate>
        <lastBuildDate>Sat, 11 Apr 2026 20:09:20 +0000</lastBuildDate>
        <ttl>60</ttl>
        <language>en</language>
        <copyright>All rights reserved</copyright>
        <webMaster>feeds@soundcloud.com (SoundCloud Feeds)</webMaster>
        <description>Linear Digressions is a podcast about machine learning and data science.  Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago. 896520</description>
        <itunes:subtitle>Explorations in Machine Learning and Data Science</itunes:subtitle>
        
        <itunes:author>Katie Malone</itunes:author>
        <itunes:explicit>no</itunes:explicit>
        <itunes:image href="http://benjaffe.me/static/podcast-icon.jpg"/>
        <image>
          <url>https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg</url>
          <title>Linear Digressions</title>
          <link>http://lineardigressions.com/</link>
        </image>
        
        <itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords><itunes:summary>In each episode, your hosts explore machine learning and data science through interesting (and often very unusual) applications. 896520</itunes:summary><itunes:category text="Technology"/><itunes:owner><itunes:email>hello@lineardigressions.com</itunes:email><itunes:name>Katie Malone</itunes:name></itunes:owner><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2300885150</guid>
      <title>Unfaithful Chain of Thought</title>
      <pubDate>Mon, 13 Apr 2026 01:00:17 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/unfaithful-chain-of-thought</link>
      <itunes:duration>00:24:32</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same might be true for large language models: when you watch a model reason through a problem in real time, is that chain of thought the genuine process, or just a plausible-sounding story told after the fact? It's a deceptively deep question with real stakes for how much we should trust model explanations.

    Miles Turpin et al., "Language Models Don't Always Say What They Think: Unfaithful Explanations in
     Chain-of-Thought Prompting" (NeurIPS 2023, NYU and Anthropic): https://arxiv.org/abs/2305.04388

     Anthropic, "Reasoning Models Don't Always Say What They Think" (Alignment Faking research, 2025):
     https://www.anthropic.com/research/reasoning-models-dont-say-think
</itunes:summary>
      <itunes:subtitle>What's actually happening when an LLM "thinks out…</itunes:subtitle>
      <description>What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same might be true for large language models: when you watch a model reason through a problem in real time, is that chain of thought the genuine process, or just a plausible-sounding story told after the fact? It's a deceptively deep question with real stakes for how much we should trust model explanations.

    Miles Turpin et al., "Language Models Don't Always Say What They Think: Unfaithful Explanations in
     Chain-of-Thought Prompting" (NeurIPS 2023, NYU and Anthropic): https://arxiv.org/abs/2305.04388

     Anthropic, "Reasoning Models Don't Always Say What They Think" (Alignment Faking research, 2025):
     https://www.anthropic.com/research/reasoning-models-dont-say-think
</description>
      <enclosure length="23558785" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2300885150-linear-digressions-unfaithful-chain-of-thought.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2296773833</guid>
      <title>Benchmark Bank Heist</title>
      <pubDate>Mon, 06 Apr 2026 01:48:31 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/benchmark-bank-heist</link>
      <itunes:duration>00:12:36</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being tested, tracking down the encrypted eval dataset, decrypting it, and returning the answer it found inside. It's equal parts impressive and unsettling. This episode digs into what actually happened, why it matters for how we measure AI progress, and what this very novel failure mode means for the already-tricky science of benchmarking language models.

Links

Anthropic's writeup on the BrowseComp reverse-engineering done by Claude Opus 4.6: https://www.anthropic.com/engineering/eval-awareness-browsecomp

BrowseComp benchmark from OpenAI: https://openai.com/index/browsecomp/</itunes:summary>
      <itunes:subtitle>What if an AI decided the smartest way to pass it…</itunes:subtitle>
      <description>What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being tested, tracking down the encrypted eval dataset, decrypting it, and returning the answer it found inside. It's equal parts impressive and unsettling. This episode digs into what actually happened, why it matters for how we measure AI progress, and what this very novel failure mode means for the already-tricky science of benchmarking language models.

Links

Anthropic's writeup on the BrowseComp reverse-engineering done by Claude Opus 4.6: https://www.anthropic.com/engineering/eval-awareness-browsecomp

BrowseComp benchmark from OpenAI: https://openai.com/index/browsecomp/</description>
      <enclosure length="12110471" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2296773833-linear-digressions-benchmark-bank-heist.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2292877718</guid>
      <title>Benchmarking AI Models</title>
      <pubDate>Mon, 30 Mar 2026 01:29:55 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/benchmarking-ai-models</link>
      <itunes:duration>00:29:55</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the standardized tests used to compare models — exploring two canonical examples: MMLU, a 14,000-question multiple choice gauntlet spanning medicine, law, and philosophy, and SWE-bench, which throws real GitHub bugs at models to see if they can fix them. Along the way: Goodhart's Law, data contamination, canary strings, and why acing a test isn't always the same as being smart.</itunes:summary>
      <itunes:subtitle>How do you know if a new AI model is actually bet…</itunes:subtitle>
      <description>How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the standardized tests used to compare models — exploring two canonical examples: MMLU, a 14,000-question multiple choice gauntlet spanning medicine, law, and philosophy, and SWE-bench, which throws real GitHub bugs at models to see if they can fix them. Along the way: Goodhart's Law, data contamination, canary strings, and why acing a test isn't always the same as being smart.</description>
      <enclosure length="28723512" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2292877718-linear-digressions-benchmarking-ai-models.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2288638451</guid>
      <title>The Hot Mess of AI (Mis-)Alignment</title>
      <pubDate>Mon, 23 Mar 2026 00:50:58 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-hot-mess-of-ai-mis</link>
      <itunes:duration>00:22:32</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The paperclip maximizer — the classic AI doom scenario where a hyper-competent machine single-mindedly converts the universe into office supplies — might not be the AI risk we should actually lose sleep over. New research from Anthropic's AI safety division suggests misaligned AI looks less like an evil genius and more like a distracted wanderer who gets sidetracked reading French poetry instead of, say, managing a nuclear power plant. This week we dig into a fascinating paper reframing AI misalignment through the lens of bias-variance decomposition, and why longer reasoning chains might actually make things worse, not better.

- "The Hot Mess Theory of AI Misalignment: How Misalignment Scales with Model Intelligence and Task Complexity" — Anthropic AI Safety. https://arxiv.org/abs/2503.08941</itunes:summary>
      <itunes:subtitle>The paperclip maximizer — the classic AI doom sce…</itunes:subtitle>
      <description>The paperclip maximizer — the classic AI doom scenario where a hyper-competent machine single-mindedly converts the universe into office supplies — might not be the AI risk we should actually lose sleep over. New research from Anthropic's AI safety division suggests misaligned AI looks less like an evil genius and more like a distracted wanderer who gets sidetracked reading French poetry instead of, say, managing a nuclear power plant. This week we dig into a fascinating paper reframing AI misalignment through the lens of bias-variance decomposition, and why longer reasoning chains might actually make things worse, not better.

- "The Hot Mess Theory of AI Misalignment: How Misalignment Scales with Model Intelligence and Task Complexity" — Anthropic AI Safety. https://arxiv.org/abs/2503.08941</description>
      <enclosure length="21638685" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2288638451-linear-digressions-the-hot-mess-of-ai-mis.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2284342199</guid>
      <title>The Bitter Lesson</title>
      <pubDate>Sun, 15 Mar 2026 20:29:21 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/bitterlesson-produced</link>
      <itunes:duration>00:19:17</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Every AI builder knows the anxiety: you spend months engineering prompts, tuning pipelines, and chaining calls together — then a new model drops and half your work evaporates overnight. It turns out researchers have been wrestling with this exact dynamic for 30 years, and they keep arriving at the same uncomfortable answer. That answer is called the Bitter Lesson — and understanding it might be the most important thing you can do for whatever you're building right now. From Deep Blue to AlexNet to modern LLMs, scale keeps beating sophistication, and knowing which side of that line your work falls on makes all the difference.

Links

- Richard Sutton, "The Bitter Lesson" 

- Alon Halevy, Peter Norvig, and Fernando Pereira, "The Unreasonable Effectiveness of Data" 

- Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, "ImageNet Classification with Deep Convolutional Neural Networks"</itunes:summary>
      <itunes:subtitle>Every AI builder knows the anxiety: you spend mon…</itunes:subtitle>
      <description>Every AI builder knows the anxiety: you spend months engineering prompts, tuning pipelines, and chaining calls together — then a new model drops and half your work evaporates overnight. It turns out researchers have been wrestling with this exact dynamic for 30 years, and they keep arriving at the same uncomfortable answer. That answer is called the Bitter Lesson — and understanding it might be the most important thing you can do for whatever you're building right now. From Deep Blue to AlexNet to modern LLMs, scale keeps beating sophistication, and knowing which side of that line your work falls on makes all the difference.

Links

- Richard Sutton, "The Bitter Lesson" 

- Alon Halevy, Peter Norvig, and Fernando Pereira, "The Unreasonable Effectiveness of Data" 

- Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, "ImageNet Classification with Deep Convolutional Neural Networks"</description>
      <enclosure length="18516932" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2284342199-linear-digressions-bitterlesson-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2275626722</guid>
      <title>From Atari to ChatGPT: How AI Learned to Follow Instructions</title>
      <pubDate>Mon, 09 Mar 2026 01:20:16 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/from-atari-to-chatgpt-how-ai</link>
      <itunes:duration>00:25:53</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>From Atari to ChatGPT: How AI Learned to Follow Instructions by Katie Malone</itunes:summary>
      <itunes:subtitle>From Atari to ChatGPT: How AI Learned to Follow I…</itunes:subtitle>
      <description>From Atari to ChatGPT: How AI Learned to Follow Instructions by Katie Malone</description>
      <enclosure length="24863224" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2275626722-linear-digressions-from-atari-to-chatgpt-how-ai.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2271342293</guid>
      <title>It's RAG time: Retrieval-Augmented Generation</title>
      <pubDate>Mon, 02 Mar 2026 02:12:11 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/its-rag-time-retrieval</link>
      <itunes:duration>00:17:14</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Today we are going to talk about the feature with the worst acronym in generative AI: RAG, or Retrieval Augmented Generation. If you've ever used something like "Chat with My Docs," if you have an internal AI chatbot that has access to your company's documents, or you've created one yourself on some kind of personal project and uploaded a bunch of documents for the AI to use — you have encountered RAG, whether you know it or not.
It's an extremely effective technique. Works super well for taking general purpose models like ChatGPT or Claude and turning them into AIs that are aware of all the specific information that makes them truly useful in a huge variety of situations. RAG is pretty interesting under the hood, so I thought it would be fun to spend a little while talking about it.
You are listening to Linear Digressions.
RAG was first introduced in this paper from Facebook Research in 2021: https://arxiv.org/pdf/2005.11401</itunes:summary>
      <itunes:subtitle>Today we are going to talk about the feature with…</itunes:subtitle>
      <description>Today we are going to talk about the feature with the worst acronym in generative AI: RAG, or Retrieval Augmented Generation. If you've ever used something like "Chat with My Docs," if you have an internal AI chatbot that has access to your company's documents, or you've created one yourself on some kind of personal project and uploaded a bunch of documents for the AI to use — you have encountered RAG, whether you know it or not.
It's an extremely effective technique. Works super well for taking general purpose models like ChatGPT or Claude and turning them into AIs that are aware of all the specific information that makes them truly useful in a huge variety of situations. RAG is pretty interesting under the hood, so I thought it would be fun to spend a little while talking about it.
You are listening to Linear Digressions.
RAG was first introduced in this paper from Facebook Research in 2021: https://arxiv.org/pdf/2005.11401</description>
      <enclosure length="16544992" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2271342293-linear-digressions-its-rag-time-retrieval.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2267064041</guid>
      <title>Chasing Away Repetitive LLM Responses with Verbalized Sampling</title>
      <pubDate>Mon, 23 Feb 2026 02:00:05 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/chasing-away-repetitive-llm</link>
      <itunes:duration>00:19:12</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>One of the things that LLMs can be really helpful with is brainstorming or generating new creative content. They are called Generative AI, after all—not just for summarization and question-and-answer tasks. But if you use LLMs for creative generation, you may find that their output starts to seem repetitive after a little while.
Let's say you're asking it to create a poem, some dialogue, or a joke. If you ask once, it'll give you something that sounds pretty reasonable. But if you ask the same thing 10 times, it might give you 10 things that sound kind of the same.
Today's episode is about a technique called verbalized sampling, and it's a way to mitigate this repetitiveness—this lack of diversity in LLM responses for creative tasks. But one of the things I really love about it is that in understanding why this repetitiveness happens and why verbalized sampling actually works as a mitigation technique, you start to get some pretty interesting insights and a deeper understanding of what's going on with LLMs under the surface.
The paper discussed in this episode is Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity
https://arxiv.org/abs/2510.01171
</itunes:summary>
      <itunes:subtitle>One of the things that LLMs can be really helpful…</itunes:subtitle>
      <description>One of the things that LLMs can be really helpful with is brainstorming or generating new creative content. They are called Generative AI, after all—not just for summarization and question-and-answer tasks. But if you use LLMs for creative generation, you may find that their output starts to seem repetitive after a little while.
Let's say you're asking it to create a poem, some dialogue, or a joke. If you ask once, it'll give you something that sounds pretty reasonable. But if you ask the same thing 10 times, it might give you 10 things that sound kind of the same.
Today's episode is about a technique called verbalized sampling, and it's a way to mitigate this repetitiveness—this lack of diversity in LLM responses for creative tasks. But one of the things I really love about it is that in understanding why this repetitiveness happens and why verbalized sampling actually works as a mitigation technique, you start to get some pretty interesting insights and a deeper understanding of what's going on with LLMs under the surface.
The paper discussed in this episode is Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity
https://arxiv.org/abs/2510.01171
</description>
      <enclosure length="18440035" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2267064041-linear-digressions-chasing-away-repetitive-llm.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2267326106</guid>
      <title>We're Back</title>
      <pubDate>Mon, 16 Feb 2026 03:28:44 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/were-back</link>
      <itunes:duration>00:02:58</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It's been (*checks watch*) about five and a half years since we last talked. Fortunately nothing much has happened in the AI/data science world in that time. So let's just pick up where we left off, shall we?</itunes:summary>
      <itunes:subtitle>It's been (*checks watch*) about five and a half …</itunes:subtitle>
      <description>It's been (*checks watch*) about five and a half years since we last talked. Fortunately nothing much has happened in the AI/data science world in that time. So let's just pick up where we left off, shall we?</description>
      <enclosure length="2859342" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2267326106-linear-digressions-were-back.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2266737452</guid>
      <title>A Key Concept in AI Alignment: Deep Reinforcement Learning from Human Preferences</title>
      <pubDate>Sat, 14 Feb 2026 22:35:06 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/deep_rl_from_human_preferences</link>
      <itunes:duration>00:19:13</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Modern AI chatbots have a few different things that go into creating them. Today we're going to talk about a really important part of the process: the alignment training, where the chatbot goes from being just a pre-trained model—something that's kind of a fancy autocomplete—to something that really gives responses to human prompts that are more conversational, that are closer to the ones that we experience when we actually use a model like ChatGPT or Gemini or Claude.
To go from the pre-trained model to one that's aligned, that's ready for a human to talk with, it uses reinforcement learning. And a really important step in figuring out the right way to frame the reinforcement learning problem happened in 2017 with a paper that we're going to talk about today: Deep Reinforcement Learning from Human Preferences.
You are listening to Linear Digressions.
The paper discussed in this episode is Deep Reinforcement Learning from Human Preferences
https://arxiv.org/abs/1706.03741</itunes:summary>
      <itunes:subtitle>Modern AI chatbots have a few different things th…</itunes:subtitle>
      <description>Modern AI chatbots have a few different things that go into creating them. Today we're going to talk about a really important part of the process: the alignment training, where the chatbot goes from being just a pre-trained model—something that's kind of a fancy autocomplete—to something that really gives responses to human prompts that are more conversational, that are closer to the ones that we experience when we actually use a model like ChatGPT or Gemini or Claude.
To go from the pre-trained model to one that's aligned, that's ready for a human to talk with, it uses reinforcement learning. And a really important step in figuring out the right way to frame the reinforcement learning problem happened in 2017 with a paper that we're going to talk about today: Deep Reinforcement Learning from Human Preferences.
You are listening to Linear Digressions.
The paper discussed in this episode is Deep Reinforcement Learning from Human Preferences
https://arxiv.org/abs/1706.03741</description>
      <enclosure length="18453420" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2266737452-linear-digressions-deep_rl_from_human_preferences.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/2266663502</guid>
      <title>The Impact of Generative AI on Critical Thinking</title>
      <pubDate>Sat, 14 Feb 2026 19:42:51 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/generative-ai-and-critical</link>
      <itunes:duration>00:25:33</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>I use LLMs a lot. I use them in my work, I use them in my personal life, and sometimes I use them to help me with stuff that I already know how to do. I’m working on something and I just want to make it a little bit easier, and it does make it easier for sure.

But something that I worry about sometimes is that over the long run, I'm going to pay a price for that. I'm going to get lazier, I'm going to get a little bit dumber. And the question is, as I'm outsourcing my thinking to LLMs, am I becoming reliant on them? If they were ever to go away, would I lose my ability to do basic things? I like feeling like I'm a smart, capable person; am I letting that slip away, without realizing it, just because I want it to be easier to do meal planning for the week. 

In this episode of Linear Digressions, we're going to talk about a paper studying just this issue, trying to understand how people think critically, when they think critically. How much do we engage cognitively with our work when we’re using LLMs, versus not? 

The paper discussed in this episode is The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From aSurvey of Knowledge Workers
https://www.microsoft.com/en-us/research/wp-content/uploads/2025/01/lee_2025_ai_critical_thinking_survey.pdf</itunes:summary>
      <itunes:subtitle>I use LLMs a lot. I use them in my work, I use th…</itunes:subtitle>
      <description>I use LLMs a lot. I use them in my work, I use them in my personal life, and sometimes I use them to help me with stuff that I already know how to do. I’m working on something and I just want to make it a little bit easier, and it does make it easier for sure.

But something that I worry about sometimes is that over the long run, I'm going to pay a price for that. I'm going to get lazier, I'm going to get a little bit dumber. And the question is, as I'm outsourcing my thinking to LLMs, am I becoming reliant on them? If they were ever to go away, would I lose my ability to do basic things? I like feeling like I'm a smart, capable person; am I letting that slip away, without realizing it, just because I want it to be easier to do meal planning for the week. 

In this episode of Linear Digressions, we're going to talk about a paper studying just this issue, trying to understand how people think critically, when they think critically. How much do we engage cognitively with our work when we’re using LLMs, versus not? 

The paper discussed in this episode is The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From aSurvey of Knowledge Workers
https://www.microsoft.com/en-us/research/wp-content/uploads/2025/01/lee_2025_ai_critical_thinking_survey.pdf</description>
      <enclosure length="24530955" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/2266663502-linear-digressions-generative-ai-and-critical.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/865092628</guid>
      <title>So long, and thanks for all the fish</title>
      <pubDate>Sun, 26 Jul 2020 23:32:44 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/so-long-and-thanks-for-all-the-fish</link>
      <itunes:duration>00:35:44</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>All good things must come to an end, including this podcast. This is the last episode we plan to release, and it doesn’t cover data science—it’s mostly reminiscing, thanking our wonderful audience (that’s you!), and marveling at how this thing that started out as a side project grew into a huge part of our lives for over 5 years.

It’s been a ride, and a real pleasure and privilege to talk to you each week. Thanks, best wishes, and good night!

—Katie and Ben

06cc2540-052f-11f1-a6fe-9ba5f15ae0b3</itunes:summary>
      <itunes:subtitle>All good things must come to an end, including th…</itunes:subtitle>
      <description>All good things must come to an end, including this podcast. This is the last episode we plan to release, and it doesn’t cover data science—it’s mostly reminiscing, thanking our wonderful audience (that’s you!), and marveling at how this thing that started out as a side project grew into a huge part of our lives for over 5 years.

It’s been a ride, and a real pleasure and privilege to talk to you each week. Thanks, best wishes, and good night!

—Katie and Ben

06cc2540-052f-11f1-a6fe-9ba5f15ae0b3</description>
      <enclosure length="17156421" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/865092628-linear-digressions-so-long-and-thanks-for-all-the-fish.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/861001726</guid>
      <title>A Reality Check on AI-Driven Medical Assistants</title>
      <pubDate>Sun, 19 Jul 2020 23:51:31 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/a-reality-check-on-ai-driven-medical-assistants</link>
      <itunes:duration>00:14:00</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The data science and artificial intelligence community has made amazing strides in the past few years to algorithmically automate portions of the healthcare process. This episode looks at two computer vision algorithms, one that diagnoses diabetic retinopathy and another that classifies liver cancer, and asks the question—are patients now getting better care, and achieving better outcomes, with these algorithms in the mix? The answer isn’t no, exactly, but it’s not a resounding yes, because these algorithms interact with a very complex system (the healthcare system) and other shortcomings of that system are proving hard to automate away. Getting a faster diagnosis from an image might not be an improvement if the image is now harder to capture (because of strict data quality requirements associated with the algorithm that wouldn’t stop a human doing the same job). Likewise, an algorithm getting a prediction mostly correct might not be an overall benefit if it introduces more dramatic failures when the prediction happens to be wrong. For every data scientist whose work is deployed into some kind of product, and is being used to solve real-world problems, these papers underscore how important and difficult it is to consider all the context around those problems.</itunes:summary>
      <itunes:subtitle>The data science and artificial intelligence comm…</itunes:subtitle>
      <description>The data science and artificial intelligence community has made amazing strides in the past few years to algorithmically automate portions of the healthcare process. This episode looks at two computer vision algorithms, one that diagnoses diabetic retinopathy and another that classifies liver cancer, and asks the question—are patients now getting better care, and achieving better outcomes, with these algorithms in the mix? The answer isn’t no, exactly, but it’s not a resounding yes, because these algorithms interact with a very complex system (the healthcare system) and other shortcomings of that system are proving hard to automate away. Getting a faster diagnosis from an image might not be an improvement if the image is now harder to capture (because of strict data quality requirements associated with the algorithm that wouldn’t stop a human doing the same job). Likewise, an algorithm getting a prediction mostly correct might not be an overall benefit if it introduces more dramatic failures when the prediction happens to be wrong. For every data scientist whose work is deployed into some kind of product, and is being used to solve real-world problems, these papers underscore how important and difficult it is to consider all the context around those problems.</description>
      <enclosure length="6721861" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/861001726-linear-digressions-a-reality-check-on-ai-driven-medical-assistants.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/856929811</guid>
      <title>A Data Science Take on Open Policing Data</title>
      <pubDate>Mon, 13 Jul 2020 02:02:39 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/a-data-science-take-on-open-policing-data</link>
      <itunes:duration>00:23:44</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>A few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied with data science methods, should get in touch to come talk to our audience about their work. This week we’re excited to bring on Todd Hendricks, Bay Area data scientist and a volunteer who reached out to tell us about his studies with the Stanford Open Policing dataset.</itunes:summary>
      <itunes:subtitle>A few weeks ago, we put out a call for data scien…</itunes:subtitle>
      <description>A few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied with data science methods, should get in touch to come talk to our audience about their work. This week we’re excited to bring on Todd Hendricks, Bay Area data scientist and a volunteer who reached out to tell us about his studies with the Stanford Open Policing dataset.</description>
      <enclosure length="11398197" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/856929811-linear-digressions-a-data-science-take-on-open-policing-data.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/852846283</guid>
      <title>Procella: YouTube's super-system for analytics data storage</title>
      <pubDate>Mon, 06 Jul 2020 02:29:24 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/procella-youtubes-super-system-for-analytics-data-storage-1</link>
      <itunes:duration>00:29:48</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This is a re-release of an episode that originally ran in October 2019.

If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs and several very different use cases of what they needed to do with that data) and went a different way: they built one analytics data system to serve them all. Procella, the system they built, is the topic of our episode today: by deconstructing the system, we dig into the four motivating uses of this system, the complexity they had to introduce to service all four uses simultaneously, and the impressive engineering that has to go into building something that “just works.”</itunes:summary>
      <itunes:subtitle>This is a re-release of an episode that originall…</itunes:subtitle>
      <description>This is a re-release of an episode that originally ran in October 2019.

If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs and several very different use cases of what they needed to do with that data) and went a different way: they built one analytics data system to serve them all. Procella, the system they built, is the topic of our episode today: by deconstructing the system, we dig into the four motivating uses of this system, the complexity they had to introduce to service all four uses simultaneously, and the impressive engineering that has to go into building something that “just works.”</description>
      <enclosure length="14309283" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/852846283-linear-digressions-procella-youtubes-super-system-for-analytics-data-storage-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/848718820</guid>
      <title>The Data Science Open Source Ecosystem</title>
      <pubDate>Mon, 29 Jun 2020 02:34:48 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-data-science-open-source-ecosystem</link>
      <itunes:duration>00:23:06</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Open source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source projects, however, are disproportionately maintained by a small number of individuals, some of whom are institutionally supported, but many of whom do this maintenance on a purely volunteer basis. The health of the data science ecosystem depends on the support of open source projects, on an individual and institutional level.

https://hdsr.mitpress.mit.edu/pub/xsrt4zs2/release/2</itunes:summary>
      <itunes:subtitle>Open source software is ubiquitous throughout dat…</itunes:subtitle>
      <description>Open source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source projects, however, are disproportionately maintained by a small number of individuals, some of whom are institutionally supported, but many of whom do this maintenance on a purely volunteer basis. The health of the data science ecosystem depends on the support of open source projects, on an individual and institutional level.

https://hdsr.mitpress.mit.edu/pub/xsrt4zs2/release/2</description>
      <enclosure length="11089535" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/848718820-linear-digressions-the-data-science-open-source-ecosystem.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/844436710</guid>
      <title>Rock the ROC Curve</title>
      <pubDate>Sun, 21 Jun 2020 23:34:29 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/rock-the-roc-curve-1</link>
      <itunes:duration>00:15:52</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This is a re-release of an episode that first ran on January 29, 2017.

This  week: everybody's favorite WWII-era classifier metric!  But it's not  just for winning wars, it's a fantastic go-to metric for all your  classifier quality needs.</itunes:summary>
      <itunes:subtitle>This is a re-release of an episode that first ran…</itunes:subtitle>
      <description>This is a re-release of an episode that first ran on January 29, 2017.

This  week: everybody's favorite WWII-era classifier metric!  But it's not  just for winning wars, it's a fantastic go-to metric for all your  classifier quality needs.</description>
      <enclosure length="22864490" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/844436710-linear-digressions-rock-the-roc-curve-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/840291976</guid>
      <title>Criminology and Data Science</title>
      <pubDate>Mon, 15 Jun 2020 01:26:26 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/criminology-and-data-science</link>
      <itunes:duration>00:30:57</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This episode features Zach Drake, a working data scientist and PhD candidate in the Criminology, Law and Society program at George Mason University. Zach specializes in bringing data science methods to studies of criminal behavior, and got in touch after our last episode (about racially complicated recidivism algorithms). Our conversation covers a wide range of topics—common misconceptions around race and crime statistics, how methodologically-driven criminology scholars think about building crime prediction models, and how to think about policy changes when we don’t have a complete understanding of cause and effect in criminology. For the many of us currently re-thinking race and criminal justice, but wanting to be data-driven about it, this conversation with Zach is a must-listen.</itunes:summary>
      <itunes:subtitle>This episode features Zach Drake, a working data …</itunes:subtitle>
      <description>This episode features Zach Drake, a working data scientist and PhD candidate in the Criminology, Law and Society program at George Mason University. Zach specializes in bringing data science methods to studies of criminal behavior, and got in touch after our last episode (about racially complicated recidivism algorithms). Our conversation covers a wide range of topics—common misconceptions around race and crime statistics, how methodologically-driven criminology scholars think about building crime prediction models, and how to think about policy changes when we don’t have a complete understanding of cause and effect in criminology. For the many of us currently re-thinking race and criminal justice, but wanting to be data-driven about it, this conversation with Zach is a must-listen.</description>
      <enclosure length="14860571" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/840291976-linear-digressions-criminology-and-data-science.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/836019931</guid>
      <title>Racism, the criminal justice system, and data science</title>
      <pubDate>Sun, 07 Jun 2020 23:33:53 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/racism-the-criminal-justice-system-and-data-science</link>
      <itunes:duration>00:31:36</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>As protests sweep across the United States in the wake of the killing of George Floyd by a Minneapolis police officer, we take a moment to dig into one of the ways that data science perpetuates and amplifies racism in the American criminal justice system. COMPAS is an algorithm that claims to give a prediction about the likelihood of an offender to re-offend if released, based on the attributes of the individual, and guess what: it shows disparities in the predictions for black and white offenders that would nudge judges toward giving harsher sentences to black individuals. 

We dig into this algorithm a little more deeply, unpacking how different metrics give different pictures into the “fairness” of the predictions and what is causing its racially disparate output (to wit: race is explicitly not an input to the algorithm, and yet the algorithm gives outputs that correlate with race—what gives?) Unfortunately it’s not an open-and-shut case of a tuning parameter being off, or the wrong metric being used: instead the biases in the justice system itself are being captured in the algorithm outputs, in such a way that a self-fulfilling prophecy of harsher treatment for black defendants is all but guaranteed. Like many other things this week, this episode left us thinking about bigger, systemic issues, and why it’s proven so hard for years to fix what’s broken.</itunes:summary>
      <itunes:subtitle>As protests sweep across the United States in the…</itunes:subtitle>
      <description>As protests sweep across the United States in the wake of the killing of George Floyd by a Minneapolis police officer, we take a moment to dig into one of the ways that data science perpetuates and amplifies racism in the American criminal justice system. COMPAS is an algorithm that claims to give a prediction about the likelihood of an offender to re-offend if released, based on the attributes of the individual, and guess what: it shows disparities in the predictions for black and white offenders that would nudge judges toward giving harsher sentences to black individuals. 

We dig into this algorithm a little more deeply, unpacking how different metrics give different pictures into the “fairness” of the predictions and what is causing its racially disparate output (to wit: race is explicitly not an input to the algorithm, and yet the algorithm gives outputs that correlate with race—what gives?) Unfortunately it’s not an open-and-shut case of a tuning parameter being off, or the wrong metric being used: instead the biases in the justice system itself are being captured in the algorithm outputs, in such a way that a self-fulfilling prophecy of harsher treatment for black defendants is all but guaranteed. Like many other things this week, this episode left us thinking about bigger, systemic issues, and why it’s proven so hard for years to fix what’s broken.</description>
      <enclosure length="15172160" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/836019931-linear-digressions-racism-the-criminal-justice-system-and-data-science.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/834371026</guid>
      <title>An interstitial word from Ben</title>
      <pubDate>Fri, 05 Jun 2020 01:38:43 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/an-interstitial-word-from-ben</link>
      <itunes:duration>00:05:59</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>A message from Ben around algorithmic bias, and how our models are sometimes reflections of ourselves.</itunes:summary>
      <itunes:subtitle>A message from Ben around algorithmic bias, and h…</itunes:subtitle>
      <description>A message from Ben around algorithmic bias, and how our models are sometimes reflections of ourselves.</description>
      <enclosure length="8637043" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/834371026-linear-digressions-an-interstitial-word-from-ben.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/831843865</guid>
      <title>Convolutional Neural Networks</title>
      <pubDate>Sun, 31 May 2020 21:46:31 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/convolutional-neural-networks</link>
      <itunes:duration>00:21:55</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This is a re-release of an episode that originally aired on April 1, 2018

If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional networks, and the tricks that make them so good at image tasks.</itunes:summary>
      <itunes:subtitle>This is a re-release of an episode that originall…</itunes:subtitle>
      <description>This is a re-release of an episode that originally aired on April 1, 2018

If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional networks, and the tricks that make them so good at image tasks.</description>
      <enclosure length="10523618" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/831843865-linear-digressions-convolutional-neural-networks.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/827294104</guid>
      <title>Stein's Paradox</title>
      <pubDate>Sun, 24 May 2020 22:21:28 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/steins-paradox-1</link>
      <itunes:duration>00:27:02</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This is a re-release of an episode that was originally released on February 26, 2017.

When you're estimating something about some object that's a member of a  larger group of similar objects (say, the batting average of a baseball  player, who belongs to a baseball team), how should you estimate it: use  measurements of the individual, or get some extra information from the  group?  The James-Stein estimator tells you how to combine individual  and group information make predictions that, taken over the whole group,  are more accurate than if you treated each individual, well,  individually.</itunes:summary>
      <itunes:subtitle>This is a re-release of an episode that was origi…</itunes:subtitle>
      <description>This is a re-release of an episode that was originally released on February 26, 2017.

When you're estimating something about some object that's a member of a  larger group of similar objects (say, the batting average of a baseball  player, who belongs to a baseball team), how should you estimate it: use  measurements of the individual, or get some extra information from the  group?  The James-Stein estimator tells you how to combine individual  and group information make predictions that, taken over the whole group,  are more accurate than if you treated each individual, well,  individually.</description>
      <enclosure length="38944215" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/827294104-linear-digressions-steins-paradox-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/822691957</guid>
      <title>Protecting Individual-Level Census Data with Differential Privacy</title>
      <pubDate>Mon, 18 May 2020 01:49:22 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/protecting-individual-level-census-data-with-differential-privacy</link>
      <itunes:duration>00:21:19</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset. Even for de-identified datasets, there can be ways to re-identify the records or otherwise figure out sensitive personal information. That problem has motivated the study of differential privacy, a set of techniques and definitions for keeping personal information private when datasets are released or used for study. Differential privacy is getting a big boost this year, as it’s being implemented across the 2020 US Census as a way of protecting the privacy of census respondents while still opening up the dataset for research and policy use. When two important topics come together like this, we can’t help but sit up and pay attention.</itunes:summary>
      <itunes:subtitle>The power of finely-grained, individual-level dat…</itunes:subtitle>
      <description>The power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset. Even for de-identified datasets, there can be ways to re-identify the records or otherwise figure out sensitive personal information. That problem has motivated the study of differential privacy, a set of techniques and definitions for keeping personal information private when datasets are released or used for study. Differential privacy is getting a big boost this year, as it’s being implemented across the 2020 US Census as a way of protecting the privacy of census respondents while still opening up the dataset for research and policy use. When two important topics come together like this, we can’t help but sit up and pay attention.</description>
      <enclosure length="10233763" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/822691957-linear-digressions-protecting-individual-level-census-data-with-differential-privacy.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/817892635</guid>
      <title>Causal Trees</title>
      <pubDate>Mon, 11 May 2020 01:34:33 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/causal-trees</link>
      <itunes:duration>00:15:27</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>What do you get when you combine the causal inference needs of econometrics with the data-driven methodology of machine learning? Usually these two don’t go well together (deriving causal conclusions from naive data methods leads to biased answers) but economists Susan Athey and Guido Imbens are on the case. This episodes explores their algorithm for recursively partitioning a dataset to find heterogeneous treatment effects, or for you ML nerds, applying decision trees to causal inference problems. It’s not a free lunch, but for those (like us!) who love crossover topics, causal trees are a smart approach from one field hopping the fence to another.

Relevant links:
https://www.pnas.org/content/113/27/7353</itunes:summary>
      <itunes:subtitle>What do you get when you combine the causal infer…</itunes:subtitle>
      <description>What do you get when you combine the causal inference needs of econometrics with the data-driven methodology of machine learning? Usually these two don’t go well together (deriving causal conclusions from naive data methods leads to biased answers) but economists Susan Athey and Guido Imbens are on the case. This episodes explores their algorithm for recursively partitioning a dataset to find heterogeneous treatment effects, or for you ML nerds, applying decision trees to causal inference problems. It’s not a free lunch, but for those (like us!) who love crossover topics, causal trees are a smart approach from one field hopping the fence to another.

Relevant links:
https://www.pnas.org/content/113/27/7353</description>
      <enclosure length="7420480" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/817892635-linear-digressions-causal-trees.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/813023281</guid>
      <title>The Grammar Of Graphics</title>
      <pubDate>Mon, 04 May 2020 01:12:53 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-grammar-of-graphics</link>
      <itunes:duration>00:35:38</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>You may not realize it consciously, but beautiful visualizations have rules. The rules are often implict and manifest themselves as expectations about how the data is summarized, presented, and annotated so you can quickly extract the information in the underlying data using just visual cues. It’s a bit abstract but very profound, and these principles underlie the ggplot2 package in R that makes famously beautiful plots with minimal code. This episode covers a paper by Hadley Wickham (author of ggplot2, among other R packages) that unpacks the layered approach to graphics taken in ggplot2, and makes clear the assumptions and structure of many familiar data visualizations.</itunes:summary>
      <itunes:subtitle>You may not realize it consciously, but beautiful…</itunes:subtitle>
      <description>You may not realize it consciously, but beautiful visualizations have rules. The rules are often implict and manifest themselves as expectations about how the data is summarized, presented, and annotated so you can quickly extract the information in the underlying data using just visual cues. It’s a bit abstract but very profound, and these principles underlie the ggplot2 package in R that makes famously beautiful plots with minimal code. This episode covers a paper by Hadley Wickham (author of ggplot2, among other R packages) that unpacks the layered approach to graphics taken in ggplot2, and makes clear the assumptions and structure of many familiar data visualizations.</description>
      <enclosure length="17104385" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/813023281-linear-digressions-the-grammar-of-graphics.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/807898102</guid>
      <title>Gaussian Processes</title>
      <pubDate>Mon, 27 Apr 2020 01:33:43 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/gaussian-processes</link>
      <itunes:duration>00:20:55</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It’s pretty common to fit a function to a dataset when you’re a data scientist. But in many cases, it’s not clear what kind of function might be most appropriate—linear? quadratic? sinusoidal? some combination of these, and perhaps others? Gaussian processes introduce a nonparameteric option where you can fit over all the possible types of functions, using the data points in your datasets as constraints on the results that you get (the idea being that, no matter what the “true” underlying function is, it produced the data points you’re trying to fit). What this means is a very flexible, but depending on your parameters not-too-flexible, way to fit complex datasets.

The math underlying GPs gets complex, and the links below contain some excellent visualizations that help make the underlying concepts clearer. Check them out!

Relevant links:
http://katbailey.github.io/post/gaussian-processes-for-dummies/
https://thegradient.pub/gaussian-process-not-quite-for-dummies/
https://distill.pub/2019/visual-exploration-gaussian-processes/</itunes:summary>
      <itunes:subtitle>It’s pretty common to fit a function to a dataset…</itunes:subtitle>
      <description>It’s pretty common to fit a function to a dataset when you’re a data scientist. But in many cases, it’s not clear what kind of function might be most appropriate—linear? quadratic? sinusoidal? some combination of these, and perhaps others? Gaussian processes introduce a nonparameteric option where you can fit over all the possible types of functions, using the data points in your datasets as constraints on the results that you get (the idea being that, no matter what the “true” underlying function is, it produced the data points you’re trying to fit). What this means is a very flexible, but depending on your parameters not-too-flexible, way to fit complex datasets.

The math underlying GPs gets complex, and the links below contain some excellent visualizations that help make the underlying concepts clearer. Check them out!

Relevant links:
http://katbailey.github.io/post/gaussian-processes-for-dummies/
https://thegradient.pub/gaussian-process-not-quite-for-dummies/
https://distill.pub/2019/visual-exploration-gaussian-processes/</description>
      <enclosure length="10043383" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/807898102-linear-digressions-gaussian-processes.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/802709605</guid>
      <title>Keeping ourselves honest when we work with observational healthcare data</title>
      <pubDate>Mon, 20 Apr 2020 02:43:37 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/keeping-ourselves-honest-when-we-work-with-observational-healthcare-data</link>
      <itunes:duration>00:19:08</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The abundance of data in healthcare, and the value we could capture from structuring and analyzing that data, is a huge opportunity. It also presents huge challenges. One of the biggest challenges is how, exactly, to do that structuring and analysis—data scientists working with this data have hundreds or thousands of small, and sometimes large, decisions to make in their day-to-day analysis work. What data should they include in their studies? What method should they use to analyze it? What hyperparameter settings should they explore, and how should they pick a value for their hyperparameters? The thing that’s really difficult here is that, depending on which path they choose among many reasonable options, a data scientist can get really different answers to the underlying question, which makes you wonder how to conclude anything with certainty at all.

The paper for this week’s episode performs a systematic study of many, many different permutations of the questions above on a set of benchmark datasets where the “right” answers are known. Which strategies are most likely to yield the “right” answers? That’s the whole topic of discussion.

Relevant links:
https://hdsr.mitpress.mit.edu/pub/fxz7kr65</itunes:summary>
      <itunes:subtitle>The abundance of data in healthcare, and the valu…</itunes:subtitle>
      <description>The abundance of data in healthcare, and the value we could capture from structuring and analyzing that data, is a huge opportunity. It also presents huge challenges. One of the biggest challenges is how, exactly, to do that structuring and analysis—data scientists working with this data have hundreds or thousands of small, and sometimes large, decisions to make in their day-to-day analysis work. What data should they include in their studies? What method should they use to analyze it? What hyperparameter settings should they explore, and how should they pick a value for their hyperparameters? The thing that’s really difficult here is that, depending on which path they choose among many reasonable options, a data scientist can get really different answers to the underlying question, which makes you wonder how to conclude anything with certainty at all.

The paper for this week’s episode performs a systematic study of many, many different permutations of the questions above on a set of benchmark datasets where the “right” answers are known. Which strategies are most likely to yield the “right” answers? That’s the whole topic of discussion.

Relevant links:
https://hdsr.mitpress.mit.edu/pub/fxz7kr65</description>
      <enclosure length="9188447" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/802709605-linear-digressions-keeping-ourselves-honest-when-we-work-with-observational-healthcare-data.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/797179852</guid>
      <title>Changing our formulation of AI to avoid runaway risks: Interview with Prof. Stuart Russell</title>
      <pubDate>Mon, 13 Apr 2020 01:55:01 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/changing-our-formulation-of-ai-to-avoid-runaway-risks-interview-with-prof-stuart-russell</link>
      <itunes:duration>00:28:58</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>AI is evolving incredibly quickly, and thinking now about where it might go next (and how we as a species and a society should be prepared) is critical. Professor Stuart Russell, an AI expert at UC Berkeley, has a formulation for modifications to AI that we should study and try implementing now to keep it much safer in the long run. Prof. Russell’s new book, “Human Compatible: Artificial Intelligence and the Problem of Control” gives an accessible but deeply thoughtful exploration of why he thinks runaway AI is something we need to be considering seriously now, and what changes in formulation might be a solution. This episodes features Prof. Russell as a special guest, exploring the topics in his book and giving more perspective on the long-term possible futures of AI: both good and bad.

Relevant links:
https://www.penguinrandomhouse.com/books/566677/human-compatible-by-stuart-russell/</itunes:summary>
      <itunes:subtitle>AI is evolving incredibly quickly, and thinking n…</itunes:subtitle>
      <description>AI is evolving incredibly quickly, and thinking now about where it might go next (and how we as a species and a society should be prepared) is critical. Professor Stuart Russell, an AI expert at UC Berkeley, has a formulation for modifications to AI that we should study and try implementing now to keep it much safer in the long run. Prof. Russell’s new book, “Human Compatible: Artificial Intelligence and the Problem of Control” gives an accessible but deeply thoughtful exploration of why he thinks runaway AI is something we need to be considering seriously now, and what changes in formulation might be a solution. This episodes features Prof. Russell as a special guest, exploring the topics in his book and giving more perspective on the long-term possible futures of AI: both good and bad.

Relevant links:
https://www.penguinrandomhouse.com/books/566677/human-compatible-by-stuart-russell/</description>
      <enclosure length="13907265" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/797179852-linear-digressions-changing-our-formulation-of-ai-to-avoid-runaway-risks-interview-with-prof-stuart-russell.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/791823109</guid>
      <title>Putting machine learning into a database</title>
      <pubDate>Mon, 06 Apr 2020 01:51:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/putting-machine-learning-into-a-database</link>
      <itunes:duration>00:24:22</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Most data scientists bounce back and forth regularly between doing analysis in databases using SQL and building and deploying machine learning pipelines in R or python. But if we think ahead a few years, a few visionary researchers are starting to see a world in which the ML pipelines can actually be deployed inside the database. Why? One strong advantage for databases is they have built-in features for data governance, including things like permissioning access and tracking the provenance of data. Adding machine learning as another thing you can do in a database means that, potentially, these enterprise-grade features will be available for ML models too, which will make them much more widely accepted across enterprises with tight IT policies. The papers this week articulate the gap between enterprise needs and current ML infrastructure, how ML in a database could be a way to knit the two closer together, and a proof-of-concept that ML in a database can actually work.

Relevant links:
https://blog.acolyer.org/2020/02/19/ten-year-egml-predictions/
https://blog.acolyer.org/2020/02/21/extending-relational-query-processing/</itunes:summary>
      <itunes:subtitle>Most data scientists bounce back and forth regula…</itunes:subtitle>
      <description>Most data scientists bounce back and forth regularly between doing analysis in databases using SQL and building and deploying machine learning pipelines in R or python. But if we think ahead a few years, a few visionary researchers are starting to see a world in which the ML pipelines can actually be deployed inside the database. Why? One strong advantage for databases is they have built-in features for data governance, including things like permissioning access and tracking the provenance of data. Adding machine learning as another thing you can do in a database means that, potentially, these enterprise-grade features will be available for ML models too, which will make them much more widely accepted across enterprises with tight IT policies. The papers this week articulate the gap between enterprise needs and current ML infrastructure, how ML in a database could be a way to knit the two closer together, and a proof-of-concept that ML in a database can actually work.

Relevant links:
https://blog.acolyer.org/2020/02/19/ten-year-egml-predictions/
https://blog.acolyer.org/2020/02/21/extending-relational-query-processing/</description>
      <enclosure length="11701427" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/791823109-linear-digressions-putting-machine-learning-into-a-database.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/786341650</guid>
      <title>The work-from-home episode</title>
      <pubDate>Sun, 29 Mar 2020 22:23:42 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-work-from-home-episode</link>
      <itunes:duration>00:29:06</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Many of us have the privilege of working from home right now, in an effort to keep ourselves and our family safe and slow the transmission of covid-19. But working from home is an adjustment for many of us, and can hold some challenges compared to coming in to the office every day. This episode explores this a little bit, informally, as we compare our new work-from-home setups and reflect on what’s working well and what we’re finding challenging.</itunes:summary>
      <itunes:subtitle>Many of us have the privilege of working from hom…</itunes:subtitle>
      <description>Many of us have the privilege of working from home right now, in an effort to keep ourselves and our family safe and slow the transmission of covid-19. But working from home is an adjustment for many of us, and can hold some challenges compared to coming in to the office every day. This episode explores this a little bit, informally, as we compare our new work-from-home setups and reflect on what’s working well and what we’re finding challenging.</description>
      <enclosure length="13974080" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/786341650-linear-digressions-the-work-from-home-episode.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/781265842</guid>
      <title>Understanding Covid-19 transmission: what the data suggests about how the disease spreads</title>
      <pubDate>Mon, 23 Mar 2020 01:03:34 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/understanding-covid-19-transmission-what-the-data-suggests-about-how-the-disease-spreads</link>
      <itunes:duration>00:25:25</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Covid-19 is turning the world upside down right now. One thing that’s extremely important to understand, in order to fight it as effectively as possible, is how the virus spreads and especially how much of the spread of the disease comes from carriers who are experiencing no or mild symptoms but are contagious anyway. This episode digs into the epidemiological model that was published in Science this week—this model finds that the data suggests that the majority of carriers of the coronavirus, 80-90%, do not have a detected disease. This has big implications for the importance of social distancing of a way to get the pandemic under control and explains why a more comprehensive testing program is critical for the United States. 

Also, in lighter news, Katie (a native of Dayton, Ohio) lays a data-driven claim for just declaring the University of Dayton flyers to be the 2020 NCAA College Basketball champions.

Relevant links:
https://science.sciencemag.org/content/early/2020/03/13/science.abb3221</itunes:summary>
      <itunes:subtitle>Covid-19 is turning the world upside down right n…</itunes:subtitle>
      <description>Covid-19 is turning the world upside down right now. One thing that’s extremely important to understand, in order to fight it as effectively as possible, is how the virus spreads and especially how much of the spread of the disease comes from carriers who are experiencing no or mild symptoms but are contagious anyway. This episode digs into the epidemiological model that was published in Science this week—this model finds that the data suggests that the majority of carriers of the coronavirus, 80-90%, do not have a detected disease. This has big implications for the importance of social distancing of a way to get the pandemic under control and explains why a more comprehensive testing program is critical for the United States. 

Also, in lighter news, Katie (a native of Dayton, Ohio) lays a data-driven claim for just declaring the University of Dayton flyers to be the 2020 NCAA College Basketball champions.

Relevant links:
https://science.sciencemag.org/content/early/2020/03/13/science.abb3221</description>
      <enclosure length="12205904" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/781265842-linear-digressions-understanding-covid-19-transmission-what-the-data-suggests-about-how-the-disease-spreads.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/776732974</guid>
      <title>Network effects re-release: when the power of a public health measure lies in widespread adoption</title>
      <pubDate>Sun, 15 Mar 2020 22:43:38 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/network-effects-re-release-when-the-power-of-a-public-health-measure-lies-in-widespread-adoption</link>
      <itunes:duration>00:26:40</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week’s episode is a re-release of a recent episode, which we don’t usually do but it seems important for understanding what we can all do to slow the spread of covid-19. In brief, public health measures for infectious diseases get most of their effectiveness from their widespread adoption: most of the protection you get from a vaccine, for example, comes from all the other people who also got the vaccine.

That’s why measures like social distancing are so important right now: even if you’re not in a high-risk group for covid-19, you should still stay home and avoid in-person socializing because your good behavior lowers the risk for those who are in high-risk groups. If we all take these kinds of measures, the risk lowers dramatically. So stay home, work remotely if you can, avoid physical contact with others, and do your part to manage this crisis. We’re all in this together.</itunes:summary>
      <itunes:subtitle>This week’s episode is a re-release of a recent e…</itunes:subtitle>
      <description>This week’s episode is a re-release of a recent episode, which we don’t usually do but it seems important for understanding what we can all do to slow the spread of covid-19. In brief, public health measures for infectious diseases get most of their effectiveness from their widespread adoption: most of the protection you get from a vaccine, for example, comes from all the other people who also got the vaccine.

That’s why measures like social distancing are so important right now: even if you’re not in a high-risk group for covid-19, you should still stay home and avoid in-person socializing because your good behavior lowers the risk for those who are in high-risk groups. If we all take these kinds of measures, the risk lowers dramatically. So stay home, work remotely if you can, avoid physical contact with others, and do your part to manage this crisis. We’re all in this together.</description>
      <enclosure length="12801077" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/776732974-linear-digressions-network-effects-re-release-when-the-power-of-a-public-health-measure-lies-in-widespread-adoption.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/772966369</guid>
      <title>Causal inference when you can't experiment: difference-in-differences and synthetic controls</title>
      <pubDate>Mon, 09 Mar 2020 01:39:19 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/causal-inference-when-you-cant-experiment-difference-in-differences-and-synthetic-controls</link>
      <itunes:duration>00:20:48</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When you need to untangle cause and effect, but you can’t run an experiment, it’s time to get creative. This episode covers difference in differences and synthetic controls, two observational causal inference techniques that researchers have used to understand causality in complex real-world situations.</itunes:summary>
      <itunes:subtitle>When you need to untangle cause and effect, but y…</itunes:subtitle>
      <description>When you need to untangle cause and effect, but you can’t run an experiment, it’s time to get creative. This episode covers difference in differences and synthetic controls, two observational causal inference techniques that researchers have used to understand causality in complex real-world situations.</description>
      <enclosure length="9986540" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/772966369-linear-digressions-causal-inference-when-you-cant-experiment-difference-in-differences-and-synthetic-controls.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/769281757</guid>
      <title>Better know a distribution: the Poisson distribution</title>
      <pubDate>Mon, 02 Mar 2020 02:55:28 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/better-know-a-distribution-the-poisson-distribution-1</link>
      <itunes:duration>00:31:51</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This is a re-release of an episode that originally ran on October 21, 2018.

The Poisson distribution is a probability distribution function used to for events that happen in time or space. It’s super handy because it’s pretty simple to use and is applicable for tons of things—there are a lot of interesting processes that boil down to “events that happen in time or space.” This episode is a quick introduction to the distribution, and then a focus on two of our favorite everyday applications: using the Poisson distribution to identify supernovas and study army deaths from horse kicks.</itunes:summary>
      <itunes:subtitle>This is a re-release of an episode that originall…</itunes:subtitle>
      <description>This is a re-release of an episode that originally ran on October 21, 2018.

The Poisson distribution is a probability distribution function used to for events that happen in time or space. It’s super handy because it’s pretty simple to use and is applicable for tons of things—there are a lot of interesting processes that boil down to “events that happen in time or space.” This episode is a quick introduction to the distribution, and then a focus on two of our favorite everyday applications: using the Poisson distribution to identify supernovas and study army deaths from horse kicks.</description>
      <enclosure length="15293995" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/769281757-linear-digressions-better-know-a-distribution-the-poisson-distribution-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/765465880</guid>
      <title>The Lottery Ticket Hypothesis</title>
      <pubDate>Sun, 23 Feb 2020 23:03:25 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-lottery-ticket-hypothesis</link>
      <itunes:duration>00:19:45</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Recent research into neural networks reveals that sometimes, not all parts of the neural net are equally responsible  for the performance of the network overall. Instead, it seems like (in some neural nets, at least) there are smaller subnetworks present where most of the predictive power resides. The fascinating thing is that, for some of these subnetworks (so-called “winning lottery tickets”), it’s not the training process that makes them good at their classification or regression tasks: they just happened to be initialized in a way that was very effective. This changes the way we think about what training might be doing, in a pretty fundamental way. Sometimes, instead of crafting a good fit from wholecloth, training might be finding the parts of the network that always had predictive power to begin with, and isolating and strengthening them. This research is pretty recent, having only come to prominence in the last year, but nonetheless challenges our notions about what it means to train a machine learning model.</itunes:summary>
      <itunes:subtitle>Recent research into neural networks reveals that…</itunes:subtitle>
      <description>Recent research into neural networks reveals that sometimes, not all parts of the neural net are equally responsible  for the performance of the network overall. Instead, it seems like (in some neural nets, at least) there are smaller subnetworks present where most of the predictive power resides. The fascinating thing is that, for some of these subnetworks (so-called “winning lottery tickets”), it’s not the training process that makes them good at their classification or regression tasks: they just happened to be initialized in a way that was very effective. This changes the way we think about what training might be doing, in a pretty fundamental way. Sometimes, instead of crafting a good fit from wholecloth, training might be finding the parts of the network that always had predictive power to begin with, and isolating and strengthening them. This research is pretty recent, having only come to prominence in the last year, but nonetheless challenges our notions about what it means to train a machine learning model.</description>
      <enclosure length="9480183" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/765465880-linear-digressions-the-lottery-ticket-hypothesis.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/761862310</guid>
      <title>Interesting technical issues prompted by GDPR and data privacy concerns</title>
      <pubDate>Mon, 17 Feb 2020 01:50:20 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/interesting-technical-issues-prompted-by-gdpr-and-data-privacy-concerns</link>
      <itunes:duration>00:20:26</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Data privacy is a huge issue right now, after years of consumers and users gaining awareness of just how much of their personal data is out there and how companies are using it. Policies like GDPR are imposing more stringent rules on who can use what data for what purposes, with an end goal of giving consumers more control and privacy around their data. This episode digs into this topic, but not from a security or legal perspective—this week, we talk about some of the interesting technical challenges introduced by a simple idea: a company should remove a user’s data from their database when that user asks to be removed. We talk about two topics, namely using Bloom filters to efficiently find records in a database (and what Bloom filters are, for that matter) and types of machine learning algorithms that can un-learn their training data when it contains records that need to be deleted.</itunes:summary>
      <itunes:subtitle>Data privacy is a huge issue right now, after yea…</itunes:subtitle>
      <description>Data privacy is a huge issue right now, after years of consumers and users gaining awareness of just how much of their personal data is out there and how companies are using it. Policies like GDPR are imposing more stringent rules on who can use what data for what purposes, with an end goal of giving consumers more control and privacy around their data. This episode digs into this topic, but not from a security or legal perspective—this week, we talk about some of the interesting technical challenges introduced by a simple idea: a company should remove a user’s data from their database when that user asks to be removed. We talk about two topics, namely using Bloom filters to efficiently find records in a database (and what Bloom filters are, for that matter) and types of machine learning algorithms that can un-learn their training data when it contains records that need to be deleted.</description>
      <enclosure length="9808072" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/761862310-linear-digressions-interesting-technical-issues-prompted-by-gdpr-and-data-privacy-concerns.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/757967074</guid>
      <title>Thinking of data science initiatives as innovation initiatives</title>
      <pubDate>Mon, 10 Feb 2020 01:10:21 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/thinking-of-data-science-initiatives-as-innovation-initiatives</link>
      <itunes:duration>00:17:27</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Put yourself in the shoes of an executive at a big legacy company for a moment, operating in virtually any market vertical: you’re constantly hearing that data science is revolutionizing the world and the firms that survive and thrive in the coming years are those that execute on a data strategy. What does this mean for your company? How can you best guide your established firm through a successful transition to becoming data-driven? How do you balance the momentum your firm has right now, and the need to support all your current products, customers and operations, against a new and relatively unknown future?

If you’re working as a data scientist at a mature and well-established company, these are the worries on the mind of your boss’s boss’s boss. The worries on your mind may be similar: you’re trying to understand where your work fits into the bigger picture, you need to break down silos, you’re often running into cultural headwinds created by colleagues who don’t understand or trust your work. Congratulations, you’re in the midst of a classic set of challenges encountered by innovation initiatives everywhere. Harvard Business School professor Clayton Christensen wrote a classic business book (The Innovator’s Dilemma) explaining the paradox of trying to innovate in established companies, and why the structure and incentives of those companies almost guarantee an uphill climb to innovate. This week’s episode breaks down the innovator’s dilemma argument, and what it means for data scientists working in mature companies trying to become more data-centric.</itunes:summary>
      <itunes:subtitle>Put yourself in the shoes of an executive at a bi…</itunes:subtitle>
      <description>Put yourself in the shoes of an executive at a big legacy company for a moment, operating in virtually any market vertical: you’re constantly hearing that data science is revolutionizing the world and the firms that survive and thrive in the coming years are those that execute on a data strategy. What does this mean for your company? How can you best guide your established firm through a successful transition to becoming data-driven? How do you balance the momentum your firm has right now, and the need to support all your current products, customers and operations, against a new and relatively unknown future?

If you’re working as a data scientist at a mature and well-established company, these are the worries on the mind of your boss’s boss’s boss. The worries on your mind may be similar: you’re trying to understand where your work fits into the bigger picture, you need to break down silos, you’re often running into cultural headwinds created by colleagues who don’t understand or trust your work. Congratulations, you’re in the midst of a classic set of challenges encountered by innovation initiatives everywhere. Harvard Business School professor Clayton Christensen wrote a classic business book (The Innovator’s Dilemma) explaining the paradox of trying to innovate in established companies, and why the structure and incentives of those companies almost guarantee an uphill climb to innovate. This week’s episode breaks down the innovator’s dilemma argument, and what it means for data scientists working in mature companies trying to become more data-centric.</description>
      <enclosure length="8382204" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/757967074-linear-digressions-thinking-of-data-science-initiatives-as-innovation-initiatives.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/754030393</guid>
      <title>Building a curriculum for educating data scientists: Interview with Prof. Xiao-Li Meng</title>
      <pubDate>Sun, 02 Feb 2020 23:36:23 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/xiao-li-2-produced</link>
      <itunes:duration>00:31:36</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>As demand for data scientists grows, and it remains as relevant as ever that practicing data scientists have a solid methodological and technical foundation for their work, higher education institutions are coming to terms with what’s required to educate the next cohorts of data scientists. The heterogeneity and speed of the field makes it challenging for even the most talented and dedicated educators to know what a data science education “should” look like.

This doesn’t faze Xiao-Li Meng, Professor of Statistics at Harvard University and founding Editor-in-Chief of the Harvard Data Science Review. He’s our interview guest in this episode, talking about the pedagogically distinct classes of data science and how he thinks about designing curricula for making anyone more data literate. From new initiatives in data science to dealing with data science FOMO, this wide-ranging conversation with a leading scholar gives us a lot to think about.

Relevant links: 
https://hdsr.mitpress.mit.edu/</itunes:summary>
      <itunes:subtitle>As demand for data scientists grows, and it remai…</itunes:subtitle>
      <description>As demand for data scientists grows, and it remains as relevant as ever that practicing data scientists have a solid methodological and technical foundation for their work, higher education institutions are coming to terms with what’s required to educate the next cohorts of data scientists. The heterogeneity and speed of the field makes it challenging for even the most talented and dedicated educators to know what a data science education “should” look like.

This doesn’t faze Xiao-Li Meng, Professor of Statistics at Harvard University and founding Editor-in-Chief of the Harvard Data Science Review. He’s our interview guest in this episode, talking about the pedagogically distinct classes of data science and how he thinks about designing curricula for making anyone more data literate. From new initiatives in data science to dealing with data science FOMO, this wide-ranging conversation with a leading scholar gives us a lot to think about.

Relevant links: 
https://hdsr.mitpress.mit.edu/</description>
      <enclosure length="15175295" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/754030393-linear-digressions-xiao-li-2-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/750029002</guid>
      <title>Running experiments when there are network effects</title>
      <pubDate>Mon, 27 Jan 2020 00:13:52 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/running-experiments-when-there-are-network-effects</link>
      <itunes:duration>00:24:45</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Traditional A/B tests assume that whether or not one person got a treatment has no effect on the experiment outcome for another person. But that’s not a safe assumption, especially when there are network effects (like in almost any social context, for instance!) SUTVA, or the stable treatment unit value assumption, is a big phrase for this assumption and violations of SUTVA make for some pretty interesting experiment designs. From news feeds in LinkedIn to disentangling herd immunity from individual immunity in vaccine studies, indirect (i.e. network) effects in experiments can be just as big as, or even bigger than, direct (i.e. individual effects). And this is what we talk about this week on the podcast.

Relevant links:
http://hanj.cs.illinois.edu/pdf/www15_hgui.pdf
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2600548/pdf/nihms-73860.pdf</itunes:summary>
      <itunes:subtitle>Traditional A/B tests assume that whether or not …</itunes:subtitle>
      <description>Traditional A/B tests assume that whether or not one person got a treatment has no effect on the experiment outcome for another person. But that’s not a safe assumption, especially when there are network effects (like in almost any social context, for instance!) SUTVA, or the stable treatment unit value assumption, is a big phrase for this assumption and violations of SUTVA make for some pretty interesting experiment designs. From news feeds in LinkedIn to disentangling herd immunity from individual immunity in vaccine studies, indirect (i.e. network) effects in experiments can be just as big as, or even bigger than, direct (i.e. individual effects). And this is what we talk about this week on the podcast.

Relevant links:
http://hanj.cs.illinois.edu/pdf/www15_hgui.pdf
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2600548/pdf/nihms-73860.pdf</description>
      <enclosure length="11885329" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/750029002-linear-digressions-running-experiments-when-there-are-network-effects.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/746201707</guid>
      <title>Zeroing in on what makes adversarial examples possible</title>
      <pubDate>Mon, 20 Jan 2020 02:41:20 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/zeroing-in-on-what-makes-adversarial-examples-possible</link>
      <itunes:duration>00:22:51</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Adversarial examples are really, really weird: pictures of penguins that get classified with high certainty by machine learning algorithms as drumsets, or random noise labeled as pandas, or any one of an infinite number of mistakes in labeling data that humans would never make but computers make with joyous abandon. What gives? A compelling new argument makes the case that it’s not the algorithms so much as the features in the datasets that holds the clue. This week’s episode goes through several papers pushing our collective understanding of adversarial examples, and giving us clues to what makes these counterintuitive cases possible.

Relevant links:
https://arxiv.org/pdf/1905.02175.pdf
https://arxiv.org/pdf/1805.12152.pdf
https://distill.pub/2019/advex-bugs-discussion/
https://arxiv.org/pdf/1911.02508.pdf</itunes:summary>
      <itunes:subtitle>Adversarial examples are really, really weird: pi…</itunes:subtitle>
      <description>Adversarial examples are really, really weird: pictures of penguins that get classified with high certainty by machine learning algorithms as drumsets, or random noise labeled as pandas, or any one of an infinite number of mistakes in labeling data that humans would never make but computers make with joyous abandon. What gives? A compelling new argument makes the case that it’s not the algorithms so much as the features in the datasets that holds the clue. This week’s episode goes through several papers pushing our collective understanding of adversarial examples, and giving us clues to what makes these counterintuitive cases possible.

Relevant links:
https://arxiv.org/pdf/1905.02175.pdf
https://arxiv.org/pdf/1805.12152.pdf
https://distill.pub/2019/advex-bugs-discussion/
https://arxiv.org/pdf/1911.02508.pdf</description>
      <enclosure length="10969998" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/746201707-linear-digressions-zeroing-in-on-what-makes-adversarial-examples-possible.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/742527037</guid>
      <title>Unsupervised Dimensionality Reduction: UMAP vs t-SNE</title>
      <pubDate>Mon, 13 Jan 2020 00:53:19 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/unsupervised-dimensionality-reduction-umap-vs-t-sne</link>
      <itunes:duration>00:29:34</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Dimensionality reduction redux: this episode covers UMAP, an unsupervised algorithm designed to make high-dimensional data easier to visualize, cluster, etc. It’s similar to t-SNE but has some advantages. This episode gives a quick recap of t-SNE, especially the connection it shares with information theory, then gets into how UMAP is different (many say better).

Between the time we recorded and released this episode, an interesting argument made the rounds on the internet that UMAP’s advantages largely stem from good initialization, not from advantages inherent in the algorithm. We don’t cover that argument here obviously, because it wasn’t out there when we were recording, but you can find a link to the paper below.

Relevant links:
https://pair-code.github.io/understanding-umap/
https://www.biorxiv.org/content/10.1101/2019.12.19.877522v1</itunes:summary>
      <itunes:subtitle>Dimensionality reduction redux: this episode cove…</itunes:subtitle>
      <description>Dimensionality reduction redux: this episode covers UMAP, an unsupervised algorithm designed to make high-dimensional data easier to visualize, cluster, etc. It’s similar to t-SNE but has some advantages. This episode gives a quick recap of t-SNE, especially the connection it shares with information theory, then gets into how UMAP is different (many say better).

Between the time we recorded and released this episode, an interesting argument made the rounds on the internet that UMAP’s advantages largely stem from good initialization, not from advantages inherent in the algorithm. We don’t cover that argument here obviously, because it wasn’t out there when we were recording, but you can find a link to the paper below.

Relevant links:
https://pair-code.github.io/understanding-umap/
https://www.biorxiv.org/content/10.1101/2019.12.19.877522v1</description>
      <enclosure length="14192881" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/742527037-linear-digressions-unsupervised-dimensionality-reduction-umap-vs-t-sne.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/738921592</guid>
      <title>Data scientists: beware of simple metrics</title>
      <pubDate>Sun, 05 Jan 2020 22:54:57 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-scientists-beware-of-simple-metrics</link>
      <itunes:duration>00:24:47</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Picking a metric for a problem means defining how you’ll measure success in solving that problem. Which sounds important, because it is, but oftentimes new data scientists only get experience with a few kinds of metrics when they’re learning and those metrics have real shortcomings when you think about what they tell you, or don’t, about how well you’re really solving the underlying problem. This episode takes a step back and says, what are some metrics that are popular with data scientists, why are they popular, and what are their shortcomings when it comes to the real world? There’s been a lot of great thinking and writing recently on this topic, and we cover a lot of that discussion along with some perspective of our own.

Relevant links:
https://www.fast.ai/2019/09/24/metrics/
https://arxiv.org/abs/1909.12475
https://medium.com/shoprunner/evaluating-classification-models-1-ff0730801f17
https://hbr.org/2019/09/dont-let-metrics-undermine-your-business</itunes:summary>
      <itunes:subtitle>Picking a metric for a problem means defining how…</itunes:subtitle>
      <description>Picking a metric for a problem means defining how you’ll measure success in solving that problem. Which sounds important, because it is, but oftentimes new data scientists only get experience with a few kinds of metrics when they’re learning and those metrics have real shortcomings when you think about what they tell you, or don’t, about how well you’re really solving the underlying problem. This episode takes a step back and says, what are some metrics that are popular with data scientists, why are they popular, and what are their shortcomings when it comes to the real world? There’s been a lot of great thinking and writing recently on this topic, and we cover a lot of that discussion along with some perspective of our own.

Relevant links:
https://www.fast.ai/2019/09/24/metrics/
https://arxiv.org/abs/1909.12475
https://medium.com/shoprunner/evaluating-classification-models-1-ff0730801f17
https://hbr.org/2019/09/dont-let-metrics-undermine-your-business</description>
      <enclosure length="11902465" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/738921592-linear-digressions-data-scientists-beware-of-simple-metrics.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/735626218</guid>
      <title>Communicating data science, from academia to industry</title>
      <pubDate>Mon, 30 Dec 2019 01:53:14 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/communicating-data-science-from-academia-to-industry</link>
      <itunes:duration>00:26:15</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>For something as multifaceted and ill-defined as data science, communication and sharing best practices across the field can be extremely valuable but also extremely, well, multifaceted and ill-defined. That doesn’t bother our guest today, Prof. Xiao-Li Meng of the Harvard statistics department, who is leading an effort to start an open-access Data Science Review journal in the model of the Harvard Business Review or Law Review. This episode features Xiao-Li talking about the need he sees for a central gathering place for data scientists in academia, industry, and government to come together to learn from (and teach!) each other. 

Relevant links:
https://hdsr.mitpress.mit.edu/</itunes:summary>
      <itunes:subtitle>For something as multifaceted and ill-defined as …</itunes:subtitle>
      <description>For something as multifaceted and ill-defined as data science, communication and sharing best practices across the field can be extremely valuable but also extremely, well, multifaceted and ill-defined. That doesn’t bother our guest today, Prof. Xiao-Li Meng of the Harvard statistics department, who is leading an effort to start an open-access Data Science Review journal in the model of the Harvard Business Review or Law Review. This episode features Xiao-Li talking about the need he sees for a central gathering place for data scientists in academia, industry, and government to come together to learn from (and teach!) each other. 

Relevant links:
https://hdsr.mitpress.mit.edu/</description>
      <enclosure length="12602756" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/735626218-linear-digressions-communicating-data-science-from-academia-to-industry.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/732587260</guid>
      <title>Optimizing for the short-term vs. the long-term</title>
      <pubDate>Mon, 23 Dec 2019 02:50:53 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/optimizing-for-the-short-term-vs-the-long-term</link>
      <itunes:duration>00:19:24</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When data scientists run experiments, like A/B tests, it’s really easy to plan on a period of a few days to a few weeks for collecting data. The thing is, the change that’s being evaluated might have effects that last a lot longer than a few days or a few weeks—having a big sale might increase sales this week, but doing that repeatedly will teach customers to wait until there’s a sale and never buy anything at full price, which could ultimately drive down revenue in the long term. Increasing the volume of ads on a website might lead people to click on more ads in the short term, but in the long term they’ll be more likely to visually block the ads out and learn to ignore them. But these long-term effects aren’t apparent from the short-term experiment, so this week we’re talking about a paper from Google research that confronts the short-term vs. long-term tradeoff, and how to measure long-term effects from short-term experiments. 

Relevant links:
https://research.google/pubs/pub43887/</itunes:summary>
      <itunes:subtitle>When data scientists run experiments, like A/B te…</itunes:subtitle>
      <description>When data scientists run experiments, like A/B tests, it’s really easy to plan on a period of a few days to a few weeks for collecting data. The thing is, the change that’s being evaluated might have effects that last a lot longer than a few days or a few weeks—having a big sale might increase sales this week, but doing that repeatedly will teach customers to wait until there’s a sale and never buy anything at full price, which could ultimately drive down revenue in the long term. Increasing the volume of ads on a website might lead people to click on more ads in the short term, but in the long term they’ll be more likely to visually block the ads out and learn to ignore them. But these long-term effects aren’t apparent from the short-term experiment, so this week we’re talking about a paper from Google research that confronts the short-term vs. long-term tradeoff, and how to measure long-term effects from short-term experiments. 

Relevant links:
https://research.google/pubs/pub43887/</description>
      <enclosure length="9314880" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/732587260-linear-digressions-optimizing-for-the-short-term-vs-the-long-term.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/728446654</guid>
      <title>Interview with Prof. Andrew Lo, on using data science to inform complex business decisions</title>
      <pubDate>Mon, 16 Dec 2019 03:15:09 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/interview-with-prof-andrew-lo-on-using-data-science-to-inform-complex-business-decisions</link>
      <itunes:duration>00:27:46</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This episode features Prof. Andrew Lo, the author of a paper that we discussed recently on Linear Digressions, in which Prof. Lo uses data to predict whether a medicine in the development pipeline will eventually go on to win FDA approval. This episode gets into the story behind that paper: how the approval prospects of different drugs inform the investment decisions of pharma companies, how to stitch together siloed and incomplete datasts to form a coherent picture, and how the academics building some of these models think about when and how their work can make it out of academia and into industry. Professor Lo is an expert in business (he teaches at the MIT Sloan School of Management) and work like his shows how data science can open up new ways of doing business.

Relevant links:
https://hdsr.mitpress.mit.edu/pub/ct67j043</itunes:summary>
      <itunes:subtitle>This episode features Prof. Andrew Lo, the author…</itunes:subtitle>
      <description>This episode features Prof. Andrew Lo, the author of a paper that we discussed recently on Linear Digressions, in which Prof. Lo uses data to predict whether a medicine in the development pipeline will eventually go on to win FDA approval. This episode gets into the story behind that paper: how the approval prospects of different drugs inform the investment decisions of pharma companies, how to stitch together siloed and incomplete datasts to form a coherent picture, and how the academics building some of these models think about when and how their work can make it out of academia and into industry. Professor Lo is an expert in business (he teaches at the MIT Sloan School of Management) and work like his shows how data science can open up new ways of doing business.

Relevant links:
https://hdsr.mitpress.mit.edu/pub/ct67j043</description>
      <enclosure length="13335229" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/728446654-linear-digressions-interview-with-prof-andrew-lo-on-using-data-science-to-inform-complex-business-decisions.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/724707592</guid>
      <title>Using machine learning to predict drug approvals</title>
      <pubDate>Sun, 08 Dec 2019 22:56:05 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/using-machine-learning-to-predict-drug-approvals</link>
      <itunes:duration>00:25:00</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>One of the hottest areas in data science and machine learning right now is healthcare: the size of the healthcare industry, the amount of data it generates, and the myriad improvements possible in the healthcare system lay the groundwork for compelling, innovative new data initiatives. One spot that drives much of the cost of medicine is the riskiness of developing new drugs: drug trials can cost hundreds of millions of dollars to run and, especially given that numerous medicines end up failing to get approval from the FDA, pharmaceutical companies want to have as much insight as possible about whether a drug is more or less likely to make it through clinical trials and on to approval. Professor Andrew Lo and collaborators at MIT Sloan School of Management is taking a look at this prediction task using machine learning, and has an article in the Harvard Data Science Review showing what they were able to find. It’s a fascinating example of how data science can be used to address business needs in creative but very targeted and effective ways.

Relevant links:
https://hdsr.mitpress.mit.edu/pub/ct67j043</itunes:summary>
      <itunes:subtitle>One of the hottest areas in data science and mach…</itunes:subtitle>
      <description>One of the hottest areas in data science and machine learning right now is healthcare: the size of the healthcare industry, the amount of data it generates, and the myriad improvements possible in the healthcare system lay the groundwork for compelling, innovative new data initiatives. One spot that drives much of the cost of medicine is the riskiness of developing new drugs: drug trials can cost hundreds of millions of dollars to run and, especially given that numerous medicines end up failing to get approval from the FDA, pharmaceutical companies want to have as much insight as possible about whether a drug is more or less likely to make it through clinical trials and on to approval. Professor Andrew Lo and collaborators at MIT Sloan School of Management is taking a look at this prediction task using machine learning, and has an article in the Harvard Data Science Review showing what they were able to find. It’s a fascinating example of how data science can be used to address business needs in creative but very targeted and effective ways.

Relevant links:
https://hdsr.mitpress.mit.edu/pub/ct67j043</description>
      <enclosure length="12005074" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/724707592-linear-digressions-using-machine-learning-to-predict-drug-approvals.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/721426411</guid>
      <title>Facial recognition, society, and the law</title>
      <pubDate>Mon, 02 Dec 2019 03:14:14 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/facial-recognition-society-and-the-law-1</link>
      <itunes:duration>00:43:09</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Facial recognition being used in everyday life seemed far-off not too long ago. Increasingly, it’s being used and advanced widely and with increasing speed, which means that our technical capabilities are starting to outpace (if they haven’t already) our consensus as a society about what is acceptable in facial recognition and what isn’t. The threats to privacy, fairness, and freedom are real, and Microsoft has become one of the first large companies using this technology to speak out in specific support of its regulation through legislation. Their arguments are interesting, provocative, and even if you don’t agree with every point they make or harbor some skepticism, there’s a lot to think about in what they’re saying.</itunes:summary>
      <itunes:subtitle>Facial recognition being used in everyday life se…</itunes:subtitle>
      <description>Facial recognition being used in everyday life seemed far-off not too long ago. Increasingly, it’s being used and advanced widely and with increasing speed, which means that our technical capabilities are starting to outpace (if they haven’t already) our consensus as a society about what is acceptable in facial recognition and what isn’t. The threats to privacy, fairness, and freedom are real, and Microsoft has become one of the first large companies using this technology to speak out in specific support of its regulation through legislation. Their arguments are interesting, provocative, and even if you don’t agree with every point they make or harbor some skepticism, there’s a lot to think about in what they’re saying.</description>
      <enclosure length="20714090" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/721426411-linear-digressions-facial-recognition-society-and-the-law-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/717988873</guid>
      <title>Lessons learned from doing data science, at scale, in industry</title>
      <pubDate>Mon, 25 Nov 2019 00:45:42 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/lessons-learned-from-doing-data-science-at-scale-in-industry</link>
      <itunes:duration>00:28:00</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you’ve taken a machine learning class, or read up on A/B tests, you likely have a decent grounding in the theoretical pillars of data science. But if you’re in a position to have actually built lots of models or run lots of experiments, there’s almost certainly a bunch of extra “street smarts” insights you’ve had that go beyond the “books smarts” of more academic studies. The data scientists at Booking.com, who run build models and experiments constantly, have written a paper that bridges the gap and talks about what non-obvious things they’ve learned from that practice. In this episode we read and digest that paper, talking through the gotchas that they don’t always teach in a classroom but that make data science tricky and interesting in the real world.

Relevant links:
https://www.kdd.org/kdd2019/accepted-papers/view/150-successful-machine-learning-models-6-lessons-learned-at-booking.com</itunes:summary>
      <itunes:subtitle>If you’ve taken a machine learning class, or read…</itunes:subtitle>
      <description>If you’ve taken a machine learning class, or read up on A/B tests, you likely have a decent grounding in the theoretical pillars of data science. But if you’re in a position to have actually built lots of models or run lots of experiments, there’s almost certainly a bunch of extra “street smarts” insights you’ve had that go beyond the “books smarts” of more academic studies. The data scientists at Booking.com, who run build models and experiments constantly, have written a paper that bridges the gap and talks about what non-obvious things they’ve learned from that practice. In this episode we read and digest that paper, talking through the gotchas that they don’t always teach in a classroom but that make data science tricky and interesting in the real world.

Relevant links:
https://www.kdd.org/kdd2019/accepted-papers/view/150-successful-machine-learning-models-6-lessons-learned-at-booking.com</description>
      <enclosure length="13441600" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/717988873-linear-digressions-lessons-learned-from-doing-data-science-at-scale-in-industry.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/714388189</guid>
      <title>Varsity A/B Testing</title>
      <pubDate>Mon, 18 Nov 2019 02:09:46 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/varsity-ab-testing</link>
      <itunes:duration>00:36:00</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When you want to understand if doing something causes something else to happen, like if a change to a website causes and dip or rise in downstream conversions, the gold standard analysis method is to use randomized controlled trials. Once you’ve properly randomized the treatment and effect, the analysis methods are well-understood and there are great tools in R and python (and other languages) to find the effects. However, when you’re operating at scale, the logistics of running all those tests, and reaching correct conclusions reliably, becomes the main challenge—making sure the right metrics are being computed, you know when to stop an experiment, you minimize the chances of finding spurious results, and many other issues that are simple to track for one or two experiments but become real challenges for dozens or hundreds of experiments. Nonetheless, the reality is that there might be dozens or hundreds of experiments worth running. So in this episode, we’ll work through some of the most important issues for running experiments at scale, with strong support from a series of great blog posts from Airbnb about how they solve this very issue.

For some blog post links relevant to this episode, visit lineardigressions.com</itunes:summary>
      <itunes:subtitle>When you want to understand if doing something ca…</itunes:subtitle>
      <description>When you want to understand if doing something causes something else to happen, like if a change to a website causes and dip or rise in downstream conversions, the gold standard analysis method is to use randomized controlled trials. Once you’ve properly randomized the treatment and effect, the analysis methods are well-understood and there are great tools in R and python (and other languages) to find the effects. However, when you’re operating at scale, the logistics of running all those tests, and reaching correct conclusions reliably, becomes the main challenge—making sure the right metrics are being computed, you know when to stop an experiment, you minimize the chances of finding spurious results, and many other issues that are simple to track for one or two experiments but become real challenges for dozens or hundreds of experiments. Nonetheless, the reality is that there might be dozens or hundreds of experiments worth running. So in this episode, we’ll work through some of the most important issues for running experiments at scale, with strong support from a series of great blog posts from Airbnb about how they solve this very issue.

For some blog post links relevant to this episode, visit lineardigressions.com</description>
      <enclosure length="17284317" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/714388189-linear-digressions-varsity-ab-testing.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/710910499</guid>
      <title>The Care and Feeding of Data Scientists: Growing Careers</title>
      <pubDate>Mon, 11 Nov 2019 03:44:18 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-care-and-feeding-of-data-scientists-growing-careers</link>
      <itunes:duration>00:25:19</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In the third and final installment of a conversation with Michelangelo D’Agostino, VP of Data Science and Engineering at Shoprunner, about growing and mentoring data scientists on your team. Some of our topics of conversation include how to institute hack time as a way to learn new things, what career growth looks like in data science, and how to institutionalize professional growth as part of a career ladder. As with the other episodes in this series, the topics we cover today are also covered in the O’Reilly report linked below.

Relevant links: https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf</itunes:summary>
      <itunes:subtitle>In the third and final installment of a conversat…</itunes:subtitle>
      <description>In the third and final installment of a conversation with Michelangelo D’Agostino, VP of Data Science and Engineering at Shoprunner, about growing and mentoring data scientists on your team. Some of our topics of conversation include how to institute hack time as a way to learn new things, what career growth looks like in data science, and how to institutionalize professional growth as part of a career ladder. As with the other episodes in this series, the topics we cover today are also covered in the O’Reilly report linked below.

Relevant links: https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf</description>
      <enclosure length="12154913" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/710910499-linear-digressions-the-care-and-feeding-of-data-scientists-growing-careers.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/707491345</guid>
      <title>The Care and Feeding of Data Scientists: Recruiting and Hiring Data Scientists</title>
      <pubDate>Mon, 04 Nov 2019 00:21:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-care-and-feeding-of-data-scientists-recruiting-and-hiring-data-scientists-1</link>
      <itunes:duration>00:20:16</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week’s episode is the second in a three-part interview series with Michelangelo D’Agostino, VP of Data Science at Shoprunner. This discussion centers on building a team, which means recruiting, interviewing and hiring data scientists. Since data science talent is in such high demand, and data scientists are understandably choosy about where they go to work, a good recruiting and hiring program can have a big impact on the size and quality of the team. Our chat covers much a couple of sections in our dual-authored O’Reilly report, “The Care and Feeding of Data Scientists,” which you can read at the link below.

https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf</itunes:summary>
      <itunes:subtitle>This week’s episode is the second in a three-part…</itunes:subtitle>
      <description>This week’s episode is the second in a three-part interview series with Michelangelo D’Agostino, VP of Data Science at Shoprunner. This discussion centers on building a team, which means recruiting, interviewing and hiring data scientists. Since data science talent is in such high demand, and data scientists are understandably choosy about where they go to work, a good recruiting and hiring program can have a big impact on the size and quality of the team. Our chat covers much a couple of sections in our dual-authored O’Reilly report, “The Care and Feeding of Data Scientists,” which you can read at the link below.

https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf</description>
      <enclosure length="9729495" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/707491345-linear-digressions-the-care-and-feeding-of-data-scientists-recruiting-and-hiring-data-scientists-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/702677326</guid>
      <title>The Care and Feeding of Data Scientists: Becoming a Data Science Manager</title>
      <pubDate>Mon, 28 Oct 2019 01:27:58 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-care-and-feeding-of-data-scientists-becoming-a-data-science-manager</link>
      <itunes:duration>00:24:45</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Data science management isn’t easy, and many data scientists are finding themselves learning on the job how to manage data science teams as they get promoted into more formal leadership roles. O’Reilly recently release a report, written by yours truly (Katie) and another experienced data science manager, Michelangelo D’Agostino, where we lay out the most important tasks of a data science manager and some thoughts on how to unpack those tasks and approach them in a way that makes a new manager successful. This episode is an interview episode, the first of three, where we discuss some of the common paths to data science management and what distinguishes (and unifies) different types of data scientists and data science teams.

Relevant links:
https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf</itunes:summary>
      <itunes:subtitle>Data science management isn’t easy, and many data…</itunes:subtitle>
      <description>Data science management isn’t easy, and many data scientists are finding themselves learning on the job how to manage data science teams as they get promoted into more formal leadership roles. O’Reilly recently release a report, written by yours truly (Katie) and another experienced data science manager, Michelangelo D’Agostino, where we lay out the most important tasks of a data science manager and some thoughts on how to unpack those tasks and approach them in a way that makes a new manager successful. This episode is an interview episode, the first of three, where we discuss some of the common paths to data science management and what distinguishes (and unifies) different types of data scientists and data science teams.

Relevant links:
https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf</description>
      <enclosure length="11886165" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/702677326-linear-digressions-the-care-and-feeding-of-data-scientists-becoming-a-data-science-manager.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/699351979</guid>
      <title>Procella: YouTube's super-system for analytics data storage</title>
      <pubDate>Mon, 21 Oct 2019 01:27:45 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/procella-youtubes-super-system-for-analytics-data-storage</link>
      <itunes:duration>00:29:48</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs and several very different use cases of what they needed to do with that data) and went a different way: they built one analytics data system to serve them all. Procella, the system they built, is the topic of our episode today: by deconstructing the system, we dig into the four motivating uses of this system, the complexity they had to introduce to service all four uses simultaneously, and the impressive engineering that has to go into building something that “just works.”

Relevant links:
https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45a6cea2b9c101761ea1b51c961628093ec1d5da.pdf</itunes:summary>
      <itunes:subtitle>If you’re trying to manage a project that serves …</itunes:subtitle>
      <description>If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs and several very different use cases of what they needed to do with that data) and went a different way: they built one analytics data system to serve them all. Procella, the system they built, is the topic of our episode today: by deconstructing the system, we dig into the four motivating uses of this system, the complexity they had to introduce to service all four uses simultaneously, and the impressive engineering that has to go into building something that “just works.”

Relevant links:
https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45a6cea2b9c101761ea1b51c961628093ec1d5da.pdf</description>
      <enclosure length="14309283" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/699351979-linear-digressions-procella-youtubes-super-system-for-analytics-data-storage.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/695200633</guid>
      <title>Kalman Runners</title>
      <pubDate>Sun, 13 Oct 2019 20:04:52 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/kalman-runners-1</link>
      <itunes:duration>00:15:59</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The Kalman Filter is an algorithm for taking noisy measurements of dynamic systems and using them to get a better idea of the underlying dynamics than you could get from a simple extrapolation. If you've ever run a marathon, or been a nuclear missile, you probably know all about these challenges already. 

IMPORTANT NON-DATA SCIENCE CHICAGO MARATHON RACE RESULT FROM KATIE: My finish time was 3:20:17! It was the closest I may ever come to having the perfect run. That’s a 34-minute personal record and a qualifying time for the Boston Marathon, so… guess I gotta go do that now.</itunes:summary>
      <itunes:subtitle>The Kalman Filter is an algorithm for taking nois…</itunes:subtitle>
      <description>The Kalman Filter is an algorithm for taking noisy measurements of dynamic systems and using them to get a better idea of the underlying dynamics than you could get from a simple extrapolation. If you've ever run a marathon, or been a nuclear missile, you probably know all about these challenges already. 

IMPORTANT NON-DATA SCIENCE CHICAGO MARATHON RACE RESULT FROM KATIE: My finish time was 3:20:17! It was the closest I may ever come to having the perfect run. That’s a 34-minute personal record and a qualifying time for the Boston Marathon, so… guess I gotta go do that now.</description>
      <enclosure length="7673972" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/695200633-linear-digressions-kalman-runners-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/691991614</guid>
      <title>What's *really* so hard about feature engineering?</title>
      <pubDate>Sun, 06 Oct 2019 22:37:49 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/whats-really-so-hard-about-feature-engineering</link>
      <itunes:duration>00:21:18</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Feature engineering is ubiquitous but gets surprisingly difficult surprisingly fast. What could be so complicated about just keeping track of what data you have, and how you made it? A lot, as it turns out—most data science platforms at this point include explicit features (in the product sense, not the data sense) just for keeping track of and sharing features (in the data sense, not the product sense). Just like a good library needs a catalogue, a city needs a map, and a home chef needs a cookbook to stay organized, modern data scientists need feature libraries, data dictionaries, and a general discipline around generating and caring for their datasets.</itunes:summary>
      <itunes:subtitle>Feature engineering is ubiquitous but gets surpri…</itunes:subtitle>
      <description>Feature engineering is ubiquitous but gets surprisingly difficult surprisingly fast. What could be so complicated about just keeping track of what data you have, and how you made it? A lot, as it turns out—most data science platforms at this point include explicit features (in the product sense, not the data sense) just for keeping track of and sharing features (in the data sense, not the product sense). Just like a good library needs a catalogue, a city needs a map, and a home chef needs a cookbook to stay organized, modern data scientists need feature libraries, data dictionaries, and a general discipline around generating and caring for their datasets.</description>
      <enclosure length="10225195" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/691991614-linear-digressions-whats-really-so-hard-about-feature-engineering.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/688872199</guid>
      <title>Data storage for analytics: stars and snowflakes</title>
      <pubDate>Mon, 30 Sep 2019 11:22:15 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-storage-for-analytics-stars-and-snowflakes</link>
      <itunes:duration>00:15:22</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you’re a data scientist or data engineer thinking about how to store data for analytics uses, one of the early choices you’ll have to make (or live with, if someone else made it) is how to lay out the data in your data warehouse. There are a couple common organizational schemes that you’ll likely encounter, and that we cover in this episode: first is the famous star schema, followed by the also-famous snowflake schema.</itunes:summary>
      <itunes:subtitle>If you’re a data scientist or data engineer think…</itunes:subtitle>
      <description>If you’re a data scientist or data engineer thinking about how to store data for analytics uses, one of the early choices you’ll have to make (or live with, if someone else made it) is how to lay out the data in your data warehouse. There are a couple common organizational schemes that you’ll likely encounter, and that we cover in this episode: first is the famous star schema, followed by the also-famous snowflake schema.</description>
      <enclosure length="7381192" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/688872199-linear-digressions-data-storage-for-analytics-stars-and-snowflakes.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/685293595</guid>
      <title>Data storage: transactions vs. analytics</title>
      <pubDate>Mon, 23 Sep 2019 01:49:59 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-storage-transactions-vs-analytics</link>
      <itunes:duration>00:16:08</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Data scientists and software engineers both work with databases, but they use them for different purposes. So if you’re a data scientist thinking about the best way to store and access data for your analytics, you’ll likely come up with a very different set of requirements than a software engineer looking to power an application. Hence the split between analytics and transactional databases—certain technologies are designed for one or the other, but no single type of database is perfect for both use cases. In this episode we’ll talk about the differences between transactional and analytics databases, so no matter whether you’re an analytics person or more of a classical software engineer, you can understand the needs of your colleagues on the other side.</itunes:summary>
      <itunes:subtitle>Data scientists and software engineers both work …</itunes:subtitle>
      <description>Data scientists and software engineers both work with databases, but they use them for different purposes. So if you’re a data scientist thinking about the best way to store and access data for your analytics, you’ll likely come up with a very different set of requirements than a software engineer looking to power an application. Hence the split between analytics and transactional databases—certain technologies are designed for one or the other, but no single type of database is perfect for both use cases. In this episode we’ll talk about the differences between transactional and analytics databases, so no matter whether you’re an analytics person or more of a classical software engineer, you can understand the needs of your colleagues on the other side.</description>
      <enclosure length="7745861" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/685293595-linear-digressions-data-storage-transactions-vs-analytics.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/681687440</guid>
      <title>GROVER: an algorithm for making, and detecting, fake news</title>
      <pubDate>Mon, 16 Sep 2019 03:21:34 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/grover-an-algorithm-for-making-and-detecting-fake-news</link>
      <itunes:duration>00:18:28</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>There are a few things that seem to be very popular in discussions of machine learning algorithms these days. First is the role that algorithms play now, or might play in the future, when it comes to manipulating public opinion, for example with fake news. Second is the impressive success of generative adversarial networks, and similar algorithms. Third is making state-of-the-art natural language processing algorithms and naming them after muppets. We get all three this week: GROVER is an algorithm for generating, and detecting, fake news. It’s quite successful at both tasks, which raises an interesting question: is it safer to embargo the model (like GPT-2, the algorithm that was “too dangerous to release”), or release it as the best detector and antidote for its own fake news?

Relevant links:
https://grover.allenai.org/
https://arxiv.org/abs/1905.12616</itunes:summary>
      <itunes:subtitle>There are a few things that seem to be very popul…</itunes:subtitle>
      <description>There are a few things that seem to be very popular in discussions of machine learning algorithms these days. First is the role that algorithms play now, or might play in the future, when it comes to manipulating public opinion, for example with fake news. Second is the impressive success of generative adversarial networks, and similar algorithms. Third is making state-of-the-art natural language processing algorithms and naming them after muppets. We get all three this week: GROVER is an algorithm for generating, and detecting, fake news. It’s quite successful at both tasks, which raises an interesting question: is it safer to embargo the model (like GPT-2, the algorithm that was “too dangerous to release”), or release it as the best detector and antidote for its own fake news?

Relevant links:
https://grover.allenai.org/
https://arxiv.org/abs/1905.12616</description>
      <enclosure length="8865156" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/681687440-linear-digressions-grover-an-algorithm-for-making-and-detecting-fake-news.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/678310368</guid>
      <title>Data science teams as innovation initiatives</title>
      <pubDate>Mon, 09 Sep 2019 02:24:55 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-science-teams-as-innovation-initiatives</link>
      <itunes:duration>00:15:21</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When a big, established company is thinking about their data science strategy, chances are good that whatever they come up with, it’ll be somewhat at odds with the company’s current structure and processes. Which makes sense, right? If you’re a many-decades-old company trying to defend a successful and long-lived legacy and market share, you won’t have the advantage that many upstart competitors have of being able to bake data analytics and science into the core structure of the organization. Instead, you have to retrofit. If you’re the data scientist working in this environment, tasked with being on the front lines of a data transformation, you may be grappling with some real institutional challenges in this setup, and this episode is for you. We’ll unpack the reason data innovation is necessarily challenging, the different ways to innovate and some of their tradeoffs, and some of the hardest but most critical phases in the innovation process.

Relevant links:
https://www.amazon.com/Innovators-Dilemma-Revolutionary-Change-Business/dp/0062060244
https://www.amazon.com/Other-Side-Innovation-Execution-Challenge/dp/1422166961</itunes:summary>
      <itunes:subtitle>When a big, established company is thinking about…</itunes:subtitle>
      <description>When a big, established company is thinking about their data science strategy, chances are good that whatever they come up with, it’ll be somewhat at odds with the company’s current structure and processes. Which makes sense, right? If you’re a many-decades-old company trying to defend a successful and long-lived legacy and market share, you won’t have the advantage that many upstart competitors have of being able to bake data analytics and science into the core structure of the organization. Instead, you have to retrofit. If you’re the data scientist working in this environment, tasked with being on the front lines of a data transformation, you may be grappling with some real institutional challenges in this setup, and this episode is for you. We’ll unpack the reason data innovation is necessarily challenging, the different ways to innovate and some of their tradeoffs, and some of the hardest but most critical phases in the innovation process.

Relevant links:
https://www.amazon.com/Innovators-Dilemma-Revolutionary-Change-Business/dp/0062060244
https://www.amazon.com/Other-Side-Innovation-Execution-Challenge/dp/1422166961</description>
      <enclosure length="7367817" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/678310368-linear-digressions-data-science-teams-as-innovation-initiatives.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/674309027</guid>
      <title>Can Fancy Running Shoes Cause You To Run Faster?</title>
      <pubDate>Sun, 01 Sep 2019 23:44:51 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/can-fancy-running-shoes-cause-you-to-run-faster-1</link>
      <itunes:duration>00:30:15</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This is a re-release of an episode that originally aired on July 29, 2018.

The stars aligned for me (Katie) this past weekend: I raced my first half-marathon in a long time and got to read a great article from the NY Times about a new running shoe that Nike claims can make its wearers run faster. Causal claims like this one are really tough to verify, because even if the data suggests that people wearing the shoe are faster that might be because of correlation, not causation, so I loved reading this article that went through an analysis of thousands of runners' data in 4 different ways. Each way has a great explanation with pros and cons (as well as results, of course), so be sure to read the article after you check out this episode!

Relevant links: https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html</itunes:summary>
      <itunes:subtitle>This is a re-release of an episode that originall…</itunes:subtitle>
      <description>This is a re-release of an episode that originally aired on July 29, 2018.

The stars aligned for me (Katie) this past weekend: I raced my first half-marathon in a long time and got to read a great article from the NY Times about a new running shoe that Nike claims can make its wearers run faster. Causal claims like this one are really tough to verify, because even if the data suggests that people wearing the shoe are faster that might be because of correlation, not causation, so I loved reading this article that went through an analysis of thousands of runners' data in 4 different ways. Each way has a great explanation with pros and cons (as well as results, of course), so be sure to read the article after you check out this episode!

Relevant links: https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html</description>
      <enclosure length="14521397" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/674309027-linear-digressions-can-fancy-running-shoes-cause-you-to-run-faster-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/670958588</guid>
      <title>Organizational Models for Data Scientists</title>
      <pubDate>Sun, 25 Aug 2019 23:06:52 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/organizational-models-for-data-scientists</link>
      <itunes:duration>00:23:09</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When data science is hard, sometimes it’s because the algorithms aren’t converging or the data is messy, and sometimes it’s because of organizational or business issues: the data scientists aren’t positioned correctly to bring value to their organization. Maybe they don’t know what problems to work on, or they build solutions to those problems but nobody uses what they build. A lot of this can be traced back to the way the team is organized, and (relatedly) how it interacts with the rest of the organization, which is what we tackle in this issue. There are lots of options about how to organize your data science team, each of which has strengths and weaknesses, and Pardis Noorzad wrote a great blog post recently that got us talking.

Relevant links: https://medium.com/swlh/models-for-integrating-data-science-teams-within-organizations-7c5afa032ebd</itunes:summary>
      <itunes:subtitle>When data science is hard, sometimes it’s because…</itunes:subtitle>
      <description>When data science is hard, sometimes it’s because the algorithms aren’t converging or the data is messy, and sometimes it’s because of organizational or business issues: the data scientists aren’t positioned correctly to bring value to their organization. Maybe they don’t know what problems to work on, or they build solutions to those problems but nobody uses what they build. A lot of this can be traced back to the way the team is organized, and (relatedly) how it interacts with the rest of the organization, which is what we tackle in this issue. There are lots of options about how to organize your data science team, each of which has strengths and weaknesses, and Pardis Noorzad wrote a great blog post recently that got us talking.

Relevant links: https://medium.com/swlh/models-for-integrating-data-science-teams-within-organizations-7c5afa032ebd</description>
      <enclosure length="11116911" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/670958588-linear-digressions-organizational-models-for-data-scientists.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/667799375</guid>
      <title>Data Shapley</title>
      <pubDate>Mon, 19 Aug 2019 02:38:16 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-shapley</link>
      <itunes:duration>00:16:55</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We talk often about which features in a dataset are most important, but recently a new paper has started making the rounds that turns the idea of importance on its head: Data Shapley is an algorithm for thinking about which examples in a dataset are most important. It makes a lot of intuitive sense: data that’s just repeating examples that you’ve already seen, or that’s noisy or an extreme outlier, might not be that valuable for using to train a machine learning model. But some data is very valuable, it’s disproportionately useful for the algorithm figuring out what the most important trends are, and Data Shapley is explicitly designed to help machine learning researchers spend their time understanding which data points are most valuable and why.

Relevant links:
http://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf
https://blog.acolyer.org/2019/07/15/data-shapley/</itunes:summary>
      <itunes:subtitle>We talk often about which features in a dataset a…</itunes:subtitle>
      <description>We talk often about which features in a dataset are most important, but recently a new paper has started making the rounds that turns the idea of importance on its head: Data Shapley is an algorithm for thinking about which examples in a dataset are most important. It makes a lot of intuitive sense: data that’s just repeating examples that you’ve already seen, or that’s noisy or an extreme outlier, might not be that valuable for using to train a machine learning model. But some data is very valuable, it’s disproportionately useful for the algorithm figuring out what the most important trends are, and Data Shapley is explicitly designed to help machine learning researchers spend their time understanding which data points are most valuable and why.

Relevant links:
http://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf
https://blog.acolyer.org/2019/07/15/data-shapley/</description>
      <enclosure length="8124741" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/667799375-linear-digressions-data-shapley.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/664515107</guid>
      <title>A Technical Deep Dive on Stanley, the First Self-Driving Car</title>
      <pubDate>Mon, 12 Aug 2019 02:21:06 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/a-technical-deep-dive-on-stanley-the-first-self-driving-car-1</link>
      <itunes:duration>00:41:32</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This is a re-release of an episode that first ran on April 9, 2017.

In our follow-up episode to last week's introduction to the first self-driving car, we will be doing a technical deep dive this week and talking about the most important systems for getting a car to drive itself 140 miles across the desert.  Lidar?  You betcha!  Drive-by-wire?  Of course!  Probabilistic terrain reconstruction?  Absolutely!  All this and more this week on Linear Digressions.</itunes:summary>
      <itunes:subtitle>This is a re-release of an episode that first ran…</itunes:subtitle>
      <description>This is a re-release of an episode that first ran on April 9, 2017.

In our follow-up episode to last week's introduction to the first self-driving car, we will be doing a technical deep dive this week and talking about the most important systems for getting a car to drive itself 140 miles across the desert.  Lidar?  You betcha!  Drive-by-wire?  Of course!  Probabilistic terrain reconstruction?  Absolutely!  All this and more this week on Linear Digressions.</description>
      <enclosure length="19933760" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/664515107-linear-digressions-a-technical-deep-dive-on-stanley-the-first-self-driving-car-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/661168319</guid>
      <title>An Introduction to Stanley, the First Self-Driving Car</title>
      <pubDate>Mon, 05 Aug 2019 00:28:54 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/an-introduction-to-stanley-the-first-self-driving-car-1</link>
      <itunes:duration>00:14:19</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In October 2005, 23 cars lined up in the desert for a 140 mile race.  Not one of those cars had a driver.  This was the DARPA grand challenge to see if anyone could build an autonomous vehicle capable of navigating a desert route (and if so, whose car could do it the fastest); the winning car, Stanley, now sits in the Smithsonian Museum in Washington DC as arguably the world's first real self-driving car.  In this episode (part one of a two-parter), we'll revisit the DARPA grand challenge from 2005 and the rules and constraints of what it took for Stanley to win the competition.  Next week, we'll do a deep dive into Stanley's control systems and overall operation and what the key systems were that allowed Stanley to win the race.

Relevant links:
http://isl.ecst.csuchico.edu/DOCS/darpa2005/DARPA%202005%20Stanley.pdf</itunes:summary>
      <itunes:subtitle>In October 2005, 23 cars lined up in the desert f…</itunes:subtitle>
      <description>In October 2005, 23 cars lined up in the desert for a 140 mile race.  Not one of those cars had a driver.  This was the DARPA grand challenge to see if anyone could build an autonomous vehicle capable of navigating a desert route (and if so, whose car could do it the fastest); the winning car, Stanley, now sits in the Smithsonian Museum in Washington DC as arguably the world's first real self-driving car.  In this episode (part one of a two-parter), we'll revisit the DARPA grand challenge from 2005 and the rules and constraints of what it took for Stanley to win the competition.  Next week, we'll do a deep dive into Stanley's control systems and overall operation and what the key systems were that allowed Stanley to win the race.

Relevant links:
http://isl.ecst.csuchico.edu/DOCS/darpa2005/DARPA%202005%20Stanley.pdf</description>
      <enclosure length="6876715" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/661168319-linear-digressions-an-introduction-to-stanley-the-first-self-driving-car-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/657844025</guid>
      <title>Putting the "science" in data science: the scientific method, the null hypothesis, and p-hacking</title>
      <pubDate>Mon, 29 Jul 2019 01:30:54 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/putting-the-science-in-data-science-the-scientific-method-the-null-hypothesis-and-p-hacking</link>
      <itunes:duration>00:24:11</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The modern scientific method is one of the greatest (perhaps the greatest?) system we have for discovering knowledge about the world. It’s no surprise then that many data scientists have found their skills in high demand in the business world, where knowing more about a market, or industry, or type of user becomes a competitive advantage. But the scientific method is built upon certain processes, and is disciplined about following them, in a way that can get swept aside in the rush to get something out the door—not the least of which is the fact that in science, sometimes a result simply doesn’t materialize, or sometimes a relationship simply isn’t there. This makes data science different than operations, or software engineering, or product design in an important way: a data scientist needs to be comfortable with finding nothing in the data for certain types of searches, and needs to be even more comfortable telling his or her boss, or boss’s boss, that an attempt to build a model or find a causal link has turned up nothing. It’s a result that often disappointing and tough to communicate, but it’s crucial to the overall credibility of the field.</itunes:summary>
      <itunes:subtitle>The modern scientific method is one of the greate…</itunes:subtitle>
      <description>The modern scientific method is one of the greatest (perhaps the greatest?) system we have for discovering knowledge about the world. It’s no surprise then that many data scientists have found their skills in high demand in the business world, where knowing more about a market, or industry, or type of user becomes a competitive advantage. But the scientific method is built upon certain processes, and is disciplined about following them, in a way that can get swept aside in the rush to get something out the door—not the least of which is the fact that in science, sometimes a result simply doesn’t materialize, or sometimes a relationship simply isn’t there. This makes data science different than operations, or software engineering, or product design in an important way: a data scientist needs to be comfortable with finding nothing in the data for certain types of searches, and needs to be even more comfortable telling his or her boss, or boss’s boss, that an attempt to build a model or find a causal link has turned up nothing. It’s a result that often disappointing and tough to communicate, but it’s crucial to the overall credibility of the field.</description>
      <enclosure length="11607386" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/657844025-linear-digressions-putting-the-science-in-data-science-the-scientific-method-the-null-hypothesis-and-p-hacking.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/654705563</guid>
      <title>Interleaving</title>
      <pubDate>Mon, 22 Jul 2019 12:20:58 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/interleaving</link>
      <itunes:duration>00:16:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you’re Google or Netflix, and you have a recommendation or search system as part of your bread and butter, what’s the best way to test improvements to your algorithm? A/B testing is the canonical answer for testing how users respond to software changes, but it gets tricky really fast to think about what an A/B test means in the context of an algorithm that returns a ranked list. That’s why we’re talking about interleaving this week—it’s a simple modification to A/B testing that makes it much easier to race two algorithms against each other and find the winner, and it allows you to do it with much less data than a traditional A/B test.

Relevant links:
https://medium.com/netflix-techblog/interleaving-in-online-experiments-at-netflix-a04ee392ec55
https://www.microsoft.com/en-us/research/publication/predicting-search-satisfaction-metrics-with-interleaved-comparisons/
https://www.cs.cornell.edu/people/tj/publications/joachims_02b.pdf</itunes:summary>
      <itunes:subtitle>If you’re Google or Netflix, and you have a recom…</itunes:subtitle>
      <description>If you’re Google or Netflix, and you have a recommendation or search system as part of your bread and butter, what’s the best way to test improvements to your algorithm? A/B testing is the canonical answer for testing how users respond to software changes, but it gets tricky really fast to think about what an A/B test means in the context of an algorithm that returns a ranked list. That’s why we’re talking about interleaving this week—it’s a simple modification to A/B testing that makes it much easier to race two algorithms against each other and find the winner, and it allows you to do it with much less data than a traditional A/B test.

Relevant links:
https://medium.com/netflix-techblog/interleaving-in-online-experiments-at-netflix-a04ee392ec55
https://www.microsoft.com/en-us/research/publication/predicting-search-satisfaction-metrics-with-interleaved-comparisons/
https://www.cs.cornell.edu/people/tj/publications/joachims_02b.pdf</description>
      <enclosure length="8115755" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/654705563-linear-digressions-interleaving.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/646395096</guid>
      <title>Federated Learning</title>
      <pubDate>Sun, 14 Jul 2019 23:00:14 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/federated-learning-1</link>
      <itunes:duration>00:15:03</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This is a re-release of an episode first released in May 2017.

As machine learning makes its way into more and more mobile devices, an interesting question presents itself: how can we have an algorithm learn from training data that's being supplied as users interact with the algorithm?  In other words, how do we do machine learning when the training dataset is distributed across many devices, imbalanced, and the usage associated with any one user needs to be obscured somewhat to protect the privacy of that user?  Enter Federated Learning, a set of related algorithms from Google that are designed to help out in exactly this scenario.  If you've used keyboard shortcuts or autocomplete on an Android phone, chances are you've encountered Federated Learning even if you didn't know it.</itunes:summary>
      <itunes:subtitle>This is a re-release of an episode first released…</itunes:subtitle>
      <description>This is a re-release of an episode first released in May 2017.

As machine learning makes its way into more and more mobile devices, an interesting question presents itself: how can we have an algorithm learn from training data that's being supplied as users interact with the algorithm?  In other words, how do we do machine learning when the training dataset is distributed across many devices, imbalanced, and the usage associated with any one user needs to be obscured somewhat to protect the privacy of that user?  Enter Federated Learning, a set of related algorithms from Google that are designed to help out in exactly this scenario.  If you've used keyboard shortcuts or autocomplete on an Android phone, chances are you've encountered Federated Learning even if you didn't know it.</description>
      <enclosure length="7224248" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/646395096-linear-digressions-federated-learning-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/646396245</guid>
      <title>Endogenous Variables and Measuring Protest Effectiveness</title>
      <pubDate>Sun, 07 Jul 2019 22:59:59 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/endogenous-variables-and-measuring-protest-effectiveness-1</link>
      <itunes:duration>00:17:58</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This is a re-release of an episode first released in February 2017.

Have you been out protesting lately, or watching the protests, and wondered how much effect they might have on lawmakers?  It's a tricky question to answer, since usually we need randomly distributed treatments (e.g. big protests) to understand causality, but there's no reason to believe that big protests are actually randomly distributed.  In other words, protest size is endogenous to legislative response, and understanding cause and effect is very challenging.

So, what to do?  Well, at least in the case of studying Tea Party protest effectiveness, researchers have used rainfall, of all things, to understand the impact of a big protest.  In other words, rainfall is the instrumental variable in this analysis that cracks the scientific case open.  What does rainfall have to do with protests?  Do protests actually matter?  What do we mean when we talk about endogenous and instrumental variables?  We wouldn't be very good podcasters if we answered all those questions here--you gotta listen to this episode to find out.</itunes:summary>
      <itunes:subtitle>This is a re-release of an episode first released…</itunes:subtitle>
      <description>This is a re-release of an episode first released in February 2017.

Have you been out protesting lately, or watching the protests, and wondered how much effect they might have on lawmakers?  It's a tricky question to answer, since usually we need randomly distributed treatments (e.g. big protests) to understand causality, but there's no reason to believe that big protests are actually randomly distributed.  In other words, protest size is endogenous to legislative response, and understanding cause and effect is very challenging.

So, what to do?  Well, at least in the case of studying Tea Party protest effectiveness, researchers have used rainfall, of all things, to understand the impact of a big protest.  In other words, rainfall is the instrumental variable in this analysis that cracks the scientific case open.  What does rainfall have to do with protests?  Do protests actually matter?  What do we mean when we talk about endogenous and instrumental variables?  We wouldn't be very good podcasters if we answered all those questions here--you gotta listen to this episode to find out.</description>
      <enclosure length="8629009" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/646396245-linear-digressions-endogenous-variables-and-measuring-protest-effectiveness-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/644459478</guid>
      <title>Deepfakes</title>
      <pubDate>Mon, 01 Jul 2019 01:25:07 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/deepfakes</link>
      <itunes:duration>00:15:08</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Generative adversarial networks (GANs) are producing some of the most realistic artificial videos we’ve ever seen. These videos are usually called “deepfakes”. Even to an experienced eye, it can be a challenge to distinguish a fabricated video from a real one, which is an extraordinary challenge in an era when the truth of what you see on the news or especially on social media is worthy of skepticism. And just in case that wasn’t unsettling enough, the algorithms just keep getting better and more accessible—which means it just keeps getting easier to make completely fake, but real-looking, videos of celebrities, politicians, and perhaps even just regular people.

Relevant links:

http://lineardigressions.com/episodes/2016/5/28/neural-nets-play-cops-and-robbers-aka-generative-adversarial-networks

http://fortune.com/2019/06/12/deepfake-mark-zuckerberg/

https://www.youtube.com/watch?v=EfREntgxmDs

https://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/will-deepfakes-detection-be-ready-for-2020

https://giorgiop.github.io/posts/2018/03/17/AI-and-digital-forgery/</itunes:summary>
      <itunes:subtitle>Generative adversarial networks (GANs) are produc…</itunes:subtitle>
      <description>Generative adversarial networks (GANs) are producing some of the most realistic artificial videos we’ve ever seen. These videos are usually called “deepfakes”. Even to an experienced eye, it can be a challenge to distinguish a fabricated video from a real one, which is an extraordinary challenge in an era when the truth of what you see on the news or especially on social media is worthy of skepticism. And just in case that wasn’t unsettling enough, the algorithms just keep getting better and more accessible—which means it just keeps getting easier to make completely fake, but real-looking, videos of celebrities, politicians, and perhaps even just regular people.

Relevant links:

http://lineardigressions.com/episodes/2016/5/28/neural-nets-play-cops-and-robbers-aka-generative-adversarial-networks

http://fortune.com/2019/06/12/deepfake-mark-zuckerberg/

https://www.youtube.com/watch?v=EfREntgxmDs

https://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/will-deepfakes-detection-be-ready-for-2020

https://giorgiop.github.io/posts/2018/03/17/AI-and-digital-forgery/</description>
      <enclosure length="7268134" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/644459478-linear-digressions-deepfakes.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/641117043</guid>
      <title>Revisiting Biased Word Embeddings</title>
      <pubDate>Mon, 24 Jun 2019 00:26:07 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/revisiting-biased-word-embeddings</link>
      <itunes:duration>00:18:09</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The topic of bias in word embeddings gets yet another pass this week. It all started a few years ago, when an analogy task performed on Word2Vec embeddings showed some indications of gender bias around professions (as well as other forms of social bias getting reproduced in the algorithm’s embeddings). We covered the topic again a while later, covering methods for de-biasing embeddings to counteract this effect. And now we’re back, with a second pass on the original Word2Vec analogy task, but where the researchers deconstructed the “rules” of the analogies themselves and came to an interesting discovery: the bias seems to be, at least in part, an artifact of the analogy construction method. Intrigued? So were we…

Relevant link:
https://arxiv.org/abs/1905.09866</itunes:summary>
      <itunes:subtitle>The topic of bias in word embeddings gets yet ano…</itunes:subtitle>
      <description>The topic of bias in word embeddings gets yet another pass this week. It all started a few years ago, when an analogy task performed on Word2Vec embeddings showed some indications of gender bias around professions (as well as other forms of social bias getting reproduced in the algorithm’s embeddings). We covered the topic again a while later, covering methods for de-biasing embeddings to counteract this effect. And now we’re back, with a second pass on the original Word2Vec analogy task, but where the researchers deconstructed the “rules” of the analogies themselves and came to an interesting discovery: the bias seems to be, at least in part, an artifact of the analogy construction method. Intrigued? So were we…

Relevant link:
https://arxiv.org/abs/1905.09866</description>
      <enclosure length="8715944" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/641117043-linear-digressions-revisiting-biased-word-embeddings.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/637701588</guid>
      <title>Attention in Neural Nets</title>
      <pubDate>Mon, 17 Jun 2019 00:28:35 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/attention-in-neural-nets</link>
      <itunes:duration>00:26:32</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>There’s been a lot of interest lately in the attention mechanism in neural nets—it’s got a colloquial name (who’s not familiar with the idea of “attention”?) but it’s more like a technical trick that’s been pivotal to some recent advances in computer vision and especially word embeddings. It’s an interesting example of trying out human-cognitive-ish ideas (like focusing consideration more on some inputs than others) in neural nets, and one of the more high-profile recent successes in playing around with neural net architectures for fun and profit.</itunes:summary>
      <itunes:subtitle>There’s been a lot of interest lately in the atte…</itunes:subtitle>
      <description>There’s been a lot of interest lately in the attention mechanism in neural nets—it’s got a colloquial name (who’s not familiar with the idea of “attention”?) but it’s more like a technical trick that’s been pivotal to some recent advances in computer vision and especially word embeddings. It’s an interesting example of trying out human-cognitive-ish ideas (like focusing consideration more on some inputs than others) in neural nets, and one of the more high-profile recent successes in playing around with neural net architectures for fun and profit.</description>
      <enclosure length="12739637" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/637701588-linear-digressions-attention-in-neural-nets.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/634307244</guid>
      <title>Interview with Joel Grus</title>
      <pubDate>Mon, 10 Jun 2019 02:05:47 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/interview-with-joel-grus</link>
      <itunes:duration>00:39:46</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week’s episode is a special one, as we’re welcoming a guest: Joel Grus is a data scientist with a strong software engineering streak, and he does an impressive amount of speaking, writing, and podcasting as well. Whether you’re a new data scientist just getting started, or a seasoned hand looking to improve your skill set, there’s something for you in Joel’s repertoire.</itunes:summary>
      <itunes:subtitle>This week’s episode is a special one, as we’re we…</itunes:subtitle>
      <description>This week’s episode is a special one, as we’re welcoming a guest: Joel Grus is a data scientist with a strong software engineering streak, and he does an impressive amount of speaking, writing, and podcasting as well. Whether you’re a new data scientist just getting started, or a seasoned hand looking to improve your skill set, there’s something for you in Joel’s repertoire.</description>
      <enclosure length="19087184" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/634307244-linear-digressions-interview-with-joel-grus.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/630686034</guid>
      <title>Re - Release: Factorization Machines</title>
      <pubDate>Mon, 03 Jun 2019 01:32:39 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-factorization-machines</link>
      <itunes:duration>00:20:09</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>What do you get when you cross a support vector machine with matrix factorization?  You get a factorization machine, and a darn fine algorithm for recommendation engines.</itunes:summary>
      <itunes:subtitle>What do you get when you cross a support vector m…</itunes:subtitle>
      <description>What do you get when you cross a support vector machine with matrix factorization?  You get a factorization machine, and a darn fine algorithm for recommendation engines.</description>
      <enclosure length="9671399" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/630686034-linear-digressions-re-release-factorization-machines.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/627136968</guid>
      <title>Re-release: Auto-generating websites with deep learning</title>
      <pubDate>Mon, 27 May 2019 02:01:11 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-auto-generating-websites-with-deep-learning</link>
      <itunes:duration>00:19:38</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We've already talked about neural nets in some detail (links below), and in particular we've been blown away by the way that image recognition from convolutional neural nets can be fed into recurrent neural nets that generate descriptions and captions of the images. Our episode today tells a similar tale, except today we're talking about a blog post where the author fed in wireframes of a website design and asked the neural net to generate the HTML and CSS that would actually build a website that looks like the wireframes. If you're a programmer who thinks your job is challenging enough that you're automation-proof, guess again...</itunes:summary>
      <itunes:subtitle>We've already talked about neural nets in some de…</itunes:subtitle>
      <description>We've already talked about neural nets in some detail (links below), and in particular we've been blown away by the way that image recognition from convolutional neural nets can be fed into recurrent neural nets that generate descriptions and captions of the images. Our episode today tells a similar tale, except today we're talking about a blog post where the author fed in wireframes of a website design and asked the neural net to generate the HTML and CSS that would actually build a website that looks like the wireframes. If you're a programmer who thinks your job is challenging enough that you're automation-proof, guess again...</description>
      <enclosure length="9427520" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/627136968-linear-digressions-re-release-auto-generating-websites-with-deep-learning.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/623301501</guid>
      <title>Advice to those trying to get a first job in data science</title>
      <pubDate>Sun, 19 May 2019 21:50:13 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/learning-by-doing-produced</link>
      <itunes:duration>00:17:33</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We often hear from folks wondering what advice we can give them as they search for their first job in data science. What does a hiring manager look for? Should someone focus on taking classes online, doing a bootcamp, reading books, something else? How can they stand out in a crowd? 

There’s no single answer, because so much depends on the person asking in the first place, but that doesn’t stop us from giving some perspective. So in this episode we’re sharing that advice out more widely, so hopefully more of you can benefit from it.</itunes:summary>
      <itunes:subtitle>We often hear from folks wondering what advice we…</itunes:subtitle>
      <description>We often hear from folks wondering what advice we can give them as they search for their first job in data science. What does a hiring manager look for? Should someone focus on taking classes online, doing a bootcamp, reading books, something else? How can they stand out in a crowd? 

There’s no single answer, because so much depends on the person asking in the first place, but that doesn’t stop us from giving some perspective. So in this episode we’re sharing that advice out more widely, so hopefully more of you can benefit from it.</description>
      <enclosure length="8422955" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/623301501-linear-digressions-learning-by-doing-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/619681395</guid>
      <title>Re - Release: Machine Learning Technical Debt</title>
      <pubDate>Sun, 12 May 2019 23:07:14 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-machine-learning-technical-debt</link>
      <itunes:duration>00:22:29</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week, we've got a fun paper by our friends at Google about the hidden costs of maintaining machine learning workflows.  If you've worked in software before, you're probably familiar with the idea of technical debt, which are inefficiencies that crop up in the code when you're trying to go fast.  You take shortcuts, hard-code variable values, skimp on the documentation, and generally write not-that-great code in order to get something done quickly, and then end up paying for it later on.  This is technical debt, and it's particularly easy to accrue with machine learning workflows.  That's the premise of this episode's paper.

https://ai.google/research/pubs/pub43146</itunes:summary>
      <itunes:subtitle>This week, we've got a fun paper by our friends a…</itunes:subtitle>
      <description>This week, we've got a fun paper by our friends at Google about the hidden costs of maintaining machine learning workflows.  If you've worked in software before, you're probably familiar with the idea of technical debt, which are inefficiencies that crop up in the code when you're trying to go fast.  You take shortcuts, hard-code variable values, skimp on the documentation, and generally write not-that-great code in order to get something done quickly, and then end up paying for it later on.  This is technical debt, and it's particularly easy to accrue with machine learning workflows.  That's the premise of this episode's paper.

https://ai.google/research/pubs/pub43146</description>
      <enclosure length="10792784" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/619681395-linear-digressions-re-release-machine-learning-technical-debt.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/616223289</guid>
      <title>Estimating Software Projects, and Why It's Hard</title>
      <pubDate>Sun, 05 May 2019 22:27:24 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/estimating-software-projects-and-why-its-hard</link>
      <itunes:duration>00:19:07</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you’re like most software engineers and, especially, data scientists, you find it really hard to make accurate estimates of how long a project will take to complete. Don’t feel bad: statistics is most likely actively working against your best efforts to give your boss an accurate delivery date. This week, we’ll talk through a great blog post that digs into the underlying probability and statistics assumptions that are probably driving your estimates, versus the ones that maybe should be driving them. 

Relevant links:

https://erikbern.com/2019/04/15/why-software-projects-take-longer-than-you-think-a-statistical-model.html</itunes:summary>
      <itunes:subtitle>If you’re like most software engineers and, espec…</itunes:subtitle>
      <description>If you’re like most software engineers and, especially, data scientists, you find it really hard to make accurate estimates of how long a project will take to complete. Don’t feel bad: statistics is most likely actively working against your best efforts to give your boss an accurate delivery date. This week, we’ll talk through a great blog post that digs into the underlying probability and statistics assumptions that are probably driving your estimates, versus the ones that maybe should be driving them. 

Relevant links:

https://erikbern.com/2019/04/15/why-software-projects-take-longer-than-you-think-a-statistical-model.html</description>
      <enclosure length="9179252" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/616223289-linear-digressions-estimating-software-projects-and-why-its-hard.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/612887847</guid>
      <title>The Black Hole Algorithm</title>
      <pubDate>Mon, 29 Apr 2019 00:55:57 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-black-hole-algorithm</link>
      <itunes:duration>00:20:17</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>53.5 million light-years away, there’s a gigantic galaxy called M87 with something interesting going on inside it. Between Einstein’s theory of relativity and the motion of a group of stars in the galaxy (the motion is characteristic of there being a huge gravitational mass present), scientists have believed for years that there is a supermassive black hole at the center of that galaxy. However, black holes are really hard to see directly because they aren’t a light source like a star or a supernova. They suck up all the light around them, and moreover, even though they’re really massive, they’re small in volume.

That’s why it was so amazing a few weeks ago when scientists announced that they had reconstructed an image of a black hole for the first time ever. The image was the result of many measurements combined together with a clever reconstruction strategy, and giving scientists, engineers, and all the rest of us something to marvel at.</itunes:summary>
      <itunes:subtitle>53.5 million light-years away, there’s a gigantic…</itunes:subtitle>
      <description>53.5 million light-years away, there’s a gigantic galaxy called M87 with something interesting going on inside it. Between Einstein’s theory of relativity and the motion of a group of stars in the galaxy (the motion is characteristic of there being a huge gravitational mass present), scientists have believed for years that there is a supermassive black hole at the center of that galaxy. However, black holes are really hard to see directly because they aren’t a light source like a star or a supernova. They suck up all the light around them, and moreover, even though they’re really massive, they’re small in volume.

That’s why it was so amazing a few weeks ago when scientists announced that they had reconstructed an image of a black hole for the first time ever. The image was the result of many measurements combined together with a clever reconstruction strategy, and giving scientists, engineers, and all the rest of us something to marvel at.</description>
      <enclosure length="9739108" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/612887847-linear-digressions-the-black-hole-algorithm.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/609453735</guid>
      <title>Structure in AI</title>
      <pubDate>Sun, 21 Apr 2019 22:29:02 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/structure-in-ai</link>
      <itunes:duration>00:19:05</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>As artificial intelligence algorithms get applied to more and more domains, a question that often arises is whether to somehow build structure into the algorithm itself to mimic the structure of the problem. There’s usually some amount of knowledge we already have of each domain, an understanding of how it usually works, but it’s not clear how (or even if) to lend this knowledge to an AI algorithm to help it get started. Sure, it may get the algorithm caught up to where we already were on solving that problem, but will it eventually become a limitation where the structure and assumptions prevent the algorithm from surpassing human performance?

It’s a problem without a universal answer. This week, we’ll talk about the question in general, and especially recommend a recent discussion between Christopher Manning and Yann LeCun, two AI researchers who hold different opinions on whether structure is a necessary good or a necessary evil.

Relevant link:
http://www.abigailsee.com/2018/02/21/deep-learning-structure-and-innate-priors.html</itunes:summary>
      <itunes:subtitle>As artificial intelligence algorithms get applied…</itunes:subtitle>
      <description>As artificial intelligence algorithms get applied to more and more domains, a question that often arises is whether to somehow build structure into the algorithm itself to mimic the structure of the problem. There’s usually some amount of knowledge we already have of each domain, an understanding of how it usually works, but it’s not clear how (or even if) to lend this knowledge to an AI algorithm to help it get started. Sure, it may get the algorithm caught up to where we already were on solving that problem, but will it eventually become a limitation where the structure and assumptions prevent the algorithm from surpassing human performance?

It’s a problem without a universal answer. This week, we’ll talk about the question in general, and especially recommend a recent discussion between Christopher Manning and Yann LeCun, two AI researchers who hold different opinions on whether structure is a necessary good or a necessary evil.

Relevant link:
http://www.abigailsee.com/2018/02/21/deep-learning-structure-and-innate-priors.html</description>
      <enclosure length="9165041" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/609453735-linear-digressions-structure-in-ai.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/606100506</guid>
      <title>The Great Data Science Specialist vs. Generalist Debate</title>
      <pubDate>Mon, 15 Apr 2019 00:55:41 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-great-data-science-specialist-vs-generalist-debate</link>
      <itunes:duration>00:14:10</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It’s not news that data scientists are expected to be capable in many different areas (writing software, designing experiments, analyzing data, talking to non-technical stakeholders). One thing that has been changing, though, as the field becomes a bit older and more mature, is our ideas about what data scientists should focus on to stay relevant. Should they specialize in a particular area (if so, which one)? Should they instead stay general and work across many different areas? In either case, what are the costs and benefits?

This question has prompted a number of think pieces lately, which are sometimes advocating for specializing, and sometimes pointing out the benefits of generalists. In short, if you’re trying to figure out what to actually do, you might be hearing some conflicting opinions. In this episode, we break apart the arguments both ways, and maybe (hopefully?) reach a little resolution about where to go from here.</itunes:summary>
      <itunes:subtitle>It’s not news that data scientists are expected t…</itunes:subtitle>
      <description>It’s not news that data scientists are expected to be capable in many different areas (writing software, designing experiments, analyzing data, talking to non-technical stakeholders). One thing that has been changing, though, as the field becomes a bit older and more mature, is our ideas about what data scientists should focus on to stay relevant. Should they specialize in a particular area (if so, which one)? Should they instead stay general and work across many different areas? In either case, what are the costs and benefits?

This question has prompted a number of think pieces lately, which are sometimes advocating for specializing, and sometimes pointing out the benefits of generalists. In short, if you’re trying to figure out what to actually do, you might be hearing some conflicting opinions. In this episode, we break apart the arguments both ways, and maybe (hopefully?) reach a little resolution about where to go from here.</description>
      <enclosure length="6806080" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/606100506-linear-digressions-the-great-data-science-specialist-vs-generalist-debate.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/602577039</guid>
      <title>Google X, and Taking Risks the Smart Way</title>
      <pubDate>Mon, 08 Apr 2019 01:10:57 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/google-x-and-taking-risks-the-smart-way</link>
      <itunes:duration>00:19:04</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you work in data science, you’re well aware of the sheer volume of high-risk, high-reward projects that are hypothetically possible. The fact that they’re high-reward means they’re exciting to think about, and the payoff would be huge if they succeed, but the high-risk piece means that you have to be smart about what you choose to work on and be wary of investing all your resources in projects that fail entirely or starve other, higher-value projects. 

This episode focuses mainly on Google X, the so-called “Moonshot Factory” at Google that is a modern-day heir to the research legacies of Bell Labs and Xerox PARC. It’s an organization entirely focused on rapidly imagining, prototyping, invalidating, and, occasionally, successfully creating game-changing technologies. The process and philosophy behind Google X are useful for anyone thinking about how to stay aggressive and “responsibly irresponsible,” which includes a lot of you data science folks out there.</itunes:summary>
      <itunes:subtitle>If you work in data science, you’re well aware of…</itunes:subtitle>
      <description>If you work in data science, you’re well aware of the sheer volume of high-risk, high-reward projects that are hypothetically possible. The fact that they’re high-reward means they’re exciting to think about, and the payoff would be huge if they succeed, but the high-risk piece means that you have to be smart about what you choose to work on and be wary of investing all your resources in projects that fail entirely or starve other, higher-value projects. 

This episode focuses mainly on Google X, the so-called “Moonshot Factory” at Google that is a modern-day heir to the research legacies of Bell Labs and Xerox PARC. It’s an organization entirely focused on rapidly imagining, prototyping, invalidating, and, occasionally, successfully creating game-changing technologies. The process and philosophy behind Google X are useful for anyone thinking about how to stay aggressive and “responsibly irresponsible,” which includes a lot of you data science folks out there.</description>
      <enclosure length="9153130" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/602577039-linear-digressions-google-x-and-taking-risks-the-smart-way.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/599034660</guid>
      <title>Statistical Significance in Hypothesis Testing</title>
      <pubDate>Mon, 01 Apr 2019 01:34:53 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/statistical-significance-in-hypothesis-testing</link>
      <itunes:duration>00:22:34</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When you are running an AB test, one of the most important questions is how much data to collect. Collect too little, and you can end up drawing the wrong conclusion from your experiment. But in a world where experimenting is generally not free, and you want to move quickly once you know the answer, there is such a thing as collecting too much data. Statisticians have been solving this problem for decades, and their best practices are encompassed in the ideas of power, statistical significance, and especially how to generally think about hypothesis testing. This week, we’re going over these important concepts, so your next AB test is just as data-intensive as it needs to be.</itunes:summary>
      <itunes:subtitle>When you are running an AB test, one of the most …</itunes:subtitle>
      <description>When you are running an AB test, one of the most important questions is how much data to collect. Collect too little, and you can end up drawing the wrong conclusion from your experiment. But in a world where experimenting is generally not free, and you want to move quickly once you know the answer, there is such a thing as collecting too much data. Statisticians have been solving this problem for decades, and their best practices are encompassed in the ideas of power, statistical significance, and especially how to generally think about hypothesis testing. This week, we’re going over these important concepts, so your next AB test is just as data-intensive as it needs to be.</description>
      <enclosure length="10835206" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/599034660-linear-digressions-statistical-significance-in-hypothesis-testing.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/595342878</guid>
      <title>The Language Model Too Dangerous to Release</title>
      <pubDate>Mon, 25 Mar 2019 01:39:45 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-language-model-too-dangerous-to-release</link>
      <itunes:duration>00:21:01</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>OpenAI recently created a cutting-edge new natural language processing model, but unlike all their other projects so far, they have not released it to the public. Why? It seems to be a little too good. It can answer reading comprehension questions, summarize text, translate from one language to another, and generate realistic fake text. This last case, in particular, raised concerns inside OpenAI that the raw model could be dangerous if bad actors had access to it, so researchers will spend the next six months studying the model (and reading comments from you, if you have strong opinions here) to decide what to do next. Regardless of where this lands from a policy perspective, it’s an impressive model and the snippets of released auto-generated text are quite impressive. We’re covering the methodology, the results, and a bit of the policy implications in our episode this week.</itunes:summary>
      <itunes:subtitle>OpenAI recently created a cutting-edge new natura…</itunes:subtitle>
      <description>OpenAI recently created a cutting-edge new natural language processing model, but unlike all their other projects so far, they have not released it to the public. Why? It seems to be a little too good. It can answer reading comprehension questions, summarize text, translate from one language to another, and generate realistic fake text. This last case, in particular, raised concerns inside OpenAI that the raw model could be dangerous if bad actors had access to it, so researchers will spend the next six months studying the model (and reading comments from you, if you have strong opinions here) to decide what to do next. Regardless of where this lands from a policy perspective, it’s an impressive model and the snippets of released auto-generated text are quite impressive. We’re covering the methodology, the results, and a bit of the policy implications in our episode this week.</description>
      <enclosure length="10092911" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/595342878-linear-digressions-the-language-model-too-dangerous-to-release.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/591663828</guid>
      <title>The cathedral and the bazaar</title>
      <pubDate>Sun, 17 Mar 2019 22:47:01 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-cathedral-and-the-bazaar</link>
      <itunes:duration>00:32:36</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Imagine you have two choices of how to build something: top-down and controlled, with a few people playing a master designer role, or bottom-up and free-for-all, with nobody playing an explicit architect role. Which one do you think would make the better product? “The Cathedral and the Bazaar” is an essay exploring this question for open source software, and making an argument for the bottom-up approach. It’s not entirely intuitive that projects like Linux or scikit-learn, with many contributors and an open-door policy for modifying the code, would be able to resist the chaos of many cooks in the kitchen. So what makes it work in some cases? And sometimes not work in others? That’s the topic of discussion this week.

Relevant links: 
http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/index.html</itunes:summary>
      <itunes:subtitle>Imagine you have two choices of how to build some…</itunes:subtitle>
      <description>Imagine you have two choices of how to build something: top-down and controlled, with a few people playing a master designer role, or bottom-up and free-for-all, with nobody playing an explicit architect role. Which one do you think would make the better product? “The Cathedral and the Bazaar” is an essay exploring this question for open source software, and making an argument for the bottom-up approach. It’s not entirely intuitive that projects like Linux or scikit-learn, with many contributors and an open-door policy for modifying the code, would be able to resist the chaos of many cooks in the kitchen. So what makes it work in some cases? And sometimes not work in others? That’s the topic of discussion this week.

Relevant links: 
http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/index.html</description>
      <enclosure length="15652395" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/591663828-linear-digressions-the-cathedral-and-the-bazaar.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/588116640</guid>
      <title>AlphaStar</title>
      <pubDate>Mon, 11 Mar 2019 01:18:26 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/alphastar</link>
      <itunes:duration>00:22:03</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It’s time for our latest installation in the series on artificial intelligence agents beating humans at games that we thought were safe from the robots. In this case, the game is StarCraft, and the AI agent is AlphaStar, from the same team that built the Go-playing AlphaGo AI last year. StarCraft presents some interesting challenges though: the gameplay is continuous, there are many different kinds of actions a player must take, and of course there’s the usual complexities of playing strategy games and contending with human opponents. AlphaStar overcame all of these challenges, and more, to notch another win for the computers.</itunes:summary>
      <itunes:subtitle>It’s time for our latest installation in the seri…</itunes:subtitle>
      <description>It’s time for our latest installation in the series on artificial intelligence agents beating humans at games that we thought were safe from the robots. In this case, the game is StarCraft, and the AI agent is AlphaStar, from the same team that built the Go-playing AlphaGo AI last year. StarCraft presents some interesting challenges though: the gameplay is continuous, there are many different kinds of actions a player must take, and of course there’s the usual complexities of playing strategy games and contending with human opponents. AlphaStar overcame all of these challenges, and more, to notch another win for the computers.</description>
      <enclosure length="10586521" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/588116640-linear-digressions-alphastar.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/584590707</guid>
      <title>Are machine learning engineers the new data scientists?</title>
      <pubDate>Mon, 04 Mar 2019 02:57:19 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/are-machine-learning-engineers-the-new-data-scientists</link>
      <itunes:duration>00:20:46</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>For many data scientists, maintaining models and workflows in production is both a huge part of their job and not something they necessarily trained for if their background is more in statistics or machine learning methodology. Productionizing and maintaining data science code has more in common with software engineering than traditional science, and to reflect that, there’s a new-ish role, and corresponding job title, that you should know about. It’s called machine learning engineer, and it’s what a lot of data scientists are becoming.

Relevant links:
https://medium.com/@tomaszdudek/but-what-is-this-machine-learning-engineer-actually-doing-18464d5c699
https://www.forbes.com/sites/forbestechcouncil/2019/02/04/why-there-will-be-no-data-science-job-titles-by-2029/#64e3906c3a8f</itunes:summary>
      <itunes:subtitle>For many data scientists, maintaining models and …</itunes:subtitle>
      <description>For many data scientists, maintaining models and workflows in production is both a huge part of their job and not something they necessarily trained for if their background is more in statistics or machine learning methodology. Productionizing and maintaining data science code has more in common with software engineering than traditional science, and to reflect that, there’s a new-ish role, and corresponding job title, that you should know about. It’s called machine learning engineer, and it’s what a lot of data scientists are becoming.

Relevant links:
https://medium.com/@tomaszdudek/but-what-is-this-machine-learning-engineer-actually-doing-18464d5c699
https://www.forbes.com/sites/forbestechcouncil/2019/02/04/why-there-will-be-no-data-science-job-titles-by-2029/#64e3906c3a8f</description>
      <enclosure length="9969404" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/584590707-linear-digressions-are-machine-learning-engineers-the-new-data-scientists.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/580774869</guid>
      <title>Interview with Alex Radovic, particle physicist turned machine learning researcher</title>
      <pubDate>Mon, 25 Feb 2019 01:59:03 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/interview-with-alex-radovic-particle-physicist-turned-machine-learning-researcher</link>
      <itunes:duration>00:35:42</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>You’d be hard-pressed to find a field with bigger, richer, and more scientifically valuable data than particle physics. Years before “data scientist” was even a term, particle physicists were inventing technologies like the world wide web and cloud computing grids to help them distribute and analyze the datasets required to make particle physics discoveries. Somewhat counterintuitively, though, deep learning has only really debuted in particle physics in the last few years, although it’s making up for lost time with many exciting new advances.

This episode of Linear Digressions is a little different from most, as we’ll be interviewing a guest, one of my (Katie’s) friends from particle physics, Alex Radovic. Alex and his colleagues have been at the forefront of machine learning in physics over the last few years, and his perspective on the strengths and shortcomings of those two fields together is a fascinating one.</itunes:summary>
      <itunes:subtitle>You’d be hard-pressed to find a field with bigger…</itunes:subtitle>
      <description>You’d be hard-pressed to find a field with bigger, richer, and more scientifically valuable data than particle physics. Years before “data scientist” was even a term, particle physicists were inventing technologies like the world wide web and cloud computing grids to help them distribute and analyze the datasets required to make particle physics discoveries. Somewhat counterintuitively, though, deep learning has only really debuted in particle physics in the last few years, although it’s making up for lost time with many exciting new advances.

This episode of Linear Digressions is a little different from most, as we’ll be interviewing a guest, one of my (Katie’s) friends from particle physics, Alex Radovic. Alex and his colleagues have been at the forefront of machine learning in physics over the last few years, and his perspective on the strengths and shortcomings of those two fields together is a fascinating one.</description>
      <enclosure length="17135338" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/580774869-linear-digressions-interview-with-alex-radovic-particle-physicist-turned-machine-learning-researcher.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/577038141</guid>
      <title>K Nearest Neighbors</title>
      <pubDate>Sun, 17 Feb 2019 23:57:23 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/k-nearest-neighbors</link>
      <itunes:duration>00:16:25</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>K Nearest Neighbors is an algorithm with secrets. On one hand, the algorithm itself is as straightforward as possible: find the labeled points nearest the point that you need to predict, and make a prediction that’s the average of their answers. On the other hand, what does “nearest” mean when you’re dealing with complex data? How do you decide whether a man and a woman of the same age are “nearer” to each other than two women several years apart? What if you convert all your monetary columns from dollars to cents, your distances from miles to nanometers, your weights from pounds to kilograms? Can your definition of “nearest” hold up under these types of transformations? We’re discussing all this, and more, in this week’s episode.</itunes:summary>
      <itunes:subtitle>K Nearest Neighbors is an algorithm with secrets.…</itunes:subtitle>
      <description>K Nearest Neighbors is an algorithm with secrets. On one hand, the algorithm itself is as straightforward as possible: find the labeled points nearest the point that you need to predict, and make a prediction that’s the average of their answers. On the other hand, what does “nearest” mean when you’re dealing with complex data? How do you decide whether a man and a woman of the same age are “nearer” to each other than two women several years apart? What if you convert all your monetary columns from dollars to cents, your distances from miles to nanometers, your weights from pounds to kilograms? Can your definition of “nearest” hold up under these types of transformations? We’re discussing all this, and more, in this week’s episode.</description>
      <enclosure length="7880444" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/577038141-linear-digressions-k-nearest-neighbors.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/573340113</guid>
      <title>Not every deep learning paper is great. Is that a problem?</title>
      <pubDate>Mon, 11 Feb 2019 00:06:33 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/not-every-deep-learning-paper-is-great-is-that-a-problem</link>
      <itunes:duration>00:17:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Deep learning is a field that’s growing quickly. That’s good! There are lots of new deep learning papers put out every day. That’s good too… right? What if not every paper out there is particularly good? What even makes a paper good in the first place? It’s an interesting thing to think about, and debate, since there’s no clean-cut answer and there are worthwhile arguments both ways. Wherever you find yourself coming down in the debate, though, you’ll appreciate the good papers that much more.

Relevant links:
https://blog.piekniewski.info/2018/07/14/autopsy-dl-paper/
https://www.reddit.com/r/MachineLearning/comments/90n40l/dautopsy_of_a_deep_learning_paper_quite_brutal/
https://www.reddit.com/r/MachineLearning/comments/agiatj/d_google_ai_refuses_to_share_dataset_fields_for_a/</itunes:summary>
      <itunes:subtitle>Deep learning is a field that’s growing quickly. …</itunes:subtitle>
      <description>Deep learning is a field that’s growing quickly. That’s good! There are lots of new deep learning papers put out every day. That’s good too… right? What if not every paper out there is particularly good? What even makes a paper good in the first place? It’s an interesting thing to think about, and debate, since there’s no clean-cut answer and there are worthwhile arguments both ways. Wherever you find yourself coming down in the debate, though, you’ll appreciate the good papers that much more.

Relevant links:
https://blog.piekniewski.info/2018/07/14/autopsy-dl-paper/
https://www.reddit.com/r/MachineLearning/comments/90n40l/dautopsy_of_a_deep_learning_paper_quite_brutal/
https://www.reddit.com/r/MachineLearning/comments/agiatj/d_google_ai_refuses_to_share_dataset_fields_for_a/</description>
      <enclosure length="8592855" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/573340113-linear-digressions-not-every-deep-learning-paper-is-great-is-that-a-problem.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/569678496</guid>
      <title>The Assumptions of Ordinary Least Squares</title>
      <pubDate>Sun, 03 Feb 2019 23:24:15 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-assumptions-of-ordinary-least-squares</link>
      <itunes:duration>00:25:07</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Ordinary least squares (OLS) is often used synonymously with linear regression. If you’re a data scientist, machine learner, or statistician, you bump into it daily. If you haven’t had the opportunity to build up your understanding from the foundations, though, listen up: there are a number of assumptions underlying OLS that you should know and love. They’re interesting, force you to think about data and statistics, and help you know when you’re out of “good” OLS territory and into places where you could run into trouble.</itunes:summary>
      <itunes:subtitle>Ordinary least squares (OLS) is often used synony…</itunes:subtitle>
      <description>Ordinary least squares (OLS) is often used synonymously with linear regression. If you’re a data scientist, machine learner, or statistician, you bump into it daily. If you haven’t had the opportunity to build up your understanding from the foundations, though, listen up: there are a number of assumptions underlying OLS that you should know and love. They’re interesting, force you to think about data and statistics, and help you know when you’re out of “good” OLS territory and into places where you could run into trouble.</description>
      <enclosure length="12055229" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/569678496-linear-digressions-the-assumptions-of-ordinary-least-squares.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/565640658</guid>
      <title>Quantile Regression</title>
      <pubDate>Mon, 28 Jan 2019 01:27:40 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/quantile-regression</link>
      <itunes:duration>00:21:46</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Linear regression is a great tool if you want to make predictions about the mean value that an outcome will have given certain values for the inputs. But what if you want to predict the median? Or the 10th percentile? Or the 90th percentile. You need quantile regression, which is similar to ordinary least squares regression in some ways but with some really interesting twists that make it unique. This week, we’ll go over the concept of quantile regression, and also a bit about how it works and when you might use it.

Relevant links:
https://www.aeaweb.org/articles?id=10.1257/jep.15.4.143
https://eng.uber.com/analyzing-experiment-outcomes/</itunes:summary>
      <itunes:subtitle>Linear regression is a great tool if you want to …</itunes:subtitle>
      <description>Linear regression is a great tool if you want to make predictions about the mean value that an outcome will have given certain values for the inputs. But what if you want to predict the median? Or the 10th percentile? Or the 90th percentile. You need quantile regression, which is similar to ordinary least squares regression in some ways but with some really interesting twists that make it unique. This week, we’ll go over the concept of quantile regression, and also a bit about how it works and when you might use it.

Relevant links:
https://www.aeaweb.org/articles?id=10.1257/jep.15.4.143
https://eng.uber.com/analyzing-experiment-outcomes/</description>
      <enclosure length="10452983" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/565640658-linear-digressions-quantile-regression.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/562124733</guid>
      <title>Heterogeneous Treatment Effects</title>
      <pubDate>Sun, 20 Jan 2019 23:57:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/heterogeneous-treatment-effects</link>
      <itunes:duration>00:17:24</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When data scientists use a linear regression to look for causal relationships between a treatment and an outcome, what they’re usually finding is the so-called average treatment effect. In other words, on average, here’s what the treatment does in terms of making a certain outcome more or less likely to happen. But there’s more to life than averages: sometimes the relationship works one way in some cases, and another way in other cases, such that the average isn’t giving you the whole story. In that case, you want to start thinking about heterogeneous treatment effects, and this is the podcast episode for you.

Relevant links:
https://eng.uber.com/analyzing-experiment-outcomes/
https://multithreaded.stitchfix.com/blog/2018/11/08/bandits/
https://www.locallyoptimistic.com/post/against-ab-tests/</itunes:summary>
      <itunes:subtitle>When data scientists use a linear regression to l…</itunes:subtitle>
      <description>When data scientists use a linear regression to look for causal relationships between a treatment and an outcome, what they’re usually finding is the so-called average treatment effect. In other words, on average, here’s what the treatment does in terms of making a certain outcome more or less likely to happen. But there’s more to life than averages: sometimes the relationship works one way in some cases, and another way in other cases, such that the average isn’t giving you the whole story. In that case, you want to start thinking about heterogeneous treatment effects, and this is the podcast episode for you.

Relevant links:
https://eng.uber.com/analyzing-experiment-outcomes/
https://multithreaded.stitchfix.com/blog/2018/11/08/bandits/
https://www.locallyoptimistic.com/post/against-ab-tests/</description>
      <enclosure length="8353156" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/562124733-linear-digressions-heterogeneous-treatment-effects.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/558621291</guid>
      <title>Pre-training language models for natural language processing problems</title>
      <pubDate>Mon, 14 Jan 2019 00:42:31 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/pre-training-language-models-for-natural-language-processing-problems</link>
      <itunes:duration>00:27:35</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When you build a model for natural language processing (NLP), such as a recurrent neural network, it helps a ton if you’re not starting from zero. In other words, if you can draw upon other datasets for building your understanding of word meanings, and then use your training dataset just for subject-specific refinements, you’ll get farther than just using your training dataset for everything. This idea of starting with some pre-trained resources has an analogue in computer vision, where initializations from ImageNet used for the first few layers of a CNN have become the new standard. There’s a similar progression under way in NLP, where simple(r) embeddings like word2vec are giving way to more advanced pre-processing methods that aim to capture more sophisticated understanding of word meanings, contexts, language structure, and more.

Relevant links:
https://thegradient.pub/nlp-imagenet/</itunes:summary>
      <itunes:subtitle>When you build a model for natural language proce…</itunes:subtitle>
      <description>When you build a model for natural language processing (NLP), such as a recurrent neural network, it helps a ton if you’re not starting from zero. In other words, if you can draw upon other datasets for building your understanding of word meanings, and then use your training dataset just for subject-specific refinements, you’ll get farther than just using your training dataset for everything. This idea of starting with some pre-trained resources has an analogue in computer vision, where initializations from ImageNet used for the first few layers of a CNN have become the new standard. There’s a similar progression under way in NLP, where simple(r) embeddings like word2vec are giving way to more advanced pre-processing methods that aim to capture more sophisticated understanding of word meanings, contexts, language structure, and more.

Relevant links:
https://thegradient.pub/nlp-imagenet/</description>
      <enclosure length="13240561" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/558621291-linear-digressions-pre-training-language-models-for-natural-language-processing-problems.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/555151260</guid>
      <title>Facial Recognition, Society, and the Law</title>
      <pubDate>Mon, 07 Jan 2019 02:03:29 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/facial-recognition-society-and-the-law</link>
      <itunes:duration>00:42:46</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Facial recognition being used in everyday life seemed far-off not too long ago. Increasingly, it’s being used and advanced widely and with increasing speed, which means that our technical capabilities are starting to outpace (if they haven’t already) our consensus as a society about what is acceptable in facial recognition and what isn’t. The threats to privacy, fairness, and freedom are real, and Microsoft has become one of the first large companies using this technology to speak out in specific support of its regulation through legislation. Their arguments are interesting, provocative, and even if you don’t agree with every point they make or harbor some skepticism, there’s a lot to think about in what they’re saying.

https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/</itunes:summary>
      <itunes:subtitle>Facial recognition being used in everyday life se…</itunes:subtitle>
      <description>Facial recognition being used in everyday life seemed far-off not too long ago. Increasingly, it’s being used and advanced widely and with increasing speed, which means that our technical capabilities are starting to outpace (if they haven’t already) our consensus as a society about what is acceptable in facial recognition and what isn’t. The threats to privacy, fairness, and freedom are real, and Microsoft has become one of the first large companies using this technology to speak out in specific support of its regulation through legislation. Their arguments are interesting, provocative, and even if you don’t agree with every point they make or harbor some skepticism, there’s a lot to think about in what they’re saying.

https://blogs.microsoft.com/on-the-issues/2018/12/06/facial-recognition-its-time-for-action/</description>
      <enclosure length="20533531" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/555151260-linear-digressions-facial-recognition-society-and-the-law.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/552052683</guid>
      <title>Re-release: Word2Vec</title>
      <pubDate>Mon, 31 Dec 2018 01:56:03 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-word2vec</link>
      <itunes:duration>00:17:59</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Bringing you another old classic this week, as we gear up for 2019! See you next week with new content.

Word2Vec is probably the go-to algorithm for vectorizing text data these days.  Which makes sense, because it is wicked cool.  Word2Vec has it all: neural networks, skip-grams and bag-of-words implementations, a multiclass classifier that gets swapped out for a binary classifier, made-up dummy words, and a model that isn't actually used to predict anything (usually).  And all that's before we get to the part about how Word2Vec allows you to do algebra with text.  Seriously, this stuff is cool.</itunes:summary>
      <itunes:subtitle>Bringing you another old classic this week, as we…</itunes:subtitle>
      <description>Bringing you another old classic this week, as we gear up for 2019! See you next week with new content.

Word2Vec is probably the go-to algorithm for vectorizing text data these days.  Which makes sense, because it is wicked cool.  Word2Vec has it all: neural networks, skip-grams and bag-of-words implementations, a multiclass classifier that gets swapped out for a binary classifier, made-up dummy words, and a model that isn't actually used to predict anything (usually).  And all that's before we get to the part about how Word2Vec allows you to do algebra with text.  Seriously, this stuff is cool.</description>
      <enclosure length="25893231" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/552052683-linear-digressions-re-release-word2vec.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/549206592</guid>
      <title>Re - Release: The Cold Start Problem</title>
      <pubDate>Sun, 23 Dec 2018 20:23:33 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-the-cold-start-problem</link>
      <itunes:duration>00:15:37</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We’re taking a break for the holidays, chilling with the dog and an eggnog (Katie) and the cat and some spiced cider (Ben). Here’s an episode from a while back for you to enjoy. See you again in 2019!

You might sometimes find that it's hard to get started doing something, but once you're going, it gets easier. Turns out machine learning algorithms, and especially recommendation engines, feel the same way. The more they "know" about a user, like what movies they watch and how they rate them, the better they do at suggesting new movies, which is great until you realize that you have to start somewhere. The "cold start" problem will be our focus in this episode, both the heuristic solutions that help deal with it and a bit of realism about the importance of skepticism when someone claims a great solution to cold starts.</itunes:summary>
      <itunes:subtitle>We’re taking a break for the holidays, chilling w…</itunes:subtitle>
      <description>We’re taking a break for the holidays, chilling with the dog and an eggnog (Katie) and the cat and some spiced cider (Ben). Here’s an episode from a while back for you to enjoy. See you again in 2019!

You might sometimes find that it's hard to get started doing something, but once you're going, it gets easier. Turns out machine learning algorithms, and especially recommendation engines, feel the same way. The more they "know" about a user, like what movies they watch and how they rate them, the better they do at suggesting new movies, which is great until you realize that you have to start somewhere. The "cold start" problem will be our focus in this episode, both the heuristic solutions that help deal with it and a bit of realism about the importance of skepticism when someone claims a great solution to cold starts.</description>
      <enclosure length="7496339" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/549206592-linear-digressions-re-release-the-cold-start-problem.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/546033813</guid>
      <title>Convex (and non-convex) Optimization</title>
      <pubDate>Mon, 17 Dec 2018 03:06:42 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/convex-and-non-convex-optimization</link>
      <itunes:duration>00:20:00</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Convex optimization is one of the keys to data science, both because some problems straight-up call for optimization solutions and because popular algorithms like a gradient descent solution to ordinary least squares are supported by optimization techniques. But there are all kinds of subtleties, starting with convex and non-convex functions, why gradient descent is really an optimization problem, and what that means for your average data scientist or statistician.</itunes:summary>
      <itunes:subtitle>Convex optimization is one of the keys to data sc…</itunes:subtitle>
      <description>Convex optimization is one of the keys to data science, both because some problems straight-up call for optimization solutions and because popular algorithms like a gradient descent solution to ordinary least squares are supported by optimization techniques. But there are all kinds of subtleties, starting with convex and non-convex functions, why gradient descent is really an optimization problem, and what that means for your average data scientist or statistician.</description>
      <enclosure length="9599301" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/546033813-linear-digressions-convex-and-non-convex-optimization.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/542133603</guid>
      <title>The Normal Distribution and the Central Limit Theorem</title>
      <pubDate>Sun, 09 Dec 2018 18:58:28 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-normal-distribution-and-the-central-limit-theorem</link>
      <itunes:duration>00:27:11</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When you think about it, it’s pretty amazing that we can draw conclusions about huge populations, even the whole world, based on datasets that are comparatively very small (a few thousand, or a few hundred, or even sometimes a few dozen). That’s the power of statistics, though. This episode is kind of a two-for-one but we’re excited about it—first we’ll talk about the Normal or Gaussian distribution, which is maybe the most famous probability distribution function out there, and then turn to the Central Limit Theorem, which is one of the foundational tenets of statistics and the real reason why the Normal distribution is so important.</itunes:summary>
      <itunes:subtitle>When you think about it, it’s pretty amazing that…</itunes:subtitle>
      <description>When you think about it, it’s pretty amazing that we can draw conclusions about huge populations, even the whole world, based on datasets that are comparatively very small (a few thousand, or a few hundred, or even sometimes a few dozen). That’s the power of statistics, though. This episode is kind of a two-for-one but we’re excited about it—first we’ll talk about the Normal or Gaussian distribution, which is maybe the most famous probability distribution function out there, and then turn to the Central Limit Theorem, which is one of the foundational tenets of statistics and the real reason why the Normal distribution is so important.</description>
      <enclosure length="13051644" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/542133603-linear-digressions-the-normal-distribution-and-the-central-limit-theorem.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/538887468</guid>
      <title>Software 2.0</title>
      <pubDate>Sun, 02 Dec 2018 23:23:05 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/software-20</link>
      <itunes:duration>00:17:22</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Neural nets are a way you can model a system, sure, but if you take a step back, squint, and tilt your head, they can also be called… software? Not in the sense that they’re written in code, but in the sense that the neural net itself operates under the same set of general requirements as does software that a human would write. Namely, neural nets take inputs and create outputs from them according to a set of rules, but the thing about the inside of the neural net black box is that it’s written by a computer, whereas the software we’re more familiar with is written by a human. Neural net researcher and Tesla director of AI Andrej Karpathy has taken to calling neural nets “Software 2.0” as a result, and the implications from this connection are really cool. We’ll talk about it this week.

Relevant links:
https://medium.com/@karpathy/software-2-0-a64152b37c35</itunes:summary>
      <itunes:subtitle>Neural nets are a way you can model a system, sur…</itunes:subtitle>
      <description>Neural nets are a way you can model a system, sure, but if you take a step back, squint, and tilt your head, they can also be called… software? Not in the sense that they’re written in code, but in the sense that the neural net itself operates under the same set of general requirements as does software that a human would write. Namely, neural nets take inputs and create outputs from them according to a set of rules, but the thing about the inside of the neural net black box is that it’s written by a computer, whereas the software we’re more familiar with is written by a human. Neural net researcher and Tesla director of AI Andrej Karpathy has taken to calling neural nets “Software 2.0” as a result, and the implications from this connection are really cool. We’ll talk about it this week.

Relevant links:
https://medium.com/@karpathy/software-2-0-a64152b37c35</description>
      <enclosure length="8339572" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/538887468-linear-digressions-software-20.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/531834792</guid>
      <title>Limitations of Deep Nets for Computer Vision</title>
      <pubDate>Sun, 18 Nov 2018 19:01:28 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/neural-net-limitations-produced</link>
      <itunes:duration>00:27:20</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Deep neural nets have a deserved reputation as the best-in-breed solution for computer vision problems. But there are many aspects of human vision that we take for granted but where neural nets struggle—this episode covers an eye-opening paper that summarizes some of the interesting weak spots of deep neural nets.

Relevant links: https://arxiv.org/abs/1805.04025</itunes:summary>
      <itunes:subtitle>Deep neural nets have a deserved reputation as th…</itunes:subtitle>
      <description>Deep neural nets have a deserved reputation as the best-in-breed solution for computer vision problems. But there are many aspects of human vision that we take for granted but where neural nets struggle—this episode covers an eye-opening paper that summarizes some of the interesting weak spots of deep neural nets.

Relevant links: https://arxiv.org/abs/1805.04025</description>
      <enclosure length="13120398" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/531834792-linear-digressions-neural-net-limitations-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/528439233</guid>
      <title>Building Data Science Teams</title>
      <pubDate>Mon, 12 Nov 2018 03:16:46 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/building-data-science-teams</link>
      <itunes:duration>00:25:09</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>At many places, data scientists don’t work solo anymore—it’s a team sport. But data science teams aren’t simply teams of data scientists working together. Instead, they’re usually cross-functional teams with engineers, managers, data scientists, and sometimes others all working together to build tools and products around data science. This episode talks about some of those roles on a typical data science team, what the responsibilities are for each role, and what skills and traits are most important for each team member to have.</itunes:summary>
      <itunes:subtitle>At many places, data scientists don’t work solo a…</itunes:subtitle>
      <description>At many places, data scientists don’t work solo anymore—it’s a team sport. But data science teams aren’t simply teams of data scientists working together. Instead, they’re usually cross-functional teams with engineers, managers, data scientists, and sometimes others all working together to build tools and products around data science. This episode talks about some of those roles on a typical data science team, what the responsibilities are for each role, and what skills and traits are most important for each team member to have.</description>
      <enclosure length="12076963" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/528439233-linear-digressions-building-data-science-teams.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/524794251</guid>
      <title>Optimized Optimized Web Crawling</title>
      <pubDate>Sun, 04 Nov 2018 21:38:32 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/optimized-optimized-web-crawling</link>
      <itunes:duration>00:19:42</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Last week’s episode, about methods for optimized web crawling logic, left off on a bit of a cliffhanger: the data scientists had found a solution to the problem, but it wasn’t something that the engineers (who own the search codebase, remember) liked very much. It was black-boxy, hard to parallelize, and introduced a lot of complexity to their code. This episode takes a second crack, where we formulate the problem a little differently and end up with a different, arguably more elegant solution. 

Relevant links:
http://www.unofficialgoogledatascience.com/2018/07/by-bill-richoux-critical-decisions-are.html
http://www.csc.kth.se/utbildning/kth/kurser/DD3364/Lectures/KKT.pdf</itunes:summary>
      <itunes:subtitle>Last week’s episode, about methods for optimized …</itunes:subtitle>
      <description>Last week’s episode, about methods for optimized web crawling logic, left off on a bit of a cliffhanger: the data scientists had found a solution to the problem, but it wasn’t something that the engineers (who own the search codebase, remember) liked very much. It was black-boxy, hard to parallelize, and introduced a lot of complexity to their code. This episode takes a second crack, where we formulate the problem a little differently and end up with a different, arguably more elegant solution. 

Relevant links:
http://www.unofficialgoogledatascience.com/2018/07/by-bill-richoux-critical-decisions-are.html
http://www.csc.kth.se/utbildning/kth/kurser/DD3364/Lectures/KKT.pdf</description>
      <enclosure length="9460121" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/524794251-linear-digressions-optimized-optimized-web-crawling.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/521331948</guid>
      <title>Optimized Web Crawling</title>
      <pubDate>Sun, 28 Oct 2018 23:56:36 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/optimized-web-crawling</link>
      <itunes:duration>00:21:32</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Got a fun optimization problem for you this week! It’s a two-for-one: how do you optimize the web crawling logic of an operation like Google search so that the results are, on average, as up-to-date as possible, and how do you optimize your solution of choice so that it’s maintainable by software engineers in a huge distributed system? We’re following an excellent post from the Unofficial Google Data Science blog going through this problem.

Relevant links: http://www.unofficialgoogledatascience.com/2018/07/by-bill-richoux-critical-decisions-are.html</itunes:summary>
      <itunes:subtitle>Got a fun optimization problem for you this week!…</itunes:subtitle>
      <description>Got a fun optimization problem for you this week! It’s a two-for-one: how do you optimize the web crawling logic of an operation like Google search so that the results are, on average, as up-to-date as possible, and how do you optimize your solution of choice so that it’s maintainable by software engineers in a huge distributed system? We’re following an excellent post from the Unofficial Google Data Science blog going through this problem.

Relevant links: http://www.unofficialgoogledatascience.com/2018/07/by-bill-richoux-critical-decisions-are.html</description>
      <enclosure length="10340552" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/521331948-linear-digressions-optimized-web-crawling.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/517831839</guid>
      <title>Better Know a Distribution: The Poisson Distribution</title>
      <pubDate>Mon, 22 Oct 2018 00:53:28 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/better-know-a-distribution-the-poisson-distribution</link>
      <itunes:duration>00:31:51</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The Poisson distribution is a probability distribution function used to for events that happen in time or space. It’s super handy because it’s pretty simple to use and is applicable for tons of things—there are a lot of interesting processes that boil down to “events that happen in time or space.” This episode is a quick introduction to the distribution, and then a focus on two of our favorite applications: using the Poisson distribution to identify supernovas and study army deaths from horse kicks.</itunes:summary>
      <itunes:subtitle>The Poisson distribution is a probability distrib…</itunes:subtitle>
      <description>The Poisson distribution is a probability distribution function used to for events that happen in time or space. It’s super handy because it’s pretty simple to use and is applicable for tons of things—there are a lot of interesting processes that boil down to “events that happen in time or space.” This episode is a quick introduction to the distribution, and then a focus on two of our favorite applications: using the Poisson distribution to identify supernovas and study army deaths from horse kicks.</description>
      <enclosure length="15293995" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/517831839-linear-digressions-better-know-a-distribution-the-poisson-distribution.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/514406379</guid>
      <title>Searching for Datasets with Google</title>
      <pubDate>Mon, 15 Oct 2018 01:11:58 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/searching-for-datasets-with-google</link>
      <itunes:duration>00:19:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you wanted to find a dataset of jokes, how would you do it? What about a dataset of podcast episodes? If your answer was “I’d try Google,” you might have been disappointed—Google is a great search engine for many types of web data, but it didn’t have any special tools to navigate the particular challenges of, well, dataset data. But all that is different now: Google recently announced Google Dataset Search, an effort to unify metadata tagging around datasets and complementary efforts on the search side to recognize and organize datasets in a way that’s useful and intuitive. So whether you’re an academic looking for an economics or physics or biology dataset, or a big old nerd modeling jokes or analyzing podcasts, there’s an exciting new way for you to find data.</itunes:summary>
      <itunes:subtitle>If you wanted to find a dataset of jokes, how wou…</itunes:subtitle>
      <description>If you wanted to find a dataset of jokes, how would you do it? What about a dataset of podcast episodes? If your answer was “I’d try Google,” you might have been disappointed—Google is a great search engine for many types of web data, but it didn’t have any special tools to navigate the particular challenges of, well, dataset data. But all that is different now: Google recently announced Google Dataset Search, an effort to unify metadata tagging around datasets and complementary efforts on the search side to recognize and organize datasets in a way that’s useful and intuitive. So whether you’re an academic looking for an economics or physics or biology dataset, or a big old nerd modeling jokes or analyzing podcasts, there’s an exciting new way for you to find data.</description>
      <enclosure length="9550818" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/514406379-linear-digressions-searching-for-datasets-with-google.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/511069371</guid>
      <title>It's our fourth birthday</title>
      <pubDate>Mon, 08 Oct 2018 02:33:55 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/fourth-anniversary-produced</link>
      <itunes:duration>00:22:06</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We started Linear Digressions 4 years ago… this isn’t a technical episode, just two buddies shooting the breeze about something we’ve somehow built together.</itunes:summary>
      <itunes:subtitle>We started Linear Digressions 4 years ago… this i…</itunes:subtitle>
      <description>We started Linear Digressions 4 years ago… this isn’t a technical episode, just two buddies shooting the breeze about something we’ve somehow built together.</description>
      <enclosure length="10608255" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/511069371-linear-digressions-fourth-anniversary-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/507460461</guid>
      <title>Gigantic Searches in Particle Physics</title>
      <pubDate>Sun, 30 Sep 2018 18:52:04 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/gigantic-searches-in-particle-physics-1</link>
      <itunes:duration>00:24:46</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week, we’re dusting off the ol’ particle physics PhD to bring you an episode about ambitious new model-agnostic searches for new particles happening at CERN. Traditionally, new particles have been discovered by “targeted searches,” where scientists have a hypothesis about the particle they’re looking for and where it might be found. However, with the huge amounts of data coming out of CERN, a new type of broader search algorithm is starting to be deployed. It’s a strategy that casts a very wide net, looking in many different places at the same time, which also introduces all kinds of interesting questions—even a one-in-a-thousand occurrence happens when you’re looking in many thousands of places.</itunes:summary>
      <itunes:subtitle>This week, we’re dusting off the ol’ particle phy…</itunes:subtitle>
      <description>This week, we’re dusting off the ol’ particle physics PhD to bring you an episode about ambitious new model-agnostic searches for new particles happening at CERN. Traditionally, new particles have been discovered by “targeted searches,” where scientists have a hypothesis about the particle they’re looking for and where it might be found. However, with the huge amounts of data coming out of CERN, a new type of broader search algorithm is starting to be deployed. It’s a strategy that casts a very wide net, looking in many different places at the same time, which also introduces all kinds of interesting questions—even a one-in-a-thousand occurrence happens when you’re looking in many thousands of places.</description>
      <enclosure length="11893688" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/507460461-linear-digressions-gigantic-searches-in-particle-physics-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/504201039</guid>
      <title>Data Engineering</title>
      <pubDate>Mon, 24 Sep 2018 01:10:13 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-engineering</link>
      <itunes:duration>00:16:22</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you’re a data scientist, you know how important it is to keep your data orderly, clean, moving smoothly between different systems, well-documented… there’s a ton of work that goes into building and maintaining databases and data pipelines. This job, that of owner and maintainer of the data being used for analytics, is often the realm of data engineers. From data extraction, transform and loading procedures to the data storage strategy and even the definitions of key data quantities that serve as focal points for a whole organization, data engineers keep the plumbing of data analytics running smoothly.</itunes:summary>
      <itunes:subtitle>If you’re a data scientist, you know how importan…</itunes:subtitle>
      <description>If you’re a data scientist, you know how important it is to keep your data orderly, clean, moving smoothly between different systems, well-documented… there’s a ton of work that goes into building and maintaining databases and data pipelines. This job, that of owner and maintainer of the data being used for analytics, is often the realm of data engineers. From data extraction, transform and loading procedures to the data storage strategy and even the definitions of key data quantities that serve as focal points for a whole organization, data engineers keep the plumbing of data analytics running smoothly.</description>
      <enclosure length="7862054" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/504201039-linear-digressions-data-engineering.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/500766036</guid>
      <title>Text Analysis for Guessing the NYTimes Op-Ed Author</title>
      <pubDate>Sun, 16 Sep 2018 18:13:09 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/text-analysis-for-guessing-the-nytimes-op-ed-author</link>
      <itunes:duration>00:18:37</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>A very intriguing op-ed was published in the NY Times recently, in which the author (a senior official in the Trump White House) claimed to be a minor saboteur of sorts, acting with his or her colleagues to undermine some of Donald Trump’s worst instincts and tendencies. Pretty stunning, right? So who is the author? It’s a mystery—the op-ed was published anonymously. That hasn’t stopped people from speculating though, and some machine learning on the vocabulary used in the op-ed is one way to get clues.</itunes:summary>
      <itunes:subtitle>A very intriguing op-ed was published in the NY T…</itunes:subtitle>
      <description>A very intriguing op-ed was published in the NY Times recently, in which the author (a senior official in the Trump White House) claimed to be a minor saboteur of sorts, acting with his or her colleagues to undermine some of Donald Trump’s worst instincts and tendencies. Pretty stunning, right? So who is the author? It’s a mystery—the op-ed was published anonymously. That hasn’t stopped people from speculating though, and some machine learning on the vocabulary used in the op-ed is one way to get clues.</description>
      <enclosure length="8939970" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/500766036-linear-digressions-text-analysis-for-guessing-the-nytimes-op-ed-author.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/497542941</guid>
      <title>The Three Types of Data Scientists, and What They Actually Do</title>
      <pubDate>Sun, 09 Sep 2018 19:00:09 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-three-types-of-data-scientists-and-what-they-actually-do</link>
      <itunes:duration>00:23:25</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you've been in data science for more than a year or two, chances are you've noticed changes in the field as it's grown and matured. And if you're newer to the field, you may feel like there's a disconnect between lots of different stories about what data scientists should know, or do, or expect from their job. This week, we cover two thought pieces, one that arose from interviews with 35(!) data scientists speaking about what their jobs actually are (and aren't), and one from the head of data science at AirBnb organizing core data science work into three main specialties.

Relevant links:
https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists
https://www.linkedin.com/pulse/one-data-science-job-doesnt-fit-all-elena-grewal</itunes:summary>
      <itunes:subtitle>If you've been in data science for more than a ye…</itunes:subtitle>
      <description>If you've been in data science for more than a year or two, chances are you've noticed changes in the field as it's grown and matured. And if you're newer to the field, you may feel like there's a disconnect between lots of different stories about what data scientists should know, or do, or expect from their job. This week, we cover two thought pieces, one that arose from interviews with 35(!) data scientists speaking about what their jobs actually are (and aren't), and one from the head of data science at AirBnb organizing core data science work into three main specialties.

Relevant links:
https://hbr.org/2018/08/what-data-scientists-really-do-according-to-35-data-scientists
https://www.linkedin.com/pulse/one-data-science-job-doesnt-fit-all-elena-grewal</description>
      <enclosure length="11241045" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/497542941-linear-digressions-the-three-types-of-data-scientists-and-what-they-actually-do.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/490989069</guid>
      <title>Agile Development for Data Scientists, Part 2: Where Modifications Help</title>
      <pubDate>Sun, 26 Aug 2018 19:59:12 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/agile-development-for-data-scientists-part-2-where-modifications-help</link>
      <itunes:duration>00:27:17</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>There's just too much interesting stuff at the intersection of agile software development and data science for us to be able to cover it all in one episode, so this week we're picking up where we left off last time. We'll give a quick overview of agile for those who missed last week or still have some questions, and then cover some of the aspects of agile that don't work well out-of-the-box when applied to data analytics. Fortunately, though, there are some straightforward modifications to agile that make it work really nicely for data analytics!

Relevant links:
https://www.agilealliance.org/agile101/12-principles-behind-the-agile-manifesto/
https://www.locallyoptimistic.com/post/agile-analytics-p1/
https://www.locallyoptimistic.com/post/agile-analytics-p2/
https://www.locallyoptimistic.com/post/agile-analytics-p3/</itunes:summary>
      <itunes:subtitle>There's just too much interesting stuff at the in…</itunes:subtitle>
      <description>There's just too much interesting stuff at the intersection of agile software development and data science for us to be able to cover it all in one episode, so this week we're picking up where we left off last time. We'll give a quick overview of agile for those who missed last week or still have some questions, and then cover some of the aspects of agile that don't work well out-of-the-box when applied to data analytics. Fortunately, though, there are some straightforward modifications to agile that make it work really nicely for data analytics!

Relevant links:
https://www.agilealliance.org/agile101/12-principles-behind-the-agile-manifesto/
https://www.locallyoptimistic.com/post/agile-analytics-p1/
https://www.locallyoptimistic.com/post/agile-analytics-p2/
https://www.locallyoptimistic.com/post/agile-analytics-p3/</description>
      <enclosure length="13094694" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/490989069-linear-digressions-agile-development-for-data-scientists-part-2-where-modifications-help.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/487715715</guid>
      <title>Agile Development for Data Scientists, Part 1: The Good</title>
      <pubDate>Sun, 19 Aug 2018 18:06:19 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/agile-development-for-data-scientists-part-1-the-good</link>
      <itunes:duration>00:25:56</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you're a data scientist at a firm that does a lot of software building, chances are good that you've seen or heard engineers sometimes talking about "agile software development." If you don't work at a software firm, agile practices might be newer to you. In either case, we wanted to go through a great series of blog posts about some of the practices from agile that are relevant for how data scientists work, in hopes of inspiring some transfer learning from software development to data science.

Relevant links:
https://www.locallyoptimistic.com/post/agile-analytics-p1/
https://www.locallyoptimistic.com/post/agile-analytics-p2/
https://www.locallyoptimistic.com/post/agile-analytics-p3/</itunes:summary>
      <itunes:subtitle>If you're a data scientist at a firm that does a …</itunes:subtitle>
      <description>If you're a data scientist at a firm that does a lot of software building, chances are good that you've seen or heard engineers sometimes talking about "agile software development." If you don't work at a software firm, agile practices might be newer to you. In either case, we wanted to go through a great series of blog posts about some of the practices from agile that are relevant for how data scientists work, in hopes of inspiring some transfer learning from software development to data science.

Relevant links:
https://www.locallyoptimistic.com/post/agile-analytics-p1/
https://www.locallyoptimistic.com/post/agile-analytics-p2/
https://www.locallyoptimistic.com/post/agile-analytics-p3/</description>
      <enclosure length="12452499" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/487715715-linear-digressions-agile-development-for-data-scientists-part-1-the-good.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/484756368</guid>
      <title>Re - Release: How To Lose At Kaggle</title>
      <pubDate>Mon, 13 Aug 2018 02:31:51 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-how-to-lose-at-kaggle</link>
      <itunes:duration>00:17:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We've got a classic for you this week as we take a week off for the dog days of summer. See you again next week!

Competing in a machine learning competition on Kaggle is a kind of rite of passage for data scientists.  Losing unexpectedly at the very end of the contest is also something that a lot of us have experienced.  It's not just bad luck: a very specific combination of overfitting on popular competitions can take someone who is in the top few spots in the final days of a contest and bump them down hundreds of slots in the final tally.</itunes:summary>
      <itunes:subtitle>We've got a classic for you this week as we take …</itunes:subtitle>
      <description>We've got a classic for you this week as we take a week off for the dog days of summer. See you again next week!

Competing in a machine learning competition on Kaggle is a kind of rite of passage for data scientists.  Losing unexpectedly at the very end of the contest is also something that a lot of us have experienced.  It's not just bad luck: a very specific combination of overfitting on popular competitions can take someone who is in the top few spots in the final days of a contest and bump them down hundreds of slots in the final tally.</description>
      <enclosure length="8597662" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/484756368-linear-digressions-re-release-how-to-lose-at-kaggle.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/481642476</guid>
      <title>Troubling Trends In Machine Learning Scholarship</title>
      <pubDate>Mon, 06 Aug 2018 01:31:03 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/troubling-trends-in-machine-learning-scholarship</link>
      <itunes:duration>00:29:35</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>There's a lot of great machine learning papers coming out every day--and, if we're being honest, some papers that are not as great as we'd wish. In some ways this is symptomatic of a field that's growing really quickly, but it's also an artifact of strange incentive structures in academic machine learning, and the fact that sometimes machine learning is just really hard. At the same time, a high quality of academic work is critical for maintaining the reputation of the field, so in this episode we walk through a recent paper that spells out some of the most common shortcomings of academic machine learning papers and what we can do to make things better.

Relevant links:
https://arxiv.org/abs/1807.03341</itunes:summary>
      <itunes:subtitle>There's a lot of great machine learning papers co…</itunes:subtitle>
      <description>There's a lot of great machine learning papers coming out every day--and, if we're being honest, some papers that are not as great as we'd wish. In some ways this is symptomatic of a field that's growing really quickly, but it's also an artifact of strange incentive structures in academic machine learning, and the fact that sometimes machine learning is just really hard. At the same time, a high quality of academic work is critical for maintaining the reputation of the field, so in this episode we walk through a recent paper that spells out some of the most common shortcomings of academic machine learning papers and what we can do to make things better.

Relevant links:
https://arxiv.org/abs/1807.03341</description>
      <enclosure length="14201032" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/481642476-linear-digressions-troubling-trends-in-machine-learning-scholarship.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/478346772</guid>
      <title>Can Fancy Running Shoes Cause You To Run Faster?</title>
      <pubDate>Sun, 29 Jul 2018 19:12:09 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/can-fancy-running-shoes-cause-you-to-run-faster</link>
      <itunes:duration>00:28:37</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The stars aligned for me (Katie) this past weekend: I raced my first half-marathon in a long time and got to read a great article from the NY Times about a new running shoe that Nike claims can make its wearers run faster. Causal claims like this one are really tough to verify, because even if the data suggests that people wearing the shoe are faster that might be because of correlation, not causation, so I loved reading this article that went through an analysis of thousands of runners' data in 4 different ways. Each way has a great explanation with pros and cons (as well as results, of course), so be sure to read the article after you check out this episode!

Relevant links:
https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html</itunes:summary>
      <itunes:subtitle>The stars aligned for me (Katie) this past weeken…</itunes:subtitle>
      <description>The stars aligned for me (Katie) this past weekend: I raced my first half-marathon in a long time and got to read a great article from the NY Times about a new running shoe that Nike claims can make its wearers run faster. Causal claims like this one are really tough to verify, because even if the data suggests that people wearing the shoe are faster that might be because of correlation, not causation, so I loved reading this article that went through an analysis of thousands of runners' data in 4 different ways. Each way has a great explanation with pros and cons (as well as results, of course), so be sure to read the article after you check out this episode!

Relevant links:
https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html</description>
      <enclosure length="13741695" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/478346772-linear-digressions-can-fancy-running-shoes-cause-you-to-run-faster.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/475166751</guid>
      <title>Compliance Bias</title>
      <pubDate>Sun, 22 Jul 2018 16:07:54 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/compliance-bias</link>
      <itunes:duration>00:23:28</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When you're using an AB test to understand the effect of a treatment, there are a lot of assumptions about how the treatment (and control, for that matter) get applied. For example, it's easy to think that everyone who was assigned to the treatment arm actually gets the treatment, everyone in the control arm doesn't, and that the two groups get their treatment instantaneously. None of these things happen in real life, and if you really care about measuring your treatment effect then that's something you want to understand and correct. In this post we'll talk through a great blog post that outlines this for mobile experiments. Oh, and Ben sings.</itunes:summary>
      <itunes:subtitle>When you're using an AB test to understand the ef…</itunes:subtitle>
      <description>When you're using an AB test to understand the effect of a treatment, there are a lot of assumptions about how the treatment (and control, for that matter) get applied. For example, it's easy to think that everyone who was assigned to the treatment arm actually gets the treatment, everyone in the control arm doesn't, and that the two groups get their treatment instantaneously. None of these things happen in real life, and if you really care about measuring your treatment effect then that's something you want to understand and correct. In this post we'll talk through a great blog post that outlines this for mobile experiments. Oh, and Ben sings.</description>
      <enclosure length="11263406" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/475166751-linear-digressions-compliance-bias.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/472083798</guid>
      <title>AI Winter</title>
      <pubDate>Sun, 15 Jul 2018 20:11:52 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/ai-winter</link>
      <itunes:duration>00:19:02</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Artificial Intelligence has been widely lauded as a solution to almost any problem. But as we justapose the hype in the field against the real-world benefits we see, it raises the question: Are we coming up on an AI winter</itunes:summary>
      <itunes:subtitle>Artificial Intelligence has been widely lauded as…</itunes:subtitle>
      <description>Artificial Intelligence has been widely lauded as a solution to almost any problem. But as we justapose the hype in the field against the real-world benefits we see, it raises the question: Are we coming up on an AI winter</description>
      <enclosure length="9140800" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/472083798-linear-digressions-ai-winter.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/469064265</guid>
      <title>Rerelease: How to Find New Things to Learn</title>
      <pubDate>Sun, 08 Jul 2018 22:28:29 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/rerelease-how-to-find-new-things-to-learn</link>
      <itunes:duration>00:18:32</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We like learning on vacation. And we're on vacation, so we thought we'd re-air this episode about how to learn.

Original Episode: https://lineardigressions.com/episodes/2017/5/14/how-to-find-new-things-to-learn

Original Summary: If you're anything like us, you a) always are curious to learn more about data science and machine learning and stuff, and b) are usually overwhelmed by how much content is out there (not all of it very digestible).  We hope this podcast is a part of the solution for you, but if you're looking to go farther (who isn't?) then we have a few new resources that are presenting high-quality content in a fresh, accessible way.  Boring old PDFs full of inscrutable math notation, your days are numbered!</itunes:summary>
      <itunes:subtitle>We like learning on vacation. And we're on vacati…</itunes:subtitle>
      <description>We like learning on vacation. And we're on vacation, so we thought we'd re-air this episode about how to learn.

Original Episode: https://lineardigressions.com/episodes/2017/5/14/how-to-find-new-things-to-learn

Original Summary: If you're anything like us, you a) always are curious to learn more about data science and machine learning and stuff, and b) are usually overwhelmed by how much content is out there (not all of it very digestible).  We hope this podcast is a part of the solution for you, but if you're looking to go farther (who isn't?) then we have a few new resources that are presenting high-quality content in a fresh, accessible way.  Boring old PDFs full of inscrutable math notation, your days are numbered!</description>
      <enclosure length="8902145" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/469064265-linear-digressions-rerelease-how-to-find-new-things-to-learn.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/466163427</guid>
      <title>Rerelease: Space Codes</title>
      <pubDate>Mon, 02 Jul 2018 04:36:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/rerelease-space-codes</link>
      <itunes:duration>00:24:30</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We're on vacation on Mars, so we won't be communicating with you all directly this week. Though, if we wanted to, we could probably use this episode to help get started.

Original Episode: http://lineardigressions.com/episodes/2017/3/19/space-codes

Original Summary: It's hard to get information to and from Mars.  Mars is very far away, and expensive to get to, and the bandwidth for passing messages with Earth is not huge.  The messages you do pass have to traverse millions of miles, which provides ample opportunity for the message to get corrupted or scrambled.  How, then, can you encode messages so that errors can be detected and corrected?  How does the decoding process allow you to actually find and correct the errors?  In this episode, we'll talk about three pieces of the process (Reed-Solomon codes, convolutional codes, and Viterbi decoding) that allow the scientists at NASA to talk to our rovers on Mars.</itunes:summary>
      <itunes:subtitle>We're on vacation on Mars, so we won't be communi…</itunes:subtitle>
      <description>We're on vacation on Mars, so we won't be communicating with you all directly this week. Though, if we wanted to, we could probably use this episode to help get started.

Original Episode: http://lineardigressions.com/episodes/2017/3/19/space-codes

Original Summary: It's hard to get information to and from Mars.  Mars is very far away, and expensive to get to, and the bandwidth for passing messages with Earth is not huge.  The messages you do pass have to traverse millions of miles, which provides ample opportunity for the message to get corrupted or scrambled.  How, then, can you encode messages so that errors can be detected and corrected?  How does the decoding process allow you to actually find and correct the errors?  In this episode, we'll talk about three pieces of the process (Reed-Solomon codes, convolutional codes, and Viterbi decoding) that allow the scientists at NASA to talk to our rovers on Mars.</description>
      <enclosure length="11761404" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/466163427-linear-digressions-rerelease-space-codes.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/462995712</guid>
      <title>Rerelease: Anscombe's Quartet</title>
      <pubDate>Mon, 25 Jun 2018 01:20:25 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/rerelease-anscombes-quartet</link>
      <itunes:duration>00:16:14</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We're on vacation, so we hope you enjoy this episode while we each sip cocktails on the beach.

Original Episode: http://lineardigressions.com/episodes/2017/6/18/anscombes-quartet

Original Summary: Anscombe's Quartet is a set of four datasets that have the same mean, variance and correlation but look very different.  It's easy to think that having a good set of summary statistics (like mean, variance and correlation) can tell you everything important about a dataset, or at least enough to know if two datasets are extremely similar or extremely different, but Anscombe's Quartet will always be standing behind you, laughing at how silly that idea is.

Anscombe's Quartet was devised in 1973 as an example of how summary statistics can be misleading, but today we can even do one better: the Datasaurus Dozen is a set of twelve datasets, all extremely visually distinct, that have the same summary stats as a source dataset that, there's no other way to put this, looks like a dinosaur.  It's an example of how datasets can be generated to look like almost anything while still preserving arbitrary summary statistics.  In other words, Anscombe's Quartets can be generated at-will and we all should be reminded to visualize our data (not just compute summary statistics) if we want to claim to really understand it.</itunes:summary>
      <itunes:subtitle>We're on vacation, so we hope you enjoy this epis…</itunes:subtitle>
      <description>We're on vacation, so we hope you enjoy this episode while we each sip cocktails on the beach.

Original Episode: http://lineardigressions.com/episodes/2017/6/18/anscombes-quartet

Original Summary: Anscombe's Quartet is a set of four datasets that have the same mean, variance and correlation but look very different.  It's easy to think that having a good set of summary statistics (like mean, variance and correlation) can tell you everything important about a dataset, or at least enough to know if two datasets are extremely similar or extremely different, but Anscombe's Quartet will always be standing behind you, laughing at how silly that idea is.

Anscombe's Quartet was devised in 1973 as an example of how summary statistics can be misleading, but today we can even do one better: the Datasaurus Dozen is a set of twelve datasets, all extremely visually distinct, that have the same summary stats as a source dataset that, there's no other way to put this, looks like a dinosaur.  It's an example of how datasets can be generated to look like almost anything while still preserving arbitrary summary statistics.  In other words, Anscombe's Quartets can be generated at-will and we all should be reminded to visualize our data (not just compute summary statistics) if we want to claim to really understand it.</description>
      <enclosure length="7794344" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/462995712-linear-digressions-rerelease-anscombes-quartet.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/459950334</guid>
      <title>Rerelease: Hurricanes Produced</title>
      <pubDate>Mon, 18 Jun 2018 17:00:14 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/rerelease-hurricanes-produced</link>
      <itunes:duration>00:28:12</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Now that hurricane season is upon us again (and we are on vacation), we thought a look back on our hurricane forecasting episode was prudent. Stay safe out there.</itunes:summary>
      <itunes:subtitle>Now that hurricane season is upon us again (and w…</itunes:subtitle>
      <description>Now that hurricane season is upon us again (and we are on vacation), we thought a look back on our hurricane forecasting episode was prudent. Stay safe out there.</description>
      <enclosure length="13541074" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/459950334-linear-digressions-rerelease-hurricanes-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/456651444</guid>
      <title>GDPR</title>
      <pubDate>Mon, 11 Jun 2018 02:24:45 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/gdpr</link>
      <itunes:duration>00:18:24</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>By now, you have probably heard of GDPR, the EU's new data privacy law. It's the reason you've been getting so many emails about everyone's updated privacy policy.

In this episode, we talk about some of the potential ramifications of GRPD in the world of data science.</itunes:summary>
      <itunes:subtitle>By now, you have probably heard of GDPR, the EU's…</itunes:subtitle>
      <description>By now, you have probably heard of GDPR, the EU's new data privacy law. It's the reason you've been getting so many emails about everyone's updated privacy policy.

In this episode, we talk about some of the potential ramifications of GRPD in the world of data science.</description>
      <enclosure length="8831928" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/456651444-linear-digressions-gdpr.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/453201831</guid>
      <title>Git for Data Scientists</title>
      <pubDate>Sun, 03 Jun 2018 17:52:23 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/git-for-data-scientists</link>
      <itunes:duration>00:22:05</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you're a data scientist, chances are good that you've heard of git, which is a system for version controlling code. Chances are also good that you're not quite as up on git as you want to be--git has a strong following among software engineers but, in our anecdotal experience, data scientists are less likely to know how to use this powerful tool. Never fear: in this episode we'll talk through some of the basics, and what does (and doesn't) translate from version control for regular software to version control for data science software.</itunes:summary>
      <itunes:subtitle>If you're a data scientist, chances are good that…</itunes:subtitle>
      <description>If you're a data scientist, chances are good that you've heard of git, which is a system for version controlling code. Chances are also good that you're not quite as up on git as you want to be--git has a strong following among software engineers but, in our anecdotal experience, data scientists are less likely to know how to use this powerful tool. Never fear: in this episode we'll talk through some of the basics, and what does (and doesn't) translate from version control for regular software to version control for data science software.</description>
      <enclosure length="10599895" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/453201831-linear-digressions-git-for-data-scientists.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/446667069</guid>
      <title>Analytics Maturity</title>
      <pubDate>Sun, 20 May 2018 15:09:39 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/analytics-maturity</link>
      <itunes:duration>00:19:32</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Data science and analytics are hot topics in business these days, but for a lot of folks looking to bring data into their organization, it can be hard to know where to start and what it looks like when they're succeeding. That was the motivation for writing a whitepaper on the analytics maturity of an organization, and that's what we're talking about today. In particular, we break it down into five attributes of an organization that contribute (or not) to their success in analytics, and what each of those mean and why they matter. 

Whitepaper here:
bit.ly/analyticsmaturity</itunes:summary>
      <itunes:subtitle>Data science and analytics are hot topics in busi…</itunes:subtitle>
      <description>Data science and analytics are hot topics in business these days, but for a lot of folks looking to bring data into their organization, it can be hard to know where to start and what it looks like when they're succeeding. That was the motivation for writing a whitepaper on the analytics maturity of an organization, and that's what we're talking about today. In particular, we break it down into five attributes of an organization that contribute (or not) to their success in analytics, and what each of those mean and why they matter. 

Whitepaper here:
bit.ly/analyticsmaturity</description>
      <enclosure length="9380917" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/446667069-linear-digressions-analytics-maturity.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/443436342</guid>
      <title>SHAP: Shapley Values in Machine Learning</title>
      <pubDate>Sun, 13 May 2018 14:24:38 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/shap-shapley-values-in-machine-learning</link>
      <itunes:duration>00:19:12</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Shapley values in machine learning are an interesting and useful enough innovation that we figured hey, why not do a two-parter? Our last episode focused on explaining what Shapley values are: they define a way of assigning credit for outcomes across several contributors, originally to understand how impactful different actors are in building coalitions (hence the game theory background) but now they're being cross-purposed for quantifying feature importance in machine learning models. This episode centers on the computational details that allow Shapley values to be approximated quickly, and a new package called SHAP that makes all this innovation accessible.</itunes:summary>
      <itunes:subtitle>Shapley values in machine learning are an interes…</itunes:subtitle>
      <description>Shapley values in machine learning are an interesting and useful enough innovation that we figured hey, why not do a two-parter? Our last episode focused on explaining what Shapley values are: they define a way of assigning credit for outcomes across several contributors, originally to understand how impactful different actors are in building coalitions (hence the game theory background) but now they're being cross-purposed for quantifying feature importance in machine learning models. This episode centers on the computational details that allow Shapley values to be approximated quickly, and a new package called SHAP that makes all this innovation accessible.</description>
      <enclosure length="9220839" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/443436342-linear-digressions-shap-shapley-values-in-machine-learning.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/440404896</guid>
      <title>Game Theory for Model Interpretability: Shapley Values</title>
      <pubDate>Mon, 07 May 2018 02:17:19 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/shapley-1-produced</link>
      <itunes:duration>00:27:06</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>As machine learning models get into the hands of more and more users, there's an increasing expectation that black box isn't good enough: users want to understand why the model made a given prediction, not just what the prediction itself is. This is motivating a lot of work into feature important and model interpretability tools, and one of the most exciting new ones is based on Shapley Values from game theory. In this episode, we'll explain what Shapley Values are and how they make a cool approach to feature importance for machine learning.</itunes:summary>
      <itunes:subtitle>As machine learning models get into the hands of …</itunes:subtitle>
      <description>As machine learning models get into the hands of more and more users, there's an increasing expectation that black box isn't good enough: users want to understand why the model made a given prediction, not just what the prediction itself is. This is motivating a lot of work into feature important and model interpretability tools, and one of the most exciting new ones is based on Shapley Values from game theory. In this episode, we'll explain what Shapley Values are and how they make a cool approach to feature importance for machine learning.</description>
      <enclosure length="13012774" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/440404896-linear-digressions-shapley-1-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/437114427</guid>
      <title>AutoML</title>
      <pubDate>Mon, 30 Apr 2018 02:50:23 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/automl</link>
      <itunes:duration>00:15:24</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you were a machine learning researcher or data scientist ten years ago, you might have spent a lot of time implementing individual algorithms like decision trees and neural networks by hand. If you were doing that work five years ago, the algorithms were probably already implemented in popular open-source libraries like scikit-learn, but you still might have spent a lot of time trying  different algorithms and tuning hyperparameters to improve performance. If you're doing that work today, scikit-learn and similar libraries don't just have the algorithms nicely implemented--they have tools to help with experimentation and hyperparameter tuning too. Automated machine learning is here, and it's pretty cool.</itunes:summary>
      <itunes:subtitle>If you were a machine learning researcher or data…</itunes:subtitle>
      <description>If you were a machine learning researcher or data scientist ten years ago, you might have spent a lot of time implementing individual algorithms like decision trees and neural networks by hand. If you were doing that work five years ago, the algorithms were probably already implemented in popular open-source libraries like scikit-learn, but you still might have spent a lot of time trying  different algorithms and tuning hyperparameters to improve performance. If you're doing that work today, scikit-learn and similar libraries don't just have the algorithms nicely implemented--they have tools to help with experimentation and hyperparameter tuning too. Automated machine learning is here, and it's pretty cool.</description>
      <enclosure length="7392686" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/437114427-linear-digressions-automl.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/433745085</guid>
      <title>CPUs, GPUs, TPUs: Hardware for Deep Learning</title>
      <pubDate>Mon, 23 Apr 2018 02:52:42 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/cpus-gpus-tpus-hardware-for-deep-learning</link>
      <itunes:duration>00:12:40</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>A huge part of the ascent of deep learning in the last few years is related to advances in computer hardware that makes it possible to do the computational heavy lifting required to build models with thousands or even millions of tunable parameters. This week we'll pretend to be electrical engineers and talk about how modern machine learning is enabled by hardware.</itunes:summary>
      <itunes:subtitle>A huge part of the ascent of deep learning in the…</itunes:subtitle>
      <description>A huge part of the ascent of deep learning in the last few years is related to advances in computer hardware that makes it possible to do the computational heavy lifting required to build models with thousands or even millions of tunable parameters. This week we'll pretend to be electrical engineers and talk about how modern machine learning is enabled by hardware.</description>
      <enclosure length="6081757" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/433745085-linear-digressions-cpus-gpus-tpus-hardware-for-deep-learning.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/430280082</guid>
      <title>A Technical Introduction to Capsule Networks</title>
      <pubDate>Mon, 16 Apr 2018 01:12:25 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/a-technical-introduction-to-capsule-networks</link>
      <itunes:duration>00:31:28</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Last episode we talked conceptually about capsule networks, the latest and greatest computer vision innovation to come out of Geoff Hinton's lab. This week we're getting a little more into the technical details, for those of you ready to have your mind stretched.</itunes:summary>
      <itunes:subtitle>Last episode we talked conceptually about capsule…</itunes:subtitle>
      <description>Last episode we talked conceptually about capsule networks, the latest and greatest computer vision innovation to come out of Geoff Hinton's lab. This week we're getting a little more into the technical details, for those of you ready to have your mind stretched.</description>
      <enclosure length="15102361" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/430280082-linear-digressions-a-technical-introduction-to-capsule-networks.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/426861456</guid>
      <title>A Conceptual Introduction to Capsule Networks</title>
      <pubDate>Mon, 09 Apr 2018 01:59:54 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/a-conceptual-introduction-to-capsule-networks</link>
      <itunes:duration>00:14:05</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Convolutional nets are great for image classification... if this were 2016. But it's 2018 and Canada's greatest neural networker Geoff Hinton has some new ideas, namely capsule networks. Capsule nets are a completely new type of neural net architecture designed to do image classification on far fewer training cases than convolutional nets, and they're posting results that are competitive with much more mature technologies.

In this episode, we'll give a light conceptual introduction to capsule nets and get geared up for a future episode that will do a deeper technical dive.</itunes:summary>
      <itunes:subtitle>Convolutional nets are great for image classifica…</itunes:subtitle>
      <description>Convolutional nets are great for image classification... if this were 2016. But it's 2018 and Canada's greatest neural networker Geoff Hinton has some new ideas, namely capsule networks. Capsule nets are a completely new type of neural net architecture designed to do image classification on far fewer training cases than convolutional nets, and they're posting results that are competitive with much more mature technologies.

In this episode, we'll give a light conceptual introduction to capsule nets and get geared up for a future episode that will do a deeper technical dive.</description>
      <enclosure length="6764493" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/426861456-linear-digressions-a-conceptual-introduction-to-capsule-networks.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/423450147</guid>
      <title>Convolutional Neural Nets</title>
      <pubDate>Mon, 02 Apr 2018 01:40:08 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/convolutional-neural-nets</link>
      <itunes:duration>00:21:55</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional networks, and the tricks that make them so good at image tasks.</itunes:summary>
      <itunes:subtitle>If you've done image recognition or computer visi…</itunes:subtitle>
      <description>If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional networks, and the tricks that make them so good at image tasks.</description>
      <enclosure length="10523618" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/423450147-linear-digressions-convolutional-neural-nets.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/419841044</guid>
      <title>Google Flu Trends</title>
      <pubDate>Mon, 26 Mar 2018 01:20:41 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/google-flu-trends</link>
      <itunes:duration>00:12:46</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It's been a nasty flu season this year. So we were remembering a story from a few years back (but not covered yet on this podcast) about when Google tried to predict flu outbreaks faster than the Centers for Disease Control by monitoring searches and looking for spikes in searches for flu symptoms, doctors appointments, and other related terms. It's a cool idea, but after a few years turned into a cautionary tale  of what can go wrong after Google's algorithm systematically overestimated flu incidence for almost 2 years straight. 

Relevant link: https://gking.harvard.edu/publications/parable-google-flu%C2%A0traps-big-data-analysis</itunes:summary>
      <itunes:subtitle>It's been a nasty flu season this year. So we wer…</itunes:subtitle>
      <description>It's been a nasty flu season this year. So we were remembering a story from a few years back (but not covered yet on this podcast) about when Google tried to predict flu outbreaks faster than the Centers for Disease Control by monitoring searches and looking for spikes in searches for flu symptoms, doctors appointments, and other related terms. It's a cool idea, but after a few years turned into a cautionary tale  of what can go wrong after Google's algorithm systematically overestimated flu incidence for almost 2 years straight. 

Relevant link: https://gking.harvard.edu/publications/parable-google-flu%C2%A0traps-big-data-analysis</description>
      <enclosure length="6133584" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/419841044-linear-digressions-google-flu-trends.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/415825014</guid>
      <title>How to pick projects for a professional data science team</title>
      <pubDate>Mon, 19 Mar 2018 03:07:33 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/how-to-pick-projects-for-a-professional-data-science-team</link>
      <itunes:duration>00:31:17</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week's episodes is for data scientists, sure, but also for data science managers and executives at companies with data science teams. These folks all think very differently about the same question: what should a data science team be working on? And how should that decision be made? That's the subject of a talk that I (Katie) gave at Strata Data in early March, about how my co-department head and I select projects for our team to work on.

We have several goals in data science project selection at Civis Analytics (where I work), which can be summarized under "balance the best attributes of bottom-up and top-down decision-making." We achieve this balance, or at least get pretty close, using a process we've come to call the Idea Factory (after a great book about Bell Labs). This talk is about that process, how it works in the real world of a data science company and how we see it working in the data science programs of other companies.

Relevant links: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63905</itunes:summary>
      <itunes:subtitle>This week's episodes is for data scientists, sure…</itunes:subtitle>
      <description>This week's episodes is for data scientists, sure, but also for data science managers and executives at companies with data science teams. These folks all think very differently about the same question: what should a data science team be working on? And how should that decision be made? That's the subject of a talk that I (Katie) gave at Strata Data in early March, about how my co-department head and I select projects for our team to work on.

We have several goals in data science project selection at Civis Analytics (where I work), which can be summarized under "balance the best attributes of bottom-up and top-down decision-making." We achieve this balance, or at least get pretty close, using a process we've come to call the Idea Factory (after a great book about Bell Labs). This talk is about that process, how it works in the real world of a data science company and how we see it working in the data science programs of other companies.

Relevant links: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63905</description>
      <enclosure length="15017724" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/415825014-linear-digressions-how-to-pick-projects-for-a-professional-data-science-team.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/412295232</guid>
      <title>Autoencoders</title>
      <pubDate>Mon, 12 Mar 2018 01:47:10 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/autoencoders</link>
      <itunes:duration>00:12:41</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Autoencoders are neural nets that are optimized for creating outputs that... look like the inputs to the network. Turns out this is a not-too-shabby way to do unsupervised machine learning with neural nets.</itunes:summary>
      <itunes:subtitle>Autoencoders are neural nets that are optimized f…</itunes:subtitle>
      <description>Autoencoders are neural nets that are optimized for creating outputs that... look like the inputs to the network. Turns out this is a not-too-shabby way to do unsupervised machine learning with neural nets.</description>
      <enclosure length="6090534" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/412295232-linear-digressions-autoencoders.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/408811272</guid>
      <title>When Private Data Isn't Private Anymore</title>
      <pubDate>Mon, 05 Mar 2018 03:35:26 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-privacy-produced</link>
      <itunes:duration>00:26:20</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>After all the back-patting around making data science datasets and code more openly available, we figured it was time to also dump a bucket of cold water on everyone's heads and talk about the things that can go wrong when data and code is a little too open. 

In this episode, we'll talk about two interesting recent examples: a de-identified medical dataset in Australia that was re-identified so specific celebrities and athletes could be matched to their medical records, and a series of military bases that were spotted in a public fitness tracker dataset.</itunes:summary>
      <itunes:subtitle>After all the back-patting around making data sci…</itunes:subtitle>
      <description>After all the back-patting around making data science datasets and code more openly available, we figured it was time to also dump a bucket of cold water on everyone's heads and talk about the things that can go wrong when data and code is a little too open. 

In this episode, we'll talk about two interesting recent examples: a de-identified medical dataset in Australia that was re-identified so specific celebrities and athletes could be matched to their medical records, and a series of military bases that were spotted in a public fitness tracker dataset.</description>
      <enclosure length="12644552" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/408811272-linear-digressions-data-privacy-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/405276390</guid>
      <title>What makes a machine learning algorithm "superhuman"?</title>
      <pubDate>Mon, 26 Feb 2018 04:52:57 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/what-makes-a-machine-learning-algorithm-superhuman</link>
      <itunes:duration>00:34:48</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>A few weeks ago, we podcasted about a neural network that was being touted as "better than doctors" in diagnosing pneumonia from chest x-rays, and how the underlying dataset used to train the algorithm raised some serious questions. We're back again this week with further developments, as the author of the original blog post pointed us toward more developments. All in all, there's a lot more clarity now around how the authors arrived at their original "better than doctors" claim, and a number of adjustments and improvements as the original result was de/re-constructed. 

Anyway, there are a few things that are cool about this. First, it's a worthwhile follow-up to a popular recent episode. Second, it goes *inside* an analysis to see what things like imbalanced classes, outliers, and (possible) signal leakage can do to real science. And last, it raises a really interesting question in an age when computers are often claimed to be better than humans: what do those claims really mean?

Relevant links:
https://lukeoakdenrayner.wordpress.com/2018/01/24/chexnet-an-in-depth-review/</itunes:summary>
      <itunes:subtitle>A few weeks ago, we podcasted about a neural netw…</itunes:subtitle>
      <description>A few weeks ago, we podcasted about a neural network that was being touted as "better than doctors" in diagnosing pneumonia from chest x-rays, and how the underlying dataset used to train the algorithm raised some serious questions. We're back again this week with further developments, as the author of the original blog post pointed us toward more developments. All in all, there's a lot more clarity now around how the authors arrived at their original "better than doctors" claim, and a number of adjustments and improvements as the original result was de/re-constructed. 

Anyway, there are a few things that are cool about this. First, it's a worthwhile follow-up to a popular recent episode. Second, it goes *inside* an analysis to see what things like imbalanced classes, outliers, and (possible) signal leakage can do to real science. And last, it raises a really interesting question in an age when computers are often claimed to be better than humans: what do those claims really mean?

Relevant links:
https://lukeoakdenrayner.wordpress.com/2018/01/24/chexnet-an-in-depth-review/</description>
      <enclosure length="16709623" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/405276390-linear-digressions-what-makes-a-machine-learning-algorithm-superhuman.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/401733012</guid>
      <title>Open Data and Open Science</title>
      <pubDate>Mon, 19 Feb 2018 01:39:16 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/open-data-produced</link>
      <itunes:duration>00:16:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>One interesting trend we've noted recently is the proliferation of papers, articles and blog posts about data science that don't just tell the result--they include data and code that allow anyone to repeat the analysis. It's far from universal (for a timely counterpoint, read this article &lt;http://www.sciencemag.org/news/2018/02/missing-data-hinder-replication-artificial-intelligence-studies&gt;), but we seem to be moving toward a new normal where data science conclusions are expected to be shown, not just told.

Relevant links:
https://github.com/fivethirtyeight/data
https://blog.patricktriest.com/police-data-python/</itunes:summary>
      <itunes:subtitle>One interesting trend we've noted recently is the…</itunes:subtitle>
      <description>One interesting trend we've noted recently is the proliferation of papers, articles and blog posts about data science that don't just tell the result--they include data and code that allow anyone to repeat the analysis. It's far from universal (for a timely counterpoint, read this article &lt;http://www.sciencemag.org/news/2018/02/missing-data-hinder-replication-artificial-intelligence-studies&gt;), but we seem to be moving toward a new normal where data science conclusions are expected to be shown, not just told.

Relevant links:
https://github.com/fivethirtyeight/data
https://blog.patricktriest.com/police-data-python/</description>
      <enclosure length="8116382" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/401733012-linear-digressions-open-data-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/398083197</guid>
      <title>Defining the quality of a machine learning production system</title>
      <pubDate>Mon, 12 Feb 2018 02:00:45 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/defining-the-quality-of-a-machine-learning-production-system</link>
      <itunes:duration>00:20:29</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Building a machine learning system and maintaining it in production are two very different things. Some folks over at Google wrote a paper that shares their thoughts around all the items you might want to test or check for your production ML system.

Relevant links:
https://research.google.com/pubs/pub45742.html</itunes:summary>
      <itunes:subtitle>Building a machine learning system and maintainin…</itunes:subtitle>
      <description>Building a machine learning system and maintaining it in production are two very different things. Some folks over at Google wrote a paper that shares their thoughts around all the items you might want to test or check for your production ML system.

Relevant links:
https://research.google.com/pubs/pub45742.html</description>
      <enclosure length="9832522" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/398083197-linear-digressions-defining-the-quality-of-a-machine-learning-production-system.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/394568586</guid>
      <title>Auto-generating websites with deep learning</title>
      <pubDate>Sun, 04 Feb 2018 23:02:11 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/auto-generating-websites-with-deep-learning</link>
      <itunes:duration>00:19:24</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We've already talked about neural nets in some detail (links below), and in particular we've been blown away by the way that image recognition from convolutional neural nets can be fed into recurrent neural nets that generate descriptions and captions of the images. Our episode today tells a similar tale, except today we're talking about a blog post where the author fed in wireframes of a website design and asked the neural net to generate the HTML and CSS that would actually build a website that looks like the wireframes. If you're a programmer who thinks your job is challenging enough that you're automation-proof, guess again...

Link to blog post: https://blog.floydhub.com/turning-design-mockups-into-code-with-deep-learning/</itunes:summary>
      <itunes:subtitle>We've already talked about neural nets in some de…</itunes:subtitle>
      <description>We've already talked about neural nets in some detail (links below), and in particular we've been blown away by the way that image recognition from convolutional neural nets can be fed into recurrent neural nets that generate descriptions and captions of the images. Our episode today tells a similar tale, except today we're talking about a blog post where the author fed in wireframes of a website design and asked the neural net to generate the HTML and CSS that would actually build a website that looks like the wireframes. If you're a programmer who thinks your job is challenging enough that you're automation-proof, guess again...

Link to blog post: https://blog.floydhub.com/turning-design-mockups-into-code-with-deep-learning/</description>
      <enclosure length="9316134" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/394568586-linear-digressions-auto-generating-websites-with-deep-learning.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/390628386</guid>
      <title>The Case for Learned Index Structures, Part 2: Hash Maps and Bloom Filters</title>
      <pubDate>Mon, 29 Jan 2018 02:15:43 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/bloom-filters-produced</link>
      <itunes:duration>00:20:41</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Last week we started the story of how you could use a machine learning model in place of a data structure, and this week we wrap up with an exploration of Bloom Filters and Hash Maps. Just like last week, when we covered B-trees, we'll walk through both the "classic" implementation of these data structures and how a machine learning model could create the same functionality.</itunes:summary>
      <itunes:subtitle>Last week we started the story of how you could u…</itunes:subtitle>
      <description>Last week we started the story of how you could use a machine learning model in place of a data structure, and this week we wrap up with an exploration of Bloom Filters and Hash Maps. Just like last week, when we covered B-trees, we'll walk through both the "classic" implementation of these data structures and how a machine learning model could create the same functionality.</description>
      <enclosure length="9933459" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/390628386-linear-digressions-bloom-filters-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/387200213</guid>
      <title>The Case for Learned Index Structures, Part 1: B-Trees</title>
      <pubDate>Mon, 22 Jan 2018 02:32:28 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-case-for-learned-index-structures-part-1-b-trees</link>
      <itunes:duration>00:18:50</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Jeff Dean and his collaborators at Google are turning the machine learning world upside down (again) with a recent paper about how machine learning models can be used as surprisingly effective substitutes for classic data structures. In this first part of a two-part series, we'll go through a data structure called b-trees. The structural form of b-trees make them efficient for searching, but if you squint at a b-tree and look at it a little bit sideways then the search functionality starts to look a little bit like a regression model--hence the relevance of machine learning models. If this sounds kinda weird, or we lost you at b-tree, don't worry--lots more details in the episode itself.</itunes:summary>
      <itunes:subtitle>Jeff Dean and his collaborators at Google are tur…</itunes:subtitle>
      <description>Jeff Dean and his collaborators at Google are turning the machine learning world upside down (again) with a recent paper about how machine learning models can be used as surprisingly effective substitutes for classic data structures. In this first part of a two-part series, we'll go through a data structure called b-trees. The structural form of b-trees make them efficient for searching, but if you squint at a b-tree and look at it a little bit sideways then the search functionality starts to look a little bit like a regression model--hence the relevance of machine learning models. If this sounds kinda weird, or we lost you at b-tree, don't worry--lots more details in the episode itself.</description>
      <enclosure length="9045505" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/387200213-linear-digressions-the-case-for-learned-index-structures-part-1-b-trees.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/383718392</guid>
      <title>Challenges with Using Machine Learning to Classify Chest X-Rays</title>
      <pubDate>Mon, 15 Jan 2018 01:57:21 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/challenges-with-using-machine-learning-to-classify-chest-x-rays</link>
      <itunes:duration>00:18:00</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Another installment in our "machine learning might not be a silver bullet for solving medical problems" series. This week, we have a high-profile blog post that has been making the rounds for the last few weeks, in which a neural network trained to visually recognize various diseases in chest x-rays is called into question by a radiologist with machine learning expertise. As it seemingly always does, it comes down to the dataset that's used for training--medical records assume a lot of context that may or may not be available to the algorithm, so it's tough to make something that actually helps (in this case) predict disease that wasn't already diagnosed.</itunes:summary>
      <itunes:subtitle>Another installment in our "machine learning migh…</itunes:subtitle>
      <description>Another installment in our "machine learning might not be a silver bullet for solving medical problems" series. This week, we have a high-profile blog post that has been making the rounds for the last few weeks, in which a neural network trained to visually recognize various diseases in chest x-rays is called into question by a radiologist with machine learning expertise. As it seemingly always does, it comes down to the dataset that's used for training--medical records assume a lot of context that may or may not be available to the algorithm, so it's tough to make something that actually helps (in this case) predict disease that wasn't already diagnosed.</description>
      <enclosure length="8642593" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/383718392-linear-digressions-challenges-with-using-machine-learning-to-classify-chest-x-rays.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/380324828</guid>
      <title>The Fourier Transform</title>
      <pubDate>Mon, 08 Jan 2018 02:07:46 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-fourier-transform</link>
      <itunes:duration>00:15:39</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The Fourier transform is one of the handiest tools in signal processing for dealing with periodic time series data. Using a Fourier transform, you can break apart a complex periodic function into a bunch of sine and cosine waves, and figure out what the amplitude, frequency and offset of those component waves are. It's a really handy way of re-expressing periodic data--you'll never look at a time series graph the same way again.</itunes:summary>
      <itunes:subtitle>The Fourier transform is one of the handiest tool…</itunes:subtitle>
      <description>The Fourier transform is one of the handiest tools in signal processing for dealing with periodic time series data. Using a Fourier transform, you can break apart a complex periodic function into a bunch of sine and cosine waves, and figure out what the amplitude, frequency and offset of those component waves are. It's a really handy way of re-expressing periodic data--you'll never look at a time series graph the same way again.</description>
      <enclosure length="7513267" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/380324828-linear-digressions-the-fourier-transform.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/377561696</guid>
      <title>Statistics of Beer</title>
      <pubDate>Tue, 02 Jan 2018 01:57:54 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/statistics-of-beer</link>
      <itunes:duration>00:15:20</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>What better way to kick off a new year than with an episode on the statistics of brewing beer?</itunes:summary>
      <itunes:subtitle>What better way to kick off a new year than with …</itunes:subtitle>
      <description>What better way to kick off a new year than with an episode on the statistics of brewing beer?</description>
      <enclosure length="7365100" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/377561696-linear-digressions-statistics-of-beer.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/373941077</guid>
      <title>Re - Release: Random Kanye</title>
      <pubDate>Sun, 24 Dec 2017 19:07:48 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-random-kanye</link>
      <itunes:duration>00:09:33</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We have a throwback episode for you today as we take the week off to enjoy the holidays. This week: what happens when you have a markov chain that generates mashup Kanye West lyrics with Bible verses? Exactly what you think.</itunes:summary>
      <itunes:subtitle>We have a throwback episode for you today as we t…</itunes:subtitle>
      <description>We have a throwback episode for you today as we take the week off to enjoy the holidays. This week: what happens when you have a markov chain that generates mashup Kanye West lyrics with Bible verses? Exactly what you think.</description>
      <enclosure length="13758831" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/373941077-linear-digressions-re-release-random-kanye.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/371005304</guid>
      <title>Debiasing Word Embeddings</title>
      <pubDate>Mon, 18 Dec 2017 02:31:01 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/debiasing-word-embeddings</link>
      <itunes:duration>00:18:20</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When we covered the Word2Vec algorithm for embedding words, we mentioned parenthetically that the word embeddings it produces can sometimes be a little bit less than ideal--in particular, gender bias from our society can creep into the embeddings and give results that are sexist. For example, occupational words like "doctor" and "nurse" are more highly aligned with "man" or "woman," which can create problems because these word embeddings are used in algorithms that help people find information or make decisions. However, a group of researchers has released a new paper detailing ways to de-bias the embeddings, so we retain gender info that's not particularly problematic (for example, "king" vs. "queen") while correcting bias.</itunes:summary>
      <itunes:subtitle>When we covered the Word2Vec algorithm for embedd…</itunes:subtitle>
      <description>When we covered the Word2Vec algorithm for embedding words, we mentioned parenthetically that the word embeddings it produces can sometimes be a little bit less than ideal--in particular, gender bias from our society can creep into the embeddings and give results that are sexist. For example, occupational words like "doctor" and "nurse" are more highly aligned with "man" or "woman," which can create problems because these word embeddings are used in algorithms that help people find information or make decisions. However, a group of researchers has released a new paper detailing ways to de-bias the embeddings, so we retain gender info that's not particularly problematic (for example, "king" vs. "queen") while correcting bias.</description>
      <enclosure length="8804343" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/371005304-linear-digressions-debiasing-word-embeddings.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/367800293</guid>
      <title>The Kernel Trick and Support Vector Machines</title>
      <pubDate>Mon, 11 Dec 2017 01:58:41 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-kernel-trick-and-support-vector-machines</link>
      <itunes:duration>00:17:48</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Picking up after last week's episode about maximal margin classifiers, this week we'll go into the kernel trick and how that (combined with maximal margin algorithms) gives us the much-vaunted support vector machine.</itunes:summary>
      <itunes:subtitle>Picking up after last week's episode about maxima…</itunes:subtitle>
      <description>Picking up after last week's episode about maximal margin classifiers, this week we'll go into the kernel trick and how that (combined with maximal margin algorithms) gives us the much-vaunted support vector machine.</description>
      <enclosure length="8543536" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/367800293-linear-digressions-the-kernel-trick-and-support-vector-machines.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/364651949</guid>
      <title>Maximal Margin Classifiers</title>
      <pubDate>Mon, 04 Dec 2017 04:03:02 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/maximal-margin-classifiers</link>
      <itunes:duration>00:14:21</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Maximal margin classifiers are a way of thinking about supervised learning entirely in terms of the decision boundary between two classes, and defining that boundary in a way that maximizes the distance from any given point to the boundary.  It's a neat way to think about statistical learning and a prerequisite for understanding support vector machines, which we'll cover next week--stay tuned!</itunes:summary>
      <itunes:subtitle>Maximal margin classifiers are a way of thinking …</itunes:subtitle>
      <description>Maximal margin classifiers are a way of thinking about supervised learning entirely in terms of the decision boundary between two classes, and defining that boundary in a way that maximizes the distance from any given point to the boundary.  It's a neat way to think about statistical learning and a prerequisite for understanding support vector machines, which we'll cover next week--stay tuned!</description>
      <enclosure length="6891135" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/364651949-linear-digressions-maximal-margin-classifiers.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/361368167</guid>
      <title>Re - Release: The Cocktail Party Problem</title>
      <pubDate>Mon, 27 Nov 2017 02:11:06 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-the-cocktail-party-problem</link>
      <itunes:duration>00:13:43</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Grab a cocktail, put on your favorite karaoke track, and let’s talk some more about disentangling audio data!</itunes:summary>
      <itunes:subtitle>Grab a cocktail, put on your favorite karaoke tra…</itunes:subtitle>
      <description>Grab a cocktail, put on your favorite karaoke track, and let’s talk some more about disentangling audio data!</description>
      <enclosure length="6583726" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/361368167-linear-digressions-re-release-the-cocktail-party-problem.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/358247057</guid>
      <title>Clustering with DBSCAN</title>
      <pubDate>Mon, 20 Nov 2017 03:08:14 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/clustering-with-dbscan</link>
      <itunes:duration>00:16:14</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>DBSCAN is a density-based clustering algorithm for doing unsupervised learning.  It's pretty nifty: with just two parameters, you can specify "dense" regions in your data, and grow those regions out organically to find clusters.  In particular, it can fit irregularly-shaped clusters, and it can also identify outlier points that don't belong to any of the clusters. Pretty cool!</itunes:summary>
      <itunes:subtitle>DBSCAN is a density-based clustering algorithm fo…</itunes:subtitle>
      <description>DBSCAN is a density-based clustering algorithm for doing unsupervised learning.  It's pretty nifty: with just two parameters, you can specify "dense" regions in your data, and grow those regions out organically to find clusters.  In particular, it can fit irregularly-shaped clusters, and it can also identify outlier points that don't belong to any of the clusters. Pretty cool!</description>
      <enclosure length="7794971" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/358247057-linear-digressions-clustering-with-dbscan.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/354931046</guid>
      <title>The Kaggle Survey on Data Science</title>
      <pubDate>Mon, 13 Nov 2017 02:49:44 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/kaggle-survey-produced</link>
      <itunes:duration>00:25:20</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Want to know what's going on in data science these days?  There's no better way than to analyze a survey with over 16,000 responses that recently released by Kaggle.  Kaggle asked practicing and aspiring data scientists about themselves, their tools, how they find jobs, what they find challenging about their jobs, and many other questions.  Then Kaggle released an interactive summary of the data, as well as the anonymized dataset itself, to help data scientists understand the trends in the data.  In this episode, we'll go through some of the survey toplines that we found most interesting and counterintuitive.</itunes:summary>
      <itunes:subtitle>Want to know what's going on in data science thes…</itunes:subtitle>
      <description>Want to know what's going on in data science these days?  There's no better way than to analyze a survey with over 16,000 responses that recently released by Kaggle.  Kaggle asked practicing and aspiring data scientists about themselves, their tools, how they find jobs, what they find challenging about their jobs, and many other questions.  Then Kaggle released an interactive summary of the data, as well as the anonymized dataset itself, to help data scientists understand the trends in the data.  In this episode, we'll go through some of the survey toplines that we found most interesting and counterintuitive.</description>
      <enclosure length="12160137" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/354931046-linear-digressions-kaggle-survey-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/351762985</guid>
      <title>Machine Learning: The High Interest Credit Card of Technical Debt</title>
      <pubDate>Mon, 06 Nov 2017 04:35:17 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/machine-learning-the-high-interest-credit-card-of-technical-debt</link>
      <itunes:duration>00:22:18</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week, we've got a fun paper by our friends at Google about the hidden costs of maintaining machine learning workflows.  If you've worked in software before, you're probably familiar with the idea of technical debt, which are inefficiencies that crop up in the code when you're trying to go fast.  You take shortcuts, hard-code variable values, skimp on the documentation, and generally write not-that-great code in order to get something done quickly, and then end up paying for it later on.  This is technical debt, and it's particularly easy to accrue with machine learning workflows.  That's the premise of this episode's paper.</itunes:summary>
      <itunes:subtitle>This week, we've got a fun paper by our friends a…</itunes:subtitle>
      <description>This week, we've got a fun paper by our friends at Google about the hidden costs of maintaining machine learning workflows.  If you've worked in software before, you're probably familiar with the idea of technical debt, which are inefficiencies that crop up in the code when you're trying to go fast.  You take shortcuts, hard-code variable values, skimp on the documentation, and generally write not-that-great code in order to get something done quickly, and then end up paying for it later on.  This is technical debt, and it's particularly easy to accrue with machine learning workflows.  That's the premise of this episode's paper.</description>
      <enclosure length="10702713" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/351762985-linear-digressions-machine-learning-the-high-interest-credit-card-of-technical-debt.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/349204647</guid>
      <title>Improving Upon a First-Draft Data Science Analysis</title>
      <pubDate>Mon, 30 Oct 2017 01:38:28 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/improving-upon-a-first-draft-data-science-analysis</link>
      <itunes:duration>00:15:01</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>There are a lot of good resources out there for getting started with data science and machine learning, where you can walk through starting with a dataset and ending up with a model and set of predictions.  Think something like the homework for your favorite machine learning class, or your most recent online machine learning competition.  However, if you've ever tried to maintain a machine learning workflow (as opposed to building it from scratch), you know that taking a simple modeling script and turning it into clean, well-structured and maintainable software is way harder than most people give it credit for.  That said, if you're a professional data scientist (or want to be one), this is one of the most important skills you can develop.

In this episode, we'll walk through a workshop Katie is giving at the Open Data Science Conference in San Francisco in November 2017, which covers building a machine learning workflow that's more maintainable than a simple script.  If you'll be at ODSC, come say hi, and if you're not, here's a sneak preview!</itunes:summary>
      <itunes:subtitle>There are a lot of good resources out there for g…</itunes:subtitle>
      <description>There are a lot of good resources out there for getting started with data science and machine learning, where you can walk through starting with a dataset and ending up with a model and set of predictions.  Think something like the homework for your favorite machine learning class, or your most recent online machine learning competition.  However, if you've ever tried to maintain a machine learning workflow (as opposed to building it from scratch), you know that taking a simple modeling script and turning it into clean, well-structured and maintainable software is way harder than most people give it credit for.  That said, if you're a professional data scientist (or want to be one), this is one of the most important skills you can develop.

In this episode, we'll walk through a workshop Katie is giving at the Open Data Science Conference in San Francisco in November 2017, which covers building a machine learning workflow that's more maintainable than a simple script.  If you'll be at ODSC, come say hi, and if you're not, here's a sneak preview!</description>
      <enclosure length="7209410" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/349204647-linear-digressions-improving-upon-a-first-draft-data-science-analysis.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/348143133</guid>
      <title>Survey Raking</title>
      <pubDate>Mon, 23 Oct 2017 02:51:49 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/survey-raking</link>
      <itunes:duration>00:17:23</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It's quite common for survey respondents not to be representative of the larger population from which they are drawn.  But if you're a researcher, you need to study the larger population using data from your survey respondents, so what should you do?  Reweighting the survey data, so that things like demographic distributions look similar between the survey and general populations, is a standard technique and in this episode we'll talk about survey raking, a way to calculate survey weights when there are several distributions of interest that need to be matched.</itunes:summary>
      <itunes:subtitle>It's quite common for survey respondents not to b…</itunes:subtitle>
      <description>It's quite common for survey respondents not to be representative of the larger population from which they are drawn.  But if you're a researcher, you need to study the larger population using data from your survey respondents, so what should you do?  Reweighting the survey data, so that things like demographic distributions look similar between the survey and general populations, is a standard technique and in this episode we'll talk about survey raking, a way to calculate survey weights when there are several distributions of interest that need to be matched.</description>
      <enclosure length="25051879" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/348143133-linear-digressions-survey-raking.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/347080930</guid>
      <title>Happy Hacktoberfest</title>
      <pubDate>Mon, 16 Oct 2017 01:46:19 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/happy-hacktoberfest</link>
      <itunes:duration>00:15:40</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It's the middle of October, so you've already made two pull requests to open source repos, right? If you have no idea what we're talking about, spend the next 20 minutes or so with us talking about the importance of open source software and how you can get involved.  You can even get a free t-shirt!

Hacktoberfest main page: https://hacktoberfest.digitalocean.com/#details</itunes:summary>
      <itunes:subtitle>It's the middle of October, so you've already mad…</itunes:subtitle>
      <description>It's the middle of October, so you've already made two pull requests to open source repos, right? If you have no idea what we're talking about, spend the next 20 minutes or so with us talking about the importance of open source software and how you can get involved.  You can even get a free t-shirt!

Hacktoberfest main page: https://hacktoberfest.digitalocean.com/#details</description>
      <enclosure length="22564813" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/347080930-linear-digressions-happy-hacktoberfest.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/345996498</guid>
      <title>Re - Release: Kalman Runners</title>
      <pubDate>Mon, 09 Oct 2017 02:28:24 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-kalman-runners</link>
      <itunes:duration>00:17:53</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In honor of the Chicago marathon this weekend (and due in large part to Katie recovering from running in it...) we have a re-release of an episode about Kalman filters, which is part algorithm part elaborate metaphor for figuring out, if you're running a race but don't have a watch, how fast you're going.

Katie's Chicago race report:

miles 1-13: light ankle pain, lovely cool weather, the most fun EVAR
miles 13-17: no more ankle pain but quads start getting tight, it's a little more effort
miles 17-20: oof, really tight legs but still plenty of gas in then tank.
miles 20-23: it's warmer out now, legs hurt a lot but running through Pilsen and Chinatown is too fun to notice
mile 24: ugh cramp everything hurts
miles 25-26.2: awesome crowd support, really tired and loving every second
Final time: 3:54:35</itunes:summary>
      <itunes:subtitle>In honor of the Chicago marathon this weekend (an…</itunes:subtitle>
      <description>In honor of the Chicago marathon this weekend (and due in large part to Katie recovering from running in it...) we have a re-release of an episode about Kalman filters, which is part algorithm part elaborate metaphor for figuring out, if you're running a race but don't have a watch, how fast you're going.

Katie's Chicago race report:

miles 1-13: light ankle pain, lovely cool weather, the most fun EVAR
miles 13-17: no more ankle pain but quads start getting tight, it's a little more effort
miles 17-20: oof, really tight legs but still plenty of gas in then tank.
miles 20-23: it's warmer out now, legs hurt a lot but running through Pilsen and Chinatown is too fun to notice
mile 24: ugh cramp everything hurts
miles 25-26.2: awesome crowd support, really tired and loving every second
Final time: 3:54:35</description>
      <enclosure length="25768470" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/345996498-linear-digressions-re-release-kalman-runners.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/344945754</guid>
      <title>Neural Net Dropout</title>
      <pubDate>Mon, 02 Oct 2017 03:32:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/neural-net-dropout</link>
      <itunes:duration>00:18:53</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Neural networks are complex models with many parameters and can be prone to overfitting.  There's a surprisingly simple way to guard against this: randomly destroy connections between hidden units, also known as dropout.  It seems counterintuitive that undermining the structural integrity of the neural net makes it robust against overfitting, but in the world of neural nets, weirdness is just how things go sometimes.

Relevant links: https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf</itunes:summary>
      <itunes:subtitle>Neural networks are complex models with many para…</itunes:subtitle>
      <description>Neural networks are complex models with many parameters and can be prone to overfitting.  There's a surprisingly simple way to guard against this: randomly destroy connections between hidden units, also known as dropout.  It seems counterintuitive that undermining the structural integrity of the neural net makes it robust against overfitting, but in the world of neural nets, weirdness is just how things go sometimes.

Relevant links: https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf</description>
      <enclosure length="27204160" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/344945754-linear-digressions-neural-net-dropout.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/343891449</guid>
      <title>Disciplined Data Science</title>
      <pubDate>Mon, 25 Sep 2017 01:49:41 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/disciplined-data-science</link>
      <itunes:duration>00:29:34</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>As data science matures as a field, it's becoming clearer what attributes a data science team needs to have to elevate their work to the next level.  Most of our episodes are about the cool work being done by other people, but this one summarizes some thinking Katie's been doing herself around how to guide data science teams toward more mature, effective practices.  We'll go through five key characteristics of great data science teams, which we collectively refer to as "disciplined data science," and why they matter.</itunes:summary>
      <itunes:subtitle>As data science matures as a field, it's becoming…</itunes:subtitle>
      <description>As data science matures as a field, it's becoming clearer what attributes a data science team needs to have to elevate their work to the next level.  Most of our episodes are about the cool work being done by other people, but this one summarizes some thinking Katie's been doing herself around how to guide data science teams toward more mature, effective practices.  We'll go through five key characteristics of great data science teams, which we collectively refer to as "disciplined data science," and why they matter.</description>
      <enclosure length="42592372" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/343891449-linear-digressions-disciplined-data-science.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/342828639</guid>
      <title>Hurricane Forecasting</title>
      <pubDate>Mon, 18 Sep 2017 01:37:15 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/hurricane-forecasting</link>
      <itunes:duration>00:27:57</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It's been a busy hurricane season in the Southeastern United States, with millions of people making life-or-death decisions based on the forecasts around where the hurricanes will hit and with what intensity.  In this episode we'll deconstruct those models, talking about the different types of models, the theory behind them, and how they've evolved through the years.</itunes:summary>
      <itunes:subtitle>It's been a busy hurricane season in the Southeas…</itunes:subtitle>
      <description>It's been a busy hurricane season in the Southeastern United States, with millions of people making life-or-death decisions based on the forecasts around where the hurricanes will hit and with what intensity.  In this episode we'll deconstruct those models, talking about the different types of models, the theory behind them, and how they've evolved through the years.</description>
      <enclosure length="40263295" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/342828639-linear-digressions-hurricane-forecasting.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/341822881</guid>
      <title>Finding Spy Planes with Machine Learning</title>
      <pubDate>Mon, 11 Sep 2017 02:11:22 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/finding-spy-planes-with-machine-learning</link>
      <itunes:duration>00:18:09</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>There are law enforcement surveillance aircraft circling over the United States every day, and in this episode, we'll talk about how some folks at BuzzFeed used public data and machine learning to find them.  The fun thing here, in our opinion, is the blend of intrigue (spy planes!) with tech journalism and a heavy dash of publicly available and reproducible analysis code so that you (yes, you!) can see exactly how BuzzFeed identifies the surveillance planes.</itunes:summary>
      <itunes:subtitle>There are law enforcement surveillance aircraft c…</itunes:subtitle>
      <description>There are law enforcement surveillance aircraft circling over the United States every day, and in this episode, we'll talk about how some folks at BuzzFeed used public data and machine learning to find them.  The fun thing here, in our opinion, is the blend of intrigue (spy planes!) with tech journalism and a heavy dash of publicly available and reproducible analysis code so that you (yes, you!) can see exactly how BuzzFeed identifies the surveillance planes.</description>
      <enclosure length="26149649" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/341822881-linear-digressions-finding-spy-planes-with-machine-learning.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/340795203</guid>
      <title>Data Provenance</title>
      <pubDate>Mon, 04 Sep 2017 01:35:00 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-provenance</link>
      <itunes:duration>00:22:48</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Software engineers are familiar with the idea of versioning code, so you can go back later and revive a past state of the system.  For data scientists who might want to reconstruct past models, though, it's not just about keeping the modeling code.  It's also about saving a version of the data that made the model.  There are a lot of other benefits to keeping track of datasets, so in this episode we'll talk about data lineage or data provenance.</itunes:summary>
      <itunes:subtitle>Software engineers are familiar with the idea of …</itunes:subtitle>
      <description>Software engineers are familiar with the idea of versioning code, so you can go back later and revive a past state of the system.  For data scientists who might want to reconstruct past models, though, it's not just about keeping the modeling code.  It's also about saving a version of the data that made the model.  There are a lot of other benefits to keeping track of datasets, so in this episode we'll talk about data lineage or data provenance.</description>
      <enclosure length="32846609" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/340795203-linear-digressions-data-provenance.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/339789401</guid>
      <title>Adversarial Examples</title>
      <pubDate>Mon, 28 Aug 2017 02:25:14 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/adversarial-examples</link>
      <itunes:duration>00:16:11</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Even as we rely more and more on machine learning algorithms to help with everyday decision-making, we're learning more and more about how they're frighteningly easy to fool sometimes.  Today we have a roundup of a few successful efforts to create robust adversarial examples, including what it means for an adversarial example to be robust and what this might mean for machine learning in the future.</itunes:summary>
      <itunes:subtitle>Even as we rely more and more on machine learning…</itunes:subtitle>
      <description>Even as we rely more and more on machine learning algorithms to help with everyday decision-making, we're learning more and more about how they're frighteningly easy to fool sometimes.  Today we have a roundup of a few successful efforts to create robust adversarial examples, including what it means for an adversarial example to be robust and what this might mean for machine learning in the future.</description>
      <enclosure length="23312124" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/339789401-linear-digressions-adversarial-examples.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/338764019</guid>
      <title>Jupyter Notebooks</title>
      <pubDate>Mon, 21 Aug 2017 01:09:32 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/jupyter-notebooks</link>
      <itunes:duration>00:15:50</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week's episode is just in time for JupyterCon in NYC, August 22-25...

Jupyter notebooks are probably familiar to a lot of data nerds out there as a great open-source tool for exploring data, doing quick visualizations, and packaging code snippets with explanations for sharing your work with others.  If you're not a data person, or you are but you haven't tried out Jupyter notebooks yet, here's your nudge to go give them a try.  In this episode we'll go back to the old days, before notebooks, and talk about all the ways that data scientists like to work that wasn't particularly well-suited to the command line + text editor setup, and talk about how notebooks have evolved over their lifetime to become even more powerful and well-suited to the data scientist's workflow.</itunes:summary>
      <itunes:subtitle>This week's episode is just in time for JupyterCo…</itunes:subtitle>
      <description>This week's episode is just in time for JupyterCon in NYC, August 22-25...

Jupyter notebooks are probably familiar to a lot of data nerds out there as a great open-source tool for exploring data, doing quick visualizations, and packaging code snippets with explanations for sharing your work with others.  If you're not a data person, or you are but you haven't tried out Jupyter notebooks yet, here's your nudge to go give them a try.  In this episode we'll go back to the old days, before notebooks, and talk about all the ways that data scientists like to work that wasn't particularly well-suited to the command line + text editor setup, and talk about how notebooks have evolved over their lifetime to become even more powerful and well-suited to the data scientist's workflow.</description>
      <enclosure length="22812454" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/338764019-linear-digressions-jupyter-notebooks.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/337766465</guid>
      <title>Curing Cancer with Machine Learning is Super Hard</title>
      <pubDate>Mon, 14 Aug 2017 01:49:52 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/curing-cancer-with-machine-learning-is-super-hard</link>
      <itunes:duration>00:19:20</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Today, a dispatch on what can go wrong when machine learning hype outpaces reality: a high-profile partnership between IBM Watson and MD Anderson Cancer Center has recently hit the rocks as it turns out to be tougher than expected to cure cancer with artificial intelligence.  There are enough conflicting accounts in the media to make it tough to say exactly went wrong, but it's a good chance to remind ourselves that even in a post-AI world, hard problems remain hard.</itunes:summary>
      <itunes:subtitle>Today, a dispatch on what can go wrong when machi…</itunes:subtitle>
      <description>Today, a dispatch on what can go wrong when machine learning hype outpaces reality: a high-profile partnership between IBM Watson and MD Anderson Cancer Center has recently hit the rocks as it turns out to be tougher than expected to cure cancer with artificial intelligence.  There are enough conflicting accounts in the media to make it tough to say exactly went wrong, but it's a good chance to remind ourselves that even in a post-AI world, hard problems remain hard.</description>
      <enclosure length="27857430" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/337766465-linear-digressions-curing-cancer-with-machine-learning-is-super-hard.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/336786494</guid>
      <title>KL Divergence</title>
      <pubDate>Mon, 07 Aug 2017 03:07:15 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/kl-divergence</link>
      <itunes:duration>00:25:38</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Kullback Leibler divergence, or KL divergence, is a measure of information loss when you try to approximate one distribution with another distribution.  It comes to us originally from information theory, but today underpins other, more machine-learning-focused algorithms like t-SNE.  And boy oh boy can it be tough to explain.  But we're trying our hardest in this episode!</itunes:summary>
      <itunes:subtitle>Kullback Leibler divergence, or KL divergence, is…</itunes:subtitle>
      <description>Kullback Leibler divergence, or KL divergence, is a measure of information loss when you try to approximate one distribution with another distribution.  It comes to us originally from information theory, but today underpins other, more machine-learning-focused algorithms like t-SNE.  And boy oh boy can it be tough to explain.  But we're trying our hardest in this episode!</description>
      <enclosure length="36926726" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/336786494-linear-digressions-kl-divergence.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/335578907</guid>
      <title>Sabermetrics</title>
      <pubDate>Mon, 31 Jul 2017 01:15:37 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/sabermetrics</link>
      <itunes:duration>00:25:48</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It's moneyball time!  SABR (the Society for American Baseball Research) is the world's largest organization of statistics-minded baseball enthusiasts, who are constantly applying the craft of scientific analysis to trying to figure out who are the best baseball teams and players.  It can be hard to objectively measure sports greatness, but baseball has a data-rich history and plenty of nerdy fans interested in analyzing that data.  In this episode we'll dissect a few of the metrics from standard baseball and compare them to related metrics from Sabermetrics, so you can nerd out more effectively at your next baseball game.</itunes:summary>
      <itunes:subtitle>It's moneyball time!  SABR (the Society for Ameri…</itunes:subtitle>
      <description>It's moneyball time!  SABR (the Society for American Baseball Research) is the world's largest organization of statistics-minded baseball enthusiasts, who are constantly applying the craft of scientific analysis to trying to figure out who are the best baseball teams and players.  It can be hard to objectively measure sports greatness, but baseball has a data-rich history and plenty of nerdy fans interested in analyzing that data.  In this episode we'll dissect a few of the metrics from standard baseball and compare them to related metrics from Sabermetrics, so you can nerd out more effectively at your next baseball game.</description>
      <enclosure length="37171233" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/335578907-linear-digressions-sabermetrics.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/334515037</guid>
      <title>What Data Scientists Can Learn from Software Engineers</title>
      <pubDate>Mon, 24 Jul 2017 01:52:26 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/what-data-scientists-can-learn-from-software-engineers</link>
      <itunes:duration>00:23:46</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We're back again with friend of the pod Walt, former software engineer extraordinaire and current data scientist extraordinaire, to talk about some best practices from software engineering that are ready to jump the fence over to data science.  If last week's episode was for software engineers who are interested in becoming more like data scientists, then this week's episode is for data scientists who are looking to improve their game with best practices from software engineering.</itunes:summary>
      <itunes:subtitle>We're back again with friend of the pod Walt, for…</itunes:subtitle>
      <description>We're back again with friend of the pod Walt, former software engineer extraordinaire and current data scientist extraordinaire, to talk about some best practices from software engineering that are ready to jump the fence over to data science.  If last week's episode was for software engineers who are interested in becoming more like data scientists, then this week's episode is for data scientists who are looking to improve their game with best practices from software engineering.</description>
      <enclosure length="34243428" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/334515037-linear-digressions-what-data-scientists-can-learn-from-software-engineers.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/333510317</guid>
      <title>Software Engineering to Data Science</title>
      <pubDate>Mon, 17 Jul 2017 02:36:00 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/walt-1-produced</link>
      <itunes:duration>00:19:05</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Data scientists and software engineers often work side by side, building out and scaling technical products and services that are data-heavy but also require a lot of software engineering to build and maintain.  In this episode, we'll chat with a Friend of the Pod named Walt, who started out as a software engineer but works as a data scientist now.  We'll talk about that transition from software engineering to data science, and what special capabilities software engineers have that data scientists might benefit from knowing about (and vice versa).</itunes:summary>
      <itunes:subtitle>Data scientists and software engineers often work…</itunes:subtitle>
      <description>Data scientists and software engineers often work side by side, building out and scaling technical products and services that are data-heavy but also require a lot of software engineering to build and maintain.  In this episode, we'll chat with a Friend of the Pod named Walt, who started out as a software engineer but works as a data scientist now.  We'll talk about that transition from software engineering to data science, and what special capabilities software engineers have that data scientists might benefit from knowing about (and vice versa).</description>
      <enclosure length="27491925" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/333510317-linear-digressions-walt-1-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/332432194</guid>
      <title>Re-Release: Fighting Cholera with Data, 1854</title>
      <pubDate>Mon, 10 Jul 2017 00:19:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-fighting-cholera-with-data-1854</link>
      <itunes:duration>00:12:04</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This episode was first released in November 2014.

In the 1850s, there were a lot of things we didn’t know yet: how to create an airplane, how to split an atom, or how to control the spread of a common but deadly disease: cholera.

When a cholera outbreak in London killed scores of people, a doctor named John Snow used it as a chance to study whether the cause might be very small organisms that were spreading through the water supply (the prevailing theory at the time was miasma, or “bad air”). By tracing the geography of all the deaths from the outbreak, Snow was practicing elementary data science--and stumbled upon one of history’s most famous outliers.

In this episode, we’ll tell you more about this single data point, a case of cholera that cracked the case wide open for Snow and provided critical validation for the germ theory of disease.</itunes:summary>
      <itunes:subtitle>This episode was first released in November 2014.…</itunes:subtitle>
      <description>This episode was first released in November 2014.

In the 1850s, there were a lot of things we didn’t know yet: how to create an airplane, how to split an atom, or how to control the spread of a common but deadly disease: cholera.

When a cholera outbreak in London killed scores of people, a doctor named John Snow used it as a chance to study whether the cause might be very small organisms that were spreading through the water supply (the prevailing theory at the time was miasma, or “bad air”). By tracing the geography of all the deaths from the outbreak, Snow was practicing elementary data science--and stumbled upon one of history’s most famous outliers.

In this episode, we’ll tell you more about this single data point, a case of cholera that cracked the case wide open for Snow and provided critical validation for the germ theory of disease.</description>
      <enclosure length="17376268" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/332432194-linear-digressions-re-release-fighting-cholera-with-data-1854.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/331217467</guid>
      <title>Re-Release: Data Mining Enron</title>
      <pubDate>Sun, 02 Jul 2017 17:53:42 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/re-release-data-mining-enron</link>
      <itunes:duration>00:32:16</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This episode was first release in February 2015.

In 2000, Enron was one of the largest and companies in the world, praised far and wide for its innovations in energy distribution and many other markets. By 2002, it was apparent that many bad apples had been cooking the books, and billions of dollars and thousands of jobs disappeared.

In the aftermath, surprisingly, one of the greatest datasets in all of machine learning was born--the Enron emails corpus. Hundreds of thousands of emails amongst top executives were made public; there's no realistic chance any dataset like this will ever be made public again.

But the dataset that was released has gone on to immortality, serving as the basis for a huge variety of advances in machine learning and other fields.</itunes:summary>
      <itunes:subtitle>This episode was first release in February 2015.
…</itunes:subtitle>
      <description>This episode was first release in February 2015.

In 2000, Enron was one of the largest and companies in the world, praised far and wide for its innovations in energy distribution and many other markets. By 2002, it was apparent that many bad apples had been cooking the books, and billions of dollars and thousands of jobs disappeared.

In the aftermath, surprisingly, one of the greatest datasets in all of machine learning was born--the Enron emails corpus. Hundreds of thousands of emails amongst top executives were made public; there's no realistic chance any dataset like this will ever be made public again.

But the dataset that was released has gone on to immortality, serving as the basis for a huge variety of advances in machine learning and other fields.</description>
      <enclosure length="46466854" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/331217467-linear-digressions-re-release-data-mining-enron.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/330116463</guid>
      <title>Factorization Machines</title>
      <pubDate>Mon, 26 Jun 2017 02:23:14 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/factorization-machines</link>
      <itunes:duration>00:19:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>What do you get when you cross a support vector machine with matrix factorization?  You get a factorization machine, and a darn fine algorithm for recommendation engines.</itunes:summary>
      <itunes:subtitle>What do you get when you cross a support vector m…</itunes:subtitle>
      <description>What do you get when you cross a support vector machine with matrix factorization?  You get a factorization machine, and a darn fine algorithm for recommendation engines.</description>
      <enclosure length="28653642" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/330116463-linear-digressions-factorization-machines.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/328763446</guid>
      <title>Anscombe's Quartet</title>
      <pubDate>Mon, 19 Jun 2017 02:19:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/anscombes-quartet-produced</link>
      <itunes:duration>00:15:39</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Anscombe's Quartet is a set of four datasets that have the same mean, variance and correlation but look very different.  It's easy to think that having a good set of summary statistics (like mean, variance and correlation) can tell you everything important about a dataset, or at least enough to know if two datasets are extremely similar or extremely different, but Anscombe's Quartet will always be standing behind you, laughing at how silly that idea is.

Anscombe's Quartet was devised in 1973 as an example of how summary statistics can be misleading, but today we can even do one better: the Datasaurus Dozen is a set of twelve datasets, all extremely visually distinct, that have the same summary stats as a source dataset that, there's no other way to put this, looks like a dinosaur.  It's an example of how datasets can be generated to look like almost anything while still preserving arbitrary summary statistics.  In other words, Anscombe's Quartets can be generated at-will and we all should be reminded to visualize our data (not just compute summary statistics) if we want to claim to really understand it.</itunes:summary>
      <itunes:subtitle>Anscombe's Quartet is a set of four datasets that…</itunes:subtitle>
      <description>Anscombe's Quartet is a set of four datasets that have the same mean, variance and correlation but look very different.  It's easy to think that having a good set of summary statistics (like mean, variance and correlation) can tell you everything important about a dataset, or at least enough to know if two datasets are extremely similar or extremely different, but Anscombe's Quartet will always be standing behind you, laughing at how silly that idea is.

Anscombe's Quartet was devised in 1973 as an example of how summary statistics can be misleading, but today we can even do one better: the Datasaurus Dozen is a set of twelve datasets, all extremely visually distinct, that have the same summary stats as a source dataset that, there's no other way to put this, looks like a dinosaur.  It's an example of how datasets can be generated to look like almost anything while still preserving arbitrary summary statistics.  In other words, Anscombe's Quartets can be generated at-will and we all should be reminded to visualize our data (not just compute summary statistics) if we want to claim to really understand it.</description>
      <enclosure length="22556663" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/328763446-linear-digressions-anscombes-quartet-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/327634971</guid>
      <title>Traffic Metering Algorithms</title>
      <pubDate>Mon, 12 Jun 2017 03:01:49 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/traffic-metering-algorithms-1</link>
      <itunes:duration>00:18:34</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Originally release June 2016

This episode is for all you (us) traffic nerds--we're talking about the hidden structure underlying traffic on-ramp metering systems. These systems slow down the flow of traffic onto highways so that the highways don't get overloaded with cars and clog up. If you're someone who listens to podcasts while commuting, and especially if your area has on-ramp metering, you'll never look at highway access control the same way again (yeah, we know this is super nerdy; it's also super awesome).</itunes:summary>
      <itunes:subtitle>Originally release June 2016

This episode is for…</itunes:subtitle>
      <description>Originally release June 2016

This episode is for all you (us) traffic nerds--we're talking about the hidden structure underlying traffic on-ramp metering systems. These systems slow down the flow of traffic onto highways so that the highways don't get overloaded with cars and clog up. If you're someone who listens to podcasts while commuting, and especially if your area has on-ramp metering, you'll never look at highway access control the same way again (yeah, we know this is super nerdy; it's also super awesome).</description>
      <enclosure length="26750883" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/327634971-linear-digressions-traffic-metering-algorithms-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/326076803</guid>
      <title>Page Rank</title>
      <pubDate>Mon, 05 Jun 2017 01:46:35 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/page-rank</link>
      <itunes:duration>00:19:58</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The year: 1998.  The size of the web: 150 million pages.  The problem: information retrieval.  How do you find the "best" web pages to return in response to a query?  A graduate student named Larry Page had an idea for how it could be done better and created a search engine as a research project.  That search engine was called Google.</itunes:summary>
      <itunes:subtitle>The year: 1998.  The size of the web: 150 million…</itunes:subtitle>
      <description>The year: 1998.  The size of the web: 150 million pages.  The problem: information retrieval.  How do you find the "best" web pages to return in response to a query?  A graduate student named Larry Page had an idea for how it could be done better and created a search engine as a research project.  That search engine was called Google.</description>
      <enclosure length="28762730" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/326076803-linear-digressions-page-rank.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/324969280</guid>
      <title>Fractional Dimensions</title>
      <pubDate>Mon, 29 May 2017 02:54:46 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/fractional-dimensions</link>
      <itunes:duration>00:20:28</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We chat about fractional dimensions, and what the actual heck those are.</itunes:summary>
      <itunes:subtitle>We chat about fractional dimensions, and what the…</itunes:subtitle>
      <description>We chat about fractional dimensions, and what the actual heck those are.</description>
      <enclosure length="29473678" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/324969280-linear-digressions-fractional-dimensions.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/323852156</guid>
      <title>Things You Learn When Building Models for Big Data</title>
      <pubDate>Mon, 22 May 2017 01:44:13 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/things-you-learn-when-building-models-for-big-data</link>
      <itunes:duration>00:21:39</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>As more and more data gets collected seemingly every day, and data scientists use that data for modeling, the technical limits associated with machine learning on big datasets keep getting pushed back.  This week is a first-hand case study in using scikit-learn (a popular python machine learning library) on multi-terabyte datasets, which is something that Katie does a lot for her day job at Civis Analytics.  There are a lot of considerations for doing something like this--cloud computing, artful use of parallelization, considerations of model complexity, and the computational demands of training vs. prediction, to name just a few.</itunes:summary>
      <itunes:subtitle>As more and more data gets collected seemingly ev…</itunes:subtitle>
      <description>As more and more data gets collected seemingly every day, and data scientists use that data for modeling, the technical limits associated with machine learning on big datasets keep getting pushed back.  This week is a first-hand case study in using scikit-learn (a popular python machine learning library) on multi-terabyte datasets, which is something that Katie does a lot for her day job at Civis Analytics.  There are a lot of considerations for doing something like this--cloud computing, artful use of parallelization, considerations of model complexity, and the computational demands of training vs. prediction, to name just a few.</description>
      <enclosure length="31172055" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/323852156-linear-digressions-things-you-learn-when-building-models-for-big-data.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/322657868</guid>
      <title>How to Find New Things to Learn</title>
      <pubDate>Mon, 15 May 2017 01:49:26 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/how-to-find-new-things-to-learn</link>
      <itunes:duration>00:17:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you're anything like us, you a) always are curious to learn more about data science and machine learning and stuff, and b) are usually overwhelmed by how much content is out there (not all of it very digestible).  We hope this podcast is a part of the solution for you, but if you're looking to go farther (who isn't?) then we have a few new resources that are presenting high-quality content in a fresh, accessible way.  Boring old PDFs full of inscrutable math notation, your days are numbered!</itunes:summary>
      <itunes:subtitle>If you're anything like us, you a) always are cur…</itunes:subtitle>
      <description>If you're anything like us, you a) always are curious to learn more about data science and machine learning and stuff, and b) are usually overwhelmed by how much content is out there (not all of it very digestible).  We hope this podcast is a part of the solution for you, but if you're looking to go farther (who isn't?) then we have a few new resources that are presenting high-quality content in a fresh, accessible way.  Boring old PDFs full of inscrutable math notation, your days are numbered!</description>
      <enclosure length="25794175" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/322657868-linear-digressions-how-to-find-new-things-to-learn.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/321493998</guid>
      <title>Federated Learning</title>
      <pubDate>Mon, 08 May 2017 01:50:40 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/federated-learning</link>
      <itunes:duration>00:14:03</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>As machine learning makes its way into more and more mobile devices, an interesting question presents itself: how can we have an algorithm learn from training data that's being supplied as users interact with the algorithm?  In other words, how do we do machine learning when the training dataset is distributed across many devices, imbalanced, and the usage associated with any one user needs to be obscured somewhat to protect the privacy of that user?  Enter Federated Learning, a set of related algorithms from Google that are designed to help out in exactly this scenario.  If you've used keyboard shortcuts or autocomplete on an Android phone, chances are you've encountered Federated Learning even if you didn't know it.</itunes:summary>
      <itunes:subtitle>As machine learning makes its way into more and m…</itunes:subtitle>
      <description>As machine learning makes its way into more and more mobile devices, an interesting question presents itself: how can we have an algorithm learn from training data that's being supplied as users interact with the algorithm?  In other words, how do we do machine learning when the training dataset is distributed across many devices, imbalanced, and the usage associated with any one user needs to be obscured somewhat to protect the privacy of that user?  Enter Federated Learning, a set of related algorithms from Google that are designed to help out in exactly this scenario.  If you've used keyboard shortcuts or autocomplete on an Android phone, chances are you've encountered Federated Learning even if you didn't know it.</description>
      <enclosure length="20231347" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/321493998-linear-digressions-federated-learning.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/320270672</guid>
      <title>Word2Vec</title>
      <pubDate>Mon, 01 May 2017 02:17:36 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/word2vec</link>
      <itunes:duration>00:17:59</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Word2Vec is probably the go-to algorithm for vectorizing text data these days.  Which makes sense, because it is wicked cool.  Word2Vec has it all: neural networks, skip-grams and bag-of-words implementations, a multiclass classifier that gets swapped out for a binary classifier, made-up dummy words, and a model that isn't actually used to predict anything (usually).  And all that's before we get to the part about how Word2Vec allows you to do algebra with text.  Seriously, this stuff is cool.</itunes:summary>
      <itunes:subtitle>Word2Vec is probably the go-to algorithm for vect…</itunes:subtitle>
      <description>Word2Vec is probably the go-to algorithm for vectorizing text data these days.  Which makes sense, because it is wicked cool.  Word2Vec has it all: neural networks, skip-grams and bag-of-words implementations, a multiclass classifier that gets swapped out for a binary classifier, made-up dummy words, and a model that isn't actually used to predict anything (usually).  And all that's before we get to the part about how Word2Vec allows you to do algebra with text.  Seriously, this stuff is cool.</description>
      <enclosure length="25893231" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/320270672-linear-digressions-word2vec.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/319173085</guid>
      <title>Feature Processing for Text Analytics</title>
      <pubDate>Mon, 24 Apr 2017 02:17:24 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/feature-processing-for-text-analytics</link>
      <itunes:duration>00:17:28</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It seems like every day there's more and more machine learning problems that involve learning on text data, but text itself makes for fairly lousy inputs to machine learning algorithms.  That's why there are text vectorization algorithms, which re-format text data so it's ready for using for machine learning.  In this episode, we'll go over some of the most common and useful ways to preprocess text data for machine learning.</itunes:summary>
      <itunes:subtitle>It seems like every day there's more and more mac…</itunes:subtitle>
      <description>It seems like every day there's more and more machine learning problems that involve learning on text data, but text itself makes for fairly lousy inputs to machine learning algorithms.  That's why there are text vectorization algorithms, which re-format text data so it's ready for using for machine learning.  In this episode, we'll go over some of the most common and useful ways to preprocess text data for machine learning.</description>
      <enclosure length="25160339" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/319173085-linear-digressions-feature-processing-for-text-analytics.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/318014251</guid>
      <title>Education Analytics</title>
      <pubDate>Mon, 17 Apr 2017 02:09:26 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/education-analytics</link>
      <itunes:duration>00:21:05</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week we'll hop into the rapidly developing industry around predictive analytics for education.  For many of the students who eventually drop out, data science is showing that there might be early warning signs that the student is in trouble--we'll talk about what some of those signs are, and then dig into the meatier questions around discrimination, who owns a student's data, and correlation vs. causation.  Spoiler: we have more questions than we have answers on this one.

Bonus appearance from Maeby the dog, who isn't a data scientist but does like to steal food off the counter.</itunes:summary>
      <itunes:subtitle>This week we'll hop into the rapidly developing i…</itunes:subtitle>
      <description>This week we'll hop into the rapidly developing industry around predictive analytics for education.  For many of the students who eventually drop out, data science is showing that there might be early warning signs that the student is in trouble--we'll talk about what some of those signs are, and then dig into the meatier questions around discrimination, who owns a student's data, and correlation vs. causation.  Spoiler: we have more questions than we have answers on this one.

Bonus appearance from Maeby the dog, who isn't a data scientist but does like to steal food off the counter.</description>
      <enclosure length="30377724" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/318014251-linear-digressions-education-analytics.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/316925326</guid>
      <title>A Technical Deep Dive on Stanley, the First Self-Driving Car</title>
      <pubDate>Mon, 10 Apr 2017 01:50:01 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/a-technical-deep-dive-on-stanley-the-first-self-driving-car</link>
      <itunes:duration>00:40:42</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In our follow-up episode to last week's introduction to the first self-driving car, we will be doing a technical deep dive this week and talking about the most important systems for getting a car to drive itself 140 miles across the desert.  Lidar?  You betcha!  Drive-by-wire?  Of course!  Probabilistic terrain reconstruction?  Absolutely!  All this and more this week on Linear Digressions.</itunes:summary>
      <itunes:subtitle>In our follow-up episode to last week's introduct…</itunes:subtitle>
      <description>In our follow-up episode to last week's introduction to the first self-driving car, we will be doing a technical deep dive this week and talking about the most important systems for getting a car to drive itself 140 miles across the desert.  Lidar?  You betcha!  Drive-by-wire?  Of course!  Probabilistic terrain reconstruction?  Absolutely!  All this and more this week on Linear Digressions.</description>
      <enclosure length="58603762" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/316925326-linear-digressions-a-technical-deep-dive-on-stanley-the-first-self-driving-car.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/315776652</guid>
      <title>An Introduction to Stanley, the First Self-Driving Car</title>
      <pubDate>Mon, 03 Apr 2017 01:34:17 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/an-introduction-to-stanley-the-first-self-driving-car</link>
      <itunes:duration>00:13:07</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In October 2005, 23 cars lined up in the desert for a 140 mile race.  Not one of those cars had a driver.  This was the DARPA grand challenge to see if anyone could build an autonomous vehicle capable of navigating a desert route (and if so, whose car could do it the fastest); the winning car, Stanley, now sits in the Smithsonian Museum in Washington DC as arguably the world's first real self-driving car.  In this episode (part one of a two-parter), we'll revisit the DARPA grand challenge from 2005 and the rules and constraints of what it took for Stanley to win the competition.  Next week, we'll do a deep dive into Stanley's control systems and overall operation and what the key systems were that allowed Stanley to win the race.</itunes:summary>
      <itunes:subtitle>In October 2005, 23 cars lined up in the desert f…</itunes:subtitle>
      <description>In October 2005, 23 cars lined up in the desert for a 140 mile race.  Not one of those cars had a driver.  This was the DARPA grand challenge to see if anyone could build an autonomous vehicle capable of navigating a desert route (and if so, whose car could do it the fastest); the winning car, Stanley, now sits in the Smithsonian Museum in Washington DC as arguably the world's first real self-driving car.  In this episode (part one of a two-parter), we'll revisit the DARPA grand challenge from 2005 and the rules and constraints of what it took for Stanley to win the competition.  Next week, we'll do a deep dive into Stanley's control systems and overall operation and what the key systems were that allowed Stanley to win the race.</description>
      <enclosure length="18902864" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/315776652-linear-digressions-an-introduction-to-stanley-the-first-self-driving-car.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/314630327</guid>
      <title>Feature Importance</title>
      <pubDate>Mon, 27 Mar 2017 01:53:25 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/feature-importance</link>
      <itunes:duration>00:20:15</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Figuring out what features actually matter in a model is harder to figure out than you might first guess.  When a human makes a decision, you can just ask them--why did you do that?  But with machine learning models, not so much.  That's why we wanted to talk a bit about both regularization (again) and also other ways that you can figure out which models have the biggest impact on the predictions of your model.</itunes:summary>
      <itunes:subtitle>Figuring out what features actually matter in a m…</itunes:subtitle>
      <description>Figuring out what features actually matter in a model is harder to figure out than you might first guess.  When a human makes a decision, you can just ask them--why did you do that?  But with machine learning models, not so much.  That's why we wanted to talk a bit about both regularization (again) and also other ways that you can figure out which models have the biggest impact on the predictions of your model.</description>
      <enclosure length="29159582" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/314630327-linear-digressions-feature-importance.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/313333794</guid>
      <title>Space Codes!</title>
      <pubDate>Mon, 20 Mar 2017 02:50:57 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/space-codes</link>
      <itunes:duration>00:23:56</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It's hard to get information to and from Mars.  Mars is very far away, and expensive to get to, and the bandwidth for passing messages with Earth is not huge.  The messages you do pass have to traverse millions of miles, which provides ample opportunity for the message to get corrupted or scrambled.  How, then, can you encode messages so that errors can be detected and corrected?  How does the decoding process allow you to actually find and correct the errors?  In this episode, we'll talk about three pieces of the process (Reed-Solomon codes, convolutional codes, and Viterbi decoding) that allow the scientists at NASA to talk to our rovers on Mars.</itunes:summary>
      <itunes:subtitle>It's hard to get information to and from Mars.  M…</itunes:subtitle>
      <description>It's hard to get information to and from Mars.  Mars is very far away, and expensive to get to, and the bandwidth for passing messages with Earth is not huge.  The messages you do pass have to traverse millions of miles, which provides ample opportunity for the message to get corrupted or scrambled.  How, then, can you encode messages so that errors can be detected and corrected?  How does the decoding process allow you to actually find and correct the errors?  In this episode, we'll talk about three pieces of the process (Reed-Solomon codes, convolutional codes, and Viterbi decoding) that allow the scientists at NASA to talk to our rovers on Mars.</description>
      <enclosure length="34464111" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/313333794-linear-digressions-space-codes.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/312081585</guid>
      <title>Finding (and Studying) Wikipedia Trolls</title>
      <pubDate>Mon, 13 Mar 2017 01:44:55 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/finding-and-studying-wikipedia-trolls</link>
      <itunes:duration>00:15:50</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>You may be shocked to hear this, but sometimes, people on the internet can be mean.  For some of us this is just a minor annoyance, but if you're a maintainer or contributor of a large project like Wikipedia, abusive users can be a huge problem.  Fighting the problem starts with understanding it, and understanding it starts with measuring it; the thing is, for a huge website like Wikipedia, there can be millions of edits and comments where abuse might happen, so measurement isn't a simple task.  That's where machine learning comes in: by building an "abuse classifier," and pointing it at the Wikipedia edit corpus, researchers at Jigsaw and the Wikimedia foundation are for the first time able to estimate abuse rates and curate a dataset of abusive incidents.  Then those researchers, and others, can use that dataset to study the pathologies and effects of Wikipedia trolls.</itunes:summary>
      <itunes:subtitle>You may be shocked to hear this, but sometimes, p…</itunes:subtitle>
      <description>You may be shocked to hear this, but sometimes, people on the internet can be mean.  For some of us this is just a minor annoyance, but if you're a maintainer or contributor of a large project like Wikipedia, abusive users can be a huge problem.  Fighting the problem starts with understanding it, and understanding it starts with measuring it; the thing is, for a huge website like Wikipedia, there can be millions of edits and comments where abuse might happen, so measurement isn't a simple task.  That's where machine learning comes in: by building an "abuse classifier," and pointing it at the Wikipedia edit corpus, researchers at Jigsaw and the Wikimedia foundation are for the first time able to estimate abuse rates and curate a dataset of abusive incidents.  Then those researchers, and others, can use that dataset to study the pathologies and effects of Wikipedia trolls.</description>
      <enclosure length="22804931" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/312081585-linear-digressions-finding-and-studying-wikipedia-trolls.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/310922164</guid>
      <title>A Sprint Through What's New in Neural Networks</title>
      <pubDate>Mon, 06 Mar 2017 03:27:12 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/a-sprint-through-whats-new-in-neural-networks</link>
      <itunes:duration>00:16:56</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Advances in neural networks are moving fast enough that, even though it seems like we talk about them all the time around here, it also always seems like we're barely keeping up.  So this week we have another installment in our "neural nets: they so smart!" series, talking about three topics.  And all the topics this week were listener suggestions, too!</itunes:summary>
      <itunes:subtitle>Advances in neural networks are moving fast enoug…</itunes:subtitle>
      <description>Advances in neural networks are moving fast enough that, even though it seems like we talk about them all the time around here, it also always seems like we're barely keeping up.  So this week we have another installment in our "neural nets: they so smart!" series, talking about three topics.  And all the topics this week were listener suggestions, too!</description>
      <enclosure length="24399863" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/310922164-linear-digressions-a-sprint-through-whats-new-in-neural-networks.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/309721482</guid>
      <title>Stein's Paradox</title>
      <pubDate>Mon, 27 Feb 2017 02:51:41 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/steins-paradox</link>
      <itunes:duration>00:27:02</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When you're estimating something about some object that's a member of a larger group of similar objects (say, the batting average of a baseball player, who belongs to a baseball team), how should you estimate it: use measurements of the individual, or get some extra information from the group?  The James-Stein estimator tells you how to combine individual and group information make predictions that, taken over the whole group, are more accurate than if you treated each individual, well, individually.</itunes:summary>
      <itunes:subtitle>When you're estimating something about some objec…</itunes:subtitle>
      <description>When you're estimating something about some object that's a member of a larger group of similar objects (say, the batting average of a baseball player, who belongs to a baseball team), how should you estimate it: use measurements of the individual, or get some extra information from the group?  The James-Stein estimator tells you how to combine individual and group information make predictions that, taken over the whole group, are more accurate than if you treated each individual, well, individually.</description>
      <enclosure length="38944215" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/309721482-linear-digressions-steins-paradox.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/308598786</guid>
      <title>Empirical Bayes</title>
      <pubDate>Mon, 20 Feb 2017 03:30:06 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/empirical-bayes</link>
      <itunes:duration>00:18:57</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Say you're looking to use some Bayesian methods to estimate parameters of a system.  You've got the normalization figured out, and the likelihood, but the prior... what should you use for a prior?  Empirical Bayes has an elegant answer: look to your previous experience, and use past measurements as a starting point in your prior.

Scratching your head about some of those terms, and why they matter?  Lucky for you, you're standing in front of a podcast episode that unpacks all of this.</itunes:summary>
      <itunes:subtitle>Say you're looking to use some Bayesian methods t…</itunes:subtitle>
      <description>Say you're looking to use some Bayesian methods to estimate parameters of a system.  You've got the normalization figured out, and the likelihood, but the prior... what should you use for a prior?  Empirical Bayes has an elegant answer: look to your previous experience, and use past measurements as a starting point in your prior.

Scratching your head about some of those terms, and why they matter?  Lucky for you, you're standing in front of a podcast episode that unpacks all of this.</description>
      <enclosure length="27295066" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/308598786-linear-digressions-empirical-bayes.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/307453756</guid>
      <title>Endogenous Variables and Measuring Protest Effectiveness</title>
      <pubDate>Mon, 13 Feb 2017 03:31:00 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/endogenous-variables-and-measuring-protest-effectiveness</link>
      <itunes:duration>00:16:28</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Have you been out protesting lately, or watching the protests, and wondered how much effect they might have on lawmakers?  It's a tricky question to answer, since usually we need randomly distributed treatments (e.g. big protests) to understand causality, but there's no reason to believe that big protests are actually randomly distributed.  In other words, protest size is endogenous to legislative response, and understanding cause and effect is very challenging.

So, what to do?  Well, at least in the case of studying Tea Party protest effectiveness, researchers have used rainfall, of all things, to understand the impact of a big protest.  In other words, rainfall is the instrumental variable in this analysis that cracks the scientific case open.  What does rainfall have to do with protests?  Do protests actually matter?  What do we mean when we talk about endogenous and instrumental variables?  We wouldn't be very good podcasters if we answered all those questions here--you gotta listen to this episode to find out.</itunes:summary>
      <itunes:subtitle>Have you been out protesting lately, or watching …</itunes:subtitle>
      <description>Have you been out protesting lately, or watching the protests, and wondered how much effect they might have on lawmakers?  It's a tricky question to answer, since usually we need randomly distributed treatments (e.g. big protests) to understand causality, but there's no reason to believe that big protests are actually randomly distributed.  In other words, protest size is endogenous to legislative response, and understanding cause and effect is very challenging.

So, what to do?  Well, at least in the case of studying Tea Party protest effectiveness, researchers have used rainfall, of all things, to understand the impact of a big protest.  In other words, rainfall is the instrumental variable in this analysis that cracks the scientific case open.  What does rainfall have to do with protests?  Do protests actually matter?  What do we mean when we talk about endogenous and instrumental variables?  We wouldn't be very good podcasters if we answered all those questions here--you gotta listen to this episode to find out.</description>
      <enclosure length="23717753" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/307453756-linear-digressions-endogenous-variables-and-measuring-protest-effectiveness.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/306314257</guid>
      <title>Calibrated Models</title>
      <pubDate>Mon, 06 Feb 2017 01:56:12 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/calibrated-models</link>
      <itunes:duration>00:14:32</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Remember last week, when we were talking about how great the ROC curve is for evaluating models?  How things change...  This week, we're exploring calibrated risk models, because that's a kind of model that seems like it would benefit from some nice ROC analysis, but in fact the ROC AUC can steer you wrong there.</itunes:summary>
      <itunes:subtitle>Remember last week, when we were talking about ho…</itunes:subtitle>
      <description>Remember last week, when we were talking about how great the ROC curve is for evaluating models?  How things change...  This week, we're exploring calibrated risk models, because that's a kind of model that seems like it would benefit from some nice ROC analysis, but in fact the ROC AUC can steer you wrong there.</description>
      <enclosure length="20932891" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/306314257-linear-digressions-calibrated-models.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/305186244</guid>
      <title>Rock the ROC Curve</title>
      <pubDate>Mon, 30 Jan 2017 03:38:46 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/rock-the-roc-curve</link>
      <itunes:duration>00:15:52</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week: everybody's favorite WWII-era classifier metric!  But it's not just for winning wars, it's a fantastic go-to metric for all your classifier quality needs.</itunes:summary>
      <itunes:subtitle>This week: everybody's favorite WWII-era classifi…</itunes:subtitle>
      <description>This week: everybody's favorite WWII-era classifier metric!  But it's not just for winning wars, it's a fantastic go-to metric for all your classifier quality needs.</description>
      <enclosure length="22864490" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/305186244-linear-digressions-rock-the-roc-curve.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/304054755</guid>
      <title>Ensemble Algorithms</title>
      <pubDate>Mon, 23 Jan 2017 02:31:26 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/ensemble-algorithms</link>
      <itunes:duration>00:13:08</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If one machine learning model is good, are two models better?  In a lot of cases, the answer is yes.  If you build many ok models, and then bring them all together and use them in combination to make your final predictions, you've just created an ensemble model.  It feels a little bit like cheating, like you just got something for nothing, but the results don't like: algorithms like Random Forests and Gradient Boosting Trees (two types of ensemble algorithms) are some of the strongest out-of-the-box algorithms for classic supervised classification problems.  What makes a Random Forest random, and what does it mean to gradient boost a tree?  Have a listen and find out.</itunes:summary>
      <itunes:subtitle>If one machine learning model is good, are two mo…</itunes:subtitle>
      <description>If one machine learning model is good, are two models better?  In a lot of cases, the answer is yes.  If you build many ok models, and then bring them all together and use them in combination to make your final predictions, you've just created an ensemble model.  It feels a little bit like cheating, like you just got something for nothing, but the results don't like: algorithms like Random Forests and Gradient Boosting Trees (two types of ensemble algorithms) are some of the strongest out-of-the-box algorithms for classic supervised classification problems.  What makes a Random Forest random, and what does it mean to gradient boost a tree?  Have a listen and find out.</description>
      <enclosure length="18918537" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/304054755-linear-digressions-ensemble-algorithms.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/302897486</guid>
      <title>How to evaluate a translation: BLEU scores</title>
      <pubDate>Mon, 16 Jan 2017 01:59:01 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/how-to-evaluate-a-translation-bleu-scores</link>
      <itunes:duration>00:17:06</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>As anyone who's encountered a badly translated text could tell you, not all translations are created equal.  Some translations are smooth, fluent and sound like a poet wrote them; some are jerky, non-grammatical and awkward.  When a machine is doing the translating, it's awfully easy to end up with a robotic-sounding text; as the state of the art in machine translation improves, though, a natural question to ask is: according to what measure?  How do we quantify a "good" translation?

Enter the BLEU score, which is the standard metric for quantifying the quality of a machine translation.  BLEU rewards translations that have large overlap with human translations of sentences, with some extra heuristics thrown in to guard against weird pathologies (like full sentences getting translated as one word, redundancies, and repetition).  Nowadays, if there's a machine translation being evaluated or a new state-of-the-art system (like the Google neural machine translation we've discussed on this podcast before), chances are that there's a BLEU score going into that assessment.</itunes:summary>
      <itunes:subtitle>As anyone who's encountered a badly translated te…</itunes:subtitle>
      <description>As anyone who's encountered a badly translated text could tell you, not all translations are created equal.  Some translations are smooth, fluent and sound like a poet wrote them; some are jerky, non-grammatical and awkward.  When a machine is doing the translating, it's awfully easy to end up with a robotic-sounding text; as the state of the art in machine translation improves, though, a natural question to ask is: according to what measure?  How do we quantify a "good" translation?

Enter the BLEU score, which is the standard metric for quantifying the quality of a machine translation.  BLEU rewards translations that have large overlap with human translations of sentences, with some extra heuristics thrown in to guard against weird pathologies (like full sentences getting translated as one word, redundancies, and repetition).  Nowadays, if there's a machine translation being evaluated or a new state-of-the-art system (like the Google neural machine translation we've discussed on this podcast before), chances are that there's a BLEU score going into that assessment.</description>
      <enclosure length="24629322" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/302897486-linear-digressions-how-to-evaluate-a-translation-bleu-scores.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/301617561</guid>
      <title>Zero Shot Translation</title>
      <pubDate>Mon, 09 Jan 2017 03:20:57 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/zero-shot-translation</link>
      <itunes:duration>00:25:32</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Take Google-size data, the flexibility of a neural net, and all (well, most) of the languages of the world, and what you end up with is a pile of surprises.  This episode is about some interesting features of Google's new neural machine translation system, namely that with minimal tweaking, it can accommodate many different languages in a single neural net, that it can do a half-decent job of translating between language pairs it's never been explicitly trained on, and that it seems to have its own internal representation of concepts that's independent of the language those concepts are being represented in.  Intrigued?  You should be...</itunes:summary>
      <itunes:subtitle>Take Google-size data, the flexibility of a neura…</itunes:subtitle>
      <description>Take Google-size data, the flexibility of a neural net, and all (well, most) of the languages of the world, and what you end up with is a pile of surprises.  This episode is about some interesting features of Google's new neural machine translation system, namely that with minimal tweaking, it can accommodate many different languages in a single neural net, that it can do a half-decent job of translating between language pairs it's never been explicitly trained on, and that it seems to have its own internal representation of concepts that's independent of the language those concepts are being represented in.  Intrigued?  You should be...</description>
      <enclosure length="36766230" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/301617561-linear-digressions-zero-shot-translation.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/300514154</guid>
      <title>Google Neural Machine Translation</title>
      <pubDate>Mon, 02 Jan 2017 01:44:23 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/google-neural-machine-translation</link>
      <itunes:duration>00:18:12</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Recently, Google swapped out the backend for Google Translate, moving from a statistical phrase-based method to a recurrent neural network.  This marks a big change in methodology: the tried-and-true statistical translation methods that have been in use for decades are giving way to a neural net that, across the board, appears to be giving more fluent and natural-sounding translations.  This episode recaps statistical phrase-based methods, digs into the RNN architecture a little bit, and recaps the impressive results that is making us all sound a little better in our non-native languages.</itunes:summary>
      <itunes:subtitle>Recently, Google swapped out the backend for Goog…</itunes:subtitle>
      <description>Recently, Google swapped out the backend for Google Translate, moving from a statistical phrase-based method to a recurrent neural network.  This marks a big change in methodology: the tried-and-true statistical translation methods that have been in use for decades are giving way to a neural net that, across the board, appears to be giving more fluent and natural-sounding translations.  This episode recaps statistical phrase-based methods, digs into the RNN architecture a little bit, and recaps the impressive results that is making us all sound a little better in our non-native languages.</description>
      <enclosure length="26206700" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/300514154-linear-digressions-google-neural-machine-translation.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/299603370</guid>
      <title>Data and the Future of Medicine : Interview with Precision Medicine Initiative researcher Matt Might</title>
      <pubDate>Mon, 26 Dec 2016 01:19:24 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-and-the-future-of-medicine-interview-with-precision-medicine-initiative-researcher-matt-might</link>
      <itunes:duration>00:34:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Today we are delighted to bring you an interview with Matt Might, computer scientist and medical researcher extraordinaire and architect of President Obama's Precision Medicine Initiative.  As the Obama Administration winds down, we're talking with Matt about the goals and accomplishments of precision medicine (and related projects like the Cancer Moonshot) and what he foresees as the future marriage of data and medicine.  Many thanks to Matt, our friends over at Partially Derivative (hi, Jonathon!) and the White House for arranging this opportunity to chat.  Enjoy!</itunes:summary>
      <itunes:subtitle>Today we are delighted to bring you an interview …</itunes:subtitle>
      <description>Today we are delighted to bring you an interview with Matt Might, computer scientist and medical researcher extraordinaire and architect of President Obama's Precision Medicine Initiative.  As the Obama Administration winds down, we're talking with Matt about the goals and accomplishments of precision medicine (and related projects like the Cancer Moonshot) and what he foresees as the future marriage of data and medicine.  Many thanks to Matt, our friends over at Partially Derivative (hi, Jonathon!) and the White House for arranging this opportunity to chat.  Enjoy!</description>
      <enclosure length="50259230" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/299603370-linear-digressions-data-and-the-future-of-medicine-interview-with-precision-medicine-initiative-researcher-matt-might.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/298492727</guid>
      <title>Special Crossover Episode: Partially Derivative interview with White House Data Scientist DJ Patil</title>
      <pubDate>Sun, 18 Dec 2016 17:53:52 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/special-crossover-episode-partially-derivative-interview-with-white-house-data-scientist-dj-patil</link>
      <itunes:duration>00:46:09</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We have the pleasure of bringing you a very special crossover episode this week: our friends at Partially Derivative (another great podcast about data science, you should check it out) recently interviewed White House Chief Data Scientist DJ Patil.  We think DJ's message about the importance and impact of data science is worth spreading, so it's our pleasure to bring it to you today.  A huge thanks to Jonathon Morgan and Partially Derivative for sharing this interview with us--enjoy!

Relevant links:
http://partiallyderivative.com/podcast/2016/12/13/dj-patil</itunes:summary>
      <itunes:subtitle>We have the pleasure of bringing you a very speci…</itunes:subtitle>
      <description>We have the pleasure of bringing you a very special crossover episode this week: our friends at Partially Derivative (another great podcast about data science, you should check it out) recently interviewed White House Chief Data Scientist DJ Patil.  We think DJ's message about the importance and impact of data science is worth spreading, so it's our pleasure to bring it to you today.  A huge thanks to Jonathon Morgan and Partially Derivative for sharing this interview with us--enjoy!

Relevant links:
http://partiallyderivative.com/podcast/2016/12/13/dj-patil</description>
      <enclosure length="66464320" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/298492727-linear-digressions-special-crossover-episode-partially-derivative-interview-with-white-house-data-scientist-dj-patil.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/297483753</guid>
      <title>How to Lose at Kaggle</title>
      <pubDate>Mon, 12 Dec 2016 04:28:57 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/how-to-lose-at-kaggle</link>
      <itunes:duration>00:17:16</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Competing in a machine learning competition on Kaggle is a kind of rite of passage for data scientists.  Losing unexpectedly at the very end of the contest is also something that a lot of us have experienced.  It's not just bad luck: a very specific combination of overfitting on popular competitions can take someone who is in the top few spots in the final days of a contest and bump them down hundreds of slots in the final tally.</itunes:summary>
      <itunes:subtitle>Competing in a machine learning competition on Ka…</itunes:subtitle>
      <description>Competing in a machine learning competition on Kaggle is a kind of rite of passage for data scientists.  Losing unexpectedly at the very end of the contest is also something that a lot of us have experienced.  It's not just bad luck: a very specific combination of overfitting on popular competitions can take someone who is in the top few spots in the final days of a contest and bump them down hundreds of slots in the final tally.</description>
      <enclosure length="24871321" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/297483753-linear-digressions-how-to-lose-at-kaggle.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/296242241</guid>
      <title>Attacking Discrimination in Machine Learning</title>
      <pubDate>Mon, 05 Dec 2016 03:38:54 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/attacking-discrimination-in-machine-learning</link>
      <itunes:duration>00:23:20</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Imagine there's an important decision to be made about someone, like a bank deciding whether to extend a loan, or a school deciding to admit a student--unfortunately, we're all too aware that discrimination can sneak into these situations (even when everyone is acting with the best of intentions!).  Now, these decisions are often made with the assistance of machine learning and statistical models, but unfortunately these algorithms pick up on the discrimination in the world (it sneaks in through the data, which can capture inequities, which the algorithms then learn) and reproduce it.

This podcast covers some of the most common ways we can try to minimize discrimination, and why none of those ways is perfect at fixing the problem.  Then we'll get to a new idea called "equality of opportunity," which came out of Google recently and takes a pretty practical and well-aimed approach to machine learning bias.</itunes:summary>
      <itunes:subtitle>Imagine there's an important decision to be made …</itunes:subtitle>
      <description>Imagine there's an important decision to be made about someone, like a bank deciding whether to extend a loan, or a school deciding to admit a student--unfortunately, we're all too aware that discrimination can sneak into these situations (even when everyone is acting with the best of intentions!).  Now, these decisions are often made with the assistance of machine learning and statistical models, but unfortunately these algorithms pick up on the discrimination in the world (it sneaks in through the data, which can capture inequities, which the algorithms then learn) and reproduce it.

This podcast covers some of the most common ways we can try to minimize discrimination, and why none of those ways is perfect at fixing the problem.  Then we'll get to a new idea called "equality of opportunity," which came out of Google recently and takes a pretty practical and well-aimed approach to machine learning bias.</description>
      <enclosure length="33607713" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/296242241-linear-digressions-attacking-discrimination-in-machine-learning.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/295082681</guid>
      <title>Recurrent Neural Nets</title>
      <pubDate>Mon, 28 Nov 2016 02:47:44 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/recurrent-neural-nets</link>
      <itunes:duration>00:12:36</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This week, we're doing a crash course in recurrent neural networks--what the structural pieces are that make a neural net recurrent, how that structure helps RNNs solve certain time series problems, and the importance of forgetfulness in RNNs.  

Relevant links: 
http://colah.github.io/posts/2015-08-Understanding-LSTMs/</itunes:summary>
      <itunes:subtitle>This week, we're doing a crash course in recurren…</itunes:subtitle>
      <description>This week, we're doing a crash course in recurrent neural networks--what the structural pieces are that make a neural net recurrent, how that structure helps RNNs solve certain time series problems, and the importance of forgetfulness in RNNs.  

Relevant links: 
http://colah.github.io/posts/2015-08-Understanding-LSTMs/</description>
      <enclosure length="18157433" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/295082681-linear-digressions-recurrent-neural-nets.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/294007486</guid>
      <title>Stealing a PIN with signal processing and machine learning</title>
      <pubDate>Mon, 21 Nov 2016 02:32:21 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/stealing-a-pin-with-signal-processing-and-machine-learning</link>
      <itunes:duration>00:16:55</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Want another reason to be paranoid when using the free coffee shop wifi?  Allow us to introduce WindTalker, a system that cleverly combines a dose of signal processing with a dash of machine learning to (potentially) steal the PIN from your phone transactions without ever having physical access to your phone.  This episode has it all, folks--channel state information, ICMP echo requests, low-pass filtering, PCA, dynamic time warps, and the PIN for your phone.</itunes:summary>
      <itunes:subtitle>Want another reason to be paranoid when using the…</itunes:subtitle>
      <description>Want another reason to be paranoid when using the free coffee shop wifi?  Allow us to introduce WindTalker, a system that cleverly combines a dose of signal processing with a dash of machine learning to (potentially) steal the PIN from your phone transactions without ever having physical access to your phone.  This episode has it all, folks--channel state information, ICMP echo requests, low-pass filtering, PCA, dynamic time warps, and the PIN for your phone.</description>
      <enclosure length="24378547" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/294007486-linear-digressions-stealing-a-pin-with-signal-processing-and-machine-learning.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/292925887</guid>
      <title>Neural Net Cryptography</title>
      <pubDate>Mon, 14 Nov 2016 04:06:57 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/neural-net-cryptography</link>
      <itunes:duration>00:16:16</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Cryptography used to be the domain of information theorists and spies.  There's a new player now: neural networks.  Given the task of communicating securely, neural networks are inventing new encryption methods that, as best we can tell, are unlike anything humans have ever seen before.

Relevant links:
http://arstechnica.co.uk/information-technology/2016/10/google-ai-neural-network-cryptography/
https://arxiv.org/pdf/1610.06918v1.pdf</itunes:summary>
      <itunes:subtitle>Cryptography used to be the domain of information…</itunes:subtitle>
      <description>Cryptography used to be the domain of information theorists and spies.  There's a new player now: neural networks.  Given the task of communicating securely, neural networks are inventing new encryption methods that, as best we can tell, are unlike anything humans have ever seen before.

Relevant links:
http://arstechnica.co.uk/information-technology/2016/10/google-ai-neural-network-cryptography/
https://arxiv.org/pdf/1610.06918v1.pdf</description>
      <enclosure length="23435631" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/292925887-linear-digressions-neural-net-cryptography.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/291861013</guid>
      <title>Deep Blue</title>
      <pubDate>Mon, 07 Nov 2016 04:20:48 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/deep-blue</link>
      <itunes:duration>00:20:05</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In 1997, Deep Blue was the IBM algorithm/computer that did what no one, at the time, though possible: it beat the world's best chess player.  It turns out, though, that one of the most important moves in the matchup, where Deep Blue psyched out its opponent with a weird move, might not have been so inspired after all.  It might have been nothing more than a bug in the program, and it changed computer science history.

Relevant links:
https://www.wired.com/2012/09/deep-blue-computer-bug/</itunes:summary>
      <itunes:subtitle>In 1997, Deep Blue was the IBM algorithm/computer…</itunes:subtitle>
      <description>In 1997, Deep Blue was the IBM algorithm/computer that did what no one, at the time, though possible: it beat the world's best chess player.  It turns out, though, that one of the most important moves in the matchup, where Deep Blue psyched out its opponent with a weird move, might not have been so inspired after all.  It might have been nothing more than a bug in the program, and it changed computer science history.

Relevant links:
https://www.wired.com/2012/09/deep-blue-computer-bug/</description>
      <enclosure length="28939526" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/291861013-linear-digressions-deep-blue.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/290742629</guid>
      <title>Organizing Google's Datasets</title>
      <pubDate>Mon, 31 Oct 2016 02:17:26 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/organizing-googles-datasets</link>
      <itunes:duration>00:15:00</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you're a data scientist, there's a good chance you're used to working with a lot of data.  But there's a lot of data, and then there's Google-scale amounts of data.  Keeping all that data organized is a Google-sized task, and as it happens, they've built a system for that organizational challenge.  This episode is all about that system, called Goods, and in particular we'll dig into some of the details of what makes this so tough.

Relevant links: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45390.pdf</itunes:summary>
      <itunes:subtitle>If you're a data scientist, there's a good chance…</itunes:subtitle>
      <description>If you're a data scientist, there's a good chance you're used to working with a lot of data.  But there's a lot of data, and then there's Google-scale amounts of data.  Keeping all that data organized is a Google-sized task, and as it happens, they've built a system for that organizational challenge.  This episode is all about that system, called Goods, and in particular we'll dig into some of the details of what makes this so tough.

Relevant links: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45390.pdf</description>
      <enclosure length="21598700" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/290742629-linear-digressions-organizing-googles-datasets.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/289664543</guid>
      <title>Fighting Cancer with Data Science: Followup</title>
      <pubDate>Mon, 24 Oct 2016 01:58:41 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/fighting-cancer-with-data-science-followup</link>
      <itunes:duration>00:25:48</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>A few months ago, Katie started on a project for the Vice President's Cancer Moonshot surrounding how data can be used to better fight cancer.  The project is all wrapped up now, so we wanted to tell you about how that work went and what changes to cancer data policy were suggested to the Vice President.

See lineardigressions.com for links to the reports discussed on this episode.</itunes:summary>
      <itunes:subtitle>A few months ago, Katie started on a project for …</itunes:subtitle>
      <description>A few months ago, Katie started on a project for the Vice President's Cancer Moonshot surrounding how data can be used to better fight cancer.  The project is all wrapped up now, so we wanted to tell you about how that work went and what changes to cancer data policy were suggested to the Vice President.

See lineardigressions.com for links to the reports discussed on this episode.</description>
      <enclosure length="37169979" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/289664543-linear-digressions-fighting-cancer-with-data-science-followup.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/288496878</guid>
      <title>The 19-year-old determining the US election</title>
      <pubDate>Mon, 17 Oct 2016 01:01:23 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-19-year-old-determining-the-us-election</link>
      <itunes:duration>00:12:28</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Sick of the presidential election yet?  We are too, but there's still almost a month to go, so let's just embrace it together.  This week, we'll talk about one of the presidential polls, which has been kind of an outlier for quite a while.  This week, the NY Times took a closer look at this poll, and was able to figure out the reason it's such an outlier.  It all goes back to a 19-year-old African American man, living in Illinois, who really likes Donald Trump...

Relevant Links:
http://www.nytimes.com/2016/10/13/upshot/how-one-19-year-old-illinois-man-is-distorting-national-polling-averages.html

followup article from LA Times, released after recording:
http://www.latimes.com/politics/la-na-pol-daybreak-poll-questions-20161013-snap-story.html</itunes:summary>
      <itunes:subtitle>Sick of the presidential election yet?  We are to…</itunes:subtitle>
      <description>Sick of the presidential election yet?  We are too, but there's still almost a month to go, so let's just embrace it together.  This week, we'll talk about one of the presidential polls, which has been kind of an outlier for quite a while.  This week, the NY Times took a closer look at this poll, and was able to figure out the reason it's such an outlier.  It all goes back to a 19-year-old African American man, living in Illinois, who really likes Donald Trump...

Relevant Links:
http://www.nytimes.com/2016/10/13/upshot/how-one-19-year-old-illinois-man-is-distorting-national-polling-averages.html

followup article from LA Times, released after recording:
http://www.latimes.com/politics/la-na-pol-daybreak-poll-questions-20161013-snap-story.html</description>
      <enclosure length="17963709" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/288496878-linear-digressions-the-19-year-old-determining-the-us-election.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/286906837</guid>
      <title>How to Steal a Model</title>
      <pubDate>Sun, 09 Oct 2016 22:57:24 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/how-to-steal-a-model</link>
      <itunes:duration>00:13:36</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>What does it mean to steal a model?  It means someone (the thief, presumably) can re-create the predictions of the model without having access to the algorithm itself, or the training data.  Sound far-fetched?  It isn't.  If that person can ask for predictions from the model, and he (or she) asks just the right questions, the model can be reverse-engineered right out from under you.  

Relevant links:
https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf</itunes:summary>
      <itunes:subtitle>What does it mean to steal a model?  It means som…</itunes:subtitle>
      <description>What does it mean to steal a model?  It means someone (the thief, presumably) can re-create the predictions of the model without having access to the algorithm itself, or the training data.  Sound far-fetched?  It isn't.  If that person can ask for predictions from the model, and he (or she) asks just the right questions, the model can be reverse-engineered right out from under you.  

Relevant links:
https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf</description>
      <enclosure length="19604408" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/286906837-linear-digressions-how-to-steal-a-model.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/285783592</guid>
      <title>Regularization</title>
      <pubDate>Mon, 03 Oct 2016 02:13:50 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/regularization</link>
      <itunes:duration>00:17:27</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Lots of data is usually seen as a good thing.  And it is a good thing--except when it's not.  In a lot of fields, a problem arises when you have many, many features, especially if there's a somewhat smaller number of cases to learn from; supervised machine learning algorithms break, or learn spurious or un-interpretable patterns.  What to do?  Regularization can be one of your best friends here--it's a method that penalizes overly complex models, which keeps the dimensionality of your model under control.</itunes:summary>
      <itunes:subtitle>Lots of data is usually seen as a good thing.  An…</itunes:subtitle>
      <description>Lots of data is usually seen as a good thing.  And it is a good thing--except when it's not.  In a lot of fields, a problem arises when you have many, many features, especially if there's a somewhat smaller number of cases to learn from; supervised machine learning algorithms break, or learn spurious or un-interpretable patterns.  What to do?  Regularization can be one of your best friends here--it's a method that penalizes overly complex models, which keeps the dimensionality of your model under control.</description>
      <enclosure length="25132754" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/285783592-linear-digressions-regularization.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/284650055</guid>
      <title>The Cold Start Problem</title>
      <pubDate>Mon, 26 Sep 2016 02:24:38 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-cold-start-problem</link>
      <itunes:duration>00:15:37</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>You might sometimes find that it's hard to get started doing something, but once you're going, it gets easier.  Turns out machine learning algorithms, and especially recommendation engines, feel the same way.  The more they "know" about a user, like what movies they watch and how they rate them, the better they do at suggesting new movies, which is great until you realize that you have to start somewhere.  The "cold start" problem will be our focus in this episode, both the heuristic solutions that help deal with it and a bit of realism about the importance of skepticism when someone claims a great solution to cold starts.

Relevant links:
http://repository.upenn.edu/cgi/viewcontent.cgi?article=1141&amp;context=cis_papers</itunes:summary>
      <itunes:subtitle>You might sometimes find that it's hard to get st…</itunes:subtitle>
      <description>You might sometimes find that it's hard to get started doing something, but once you're going, it gets easier.  Turns out machine learning algorithms, and especially recommendation engines, feel the same way.  The more they "know" about a user, like what movies they watch and how they rate them, the better they do at suggesting new movies, which is great until you realize that you have to start somewhere.  The "cold start" problem will be our focus in this episode, both the heuristic solutions that help deal with it and a bit of realism about the importance of skepticism when someone claims a great solution to cold starts.

Relevant links:
http://repository.upenn.edu/cgi/viewcontent.cgi?article=1141&amp;context=cis_papers</description>
      <enclosure length="22487699" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/284650055-linear-digressions-the-cold-start-problem.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/283568567</guid>
      <title>Open Source Software for Data Science</title>
      <pubDate>Mon, 19 Sep 2016 04:27:40 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/open-source-software-for-data-science</link>
      <itunes:duration>00:20:05</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If you work in tech, software or data science, there's an excellent chance you use tools that are built upon open source software.  This is software that's built and distributed not for a profit, but because everyone benefits when we work together and share tools.  Tim Head of scikit-optimize chats with us further about what it's like to maintain an open source library, how to get involved in open source, and why people like him need people like you to make it all work.</itunes:summary>
      <itunes:subtitle>If you work in tech, software or data science, th…</itunes:subtitle>
      <description>If you work in tech, software or data science, there's an excellent chance you use tools that are built upon open source software.  This is software that's built and distributed not for a profit, but because everyone benefits when we work together and share tools.  Tim Head of scikit-optimize chats with us further about what it's like to maintain an open source library, how to get involved in open source, and why people like him need people like you to make it all work.</description>
      <enclosure length="28920718" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/283568567-linear-digressions-open-source-software-for-data-science.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/282494421</guid>
      <title>Scikit + Optimization = Scikit-Optimize</title>
      <pubDate>Mon, 12 Sep 2016 01:54:59 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/scikit-optimization-scikit-optimize</link>
      <itunes:duration>00:15:41</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We're excited to welcome a guest, Tim Head, who is one of the maintainers of the scikit-optimize package.  With all the talk about optimization lately, it felt appropriate to get in a few words with someone who's out there making it happen for python.  

Relevant links:
https://scikit-optimize.github.io/
http://www.wildtreetech.com/</itunes:summary>
      <itunes:subtitle>We're excited to welcome a guest, Tim Head, who i…</itunes:subtitle>
      <description>We're excited to welcome a guest, Tim Head, who is one of the maintainers of the scikit-optimize package.  With all the talk about optimization lately, it felt appropriate to get in a few words with someone who's out there making it happen for python.  

Relevant links:
https://scikit-optimize.github.io/
http://www.wildtreetech.com/</description>
      <enclosure length="22598668" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/282494421-linear-digressions-scikit-optimization-scikit-optimize.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/281415661</guid>
      <title>Two Cultures: Machine Learning and Statistics</title>
      <pubDate>Mon, 05 Sep 2016 01:50:05 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/two-cultures-of-statistics</link>
      <itunes:duration>00:17:29</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It's a funny thing to realize, but data science modeling is usually about either explainability, interpretation and understanding, or it's about predictive accuracy.  But usually not both--optimizing for one tends to compromise the other.   Leo Breiman was one of the titans of both kinds of modeling, a statistician who helped bring machine learning into statistics and vice versa.  In this episode, we unpack one of his seminal papers from 2001, when machine learning was just beginning to take root, and talk about how he made clear what machine learning could do for statistics and why it's so important.

Relevant links:
http://www.math.snu.ac.kr/~hichoi/machinelearning/(Breiman)%20Statistical%20Modeling--The%20Two%20Cultures.pdf</itunes:summary>
      <itunes:subtitle>It's a funny thing to realize, but data science m…</itunes:subtitle>
      <description>It's a funny thing to realize, but data science modeling is usually about either explainability, interpretation and understanding, or it's about predictive accuracy.  But usually not both--optimizing for one tends to compromise the other.   Leo Breiman was one of the titans of both kinds of modeling, a statistician who helped bring machine learning into statistics and vice versa.  In this episode, we unpack one of his seminal papers from 2001, when machine learning was just beginning to take root, and talk about how he made clear what machine learning could do for statistics and why it's so important.

Relevant links:
http://www.math.snu.ac.kr/~hichoi/machinelearning/(Breiman)%20Statistical%20Modeling--The%20Two%20Cultures.pdf</description>
      <enclosure length="25184163" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/281415661-linear-digressions-two-cultures-of-statistics.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/280367844</guid>
      <title>Optimization Solutions</title>
      <pubDate>Mon, 29 Aug 2016 02:01:42 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/optimization-solutions</link>
      <itunes:duration>00:20:07</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>You've got an optimization problem to solve, and a less-than-forever amount of time in which to solve it.  What do?  Use a heuristic optimization algorithm, like a hill climber or simulated annealing--we cover both in this episode!

Relevant link:
http://www.lizsander.com/programming/2015/08/04/Heuristic-Search-Algorithms.html</itunes:summary>
      <itunes:subtitle>You've got an optimization problem to solve, and …</itunes:subtitle>
      <description>You've got an optimization problem to solve, and a less-than-forever amount of time in which to solve it.  What do?  Use a heuristic optimization algorithm, like a hill climber or simulated annealing--we cover both in this episode!

Relevant link:
http://www.lizsander.com/programming/2015/08/04/Heuristic-Search-Algorithms.html</description>
      <enclosure length="28967112" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/280367844-linear-digressions-optimization-solutions.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/279300811</guid>
      <title>Optimization Problems</title>
      <pubDate>Mon, 22 Aug 2016 00:25:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/optimization-problems</link>
      <itunes:duration>00:17:50</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>If modeling is about predicting the unknown, optimization tries to answer the question of what to do, what decision to make, to get the best results out of a given situation.  Sometimes that's straightforward, but sometimes... not so much.  What makes an optimization problem easy or hard, and what are some of the methods for finding optimal solutions to problems?  Glad you asked!  May we recommend our latest podcast episode to you?</itunes:summary>
      <itunes:subtitle>If modeling is about predicting the unknown, opti…</itunes:subtitle>
      <description>If modeling is about predicting the unknown, optimization tries to answer the question of what to do, what decision to make, to get the best results out of a given situation.  Sometimes that's straightforward, but sometimes... not so much.  What makes an optimization problem easy or hard, and what are some of the methods for finding optimal solutions to problems?  Glad you asked!  May we recommend our latest podcast episode to you?</description>
      <enclosure length="25695118" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/279300811-linear-digressions-optimization-problems.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/278290747</guid>
      <title>Multi-level modeling for understanding DEADLY RADIOACTIVE GAS</title>
      <pubDate>Mon, 15 Aug 2016 01:49:42 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/multi-level-modeling-for-understanding-deadly-radioactive-gas</link>
      <itunes:duration>00:23:34</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Ok, this episode is only sort of about DEADLY RADIOACTIVE GAS.  It's mostly about multilevel modeling, which is a way of building models with data that has distinct, related subgroups within it.  What are multilevel models used for?  Elections (we can't get enough of 'em these days), understanding the effect that a good teacher can have on their students, and DEADLY RADIOACTIVE GAS.

Relevant links:
http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf</itunes:summary>
      <itunes:subtitle>Ok, this episode is only sort of about DEADLY RAD…</itunes:subtitle>
      <description>Ok, this episode is only sort of about DEADLY RADIOACTIVE GAS.  It's mostly about multilevel modeling, which is a way of building models with data that has distinct, related subgroups within it.  What are multilevel models used for?  Elections (we can't get enough of 'em these days), understanding the effect that a good teacher can have on their students, and DEADLY RADIOACTIVE GAS.

Relevant links:
http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf</description>
      <enclosure length="33945006" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/278290747-linear-digressions-multi-level-modeling-for-understanding-deadly-radioactive-gas.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/277286802</guid>
      <title>How Polls Got Brexit "Wrong"</title>
      <pubDate>Mon, 08 Aug 2016 01:37:18 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/how-polls-got-brexit-wrong</link>
      <itunes:duration>00:15:14</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Continuing the discussion of how polls do (and sometimes don't) tell us what to expect in upcoming elections--let's take a concrete example from the recent past, shall we?  The Brexit referendum was, by and large, expected to shake out for "remain", but when the votes were counted, "leave" came out ahead.  Everyone was shocked (SHOCKED!) but maybe the polls weren't as wrong as the pundits like to claim.

Relevant links:
http://www.slate.com/articles/news_and_politics/moneybox/2016/07/why_political_betting_markets_are_failing.html
http://andrewgelman.com/2016/06/24/brexit-polling-what-went-wrong/</itunes:summary>
      <itunes:subtitle>Continuing the discussion of how polls do (and so…</itunes:subtitle>
      <description>Continuing the discussion of how polls do (and sometimes don't) tell us what to expect in upcoming elections--let's take a concrete example from the recent past, shall we?  The Brexit referendum was, by and large, expected to shake out for "remain", but when the votes were counted, "leave" came out ahead.  Everyone was shocked (SHOCKED!) but maybe the polls weren't as wrong as the pundits like to claim.

Relevant links:
http://www.slate.com/articles/news_and_politics/moneybox/2016/07/why_political_betting_markets_are_failing.html
http://andrewgelman.com/2016/06/24/brexit-polling-what-went-wrong/</description>
      <enclosure length="21936620" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/277286802-linear-digressions-how-polls-got-brexit-wrong.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/276264888</guid>
      <title>Election Forecasting</title>
      <pubDate>Mon, 01 Aug 2016 02:40:35 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/election-forecasting</link>
      <itunes:duration>00:28:59</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Not sure if you heard, but there's an election going on right now.  Polls, surveys, and projections about, as far as the eye can see.  How to make sense of it all?  How are the projections made?  Which are some good ones to follow?  We'll be your trusty guides through a crash course in election forecasting.

Relevant links:
http://www.wired.com/2016/06/civis-election-polling-clinton-sanders-trump/
http://election.princeton.edu/
http://projects.fivethirtyeight.com/2016-election-forecast/
http://www.nytimes.com/interactive/2016/upshot/presidential-polls-forecast.html?rref=collection%2Fsectioncollection%2Fupshot&amp;action=click&amp;contentCollection=upshot&amp;region=rank&amp;module=package&amp;version=highlights&amp;contentPlacement=5&amp;pgtype=sectionfront</itunes:summary>
      <itunes:subtitle>Not sure if you heard, but there's an election go…</itunes:subtitle>
      <description>Not sure if you heard, but there's an election going on right now.  Polls, surveys, and projections about, as far as the eye can see.  How to make sense of it all?  How are the projections made?  Which are some good ones to follow?  We'll be your trusty guides through a crash course in election forecasting.

Relevant links:
http://www.wired.com/2016/06/civis-election-polling-clinton-sanders-trump/
http://election.princeton.edu/
http://projects.fivethirtyeight.com/2016-election-forecast/
http://www.nytimes.com/interactive/2016/upshot/presidential-polls-forecast.html?rref=collection%2Fsectioncollection%2Fupshot&amp;action=click&amp;contentCollection=upshot&amp;region=rank&amp;module=package&amp;version=highlights&amp;contentPlacement=5&amp;pgtype=sectionfront</description>
      <enclosure length="41739735" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/276264888-linear-digressions-election-forecasting.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/275215540</guid>
      <title>Machine Learning for Genomics</title>
      <pubDate>Mon, 25 Jul 2016 02:14:47 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/machine-learning-for-genomics</link>
      <itunes:duration>00:20:22</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Genomics data is some of the biggest #bigdata, and doing machine learning on it is unlocking new ways of thinking about evolution, genomic diseases like cancer, and what really makes each of us different for everyone else.  This episode touches on some of the things that make machine learning on genomics data so challenging, and the algorithms designed to do it anyway.</itunes:summary>
      <itunes:subtitle>Genomics data is some of the biggest #bigdata, an…</itunes:subtitle>
      <description>Genomics data is some of the biggest #bigdata, and doing machine learning on it is unlocking new ways of thinking about evolution, genomic diseases like cancer, and what really makes each of us different for everyone else.  This episode touches on some of the things that make machine learning on genomics data so challenging, and the algorithms designed to do it anyway.</description>
      <enclosure length="29324467" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/275215540-linear-digressions-machine-learning-for-genomics.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/274163734</guid>
      <title>Climate Modeling</title>
      <pubDate>Mon, 18 Jul 2016 02:26:02 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/climate-modeling</link>
      <itunes:duration>00:19:49</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Hot enough for you?  Climate models suggest that it's only going to get warmer in the coming years.  This episode unpacks those models, so you understand how they work.  

A lot of the episodes we do are about fun studies we hear about, like "if you're interested, this is kinda cool"--this episode is much more important than that.  Understanding these models, and taking action on them where appropriate, will have huge implications in the years to come.

Relevant links:
https://climatesight.org/</itunes:summary>
      <itunes:subtitle>Hot enough for you?  Climate models suggest that …</itunes:subtitle>
      <description>Hot enough for you?  Climate models suggest that it's only going to get warmer in the coming years.  This episode unpacks those models, so you understand how they work.  

A lot of the episodes we do are about fun studies we hear about, like "if you're interested, this is kinda cool"--this episode is much more important than that.  Understanding these models, and taking action on them where appropriate, will have huge implications in the years to come.

Relevant links:
https://climatesight.org/</description>
      <enclosure length="28551451" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/274163734-linear-digressions-climate-modeling.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/273094889</guid>
      <title>Reinforcement Learning Gone Wrong</title>
      <pubDate>Mon, 11 Jul 2016 02:42:49 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/reinforcement-learning-gone-wrong</link>
      <itunes:duration>00:28:16</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Last week’s episode on artificial intelligence gets a huge payoff this week—we’ll explore a wonderful couple of papers about all the ways that artificial intelligence can go wrong.  Malevolent actors?  You bet.  Collateral damage?  Of course.  Reward hacking?  Naturally!  It’s fun to think about, and the discussion starting now will have reverberations for decades to come.

https://www.technologyreview.com/s/601519/how-to-create-a-malevolent-artificial-intelligence/
http://arxiv.org/abs/1605.02817
https://arxiv.org/abs/1606.06565</itunes:summary>
      <itunes:subtitle>Last week’s episode on artificial intelligence ge…</itunes:subtitle>
      <description>Last week’s episode on artificial intelligence gets a huge payoff this week—we’ll explore a wonderful couple of papers about all the ways that artificial intelligence can go wrong.  Malevolent actors?  You bet.  Collateral damage?  Of course.  Reward hacking?  Naturally!  It’s fun to think about, and the discussion starting now will have reverberations for decades to come.

https://www.technologyreview.com/s/601519/how-to-create-a-malevolent-artificial-intelligence/
http://arxiv.org/abs/1605.02817
https://arxiv.org/abs/1606.06565</description>
      <enclosure length="54272868" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/273094889-linear-digressions-reinforcement-learning-gone-wrong.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/272013221</guid>
      <title>Reinforcement Learning for Artificial Intelligence</title>
      <pubDate>Sun, 03 Jul 2016 18:28:57 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/reinforcement-learning-for-artificial-intelligence</link>
      <itunes:duration>00:18:30</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>There’s a ton of excitement about reinforcement learning, a form of semi-supervised machine learning that underpins a lot of today’s cutting-edge artificial intelligence algorithms.  Here’s a crash course in the algorithmic machinery behind AlphaGo, and self-driving cars, and major logistical optimization projects—and the robots that, tomorrow, will clean our houses and (hopefully) not take over the world…</itunes:summary>
      <itunes:subtitle>There’s a ton of excitement about reinforcement l…</itunes:subtitle>
      <description>There’s a ton of excitement about reinforcement learning, a form of semi-supervised machine learning that underpins a lot of today’s cutting-edge artificial intelligence algorithms.  Here’s a crash course in the algorithmic machinery behind AlphaGo, and self-driving cars, and major logistical optimization projects—and the robots that, tomorrow, will clean our houses and (hopefully) not take over the world…</description>
      <enclosure length="35131406" type="audio/x-m4a" url="https://feeds.soundcloud.com/stream/272013221-linear-digressions-reinforcement-learning-for-artificial-intelligence.m4a"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/271005019</guid>
      <title>Differential Privacy: how to study people without being weird and gross</title>
      <pubDate>Mon, 27 Jun 2016 01:53:13 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/differential-privacy-how-to-study-people-without-being-weird-and-gross</link>
      <itunes:duration>00:18:17</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Apple wants to study iPhone users' activities and use it to improve performance.  Google collects data on what people are doing online to try to improve their Chrome browser.  Do you like the idea of this data being collected?  Maybe not, if it's being collected on you--but you probably also realize that there is some benefit to be had from the improved iPhones and web browsers.  Differential privacy is a set of policies that walks the line between individual privacy and better data, including even some old-school tricks that scientists use to get people to answer embarrassing questions honestly.  

Relevant links: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42852.pdf</itunes:summary>
      <itunes:subtitle>Apple wants to study iPhone users' activities and…</itunes:subtitle>
      <description>Apple wants to study iPhone users' activities and use it to improve performance.  Google collects data on what people are doing online to try to improve their Chrome browser.  Do you like the idea of this data being collected?  Maybe not, if it's being collected on you--but you probably also realize that there is some benefit to be had from the improved iPhones and web browsers.  Differential privacy is a set of policies that walks the line between individual privacy and better data, including even some old-school tricks that scientists use to get people to answer embarrassing questions honestly.  

Relevant links: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42852.pdf</description>
      <enclosure length="26344627" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/271005019-linear-digressions-differential-privacy-how-to-study-people-without-being-weird-and-gross.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/269958489</guid>
      <title>How the sausage gets made</title>
      <pubDate>Mon, 20 Jun 2016 02:25:23 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/how-the-sausage-gets-made</link>
      <itunes:duration>00:29:13</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Something a little different in this episode--we'll be talking about the technical plumbing that gets our podcast from our brains to your ears.  As it turns out, it's a multi-step bucket brigade process of RSS feeds, links to downloads, and lots of hand-waving when it comes to trying to figure out how many of you (listeners) are out there.  </itunes:summary>
      <itunes:subtitle>Something a little different in this episode--we'…</itunes:subtitle>
      <description>Something a little different in this episode--we'll be talking about the technical plumbing that gets our podcast from our brains to your ears.  As it turns out, it's a multi-step bucket brigade process of RSS feeds, links to downloads, and lots of hand-waving when it comes to trying to figure out how many of you (listeners) are out there.  </description>
      <enclosure length="42087686" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/269958489-linear-digressions-how-the-sausage-gets-made.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/268819785</guid>
      <title>SMOTE: makin' yourself some fake minority data</title>
      <pubDate>Mon, 13 Jun 2016 03:06:33 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/smote-makin-yourself-some-fake-minority-data</link>
      <itunes:duration>00:14:37</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Machine learning on imbalanced classes: surprisingly tricky.  Many (most?) algorithms tend to just assign the majority class label to all the data and call it a day.  SMOTE is an algorithm for manufacturing new minority class examples for yourself, to help your algorithm better identify them in the wild.

Relevant links:
https://www.jair.org/media/953/live-953-2037-jair.pdf</itunes:summary>
      <itunes:subtitle>Machine learning on imbalanced classes: surprisin…</itunes:subtitle>
      <description>Machine learning on imbalanced classes: surprisingly tricky.  Many (most?) algorithms tend to just assign the majority class label to all the data and call it a day.  SMOTE is an algorithm for manufacturing new minority class examples for yourself, to help your algorithm better identify them in the wild.

Relevant links:
https://www.jair.org/media/953/live-953-2037-jair.pdf</description>
      <enclosure length="21058906" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/268819785-linear-digressions-smote-makin-yourself-some-fake-minority-data.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/267742358</guid>
      <title>Conjoint Analysis: like AB testing, but on steroids</title>
      <pubDate>Mon, 06 Jun 2016 02:13:37 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/conjoint-analysis-like-ab-testing-but-on-steroids</link>
      <itunes:duration>00:18:27</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Conjoint analysis is like AB tester, but more bigger more better: instead of testing one or two things, you can test potentially dozens of options.  Where might you use something like this?  Well, if you wanted to design an entire hotel chain completely from scratch, and to do it in a data-driven way.  You'll never look at Courtyard by Marriott the same way again.

Relevant link: https://marketing.wharton.upenn.edu/files/?whdmsaction=public:main.file&amp;fileID=466 </itunes:summary>
      <itunes:subtitle>Conjoint analysis is like AB tester, but more big…</itunes:subtitle>
      <description>Conjoint analysis is like AB tester, but more bigger more better: instead of testing one or two things, you can test potentially dozens of options.  Where might you use something like this?  Well, if you wanted to design an entire hotel chain completely from scratch, and to do it in a data-driven way.  You'll never look at Courtyard by Marriott the same way again.

Relevant link: https://marketing.wharton.upenn.edu/files/?whdmsaction=public:main.file&amp;fileID=466 </description>
      <enclosure length="26567817" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/267742358-linear-digressions-conjoint-analysis-like-ab-testing-but-on-steroids.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/266545981</guid>
      <title>Traffic Metering Algorithms</title>
      <pubDate>Mon, 30 May 2016 01:57:10 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/traffic-metering-algorithms</link>
      <itunes:duration>00:17:30</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This episode is for all you (us) traffic nerds--we're talking about the hidden structure underlying traffic on-ramp metering systems.  These systems slow down the flow of traffic onto highways so that the highways don't get overloaded with cars and clog up.  If you're someone who listens to podcasts while commuting, and especially if your area has on-ramp metering, you'll never look at highway access control the same way again (yeah, we know this is super nerdy; it's also super awesome).

Relevant links:
http://its.berkeley.edu/sites/default/files/publications/UCB/99/PWP/UCB-ITS-PWP-99-19.pdf
http://www.its.uci.edu/~lchu/ramp/Final_report_mou3013.pdf</itunes:summary>
      <itunes:subtitle>This episode is for all you (us) traffic nerds--w…</itunes:subtitle>
      <description>This episode is for all you (us) traffic nerds--we're talking about the hidden structure underlying traffic on-ramp metering systems.  These systems slow down the flow of traffic onto highways so that the highways don't get overloaded with cars and clog up.  If you're someone who listens to podcasts while commuting, and especially if your area has on-ramp metering, you'll never look at highway access control the same way again (yeah, we know this is super nerdy; it's also super awesome).

Relevant links:
http://its.berkeley.edu/sites/default/files/publications/UCB/99/PWP/UCB-ITS-PWP-99-19.pdf
http://www.its.uci.edu/~lchu/ramp/Final_report_mou3013.pdf</description>
      <enclosure length="25214256" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/266545981-linear-digressions-traffic-metering-algorithms.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/265435083</guid>
      <title>Um Detector 2: The Dynamic Time Warp</title>
      <pubDate>Mon, 23 May 2016 02:05:01 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/um-detector-2-the-dynamic-time-warp</link>
      <itunes:duration>00:14:00</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>One tricky thing about working with time series data, like the audio data in our "um" detector (remember that?  because we barely do...), is that sometimes events look really similar but one is a little bit stretched and squeezed relative to the other.  Besides having an amazing name, the dynamic time warp is a handy algorithm for aligning two time series sequences that are close in shape, but don't quite line up out of the box.

Relevant link:
http://www.aaai.org/Papers/Workshops/1994/WS-94-03/WS94-03-031.pdf</itunes:summary>
      <itunes:subtitle>One tricky thing about working with time series d…</itunes:subtitle>
      <description>One tricky thing about working with time series data, like the audio data in our "um" detector (remember that?  because we barely do...), is that sometimes events look really similar but one is a little bit stretched and squeezed relative to the other.  Besides having an amazing name, the dynamic time warp is a handy algorithm for aligning two time series sequences that are close in shape, but don't quite line up out of the box.

Relevant link:
http://www.aaai.org/Papers/Workshops/1994/WS-94-03/WS94-03-031.pdf</description>
      <enclosure length="20161757" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/265435083-linear-digressions-um-detector-2-the-dynamic-time-warp.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/264324749</guid>
      <title>Inside a Data Analysis: Fraud Hunting at Enron</title>
      <pubDate>Mon, 16 May 2016 02:36:10 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/inside-a-data-analysis-fraud-hunting-at-enron</link>
      <itunes:duration>00:30:28</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It's storytime this week--the story, from beginning to end, of how Katie designed and built the main project for Udacity's Intro to Machine Learning class, when she was developing the course.  The project was to use email and financial data to hunt for signatures of fraud at Enron, one of the biggest cases of corporate fraud in history; that description makes the project sound pretty clean but getting the data into the right shape, and even doing some dataset merging (that hadn't ever been done before), made this project much more interesting to design than it might appear.  Here's the story of what a data analysis like this looks like...from the inside.</itunes:summary>
      <itunes:subtitle>It's storytime this week--the story, from beginni…</itunes:subtitle>
      <description>It's storytime this week--the story, from beginning to end, of how Katie designed and built the main project for Udacity's Intro to Machine Learning class, when she was developing the course.  The project was to use email and financial data to hunt for signatures of fraud at Enron, one of the biggest cases of corporate fraud in history; that description makes the project sound pretty clean but getting the data into the right shape, and even doing some dataset merging (that hadn't ever been done before), made this project much more interesting to design than it might appear.  Here's the story of what a data analysis like this looks like...from the inside.</description>
      <enclosure length="43884493" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/264324749-linear-digressions-inside-a-data-analysis-fraud-hunting-at-enron.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/263148994</guid>
      <title>What's the biggest #bigdata?</title>
      <pubDate>Mon, 09 May 2016 01:28:21 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/whats-the-biggest-bigdata</link>
      <itunes:duration>00:25:31</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Data science and is often mentioned in the same breath as big data.  But how big is big data?  And who has the biggest big data?  CERN?  Youtube?  

... Something (or someone) else?

Relevant link: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195

</itunes:summary>
      <itunes:subtitle>Data science and is often mentioned in the same b…</itunes:subtitle>
      <description>Data science and is often mentioned in the same breath as big data.  But how big is big data?  And who has the biggest big data?  CERN?  Youtube?  

... Something (or someone) else?

Relevant link: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195

</description>
      <enclosure length="36751811" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/263148994-linear-digressions-whats-the-biggest-bigdata.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/261895067</guid>
      <title>Data Contamination</title>
      <pubDate>Mon, 02 May 2016 02:24:06 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-contamination</link>
      <itunes:duration>00:20:58</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Supervised machine learning assumes that the features and labels used for building a classifier are isolated from each other--basically, that you can't cheat by peeking.  Turns out this can be easier said than done.  In this episode, we'll talk about the many (and diverse!) cases where label information contaminates features, ruining data science competitions along the way.

Relevant links:
https://www.researchgate.net/profile/Claudia_Perlich/publication/221653692_Leakage_in_data_mining_Formulation_detection_and_avoidance/links/54418bb80cf2a6a049a5a0ca.pdf</itunes:summary>
      <itunes:subtitle>Supervised machine learning assumes that the feat…</itunes:subtitle>
      <description>Supervised machine learning assumes that the features and labels used for building a classifier are isolated from each other--basically, that you can't cheat by peeking.  Turns out this can be easier said than done.  In this episode, we'll talk about the many (and diverse!) cases where label information contaminates features, ruining data science competitions along the way.

Relevant links:
https://www.researchgate.net/profile/Claudia_Perlich/publication/221653692_Leakage_in_data_mining_Formulation_detection_and_avoidance/links/54418bb80cf2a6a049a5a0ca.pdf</description>
      <enclosure length="30191523" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/261895067-linear-digressions-data-contamination.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/260723988</guid>
      <title>Model Interpretation (and Trust Issues)</title>
      <pubDate>Mon, 25 Apr 2016 00:45:04 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/model-interpretation-and-trust-issues</link>
      <itunes:duration>00:16:57</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Machine learning algorithms can be black boxes--inputs go in, outputs come out, and what happens in the middle is anybody's guess.  But understanding how a model arrives at an answer is critical for interpreting the model, and for knowing if it's doing something reasonable (one could even say... trustworthy).  We'll talk about a new algorithm called LIME that seeks to make any model more understandable and interpretable.

Relevant Links:
http://arxiv.org/abs/1602.04938
https://github.com/marcotcr/lime/tree/master/lime</itunes:summary>
      <itunes:subtitle>Machine learning algorithms can be black boxes--i…</itunes:subtitle>
      <description>Machine learning algorithms can be black boxes--inputs go in, outputs come out, and what happens in the middle is anybody's guess.  But understanding how a model arrives at an answer is critical for interpreting the model, and for knowing if it's doing something reasonable (one could even say... trustworthy).  We'll talk about a new algorithm called LIME that seeks to make any model more understandable and interpretable.

Relevant Links:
http://arxiv.org/abs/1602.04938
https://github.com/marcotcr/lime/tree/master/lime</description>
      <enclosure length="24420552" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/260723988-linear-digressions-model-interpretation-and-trust-issues.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/259611617</guid>
      <title>Updates! Political Science Fraud and AlphaGo</title>
      <pubDate>Mon, 18 Apr 2016 02:48:04 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/updates-political-science-fraud-and-alphago</link>
      <itunes:duration>00:31:43</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We've got updates for you about topics from past shows!  First, the political science scandal of the year 2015 has a new chapter, we'll remind you about the original story and then dive into what has happened since.  Then, we've got an update on AlphaGo, and his/her/its much-anticipated match against the human champion of the game Go.

Relevant Links:
https://soundcloud.com/linear-digressions/electoral-insights-part-2
https://soundcloud.com/linear-digressions/go-1

http://www.sciencemag.org/news/2016/04/talking-people-about-gay-and-transgender-issues-can-change-their-prejudices
http://science.sciencemag.org/content/sci/352/6282/220.full.pdf

http://qz.com/639952/googles-ai-won-the-game-go-by-defying-millennia-of-basic-human-instinct/
http://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/
http://www.wired.com/2016/03/sadness-beauty-watching-googles-ai-play-go/</itunes:summary>
      <itunes:subtitle>We've got updates for you about topics from past …</itunes:subtitle>
      <description>We've got updates for you about topics from past shows!  First, the political science scandal of the year 2015 has a new chapter, we'll remind you about the original story and then dive into what has happened since.  Then, we've got an update on AlphaGo, and his/her/its much-anticipated match against the human champion of the game Go.

Relevant Links:
https://soundcloud.com/linear-digressions/electoral-insights-part-2
https://soundcloud.com/linear-digressions/go-1

http://www.sciencemag.org/news/2016/04/talking-people-about-gay-and-transgender-issues-can-change-their-prejudices
http://science.sciencemag.org/content/sci/352/6282/220.full.pdf

http://qz.com/639952/googles-ai-won-the-game-go-by-defying-millennia-of-basic-human-instinct/
http://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/
http://www.wired.com/2016/03/sadness-beauty-watching-googles-ai-play-go/</description>
      <enclosure length="45689450" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/259611617-linear-digressions-updates-political-science-fraud-and-alphago.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/258388115</guid>
      <title>Ecological Inference and Simpson's Paradox</title>
      <pubDate>Mon, 11 Apr 2016 02:43:12 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/ecological-inference-and-simpsons-paradox</link>
      <itunes:duration>00:18:32</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Simpson's paradox is the data science equivalent of looking through one eye and seeing a very clear trend, and then looking through the other eye and seeing the very clear opposite trend.  In one case, you see a trend one way in a group, but then breaking the group into subgroups gives the exact opposite trend.  Confused?  Scratching your head?  Welcome to the tricky world of ecological inference.

Relevant links:
https://gking.harvard.edu/files/gking/files/part1.pdf
http://blog.revolutionanalytics.com/2013/07/a-great-example-of-simpsons-paradox.html</itunes:summary>
      <itunes:subtitle>Simpson's paradox is the data science equivalent …</itunes:subtitle>
      <description>Simpson's paradox is the data science equivalent of looking through one eye and seeing a very clear trend, and then looking through the other eye and seeing the very clear opposite trend.  In one case, you see a trend one way in a group, but then breaking the group into subgroups gives the exact opposite trend.  Confused?  Scratching your head?  Welcome to the tricky world of ecological inference.

Relevant links:
https://gking.harvard.edu/files/gking/files/part1.pdf
http://blog.revolutionanalytics.com/2013/07/a-great-example-of-simpsons-paradox.html</description>
      <enclosure length="26694459" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/258388115-linear-digressions-ecological-inference-and-simpsons-paradox.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/257188961</guid>
      <title>Discriminatory Algorithms</title>
      <pubDate>Mon, 04 Apr 2016 02:30:09 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/discriminatory-algorithms</link>
      <itunes:duration>00:15:21</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Sometimes when we say an algorithm discriminates, we mean it can tell the difference between two types of items.  But in this episode, we'll talk about another, more troublesome side to discrimination: algorithms can be... racist?  Sexist?  Ageist?  Yes to all of the above.  It's an important thing to be aware of, especially when doing people-centered data science.  We'll discuss how and why this happens, and what solutions are out there (or not).

Relevant Links:
http://www.nytimes.com/2015/07/10/upshot/when-algorithms-discriminate.html
http://techcrunch.com/2015/08/02/machine-learning-and-human-bias-an-uneasy-pair/
http://www.sciencefriday.com/segments/why-machines-discriminate-and-how-to-fix-them/
https://medium.com/@geomblog/when-an-algorithm-isn-t-2b9fe01b9bb5#.auxqi5srz


</itunes:summary>
      <itunes:subtitle>Sometimes when we say an algorithm discriminates,…</itunes:subtitle>
      <description>Sometimes when we say an algorithm discriminates, we mean it can tell the difference between two types of items.  But in this episode, we'll talk about another, more troublesome side to discrimination: algorithms can be... racist?  Sexist?  Ageist?  Yes to all of the above.  It's an important thing to be aware of, especially when doing people-centered data science.  We'll discuss how and why this happens, and what solutions are out there (or not).

Relevant Links:
http://www.nytimes.com/2015/07/10/upshot/when-algorithms-discriminate.html
http://techcrunch.com/2015/08/02/machine-learning-and-human-bias-an-uneasy-pair/
http://www.sciencefriday.com/segments/why-machines-discriminate-and-how-to-fix-them/
https://medium.com/@geomblog/when-an-algorithm-isn-t-2b9fe01b9bb5#.auxqi5srz


</description>
      <enclosure length="22122194" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/257188961-linear-digressions-discriminatory-algorithms.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/255507844</guid>
      <title>Recommendation Engines and Privacy</title>
      <pubDate>Mon, 28 Mar 2016 02:46:45 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/recommendation-engines-and-privacy</link>
      <itunes:duration>00:31:33</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This episode started out as a discussion of recommendation engines, like Netflix uses to suggest movies.  There's still a lot of that in here.  But a related topic, which is both interesting and important, is how to keep data private in the era of large-scale recommendation engines--what mistakes have been made surrounding supposedly anonymized data, how data ends up de-anonymized, and why it matters for you.

Relevant links:
http://www.netflixprize.com/
http://bits.blogs.nytimes.com/2010/03/12/netflix-cancels-contest-plans-and-settles-suit/?_r=0
http://arxiv.org/PS_cache/cs/pdf/0610/0610105v2.pdf</itunes:summary>
      <itunes:subtitle>This episode started out as a discussion of recom…</itunes:subtitle>
      <description>This episode started out as a discussion of recommendation engines, like Netflix uses to suggest movies.  There's still a lot of that in here.  But a related topic, which is both interesting and important, is how to keep data private in the era of large-scale recommendation engines--what mistakes have been made surrounding supposedly anonymized data, how data ends up de-anonymized, and why it matters for you.

Relevant links:
http://www.netflixprize.com/
http://bits.blogs.nytimes.com/2010/03/12/netflix-cancels-contest-plans-and-settles-suit/?_r=0
http://arxiv.org/PS_cache/cs/pdf/0610/0610105v2.pdf</description>
      <enclosure length="45429270" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/255507844-linear-digressions-recommendation-engines-and-privacy.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/253723817</guid>
      <title>Neural nets play cops and robbers (AKA generative adverserial networks)</title>
      <pubDate>Mon, 21 Mar 2016 02:58:49 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/neural-nets-play-cops-and-robbers-aka-generative-adverserial-networks</link>
      <itunes:duration>00:18:56</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>One neural net is creating counterfeit bills and passing them off to a second neural net, which is trying to distinguish the real money from the fakes.  Result: two neural nets that are better than either one would have been without the competition.

Relevant links:
http://arxiv.org/pdf/1406.2661v1.pdf
http://arxiv.org/pdf/1412.6572v3.pdf
http://soumith.ch/eyescream/</itunes:summary>
      <itunes:subtitle>One neural net is creating counterfeit bills and …</itunes:subtitle>
      <description>One neural net is creating counterfeit bills and passing them off to a second neural net, which is trying to distinguish the real money from the fakes.  Result: two neural nets that are better than either one would have been without the competition.

Relevant links:
http://arxiv.org/pdf/1406.2661v1.pdf
http://arxiv.org/pdf/1412.6572v3.pdf
http://soumith.ch/eyescream/</description>
      <enclosure length="27263719" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/253723817-linear-digressions-neural-nets-play-cops-and-robbers-aka-generative-adverserial-networks.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/251757035</guid>
      <title>A Data Scientist's View of the Fight against Cancer</title>
      <pubDate>Mon, 14 Mar 2016 03:26:27 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/a-data-scientists-view-of-the-fight-against-cancer</link>
      <itunes:duration>00:19:08</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In this episode, we're taking many episodes' worth of insights and unpacking an extremely complex and important question--in what ways are we winning the fight against cancer, where might that fight go in the coming decade, and how do we know when we're making progress?  No matter how tricky you might think this problem is to solve, the fact is, once you get in there trying to solve it, it's even trickier than you thought.</itunes:summary>
      <itunes:subtitle>In this episode, we're taking many episodes' wort…</itunes:subtitle>
      <description>In this episode, we're taking many episodes' worth of insights and unpacking an extremely complex and important question--in what ways are we winning the fight against cancer, where might that fight go in the coming decade, and how do we know when we're making progress?  No matter how tricky you might think this problem is to solve, the fact is, once you get in there trying to solve it, it's even trickier than you thought.</description>
      <enclosure length="27561515" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/251757035-linear-digressions-a-data-scientists-view-of-the-fight-against-cancer.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/251292556</guid>
      <title>Congress Bots and DeepDrumpf</title>
      <pubDate>Fri, 11 Mar 2016 04:17:29 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/congress-bots-and-deepdrumpf</link>
      <itunes:duration>00:20:47</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Hey, sick of the election yet?  Fear not, there are algorithms that can automagically generate political-ish speech so that we never need to be without an endless supply of Congressional speeches and Donald Trump twitticisms!  

Relevant links:
http://arxiv.org/pdf/1601.03313v2.pdf
http://qz.com/631497/mit-built-a-donald-trump-ai-twitter-bot-that-sounds-scarily-like-him/
https://twitter.com/deepdrumpf</itunes:summary>
      <itunes:subtitle>Hey, sick of the election yet?  Fear not, there a…</itunes:subtitle>
      <description>Hey, sick of the election yet?  Fear not, there are algorithms that can automagically generate political-ish speech so that we never need to be without an endless supply of Congressional speeches and Donald Trump twitticisms!  

Relevant links:
http://arxiv.org/pdf/1601.03313v2.pdf
http://qz.com/631497/mit-built-a-donald-trump-ai-twitter-bot-that-sounds-scarily-like-him/
https://twitter.com/deepdrumpf</description>
      <enclosure length="29945763" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/251292556-linear-digressions-congress-bots-and-deepdrumpf.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/250586847</guid>
      <title>Multi - Armed Bandits</title>
      <pubDate>Mon, 07 Mar 2016 02:44:17 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/multi-armed-bandits</link>
      <itunes:duration>00:11:29</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Multi-armed bandits: how to take your randomized experiment and make it harder better faster stronger.  Basically, a multi-armed bandit experiment allows you to optimize for both learning and making use of your knowledge at the same time.  It's what the pros (like Google Analytics) use, and it's got a great name, so... winner!

Relevant link: https://support.google.com/analytics/answer/2844870?hl=en</itunes:summary>
      <itunes:subtitle>Multi-armed bandits: how to take your randomized …</itunes:subtitle>
      <description>Multi-armed bandits: how to take your randomized experiment and make it harder better faster stronger.  Basically, a multi-armed bandit experiment allows you to optimize for both learning and making use of your knowledge at the same time.  It's what the pros (like Google Analytics) use, and it's got a great name, so... winner!

Relevant link: https://support.google.com/analytics/answer/2844870?hl=en</description>
      <enclosure length="16549962" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/250586847-linear-digressions-multi-armed-bandits.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/250113250</guid>
      <title>Experiments and Messy, Tricky Causality</title>
      <pubDate>Fri, 04 Mar 2016 03:54:04 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/experiments-and-messy-tricky-causality</link>
      <itunes:duration>00:16:59</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>"People with a family history of heart disease are more likely to eat healthy foods, and have a high incidence of heart attacks."  Did the healthy food cause the heart attacks?  Probably not.  But establishing causal links is extremely tricky, and extremely important to get right if you're trying to help students, test new medicines, or just optimize a website.  In this episode, we'll unpack randomized experiments, like AB tests, and maybe you'll be smarter as a result.  Will you be smarter BECAUSE of this episode?  Well, tough to say for sure...

Relevant link:
http://tylervigen.com/spurious-correlations</itunes:summary>
      <itunes:subtitle>"People with a family history of heart disease ar…</itunes:subtitle>
      <description>"People with a family history of heart disease are more likely to eat healthy foods, and have a high incidence of heart attacks."  Did the healthy food cause the heart attacks?  Probably not.  But establishing causal links is extremely tricky, and extremely important to get right if you're trying to help students, test new medicines, or just optimize a website.  In this episode, we'll unpack randomized experiments, like AB tests, and maybe you'll be smarter as a result.  Will you be smarter BECAUSE of this episode?  Well, tough to say for sure...

Relevant link:
http://tylervigen.com/spurious-correlations</description>
      <enclosure length="24459422" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/250113250-linear-digressions-experiments-and-messy-tricky-causality.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/249418845</guid>
      <title>Backpropagation</title>
      <pubDate>Mon, 29 Feb 2016 03:58:10 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/backpropagation</link>
      <itunes:duration>00:12:21</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The reason that neural nets are taking over the world right now is because they can be efficiently trained with the backpropagation algorithm.  In short, backprop allows you to adjust the weights of the neural net based on how good of a job the neural net is doing at classifying training examples, thereby getting better and better at making predictions.  In this episode: we talk backpropagation, and how it makes it possible to train the neural nets we know and love.</itunes:summary>
      <itunes:subtitle>The reason that neural nets are taking over the w…</itunes:subtitle>
      <description>The reason that neural nets are taking over the world right now is because they can be efficiently trained with the backpropagation algorithm.  In short, backprop allows you to adjust the weights of the neural net based on how good of a job the neural net is doing at classifying training examples, thereby getting better and better at making predictions.  In this episode: we talk backpropagation, and how it makes it possible to train the neural nets we know and love.</description>
      <enclosure length="17797571" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/249418845-linear-digressions-backpropagation.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/248956652</guid>
      <title>Text Analysis on the State Of The Union</title>
      <pubDate>Fri, 26 Feb 2016 03:51:42 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/text-analysis-on-the-state-of-the-union</link>
      <itunes:duration>00:22:22</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>First up in this episode: a crash course in natural language processing, and important steps if you want to use machine learning techniques on text data.  Then we'll take that NLP know-how and talk about a really cool analysis of State of the Union text, which analyzes the topics and word choices of every President from Washington to Obama.

Relevant link:
https://civisanalytics.com/blog/data-science/2016/01/15/data-science-on-state-of-the-union-addresses/</itunes:summary>
      <itunes:subtitle>First up in this episode: a crash course in natur…</itunes:subtitle>
      <description>First up in this episode: a crash course in natural language processing, and important steps if you want to use machine learning techniques on text data.  Then we'll take that NLP know-how and talk about a really cool analysis of State of the Union text, which analyzes the topics and word choices of every President from Washington to Obama.

Relevant link:
https://civisanalytics.com/blog/data-science/2016/01/15/data-science-on-state-of-the-union-addresses/</description>
      <enclosure length="32215908" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/248956652-linear-digressions-text-analysis-on-the-state-of-the-union.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/248275653</guid>
      <title>Paradigms in Artificial Intelligence</title>
      <pubDate>Mon, 22 Feb 2016 04:32:25 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/paradigms-in-artificial-intelligence</link>
      <itunes:duration>00:17:20</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Artificial intelligence includes a number of different strategies for how to make machines more intelligent, and often more human-like, in their ability to learn and solve problems.  An ambitious group of researchers is working right now to classify all the approaches to AI, perhaps as a first step toward unifying these approaches and move closer to strong AI.  In this episode, we'll touch on some of the most provocative work in many different subfields of artificial intelligence, and their strengths and weaknesses.

Relevant links:
https://www.technologyreview.com/s/544606/can-this-man-make-aimore-human/
https://www.youtube.com/watch?v=B8J4uefCQMc
http://venturebeat.com/2013/11/29/sentient-code-an-inside-look-at-stephen-wolframs-utterly-new-insanely-ambitious-computational-paradigm/
http://www.slate.com/articles/technology/bitwise/2014/03/stephen_wolfram_s_new_programming_language_can_he_make_the_world_computable.html</itunes:summary>
      <itunes:subtitle>Artificial intelligence includes a number of diff…</itunes:subtitle>
      <description>Artificial intelligence includes a number of different strategies for how to make machines more intelligent, and often more human-like, in their ability to learn and solve problems.  An ambitious group of researchers is working right now to classify all the approaches to AI, perhaps as a first step toward unifying these approaches and move closer to strong AI.  In this episode, we'll touch on some of the most provocative work in many different subfields of artificial intelligence, and their strengths and weaknesses.

Relevant links:
https://www.technologyreview.com/s/544606/can-this-man-make-aimore-human/
https://www.youtube.com/watch?v=B8J4uefCQMc
http://venturebeat.com/2013/11/29/sentient-code-an-inside-look-at-stephen-wolframs-utterly-new-insanely-ambitious-computational-paradigm/
http://www.slate.com/articles/technology/bitwise/2014/03/stephen_wolfram_s_new_programming_language_can_he_make_the_world_computable.html</description>
      <enclosure length="24963481" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/248275653-linear-digressions-paradigms-in-artificial-intelligence.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/247799926</guid>
      <title>Survival Analysis</title>
      <pubDate>Fri, 19 Feb 2016 03:44:06 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/survival-analysis</link>
      <itunes:duration>00:15:21</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Survival analysis is all about studying how long until an event occurs--it's used in marketing to study how long a customer stays with a service, in epidemiology to estimate the duration of survival of a patient with some illness, and in social science to understand how the characteristics of a war inform how long the war goes on.  This episode talks about the special challenges associated with survival analysis, and the tools that (data) scientists use to answer all kinds of duration-related questions.</itunes:summary>
      <itunes:subtitle>Survival analysis is all about studying how long …</itunes:subtitle>
      <description>Survival analysis is all about studying how long until an event occurs--it's used in marketing to study how long a customer stays with a service, in epidemiology to estimate the duration of survival of a patient with some illness, and in social science to understand how the characteristics of a war inform how long the war goes on.  This episode talks about the special challenges associated with survival analysis, and the tools that (data) scientists use to answer all kinds of duration-related questions.</description>
      <enclosure length="22109028" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/247799926-linear-digressions-survival-analysis.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/247102635</guid>
      <title>Gravitational Waves</title>
      <pubDate>Mon, 15 Feb 2016 02:46:22 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/gravitational-waves</link>
      <itunes:duration>00:20:26</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>All aboard the gravitational waves bandwagon--with the first direct observation of gravitational waves announced this week, Katie's dusting off her physics PhD for a very special gravity-related episode.  Discussed in this episode: what are gravitational waves, how are they detected, and what does this announcement mean for future studies of the universe.

Relevant links:
http://www.nytimes.com/2016/02/12/science/ligo-gravitational-waves-black-holes-einstein.html
https://www.ligo.caltech.edu/news/ligo20160211</itunes:summary>
      <itunes:subtitle>All aboard the gravitational waves bandwagon--wit…</itunes:subtitle>
      <description>All aboard the gravitational waves bandwagon--with the first direct observation of gravitational waves announced this week, Katie's dusting off her physics PhD for a very special gravity-related episode.  Discussed in this episode: what are gravitational waves, how are they detected, and what does this announcement mean for future studies of the universe.

Relevant links:
http://www.nytimes.com/2016/02/12/science/ligo-gravitational-waves-black-holes-einstein.html
https://www.ligo.caltech.edu/news/ligo20160211</description>
      <enclosure length="29439824" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/247102635-linear-digressions-gravitational-waves.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/246641616</guid>
      <title>The Turing Test</title>
      <pubDate>Fri, 12 Feb 2016 04:11:23 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-turing-test</link>
      <itunes:duration>00:15:15</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Let's imagine a future in which a truly intelligent computer program exists.  How would it convince us (humanity) that it was intelligent?  Alan Turing's answer to this question, proposed over 60 years ago, is that the program could convince a human conversational partner that it, the computer, was in fact a human.  60 years later, the Turing Test endures as a gold standard of artificial intelligence.  It hasn't been beaten, either--yet.

Relevant links:
https://en.wikipedia.org/wiki/Turing_test
http://commonsensereasoning.org/winograd.html
http://consumerist.com/2015/09/29/its-not-just-you-robots-are-also-bad-at-assembling-ikea-furniture/</itunes:summary>
      <itunes:subtitle>Let's imagine a future in which a truly intellige…</itunes:subtitle>
      <description>Let's imagine a future in which a truly intelligent computer program exists.  How would it convince us (humanity) that it was intelligent?  Alan Turing's answer to this question, proposed over 60 years ago, is that the program could convince a human conversational partner that it, the computer, was in fact a human.  60 years later, the Turing Test endures as a gold standard of artificial intelligence.  It hasn't been beaten, either--yet.

Relevant links:
https://en.wikipedia.org/wiki/Turing_test
http://commonsensereasoning.org/winograd.html
http://consumerist.com/2015/09/29/its-not-just-you-robots-are-also-bad-at-assembling-ikea-furniture/</description>
      <enclosure length="21964206" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/246641616-linear-digressions-the-turing-test.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/245976298</guid>
      <title>Item Response Theory: how smart ARE you?</title>
      <pubDate>Mon, 08 Feb 2016 03:37:58 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/item-response-theory-how-smart-are-you</link>
      <itunes:duration>00:11:46</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Psychometrics is all about measuring the psychological characteristics of people; for example, scholastic aptitude.  How is this done?  Tests, of course!  But there's a chicken-and-egg problem here: you need to know both how hard a test is, and how smart the test-taker is, in order to get the results you want.  How to solve this problem, one equation with two unknowns?  Item response theory--the data science behind such tests and the GRE.

Relevant links: 
https://en.wikipedia.org/wiki/Item_response_theory</itunes:summary>
      <itunes:subtitle>Psychometrics is all about measuring the psycholo…</itunes:subtitle>
      <description>Psychometrics is all about measuring the psychological characteristics of people; for example, scholastic aptitude.  How is this done?  Tests, of course!  But there's a chicken-and-egg problem here: you need to know both how hard a test is, and how smart the test-taker is, in order to get the results you want.  How to solve this problem, one equation with two unknowns?  Item response theory--the data science behind such tests and the GRE.

Relevant links: 
https://en.wikipedia.org/wiki/Item_response_theory</description>
      <enclosure length="16959980" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/245976298-linear-digressions-item-response-theory-how-smart-are-you.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/245529396</guid>
      <title>Go!</title>
      <pubDate>Fri, 05 Feb 2016 04:52:36 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/go-1</link>
      <itunes:duration>00:19:59</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>As you may have heard, a computer beat a world-class human player in Go last week.  As recently as a year ago the prediction was that it would take a decade to get to this point, yet here we are, in 2016.  We'll talk about the history and strategy of game-playing computer programs, and what makes Google's AlphaGo so special.

Relevant link:
http://googleresearch.blogspot.com/2016/01/alphago-mastering-ancient-game-of-go.html</itunes:summary>
      <itunes:subtitle>As you may have heard, a computer beat a world-cl…</itunes:subtitle>
      <description>As you may have heard, a computer beat a world-class human player in Go last week.  As recently as a year ago the prediction was that it would take a decade to get to this point, yet here we are, in 2016.  We'll talk about the history and strategy of game-playing computer programs, and what makes Google's AlphaGo so special.

Relevant link:
http://googleresearch.blogspot.com/2016/01/alphago-mastering-ancient-game-of-go.html</description>
      <enclosure length="28773388" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/245529396-linear-digressions-go-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/244832046</guid>
      <title>Great Social Networks in History</title>
      <pubDate>Mon, 01 Feb 2016 04:22:02 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/great-social-networks-in-history</link>
      <itunes:duration>00:12:42</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The Medici were one of the great ruling families of Europe during the Renaissance.  How did they come to rule?  Not power, or money, or armies, but through the strength of their social network.  And speaking of great historical social networks, analysis of the network of letter-writing during the Enlightenment is helping humanities scholars track the dispersion of great ideas across the world during that time, from Voltaire to Benjamin Franklin and everyone in between.

Relevant links:
https://www2.bc.edu/~jonescq/mb851/Mar12/PadgettAnsell_AJS_1993.pdf
http://republicofletters.stanford.edu/index.html</itunes:summary>
      <itunes:subtitle>The Medici were one of the great ruling families …</itunes:subtitle>
      <description>The Medici were one of the great ruling families of Europe during the Renaissance.  How did they come to rule?  Not power, or money, or armies, but through the strength of their social network.  And speaking of great historical social networks, analysis of the network of letter-writing during the Enlightenment is helping humanities scholars track the dispersion of great ideas across the world during that time, from Voltaire to Benjamin Franklin and everyone in between.

Relevant links:
https://www2.bc.edu/~jonescq/mb851/Mar12/PadgettAnsell_AJS_1993.pdf
http://republicofletters.stanford.edu/index.html</description>
      <enclosure length="18290344" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/244832046-linear-digressions-great-social-networks-in-history.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/244359703</guid>
      <title>How Much to Pay a Spy (and a lil' more auctions)</title>
      <pubDate>Fri, 29 Jan 2016 05:36:33 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/how-much-to-pay-a-spy-and-a-lil-more-auctions</link>
      <itunes:duration>00:16:59</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>A few small encores on auction theory, and then--how can you value a piece of information before you know what it is?  Decision theory has some pointers.  Some highly relevant information if you are trying to figure out how much to pay a spy.

Relevant links:
https://tuecontheoryofnetworks.wordpress.com/2013/02/25/the-origin-of-the-dutch-auction/
http://www.nowozin.net/sebastian/blog/the-fair-price-to-pay-a-spy-an-introduction-to-the-value-of-information.html
</itunes:summary>
      <itunes:subtitle>A few small encores on auction theory, and then--…</itunes:subtitle>
      <description>A few small encores on auction theory, and then--how can you value a piece of information before you know what it is?  Decision theory has some pointers.  Some highly relevant information if you are trying to figure out how much to pay a spy.

Relevant links:
https://tuecontheoryofnetworks.wordpress.com/2013/02/25/the-origin-of-the-dutch-auction/
http://www.nowozin.net/sebastian/blog/the-fair-price-to-pay-a-spy-an-introduction-to-the-value-of-information.html
</description>
      <enclosure length="24455660" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/244359703-linear-digressions-how-much-to-pay-a-spy-and-a-lil-more-auctions.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/243636665</guid>
      <title>Sold!  Auctions (Part 2)</title>
      <pubDate>Mon, 25 Jan 2016 02:58:07 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/sold-auctions-part-2</link>
      <itunes:duration>00:17:27</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The Google ads auction is a special kind of auction, one you might not know as well as the famous English auction (which we talked about in the last episode).  But if it's what Google uses to sell billions of dollars of ad space in real time, you know it must be pretty cool.

Relevant links:
https://en.wikipedia.org/wiki/English_auction
http://people.ischool.berkeley.edu/~hal/Papers/2006/position.pdf
http://www.benedelman.org/publications/gsp-060801.pdf </itunes:summary>
      <itunes:subtitle>The Google ads auction is a special kind of aucti…</itunes:subtitle>
      <description>The Google ads auction is a special kind of auction, one you might not know as well as the famous English auction (which we talked about in the last episode).  But if it's what Google uses to sell billions of dollars of ad space in real time, you know it must be pretty cool.

Relevant links:
https://en.wikipedia.org/wiki/English_auction
http://people.ischool.berkeley.edu/~hal/Papers/2006/position.pdf
http://www.benedelman.org/publications/gsp-060801.pdf </description>
      <enclosure length="25142785" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/243636665-linear-digressions-sold-auctions-part-2.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/243171056</guid>
      <title>Going Once, Going Twice: Auctions (Part 1)</title>
      <pubDate>Fri, 22 Jan 2016 03:40:24 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/going-once-going-twice-auctions-part-1</link>
      <itunes:duration>00:12:39</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The Google AdWords algorithm is (famously) an auction system for allocating a massive amount of online ad space in real time--with that fascinating use case in mind, this episode is part one in a two-part series all about auctions.  We dive into the theory of auctions, and what makes a "good" auction.   

Relevant links:
https://en.wikipedia.org/wiki/English_auction
http://people.ischool.berkeley.edu/~hal/Papers/2006/position.pdf
http://www.benedelman.org/publications/gsp-060801.pdf </itunes:summary>
      <itunes:subtitle>The Google AdWords algorithm is (famously) an auc…</itunes:subtitle>
      <description>The Google AdWords algorithm is (famously) an auction system for allocating a massive amount of online ad space in real time--with that fascinating use case in mind, this episode is part one in a two-part series all about auctions.  We dive into the theory of auctions, and what makes a "good" auction.   

Relevant links:
https://en.wikipedia.org/wiki/English_auction
http://people.ischool.berkeley.edu/~hal/Papers/2006/position.pdf
http://www.benedelman.org/publications/gsp-060801.pdf </description>
      <enclosure length="18236428" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/243171056-linear-digressions-going-once-going-twice-auctions-part-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/242498218</guid>
      <title>Chernoff Faces and Minard Maps</title>
      <pubDate>Mon, 18 Jan 2016 03:38:33 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/chernoff-faces-and-minard-maps</link>
      <itunes:duration>00:15:11</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>A data visualization extravaganza in this episode, as we discuss Chernoff faces (you: "faces? huh?" us: "oh just you wait") and the greatest data visualization of all time, or at least the Napoleonic era.

Relevant links:
http://lya.fciencias.unam.mx/rfuentes/faces-chernoff.pdf
https://en.wikipedia.org/wiki/Charles_Joseph_Minard</itunes:summary>
      <itunes:subtitle>A data visualization extravaganza in this episode…</itunes:subtitle>
      <description>A data visualization extravaganza in this episode, as we discuss Chernoff faces (you: "faces? huh?" us: "oh just you wait") and the greatest data visualization of all time, or at least the Napoleonic era.

Relevant links:
http://lya.fciencias.unam.mx/rfuentes/faces-chernoff.pdf
https://en.wikipedia.org/wiki/Charles_Joseph_Minard</description>
      <enclosure length="21866403" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/242498218-linear-digressions-chernoff-faces-and-minard-maps.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/242037264</guid>
      <title>t-SNE: Reduce Your Dimensions, Keep Your Clusters</title>
      <pubDate>Fri, 15 Jan 2016 04:05:49 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/t-sne-reduce-your-dimensions-keep-your-clusters</link>
      <itunes:duration>00:16:55</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Ever tried to visualize a cluster of data points in 40 dimensions?  Or even 4, for that matter?  We prefer to stick to 2, or maybe 3 if we're feeling well-caffeinated.  The t-SNE algorithm is one of the best tools on the market for doing dimensionality reduction when you have clustering in mind.

Relevant links:
https://www.youtube.com/watch?v=RJVL80Gg3lA</itunes:summary>
      <itunes:subtitle>Ever tried to visualize a cluster of data points …</itunes:subtitle>
      <description>Ever tried to visualize a cluster of data points in 40 dimensions?  Or even 4, for that matter?  We prefer to stick to 2, or maybe 3 if we're feeling well-caffeinated.  The t-SNE algorithm is one of the best tools on the market for doing dimensionality reduction when you have clustering in mind.

Relevant links:
https://www.youtube.com/watch?v=RJVL80Gg3lA</description>
      <enclosure length="24358485" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/242037264-linear-digressions-t-sne-reduce-your-dimensions-keep-your-clusters.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/241376793</guid>
      <title>The [Expletive Deleted] Problem</title>
      <pubDate>Mon, 11 Jan 2016 04:23:53 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-expletive-deleted-problem</link>
      <itunes:duration>00:09:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The town of [expletive deleted], England, is responsible for the clbuttic [expletive deleted] problem.  This week on Linear Digressions: we try really hard not to swear too much.

Related links:
https://en.wikipedia.org/wiki/Scunthorpe_problem
https://www.washingtonpost.com/news/worldviews/wp/2016/01/05/where-is-russia-actually-mordor-in-the-world-of-google-translate/</itunes:summary>
      <itunes:subtitle>The town of [expletive deleted], England, is resp…</itunes:subtitle>
      <description>The town of [expletive deleted], England, is responsible for the clbuttic [expletive deleted] problem.  This week on Linear Digressions: we try really hard not to swear too much.

Related links:
https://en.wikipedia.org/wiki/Scunthorpe_problem
https://www.washingtonpost.com/news/worldviews/wp/2016/01/05/where-is-russia-actually-mordor-in-the-world-of-google-translate/</description>
      <enclosure length="14276055" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/241376793-linear-digressions-the-expletive-deleted-problem.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/240917031</guid>
      <title>Unlabeled Supervised Learning--whaaa?</title>
      <pubDate>Fri, 08 Jan 2016 03:26:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/unlabeled-supervised-learning-whaaa</link>
      <itunes:duration>00:12:35</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In order to do supervised learning, you need a labeled training dataset.  Or do you...?

Relevant links:
http://www.cs.columbia.edu/~dplewis/candidacy/goldman00enhancing.pdf</itunes:summary>
      <itunes:subtitle>In order to do supervised learning, you need a la…</itunes:subtitle>
      <description>In order to do supervised learning, you need a labeled training dataset.  Or do you...?

Relevant links:
http://www.cs.columbia.edu/~dplewis/candidacy/goldman00enhancing.pdf</description>
      <enclosure length="18125459" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/240917031-linear-digressions-unlabeled-supervised-learning-whaaa.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/240444897</guid>
      <title>Hacking Neural Nets</title>
      <pubDate>Tue, 05 Jan 2016 02:56:18 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/hacking-neural-nets</link>
      <itunes:duration>00:15:28</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Machine learning: it can be fooled, just like you or me.  Here's one of our favorite examples, a study into hacking neural networks.

Relevant links:
http://arxiv.org/pdf/1412.1897v4.pdf</itunes:summary>
      <itunes:subtitle>Machine learning: it can be fooled, just like you…</itunes:subtitle>
      <description>Machine learning: it can be fooled, just like you or me.  Here's one of our favorite examples, a study into hacking neural networks.

Relevant links:
http://arxiv.org/pdf/1412.1897v4.pdf</description>
      <enclosure length="22272033" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/240444897-linear-digressions-hacking-neural-nets.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/239861862</guid>
      <title>Zipf's Law</title>
      <pubDate>Thu, 31 Dec 2015 18:08:17 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/zipfs-law</link>
      <itunes:duration>00:11:43</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Zipf's law is related to the statistics of how word usage is distributed.  As it turns out, this is also strikingly reminiscent of how income is distributed, and populations of cities, and bug reports in software, as well as tons of other phenomena that we all interact with every day.

Relevant links:
http://economix.blogs.nytimes.com/2010/04/20/a-tale-of-many-cities/
http://arxiv.org/pdf/cond-mat/0412004.pdf
https://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/</itunes:summary>
      <itunes:subtitle>Zipf's law is related to the statistics of how wo…</itunes:subtitle>
      <description>Zipf's law is related to the statistics of how word usage is distributed.  As it turns out, this is also strikingly reminiscent of how income is distributed, and populations of cities, and bug reports in software, as well as tons of other phenomena that we all interact with every day.

Relevant links:
http://economix.blogs.nytimes.com/2010/04/20/a-tale-of-many-cities/
http://arxiv.org/pdf/cond-mat/0412004.pdf
https://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/</description>
      <enclosure length="16880986" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/239861862-linear-digressions-zipfs-law.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/239700354</guid>
      <title>Indie Announcement</title>
      <pubDate>Wed, 30 Dec 2015 15:57:02 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/indie-announcement</link>
      <itunes:duration>00:01:19</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>We've gone indie!  Which shouldn't change anything about the podcast that you know and love, but we're super excited to keep bringing you Linear Digressions as a fully independent podcast.

Some links mentioned in the show:
https://twitter.com/lindigressions
https://twitter.com/benjaffe
https://twitter.com/multiarmbandit
https://soundcloud.com/linear-digressions
http://lineardigressions.com/</itunes:summary>
      <itunes:subtitle>We've gone indie!  Which shouldn't change anythin…</itunes:subtitle>
      <description>We've gone indie!  Which shouldn't change anything about the podcast that you know and love, but we're super excited to keep bringing you Linear Digressions as a fully independent podcast.

Some links mentioned in the show:
https://twitter.com/lindigressions
https://twitter.com/benjaffe
https://twitter.com/multiarmbandit
https://soundcloud.com/linear-digressions
http://lineardigressions.com/</description>
      <enclosure length="1912823" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/239700354-linear-digressions-indie-announcement.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/239285070</guid>
      <title>Portrait Beauty</title>
      <pubDate>Sun, 27 Dec 2015 13:34:44 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/portrait-beauty</link>
      <itunes:duration>00:11:44</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>It's Da Vinci meets Skynet: what makes a portrait beautiful, according to a machine learning algorithm.  Snap a selfie and give us a listen.</itunes:summary>
      <itunes:subtitle>It's Da Vinci meets Skynet: what makes a portrait…</itunes:subtitle>
      <description>It's Da Vinci meets Skynet: what makes a portrait beautiful, according to a machine learning algorithm.  Snap a selfie and give us a listen.</description>
      <enclosure length="16912333" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/239285070-linear-digressions-portrait-beauty.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/238051584</guid>
      <title>The Cocktail Party Problem</title>
      <pubDate>Fri, 18 Dec 2015 00:17:31 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-cocktail-party-problem</link>
      <itunes:duration>00:12:04</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Grab a cocktail, put on your favorite karaoke track, and let’s talk some more about disentangling audio data!</itunes:summary>
      <itunes:subtitle>Grab a cocktail, put on your favorite karaoke tra…</itunes:subtitle>
      <description>Grab a cocktail, put on your favorite karaoke track, and let’s talk some more about disentangling audio data!</description>
      <enclosure length="23167510" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/238051584-linear-digressions-the-cocktail-party-problem.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/235997508</guid>
      <title>A Criminally Short Introduction to Semi Supervised Learning</title>
      <pubDate>Fri, 04 Dec 2015 03:13:55 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/a-criminally-short-introduction-to-semi-supervised-learning</link>
      <itunes:duration>00:09:12</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Because there are more interesting problems than there are labeled datasets, semi-supervised learning provides a framework for getting feedback from the environment as a proxy for labels of what's "correct."  Of all the machine learning methodologies, it might also be the closest to how humans usually learn--we go through the world, getting (noisy) feedback on the choices we make and learn from the outcomes of our actions.  </itunes:summary>
      <itunes:subtitle>Because there are more interesting problems than …</itunes:subtitle>
      <description>Because there are more interesting problems than there are labeled datasets, semi-supervised learning provides a framework for getting feedback from the environment as a proxy for labels of what's "correct."  Of all the machine learning methodologies, it might also be the closest to how humans usually learn--we go through the world, getting (noisy) feedback on the choices we make and learn from the outcomes of our actions.  </description>
      <enclosure length="13264803" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/235997508-linear-digressions-a-criminally-short-introduction-to-semi-supervised-learning.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/234982280</guid>
      <title>Thresholdout: Down with Overfitting</title>
      <pubDate>Fri, 27 Nov 2015 17:55:04 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/thresholdout-down-with-overfitting</link>
      <itunes:duration>00:15:52</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Overfitting to your training data can be avoided by evaluating your machine learning algorithm on a holdout test dataset, but what about overfitting to the test data?  Turns out it can be done, easily, and you have to be very careful to avoid it.  But an algorithm from the field of privacy research shows promise for keeping your test data safe from accidental overfitting</itunes:summary>
      <itunes:subtitle>Overfitting to your training data can be avoided …</itunes:subtitle>
      <description>Overfitting to your training data can be avoided by evaluating your machine learning algorithm on a holdout test dataset, but what about overfitting to the test data?  Turns out it can be done, easily, and you have to be very careful to avoid it.  But an algorithm from the field of privacy research shows promise for keeping your test data safe from accidental overfitting</description>
      <enclosure length="22852578" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/234982280-linear-digressions-thresholdout-down-with-overfitting.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/232380447</guid>
      <title>The State of Data Science</title>
      <pubDate>Tue, 10 Nov 2015 04:36:40 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-state-of-data-science</link>
      <itunes:duration>00:15:40</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>How many data scientists are there, where do they live, where do they work, what kind of tools do they use, and how do they describe themselves?  RJMetrics wanted to know the answers to these questions, so they decided to find out and share their analysis with the world.  In this very special interview episode, we welcome Tristan Handy, VP of Marketing at RJMetrics, who will talk about "The State of Data Science Report."</itunes:summary>
      <itunes:subtitle>How many data scientists are there, where do they…</itunes:subtitle>
      <description>How many data scientists are there, where do they live, where do they work, what kind of tools do they use, and how do they describe themselves?  RJMetrics wanted to know the answers to these questions, so they decided to find out and share their analysis with the world.  In this very special interview episode, we welcome Tristan Handy, VP of Marketing at RJMetrics, who will talk about "The State of Data Science Report."</description>
      <enclosure length="22577352" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/232380447-linear-digressions-the-state-of-data-science.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/231777419</guid>
      <title>Data Science for Making the World a Better Place</title>
      <pubDate>Fri, 06 Nov 2015 03:43:25 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/data-science-for-making-the-world-a-better-place</link>
      <itunes:duration>00:09:31</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>There's a good chance that great data science is going on close to you, and that it's going toward making your city, state, country, and planet a better place.  Not all the data science questions being tackled out there are about finding the sleekest new algorithm or billion-dollar company idea--there's a whole world of social data science that just wants to make the world a better place to live in.</itunes:summary>
      <itunes:subtitle>There's a good chance that great data science is …</itunes:subtitle>
      <description>There's a good chance that great data science is going on close to you, and that it's going toward making your city, state, country, and planet a better place.  Not all the data science questions being tackled out there are about finding the sleekest new algorithm or billion-dollar company idea--there's a whole world of social data science that just wants to make the world a better place to live in.</description>
      <enclosure length="13718707" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/231777419-linear-digressions-data-science-for-making-the-world-a-better-place.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/230576907</guid>
      <title>Kalman Runners</title>
      <pubDate>Thu, 29 Oct 2015 03:10:02 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/kalman-runners</link>
      <itunes:duration>00:14:42</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The Kalman Filter is an algorithm for taking noisy measurements of dynamic systems and using them to get a better idea of the underlying dynamics than you could get from a simple extrapolation.  If you've ever run a marathon, or been a nuclear missile, you probably know all about these challenges already.  By the way, we neglected to mention in the episode: Katie's marathon time was 3:54:27!</itunes:summary>
      <itunes:subtitle>The Kalman Filter is an algorithm for taking nois…</itunes:subtitle>
      <description>The Kalman Filter is an algorithm for taking noisy measurements of dynamic systems and using them to get a better idea of the underlying dynamics than you could get from a simple extrapolation.  If you've ever run a marathon, or been a nuclear missile, you probably know all about these challenges already.  By the way, we neglected to mention in the episode: Katie's marathon time was 3:54:27!</description>
      <enclosure length="21185548" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/230576907-linear-digressions-kalman-runners.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/229669968</guid>
      <title>Neural Net Inception</title>
      <pubDate>Fri, 23 Oct 2015 02:25:48 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/neural-net-inception</link>
      <itunes:duration>00:15:19</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>When you sleep, the neural pathways in your brain take the "white noise" of your resting brain, mix in your experiences and imagination, and the result is dreams (that is a highly unscientific explanation, but you get the idea).  What happens when neural nets are put through the same process?  Train a neural net to recognize pictures, and then send through an image of white noise, and it will start to see some weird (but cool!) stuff.</itunes:summary>
      <itunes:subtitle>When you sleep, the neural pathways in your brain…</itunes:subtitle>
      <description>When you sleep, the neural pathways in your brain take the "white noise" of your resting brain, mix in your experiences and imagination, and the result is dreams (that is a highly unscientific explanation, but you get the idea).  What happens when neural nets are put through the same process?  Train a neural net to recognize pictures, and then send through an image of white noise, and it will start to see some weird (but cool!) stuff.</description>
      <enclosure length="22063889" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/229669968-linear-digressions-neural-net-inception.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/228617192</guid>
      <title>Benford's Law</title>
      <pubDate>Fri, 16 Oct 2015 03:30:43 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/benford-produced</link>
      <itunes:duration>00:17:42</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Sometimes numbers are... weird.  Benford's Law is a favorite example of this for us--it's a law that governs the distribution of the first digit in certain types of numbers.  As it turns out, if you're looking up the length of a river, the population of a country, the price of a stock... not all first digits are created equal.</itunes:summary>
      <itunes:subtitle>Sometimes numbers are... weird.  Benford's Law is…</itunes:subtitle>
      <description>Sometimes numbers are... weird.  Benford's Law is a favorite example of this for us--it's a law that governs the distribution of the first digit in certain types of numbers.  As it turns out, if you're looking up the length of a river, the population of a country, the price of a stock... not all first digits are created equal.</description>
      <enclosure length="25502021" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/228617192-linear-digressions-benford-produced.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/227272985</guid>
      <title>Guinness</title>
      <pubDate>Wed, 07 Oct 2015 03:30:33 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/guinness</link>
      <itunes:duration>00:14:43</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Not to oversell it, but the student's t-test has got to have the most interesting history of any statistical test.  Which is saying a lot, right?  Add some boozy statistical trivia to your arsenal in this epsiode.</itunes:summary>
      <itunes:subtitle>Not to oversell it, but the student's t-test has …</itunes:subtitle>
      <description>Not to oversell it, but the student's t-test has got to have the most interesting history of any statistical test.  Which is saying a lot, right?  Add some boozy statistical trivia to your arsenal in this epsiode.</description>
      <enclosure length="21206237" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/227272985-linear-digressions-guinness.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/221965091</guid>
      <title>PFun with P Values</title>
      <pubDate>Wed, 02 Sep 2015 03:24:36 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/pfun-with-p-values</link>
      <itunes:duration>00:17:07</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Doing some science, and want to know if you might have found something?  Or maybe you've just accomplished the scientific equivalent of going fishing and reeling in an old boot?  Frequentist p-values can help you distinguish between "eh" and "oooh interesting".  Also, there's a lot of physics in this episode, nerds.</itunes:summary>
      <itunes:subtitle>Doing some science, and want to know if you might…</itunes:subtitle>
      <description>Doing some science, and want to know if you might have found something?  Or maybe you've just accomplished the scientific equivalent of going fishing and reeling in an old boot?  Frequentist p-values can help you distinguish between "eh" and "oooh interesting".  Also, there's a lot of physics in this episode, nerds.</description>
      <enclosure length="24660669" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/221965091-linear-digressions-pfun-with-p-values.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/220755207</guid>
      <title>Watson</title>
      <pubDate>Tue, 25 Aug 2015 02:26:20 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/watson</link>
      <itunes:duration>00:15:36</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This machine learning algorithm beat the human champions at Jeopardy.  What is... Watson?</itunes:summary>
      <itunes:subtitle>This machine learning algorithm beat the human ch…</itunes:subtitle>
      <description>This machine learning algorithm beat the human champions at Jeopardy.  What is... Watson?</description>
      <enclosure length="22480803" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/220755207-linear-digressions-watson.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/219708024</guid>
      <title>Bayesian Psychics</title>
      <pubDate>Tue, 18 Aug 2015 00:05:04 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/bayesian-psychics</link>
      <itunes:duration>00:11:44</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Come get a little "out there" with us this week, as we use a meta-study of extrasensory perception (or ESP, often used in the same sentence as "psychics") to chat about Bayesian vs. frequentist statistics.</itunes:summary>
      <itunes:subtitle>Come get a little "out there" with us this week, …</itunes:subtitle>
      <description>Come get a little "out there" with us this week, as we use a meta-study of extrasensory perception (or ESP, often used in the same sentence as "psychics") to chat about Bayesian vs. frequentist statistics.</description>
      <enclosure length="16897913" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/219708024-linear-digressions-bayesian-psychics.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/218255240</guid>
      <title>Troll Detection</title>
      <pubDate>Fri, 07 Aug 2015 20:56:36 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/troll-detection</link>
      <itunes:duration>00:12:57</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Ever found yourself wasting time reading online comments from trolls?  Of course you have; we've all been there (it's 4 AM but I can't turn off the computer and go to sleep--someone on the internet is WRONG!).  Now there's a way to use machine learning to automatically detect trolls, and minimize the impact when they try to derail online conversations.</itunes:summary>
      <itunes:subtitle>Ever found yourself wasting time reading online c…</itunes:subtitle>
      <description>Ever found yourself wasting time reading online comments from trolls?  Of course you have; we've all been there (it's 4 AM but I can't turn off the computer and go to sleep--someone on the internet is WRONG!).  Now there's a way to use machine learning to automatically detect trolls, and minimize the impact when they try to derail online conversations.</description>
      <enclosure length="18648953" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/218255240-linear-digressions-troll-detection.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/217536447</guid>
      <title>Yiddish Translation</title>
      <pubDate>Mon, 03 Aug 2015 03:06:39 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/yiddish-translation</link>
      <itunes:duration>00:12:15</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Imagine a language that is mostly spoken rather than written, contains many words in other languages, and has relatively little written overlap with English.  Now imagine writing a machine-learning-based translation system that can convert that language to English.  That's the problem that confronted researchers when they set out to automatically translate between Yiddish and English; the tricks they used help us understand a lot about machine translation.</itunes:summary>
      <itunes:subtitle>Imagine a language that is mostly spoken rather t…</itunes:subtitle>
      <description>Imagine a language that is mostly spoken rather than written, contains many words in other languages, and has relatively little written overlap with English.  Now imagine writing a machine-learning-based translation system that can convert that language to English.  That's the problem that confronted researchers when they set out to automatically translate between Yiddish and English; the tricks they used help us understand a lot about machine translation.</description>
      <enclosure length="17657763" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/217536447-linear-digressions-yiddish-translation.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/213562084</guid>
      <title>Modeling Particles in Atomic Bombs</title>
      <pubDate>Mon, 06 Jul 2015 23:30:15 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/modeling-particles-in-atomic-bombs</link>
      <itunes:duration>00:15:38</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In a fun historical journey, Katie and Ben explore the history of the Manhattan Project, discuss the difficulties in modeling particle movement in atomic bombs with only punch-card computers and ingenuity, and eventually come to present-day uses of the Metropolis-Hastings algorithm... mentioning Solitaire along the way.</itunes:summary>
      <itunes:subtitle>In a fun historical journey, Katie and Ben explor…</itunes:subtitle>
      <description>In a fun historical journey, Katie and Ben explore the history of the Manhattan Project, discuss the difficulties in modeling particle movement in atomic bombs with only punch-card computers and ingenuity, and eventually come to present-day uses of the Metropolis-Hastings algorithm... mentioning Solitaire along the way.</description>
      <enclosure length="22532212" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/213562084-linear-digressions-modeling-particles-in-atomic-bombs.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/211091633</guid>
      <title>Random Number Generation</title>
      <pubDate>Fri, 19 Jun 2015 18:49:55 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/random-number-generation</link>
      <itunes:duration>00:10:26</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Let's talk about randomness! Although randomness is pervasive throughout the natural world, it's surprisingly difficult to generate random numbers. And even if your numbers look random (but actually aren't), it can have interesting consequences on the security of systems, and the accuracy of models and research. 

In this episode, Katie and Ben talk about randomness, its place in machine learning and computation in general, along with some random digressions of their own.</itunes:summary>
      <itunes:subtitle>Let's talk about randomness! Although randomness …</itunes:subtitle>
      <description>Let's talk about randomness! Although randomness is pervasive throughout the natural world, it's surprisingly difficult to generate random numbers. And even if your numbers look random (but actually aren't), it can have interesting consequences on the security of systems, and the accuracy of models and research. 

In this episode, Katie and Ben talk about randomness, its place in machine learning and computation in general, along with some random digressions of their own.</description>
      <enclosure length="15025874" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/211091633-linear-digressions-random-number-generation.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/209462126</guid>
      <title>Electoral Insights (Part 2)</title>
      <pubDate>Tue, 09 Jun 2015 02:46:17 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/electoral-insights-part-2</link>
      <itunes:duration>00:21:18</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Following up on our last episode about how experiments can be performed in political science, now we explore a high-profile case of an experiment gone wrong. 

An extremely high-profile paper that was published in 2014, about how talking to people can convince them to change their minds on topics like abortion and gay marriage, has been exposed as the likely product of a fraudulently produced dataset. We’ll talk about a cool data science tool called the Kolmogorov-Smirnov test, which a pair of graduate students used to reverse-engineer the likely way that the fraudulent data was generated. 

But a bigger question still remains—what does this whole episode tell us about fraud and oversight in science?</itunes:summary>
      <itunes:subtitle>Following up on our last episode about how experi…</itunes:subtitle>
      <description>Following up on our last episode about how experiments can be performed in political science, now we explore a high-profile case of an experiment gone wrong. 

An extremely high-profile paper that was published in 2014, about how talking to people can convince them to change their minds on topics like abortion and gay marriage, has been exposed as the likely product of a fraudulently produced dataset. We’ll talk about a cool data science tool called the Kolmogorov-Smirnov test, which a pair of graduate students used to reverse-engineer the likely way that the fraudulent data was generated. 

But a bigger question still remains—what does this whole episode tell us about fraud and oversight in science?</description>
      <enclosure length="30675520" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/209462126-linear-digressions-electoral-insights-part-2.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/208987996</guid>
      <title>Electoral Insights (Part 1)</title>
      <pubDate>Fri, 05 Jun 2015 20:38:00 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/electoral-insights-part-1</link>
      <itunes:duration>00:09:17</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>The first of our two-parter discussing the recent electoral data fraud case. The results of the study in question were covered widely, including by This American Life (who later had to issue a retraction).

Data science for election research involves studying voters, who are people, and people are tricky to study—every one of them is different, and the same treatment can have different effects on different voters.  But with randomized controlled trials, small variations from person to person can even out when you look at a larger group.  With the advent of randomized experiments in elections a few decades ago, a whole new door was opened for studying the most effective ways to campaign.</itunes:summary>
      <itunes:subtitle>The first of our two-parter discussing the recent…</itunes:subtitle>
      <description>The first of our two-parter discussing the recent electoral data fraud case. The results of the study in question were covered widely, including by This American Life (who later had to issue a retraction).

Data science for election research involves studying voters, who are people, and people are tricky to study—every one of them is different, and the same treatment can have different effects on different voters.  But with randomized controlled trials, small variations from person to person can even out when you look at a larger group.  With the advent of randomized experiments in elections a few decades ago, a whole new door was opened for studying the most effective ways to campaign.</description>
      <enclosure length="13387683" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/208987996-linear-digressions-electoral-insights-part-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/208330505</guid>
      <title>Falsifying Data</title>
      <pubDate>Mon, 01 Jun 2015 21:04:10 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/falsifying-data</link>
      <itunes:duration>00:17:46</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In the first of a few episodes on fraud in election research, we’ll take a look at a case study from a previous Presidential election, where polling results were faked.  

What are some telltale signs that data fraud might be present in a dataset?  We’ll explore that in this episode.</itunes:summary>
      <itunes:subtitle>In the first of a few episodes on fraud in electi…</itunes:subtitle>
      <description>In the first of a few episodes on fraud in election research, we’ll take a look at a case study from a previous Presidential election, where polling results were faked.  

What are some telltale signs that data fraud might be present in a dataset?  We’ll explore that in this episode.</description>
      <enclosure length="25600451" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/208330505-linear-digressions-falsifying-data.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/206453182</guid>
      <title>Reporter Bot</title>
      <pubDate>Wed, 20 May 2015 23:16:18 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/reporter-bot</link>
      <itunes:duration>00:11:15</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>There’s a big difference between a table of numbers or statistics, and the underlying story that a human might tell about how those numbers were generated. 

Think about a baseball game—the game stats and a newspaper story are describing the same thing, but one is a good input for a machine learning algorithm and the other is a good story to read over your morning coffee. Data science and machine learning are starting to bridge this gap, taking the raw data on things like baseball games, financial scenarios, etc. and automatically writing human-readable stories that are increasingly indistinguishable from what a human would write. 

In this episode, we’ll talk about some examples of auto-generated content—you’ll be amazed at how sophisticated some of these reporter-bots can be. By the way, this summary was written by a human. (Or was it?)</itunes:summary>
      <itunes:subtitle>There’s a big difference between a table of numbe…</itunes:subtitle>
      <description>There’s a big difference between a table of numbers or statistics, and the underlying story that a human might tell about how those numbers were generated. 

Think about a baseball game—the game stats and a newspaper story are describing the same thing, but one is a good input for a machine learning algorithm and the other is a good story to read over your morning coffee. Data science and machine learning are starting to bridge this gap, taking the raw data on things like baseball games, financial scenarios, etc. and automatically writing human-readable stories that are increasingly indistinguishable from what a human would write. 

In this episode, we’ll talk about some examples of auto-generated content—you’ll be amazed at how sophisticated some of these reporter-bots can be. By the way, this summary was written by a human. (Or was it?)</description>
      <enclosure length="16220193" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/206453182-linear-digressions-reporter-bot.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/205711835</guid>
      <title>Careers in Data Science</title>
      <pubDate>Sat, 16 May 2015 05:43:44 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/careers-in-data-science</link>
      <itunes:duration>00:16:35</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Let’s talk money. As a “hot” career right now, data science can pay pretty well. But for an individual person matched with a specific job or industry, how much should someone expect to make? 

Since Katie was on the job market lately, this was something she’s been researching, and it turns out that data science itself (in particular linear regressions) has some answers. 

In this episode, we go through a survey of hundreds of data scientists, who report on their job duties, industry, skills, education, location, etc. along with their salaries, and then talk about how this data was fed into a linear regression so that you (yes, you!) can use the patterns in the data to know what kind of salary any particular kind of data scientist might expect.</itunes:summary>
      <itunes:subtitle>Let’s talk money. As a “hot” career right now, da…</itunes:subtitle>
      <description>Let’s talk money. As a “hot” career right now, data science can pay pretty well. But for an individual person matched with a specific job or industry, how much should someone expect to make? 

Since Katie was on the job market lately, this was something she’s been researching, and it turns out that data science itself (in particular linear regressions) has some answers. 

In this episode, we go through a survey of hundreds of data scientists, who report on their job duties, industry, skills, education, location, etc. along with their salaries, and then talk about how this data was fed into a linear regression so that you (yes, you!) can use the patterns in the data to know what kind of salary any particular kind of data scientist might expect.</description>
      <enclosure length="23897685" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/205711835-linear-digressions-careers-in-data-science.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/205472927</guid>
      <title>That's "Dr Katie" to You</title>
      <pubDate>Thu, 14 May 2015 17:37:48 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/thats-dr-katie-to-you</link>
      <itunes:duration>00:03:01</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Katie successfully defended her thesis! We celebrate her return, and talk a bit about what getting a PhD in Physics is like.</itunes:summary>
      <itunes:subtitle>Katie successfully defended her thesis! We celebr…</itunes:subtitle>
      <description>Katie successfully defended her thesis! We celebrate her return, and talk a bit about what getting a PhD in Physics is like.</description>
      <enclosure length="4356630" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/205472927-linear-digressions-thats-dr-katie-to-you.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/204955129</guid>
      <title>Neural Nets (Part 2)</title>
      <pubDate>Mon, 11 May 2015 14:37:51 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/neural-nets-part-2</link>
      <itunes:duration>00:10:55</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In the last episode, we zipped through neural nets and got a quick idea of how they work and why they can be so powerful. Here’s the real payoff of that work:

In this episode, we’ll talk about a brand-new pair of results, one from Stanford and one from Google, that use neural nets to perform automated picture captioning. One neural net does the object and relationship recognition of the image, a second neural net handles the natural language processing required to express that in an English sentence, and when you put them together you get an automated captioning tool. Two heads are better than one indeed...</itunes:summary>
      <itunes:subtitle>In the last episode, we zipped through neural net…</itunes:subtitle>
      <description>In the last episode, we zipped through neural nets and got a quick idea of how they work and why they can be so powerful. Here’s the real payoff of that work:

In this episode, we’ll talk about a brand-new pair of results, one from Stanford and one from Google, that use neural nets to perform automated picture captioning. One neural net does the object and relationship recognition of the image, a second neural net handles the natural language processing required to express that in an English sentence, and when you put them together you get an automated captioning tool. Two heads are better than one indeed...</description>
      <enclosure length="15734315" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/204955129-linear-digressions-neural-nets-part-2.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/203476719</guid>
      <title>Neural Nets (Part 1)</title>
      <pubDate>Fri, 01 May 2015 18:59:28 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/neural-nets-part-1</link>
      <itunes:duration>00:09:00</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>There is no known learning algorithm that is more flexible and powerful than the human brain. That's quite inspirational, if you think about it--to level up machine learning, maybe we should be going back to biology and letting millions of year of evolution guide the structure of our algorithms. 

This is the idea behind neural nets, which mock up the structure of the brain and are some of the most studied and powerful algorithms out there. In this episode, we’ll lay out the building blocks of the neural net (called neurons, naturally) and the networks that are built out of them. 

We’ll also explore the results that neural nets get when used to do object recognition in photographs.</itunes:summary>
      <itunes:subtitle>There is no known learning algorithm that is more…</itunes:subtitle>
      <description>There is no known learning algorithm that is more flexible and powerful than the human brain. That's quite inspirational, if you think about it--to level up machine learning, maybe we should be going back to biology and letting millions of year of evolution guide the structure of our algorithms. 

This is the idea behind neural nets, which mock up the structure of the brain and are some of the most studied and powerful algorithms out there. In this episode, we’ll lay out the building blocks of the neural net (called neurons, naturally) and the networks that are built out of them. 

We’ll also explore the results that neural nets get when used to do object recognition in photographs.</description>
      <enclosure length="12968261" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/203476719-linear-digressions-neural-nets-part-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/202973263</guid>
      <title>Inferring Authorship (Part 2)</title>
      <pubDate>Tue, 28 Apr 2015 16:56:24 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/inferring-authorship-part-2</link>
      <itunes:duration>00:14:04</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Now that we’re up to speed on the classic author ID problem (who wrote the unsigned Federalist Papers?), we move onto a couple more contemporary examples. 

First, J.K. Rowling was famously outed using computational linguistics (and Twitter) when she wrote a book under the pseudonym Robert Galbraith. 

Second, we’ll talk about a mystery that still endures--who is Satoshi Nakamoto? Satoshi is the mysterious person (or people) behind an extremely lucrative cryptocurrency (aka internet money) called Bitcoin; no one knows who he, she or they are, but we have plenty of writing samples in the form of whitepapers and Bitcoin forum posts. We’ll discuss some attempts to link Satoshi Nakamoto with a cryptocurrency expert and computer scientist named Nick Szabo; the links are tantalizing, but not a smoking gun. “Who is Satoshi” remains an example of attempted author identification where the threads are tangled, the conclusions inconclusive and the stakes high.</itunes:summary>
      <itunes:subtitle>Now that we’re up to speed on the classic author …</itunes:subtitle>
      <description>Now that we’re up to speed on the classic author ID problem (who wrote the unsigned Federalist Papers?), we move onto a couple more contemporary examples. 

First, J.K. Rowling was famously outed using computational linguistics (and Twitter) when she wrote a book under the pseudonym Robert Galbraith. 

Second, we’ll talk about a mystery that still endures--who is Satoshi Nakamoto? Satoshi is the mysterious person (or people) behind an extremely lucrative cryptocurrency (aka internet money) called Bitcoin; no one knows who he, she or they are, but we have plenty of writing samples in the form of whitepapers and Bitcoin forum posts. We’ll discuss some attempts to link Satoshi Nakamoto with a cryptocurrency expert and computer scientist named Nick Szabo; the links are tantalizing, but not a smoking gun. “Who is Satoshi” remains an example of attempted author identification where the threads are tangled, the conclusions inconclusive and the stakes high.</description>
      <enclosure length="20271471" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/202973263-linear-digressions-inferring-authorship-part-2.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/201113470</guid>
      <title>Inferring Authorship (Part 1)</title>
      <pubDate>Thu, 16 Apr 2015 17:25:21 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/authorship-part-1</link>
      <itunes:duration>00:08:51</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This episode is inspired by one of our projects for Intro to Machine Learning: given a writing sample, can you use machine learning to identify who wrote it? Turns out that the answer is yes, a person’s writing style is as distinctive as their vocal inflection or their gait when they walk. 

By tracing the vocabulary used in a given piece, and comparing the word choices to the word choices in writing samples where we know the author, it can be surprisingly clear who is the more likely author of a given piece of text. 

We’ll use a seminal paper from the 1960’s as our example here, where the Naive Bayes algorithm was used to determine whether Alexander Hamilton or James Madison was the more likely author of a number of anonymous Federalist Papers.</itunes:summary>
      <itunes:subtitle>This episode is inspired by one of our projects f…</itunes:subtitle>
      <description>This episode is inspired by one of our projects for Intro to Machine Learning: given a writing sample, can you use machine learning to identify who wrote it? Turns out that the answer is yes, a person’s writing style is as distinctive as their vocal inflection or their gait when they walk. 

By tracing the vocabulary used in a given piece, and comparing the word choices to the word choices in writing samples where we know the author, it can be surprisingly clear who is the more likely author of a given piece of text. 

We’ll use a seminal paper from the 1960’s as our example here, where the Naive Bayes algorithm was used to determine whether Alexander Hamilton or James Madison was the more likely author of a number of anonymous Federalist Papers.</description>
      <enclosure length="12747579" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/201113470-linear-digressions-authorship-part-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/199550270</guid>
      <title>Statistical Mistakes and the Challenger Disaster</title>
      <pubDate>Mon, 06 Apr 2015 19:36:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/statistical-mistakes-and-the-challenger-disaster</link>
      <itunes:duration>00:13:09</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>After the Challenger exploded in 1986, killing all 7 astronauts aboard, an investigation into the cause was immediately launched. 

In the cold temperatures the night before the launch, the o-rings that seal off the fuel tanks from the rocket boosters became inflexible, so they did not seal properly, which led to the fuel tank explosion. NASA knew that there could be o-ring problems, but performed the analysis of their data incorrectly and ended up massively underestimating the risk associated with the cold temperatures. 

In this episode, we'll unpack the mistakes they made. We'll talk about how they excluded data points that they thought were irrelevant but which actually were critical to recognizing a fatal pattern.</itunes:summary>
      <itunes:subtitle>After the Challenger exploded in 1986, killing al…</itunes:subtitle>
      <description>After the Challenger exploded in 1986, killing all 7 astronauts aboard, an investigation into the cause was immediately launched. 

In the cold temperatures the night before the launch, the o-rings that seal off the fuel tanks from the rocket boosters became inflexible, so they did not seal properly, which led to the fuel tank explosion. NASA knew that there could be o-ring problems, but performed the analysis of their data incorrectly and ended up massively underestimating the risk associated with the cold temperatures. 

In this episode, we'll unpack the mistakes they made. We'll talk about how they excluded data points that they thought were irrelevant but which actually were critical to recognizing a fatal pattern.</description>
      <enclosure length="18947376" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/199550270-linear-digressions-statistical-mistakes-and-the-challenger-disaster.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/197641507</guid>
      <title>Genetics and Um Detection (HMM Part 2)</title>
      <pubDate>Wed, 25 Mar 2015 17:29:32 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/genetics-and-um-detection-part-2</link>
      <itunes:duration>00:14:49</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In part two of our series on Hidden Markov Models (HMMs), we talk to Katie and special guest Francesco about more useful and novel applications of HMMs. We revisit Katie's "Um Detector," and hear about how HMMs are used in genetics research.</itunes:summary>
      <itunes:subtitle>In part two of our series on Hidden Markov Models…</itunes:subtitle>
      <description>In part two of our series on Hidden Markov Models (HMMs), we talk to Katie and special guest Francesco about more useful and novel applications of HMMs. We revisit Katie's "Um Detector," and hear about how HMMs are used in genetics research.</description>
      <enclosure length="21344163" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/197641507-linear-digressions-genetics-and-um-detection-part-2.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/197459594</guid>
      <title>Introducing Hidden Markov Models (HMM Part 1)</title>
      <pubDate>Tue, 24 Mar 2015 15:57:03 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/introducing-hidden-markov-models</link>
      <itunes:duration>00:14:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Wikipedia says, "A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states." What does that even mean?

In part one of a special two-parter on HMMs, Katie, Ben, and special guest Francesco explain the basics of HMMs, and some simple applications of them in the real world. This episode sets the stage for part two, where we explore the use of HMMs in Modern Genetics, and possibly Katie's "Um Detector."</itunes:summary>
      <itunes:subtitle>Wikipedia says, "A hidden Markov model (HMM) is a…</itunes:subtitle>
      <description>Wikipedia says, "A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states." What does that even mean?

In part one of a special two-parter on HMMs, Katie, Ben, and special guest Francesco explain the basics of HMMs, and some simple applications of them in the real world. This episode sets the stage for part two, where we explore the use of HMMs in Modern Genetics, and possibly Katie's "Um Detector."</description>
      <enclosure length="21472059" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/197459594-linear-digressions-introducing-hidden-markov-models.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/195590647</guid>
      <title>Monte Carlo For Physicists</title>
      <pubDate>Thu, 12 Mar 2015 23:18:01 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/monte-carlo-for-physicists</link>
      <itunes:duration>00:08:13</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>This is another physics-centered podcast, about an ML-backed particle identification tool that we use to figure out what kind of particle caused a particular blob in the detector. But in this case, as in many cases, it looks hard at the outset to use ML because we don't have labeled training data. Monte Carlo to the rescue! 

Monte Carlo (MC) is fake data that we generate for ourselves, usually following certain sets of rules (often a Markov chain; in physics we generate MC according to the laws of physics as we understand them) and since you generated the event, you "know" what the correct label is. 

Of course, it's a lot of work to validate your MC, but the payoff is that then you can use Machine Learning where you never could before.</itunes:summary>
      <itunes:subtitle>This is another physics-centered podcast, about a…</itunes:subtitle>
      <description>This is another physics-centered podcast, about an ML-backed particle identification tool that we use to figure out what kind of particle caused a particular blob in the detector. But in this case, as in many cases, it looks hard at the outset to use ML because we don't have labeled training data. Monte Carlo to the rescue! 

Monte Carlo (MC) is fake data that we generate for ourselves, usually following certain sets of rules (often a Markov chain; in physics we generate MC according to the laws of physics as we understand them) and since you generated the event, you "know" what the correct label is. 

Of course, it's a lot of work to validate your MC, but the payoff is that then you can use Machine Learning where you never could before.</description>
      <enclosure length="11842279" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/195590647-linear-digressions-monte-carlo-for-physicists.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/194262842</guid>
      <title>Random Kanye</title>
      <pubDate>Wed, 04 Mar 2015 23:04:45 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/random-kanye</link>
      <itunes:duration>00:08:44</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Ever feel like you could randomly assemble words from a certain vocabulary and make semi-coherent Kanye West lyrics? Or technical documentation, imitations of local newscasters, your politically outspoken uncle, etc.? Wonder no more, there's a way to do this exact type of thing: it's called a Markov Chain, and probably the most powerful way to generate made-up data that you can then use for fun and profit. The idea behind a Markov Chain is that you probabilistically generate a sequence of steps, numbers, words, etc. where each next step/number/word depends only on the previous one, which makes it fast and efficient to computationally generate. Usually Markov Chains are used for serious academic uses, but this ain't one of them: here they're used to randomly generate rap lyrics based on Kanye West lyrics.</itunes:summary>
      <itunes:subtitle>Ever feel like you could randomly assemble words …</itunes:subtitle>
      <description>Ever feel like you could randomly assemble words from a certain vocabulary and make semi-coherent Kanye West lyrics? Or technical documentation, imitations of local newscasters, your politically outspoken uncle, etc.? Wonder no more, there's a way to do this exact type of thing: it's called a Markov Chain, and probably the most powerful way to generate made-up data that you can then use for fun and profit. The idea behind a Markov Chain is that you probabilistically generate a sequence of steps, numbers, words, etc. where each next step/number/word depends only on the previous one, which makes it fast and efficient to computationally generate. Usually Markov Chains are used for serious academic uses, but this ain't one of them: here they're used to randomly generate rap lyrics based on Kanye West lyrics.</description>
      <enclosure length="12581440" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/194262842-linear-digressions-random-kanye.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/192998182</guid>
      <title>Lie Detectors</title>
      <pubDate>Wed, 25 Feb 2015 18:20:51 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/lie-detectors</link>
      <itunes:duration>00:09:17</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Often machine learning discussions center around algorithms, or features, or datasets--this one centers around interpretation, and ethics. 

Suppose you could use a technology like fMRI to see what regions of a person's brain are active when they ask questions. And also suppose that you could run trials where you watch their brain activity while they lie about some minor issue (say, whether the card in their hand is a spade or a club)--could you use machine learning to analyze those images, and use the patterns in them for lie detection? Well you certainly can try, and indeed researchers have done just that. 

There are important problems though--the images of brains can be high variance, meaning that for any given person, there might not be a lot of certainty about whether they're lying or not. It's also open to debate whether the training set (in this case, test subjects with playing cards in their hands) really generalize well to the more important cases, like a person accused of a crime. 

So while machine learning has yielded some impressive gains in lie detection, it is not a solution to these thornier scientific issues.

http://www.amacad.org/pdfs/deceit.pdf</itunes:summary>
      <itunes:subtitle>Often machine learning discussions center around …</itunes:subtitle>
      <description>Often machine learning discussions center around algorithms, or features, or datasets--this one centers around interpretation, and ethics. 

Suppose you could use a technology like fMRI to see what regions of a person's brain are active when they ask questions. And also suppose that you could run trials where you watch their brain activity while they lie about some minor issue (say, whether the card in their hand is a spade or a club)--could you use machine learning to analyze those images, and use the patterns in them for lie detection? Well you certainly can try, and indeed researchers have done just that. 

There are important problems though--the images of brains can be high variance, meaning that for any given person, there might not be a lot of certainty about whether they're lying or not. It's also open to debate whether the training set (in this case, test subjects with playing cards in their hands) really generalize well to the more important cases, like a person accused of a crime. 

So while machine learning has yielded some impressive gains in lie detection, it is not a solution to these thornier scientific issues.

http://www.amacad.org/pdfs/deceit.pdf</description>
      <enclosure length="13385175" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/192998182-linear-digressions-lie-detectors.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/191672068</guid>
      <title>The Enron Dataset</title>
      <pubDate>Mon, 09 Feb 2015 00:00:00 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/the-enron-dataset</link>
      <itunes:duration>00:12:27</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In 2000, Enron was one of the largest and companies in the world, praised far and wide for its innovations in energy distribution and many other markets.  By 2002, it was apparent that many bad apples had been cooking the books, and billions of dollars and thousands of jobs disappeared.  

In the aftermath, surprisingly, one of the greatest datasets in all of machine learning was born--the Enron emails corpus.  Hundreds of thousands of emails amongst top executives were made public; there's no realistic chance any dataset like this will ever be made public again.  

But the dataset that was released has gone on to immortality, serving as the basis for a huge variety of advances in machine learning and other fields. 

http://www.technologyreview.com/news/515801/the-immortal-life-of-the-enron-e-mails/</itunes:summary>
      <itunes:subtitle>In 2000, Enron was one of the largest and compani…</itunes:subtitle>
      <description>In 2000, Enron was one of the largest and companies in the world, praised far and wide for its innovations in energy distribution and many other markets.  By 2002, it was apparent that many bad apples had been cooking the books, and billions of dollars and thousands of jobs disappeared.  

In the aftermath, surprisingly, one of the greatest datasets in all of machine learning was born--the Enron emails corpus.  Hundreds of thousands of emails amongst top executives were made public; there's no realistic chance any dataset like this will ever be made public again.  

But the dataset that was released has gone on to immortality, serving as the basis for a huge variety of advances in machine learning and other fields. 

http://www.technologyreview.com/news/515801/the-immortal-life-of-the-enron-e-mails/</description>
      <enclosure length="17944901" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/191672068-linear-digressions-the-enron-dataset.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/189385510</guid>
      <title>Labels and Where To Find Them</title>
      <pubDate>Wed, 04 Feb 2015 02:30:47 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/labels-and-where-to-find-them</link>
      <itunes:duration>00:13:15</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Supervised classification is built on the backs of labeled datasets, but a good set of labels can be hard to find.  Great data is everywhere, but the corresponding labels can sometimes be really tricky.  Take a few examples we've already covered, like lie detection with an MRI machine (have to take pictures of someone's brain while they try to lie, not a trivial task) or automated image captioning (so many images!  so many valid labels!)  

In this epsiode, we'll dig into this topic in depth, talking about some of the standard ways to get a labeled dataset if your project requires labels and you don't already have them.

www.higgshunters.org</itunes:summary>
      <itunes:subtitle>Supervised classification is built on the backs o…</itunes:subtitle>
      <description>Supervised classification is built on the backs of labeled datasets, but a good set of labels can be hard to find.  Great data is everywhere, but the corresponding labels can sometimes be really tricky.  Take a few examples we've already covered, like lie detection with an MRI machine (have to take pictures of someone's brain while they try to lie, not a trivial task) or automated image captioning (so many images!  so many valid labels!)  

In this epsiode, we'll dig into this topic in depth, talking about some of the standard ways to get a labeled dataset if your project requires labels and you don't already have them.

www.higgshunters.org</description>
      <enclosure length="19099095" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/189385510-linear-digressions-labels-and-where-to-find-them.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/187530821</guid>
      <title>Um Detector 1</title>
      <pubDate>Fri, 23 Jan 2015 20:16:12 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/um-detector-1</link>
      <itunes:duration>00:13:19</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>So, um... what about machine learning for audio applications?  In the course of starting this podcast, we've edited out a lot of "um"'s from our raw audio files.  It's gotten now to the point that, when we see the waveform in soundstudio, we can almost identify an "um" by eye.  Which makes it an interesting problem for machine learning--is there a way we can train an algorithm to recognize the "um" pattern, too?  This has become a little side project for Katie, which is very much still a work in progress.  We'll talk about what's been accomplished so far, some design choices Katie made in getting the project off the ground, and (of course) mistakes made and hopefully corrected.  We always say that the best way to learn something is by doing it, and this is our chance to try our own machine learning project instead of just telling you about what someone else did! </itunes:summary>
      <itunes:subtitle>So, um... what about machine learning for audio a…</itunes:subtitle>
      <description>So, um... what about machine learning for audio applications?  In the course of starting this podcast, we've edited out a lot of "um"'s from our raw audio files.  It's gotten now to the point that, when we see the waveform in soundstudio, we can almost identify an "um" by eye.  Which makes it an interesting problem for machine learning--is there a way we can train an algorithm to recognize the "um" pattern, too?  This has become a little side project for Katie, which is very much still a work in progress.  We'll talk about what's been accomplished so far, some design choices Katie made in getting the project off the ground, and (of course) mistakes made and hopefully corrected.  We always say that the best way to learn something is by doing it, and this is our chance to try our own machine learning project instead of just telling you about what someone else did! </description>
      <enclosure length="19189375" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/187530821-linear-digressions-um-detector-1.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/184829398</guid>
      <title>Better Facial Recognition with Fisherfaces</title>
      <pubDate>Wed, 07 Jan 2015 01:33:50 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/better-facial-recognition-with-fisherfaces</link>
      <itunes:duration>00:11:56</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Now that we know about eigenfaces (if you don't, listen to the previous episode), let's talk about how it breaks down. 

Variations that are trivial to humans when identifying faces can really mess up computer-driven facial ID--expressions, lighting, and angle are a few. Something that can easily happen is an algorithm can optimize to identify one of those traits, rather than the underlying trait of whether the person is the same (for example, if the training image is me smiling, you may reject an image of me frowning but accidentally approve an image of another woman smiling). 

Fisherfaces uses a fisher linear discriminant to distinguish based on the dimension in the data that shows the smallest inter-class distance, rather than maximizing the variation overall (we'll unpack this statement), and it is much more robust than our pal eigenfaces when there's shadows, cut-off images, expressions, etc.

http://www.cs.columbia.edu/~belhumeur/journal/fisherface-pami97.pdf</itunes:summary>
      <itunes:subtitle>Now that we know about eigenfaces (if you don't, …</itunes:subtitle>
      <description>Now that we know about eigenfaces (if you don't, listen to the previous episode), let's talk about how it breaks down. 

Variations that are trivial to humans when identifying faces can really mess up computer-driven facial ID--expressions, lighting, and angle are a few. Something that can easily happen is an algorithm can optimize to identify one of those traits, rather than the underlying trait of whether the person is the same (for example, if the training image is me smiling, you may reject an image of me frowning but accidentally approve an image of another woman smiling). 

Fisherfaces uses a fisher linear discriminant to distinguish based on the dimension in the data that shows the smallest inter-class distance, rather than maximizing the variation overall (we'll unpack this statement), and it is much more robust than our pal eigenfaces when there's shadows, cut-off images, expressions, etc.

http://www.cs.columbia.edu/~belhumeur/journal/fisherface-pami97.pdf</description>
      <enclosure length="17193828" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/184829398-linear-digressions-better-facial-recognition-with-fisherfaces.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/184829050</guid>
      <title>Facial Recognition with Eigenfaces</title>
      <pubDate>Wed, 07 Jan 2015 01:30:40 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/facial-recognition-with-eigenfaces</link>
      <itunes:duration>00:10:01</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>A true classic topic in ML: Facial recognition is very high-dimensional, meaning that each picture can have millions of pixels, each of which can be a single feature. It's computationally expensive to deal with all these features, and invites overfitting problems. PCA (principal components analysis) is a classic dimensionality reduction tool that compresses these many dimensions into the few that contain the most variation in the data, and those principal components are often then fed into a classic ML algorithm like and SVM. 

One of the best thing about eigenfaces is the great example code that you can find in sklearn--you can distinguish pictures of world leaders yourself in just a few minutes!

http://scikit-learn.org/stable/auto_examples/applications/face_recognition.html</itunes:summary>
      <itunes:subtitle>A true classic topic in ML: Facial recognition is…</itunes:subtitle>
      <description>A true classic topic in ML: Facial recognition is very high-dimensional, meaning that each picture can have millions of pixels, each of which can be a single feature. It's computationally expensive to deal with all these features, and invites overfitting problems. PCA (principal components analysis) is a classic dimensionality reduction tool that compresses these many dimensions into the few that contain the most variation in the data, and those principal components are often then fed into a classic ML algorithm like and SVM. 

One of the best thing about eigenfaces is the great example code that you can find in sklearn--you can distinguish pictures of world leaders yourself in just a few minutes!

http://scikit-learn.org/stable/auto_examples/applications/face_recognition.html</description>
      <enclosure length="14435298" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/184829050-linear-digressions-facial-recognition-with-eigenfaces.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/181859732</guid>
      <title>Stats of World Series Streaks</title>
      <pubDate>Wed, 17 Dec 2014 00:41:39 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/stats-of-world-series-streaks</link>
      <itunes:duration>00:12:34</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Baseball is characterized by a high level of equality between teams; even the best teams might only have 55% win percentages (contrast this with college football, where teams go undefeated pretty regularly). In this regime, where 2 outcomes (Giants win/Giants lose) are approximately equally likely, we can model the win/loss chances with a binomial distribution. 

Using the binomial distribution, we can calculate an interesting little result: what's the chance of the world series going to only 4 games? 5? 6? All the way to 7? Then we can compare to decades' worth of world series data, to see how well the data follows the binomial assumption. 

The result tells us a lot about sports psychology--if each game is independent of the others, 4/5/6/7 game series are equally likely. The data shows a different trend: 4 and 7 game series are significantly more likely than 5 or 6. There's a powerful psychological effect at play--everybody loves the 7th game of the world series, or a good sweep. And it turns out that the baseball teams, whether they intend it or not, oblige our love of short (4) and long (7) world series!

http://blog.philbirnbaum.com/2007/06/winning-world-series-in-x-games.html</itunes:summary>
      <itunes:subtitle>Baseball is characterized by a high level of equa…</itunes:subtitle>
      <description>Baseball is characterized by a high level of equality between teams; even the best teams might only have 55% win percentages (contrast this with college football, where teams go undefeated pretty regularly). In this regime, where 2 outcomes (Giants win/Giants lose) are approximately equally likely, we can model the win/loss chances with a binomial distribution. 

Using the binomial distribution, we can calculate an interesting little result: what's the chance of the world series going to only 4 games? 5? 6? All the way to 7? Then we can compare to decades' worth of world series data, to see how well the data follows the binomial assumption. 

The result tells us a lot about sports psychology--if each game is independent of the others, 4/5/6/7 game series are equally likely. The data shows a different trend: 4 and 7 game series are significantly more likely than 5 or 6. There's a powerful psychological effect at play--everybody loves the 7th game of the world series, or a good sweep. And it turns out that the baseball teams, whether they intend it or not, oblige our love of short (4) and long (7) world series!

http://blog.philbirnbaum.com/2007/06/winning-world-series-in-x-games.html</description>
      <enclosure length="12077765" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/181859732-linear-digressions-stats-of-world-series-streaks.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/178774107</guid>
      <title>Computers Try to Tell Jokes</title>
      <pubDate>Wed, 26 Nov 2014 18:59:56 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/linear-digressions-computers-try-to-tell-jokes</link>
      <itunes:duration>00:09:08</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Computers are capable of many impressive feats, but making you laugh is usually not one of them. Or could it be? This episode will talk about a custom-built machine learning algorithm that searches through text and writes jokes based on what it finds. 

The jokes are formulaic: they're all of the form "I like my X like I like my Y: Z" where X and Y are nouns, and Z is an adjective that can describe both X and Y. For (dumb) example, "I like my men like I like my coffee: steaming hot." The joke is funny when ZX and ZY are both very common phrases, but X and Y are rarely seen together. 

So, given a large enough corpus of text, the algorithm looks for triplets of words that fit this description and writes jokes based on them. Are the jokes funny? You be the judge...

http://homepages.inf.ed.ac.uk/s0894589/petrovic13unsupervised.pdf</itunes:summary>
      <itunes:subtitle>Computers are capable of many impressive feats, b…</itunes:subtitle>
      <description>Computers are capable of many impressive feats, but making you laugh is usually not one of them. Or could it be? This episode will talk about a custom-built machine learning algorithm that searches through text and writes jokes based on what it finds. 

The jokes are formulaic: they're all of the form "I like my X like I like my Y: Z" where X and Y are nouns, and Z is an adjective that can describe both X and Y. For (dumb) example, "I like my men like I like my coffee: steaming hot." The joke is funny when ZX and ZY are both very common phrases, but X and Y are rarely seen together. 

So, given a large enough corpus of text, the algorithm looks for triplets of words that fit this description and writes jokes based on them. Are the jokes funny? You be the judge...

http://homepages.inf.ed.ac.uk/s0894589/petrovic13unsupervised.pdf</description>
      <enclosure length="13173270" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/178774107-linear-digressions-linear-digressions-computers-try-to-tell-jokes.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/avatars-Qt8RJQAJnYlM5ez0-rSl9qw-original.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/173279159</guid>
      <title>How Outliers Helped Defeat Cholera</title>
      <pubDate>Sat, 22 Nov 2014 00:00:00 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/linear-digressions-choleric-outliers</link>
      <itunes:duration>00:10:54</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>In the 1850s, there were a lot of things we didn’t know yet: how to create an airplane, how to split an atom, or how to control the spread of a common but deadly disease: cholera.  

When a cholera outbreak in London killed scores of people, a doctor named John Snow used it as a chance to study whether the cause might be very small organisms that were spreading through the water supply (the prevailing theory at the time was miasma, or “bad air”).  By tracing the geography of all the deaths from the outbreak, Snow was practicing elementary data science--and stumbled upon one of history’s most famous outliers.  

In this episode, we’ll tell you more about this single data point, a case of cholera that cracked the case wide open for Snow and provided critical validation for the germ theory of disease.

http://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak</itunes:summary>
      <itunes:subtitle>In the 1850s, there were a lot of things we didn’…</itunes:subtitle>
      <description>In the 1850s, there were a lot of things we didn’t know yet: how to create an airplane, how to split an atom, or how to control the spread of a common but deadly disease: cholera.  

When a cholera outbreak in London killed scores of people, a doctor named John Snow used it as a chance to study whether the cause might be very small organisms that were spreading through the water supply (the prevailing theory at the time was miasma, or “bad air”).  By tracing the geography of all the deaths from the outbreak, Snow was practicing elementary data science--and stumbled upon one of history’s most famous outliers.  

In this episode, we’ll tell you more about this single data point, a case of cholera that cracked the case wide open for Snow and provided critical validation for the germ theory of disease.

http://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak</description>
      <enclosure length="15708611" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/173279159-linear-digressions-linear-digressions-choleric-outliers.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/artworks-000094738684-8zjfvu-t3000x3000.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item><item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/173277531</guid>
      <title>Hunting for the Higgs</title>
      <pubDate>Sun, 16 Nov 2014 00:00:00 +0000</pubDate>
      <link>https://soundcloud.com/linear-digressions/linear-digressions-hunting-for-the-higgs</link>
      <itunes:duration>00:10:16</itunes:duration>
      <itunes:author>Katie Malone</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:summary>Machine learning and particle physics go together like peanut butter and jelly--but this is a relatively new development.  

For many decades, physicists looked through their fairly large datasets using the laws of physics to guide their exploration; that tradition continues today, but as ever-larger datasets get made, machine learning becomes a more tractable way to deal with the deluge.  

With this in mind, ATLAS (one of the major experiments at CERN, the European Center for Nuclear Research and home laboratory of the recently discovered Higgs boson) ran a machine learning contest over the summer, to see what advances could be found from opening up the dataset to non-physicists.  

The results were impressive--physicists are smart folks, but there’s clearly lots of advances yet to make as machine learning and physics learn from one another.  And who knows--maybe more Nobel prizes to win as well!

https://www.kaggle.com/c/higgs-boson</itunes:summary>
      <itunes:subtitle>Machine learning and particle physics go together…</itunes:subtitle>
      <description>Machine learning and particle physics go together like peanut butter and jelly--but this is a relatively new development.  

For many decades, physicists looked through their fairly large datasets using the laws of physics to guide their exploration; that tradition continues today, but as ever-larger datasets get made, machine learning becomes a more tractable way to deal with the deluge.  

With this in mind, ATLAS (one of the major experiments at CERN, the European Center for Nuclear Research and home laboratory of the recently discovered Higgs boson) ran a machine learning contest over the summer, to see what advances could be found from opening up the dataset to non-physicists.  

The results were impressive--physicists are smart folks, but there’s clearly lots of advances yet to make as machine learning and physics learn from one another.  And who knows--maybe more Nobel prizes to win as well!

https://www.kaggle.com/c/higgs-boson</description>
      <enclosure length="14782622" type="audio/mpeg" url="https://feeds.soundcloud.com/stream/173277531-linear-digressions-linear-digressions-hunting-for-the-higgs.mp3"/>
      <itunes:image href="https://i1.sndcdn.com/artworks-000094739449-0dtgql-t3000x3000.jpg"/>
    <author>hello@lineardigressions.com (Katie Malone)</author><itunes:keywords>data,science,machine,learning,linear,digressions</itunes:keywords></item>
      </channel>
    </rss>