The post Detecting the inconsistency of BS appeared first on MetaSD.

]]>DARPA put out a request for a BS detector for science. I responded with a strategy for combining the results of multiple models (using Mohammad Jalali’s multivariate meta-analysis with some supporting infrastructure like data archiving) to establish whether new findings are consistent with an existing body of knowledge.

DARPA didn’t bite. I have no idea why, but could speculate from the RFC that they had in mind something more like a big data approach that would use text analysis to evaluate claims. Hopefully not, because a text-only approach will have limited power. Here’s why.

First, here’s why DARPA’s idea has merit in general. In math and logic, as soon as you admit one false proposition (1=2), you can leverage that to prove lots of false stuff. As soon as you start detecting conflicting conclusions, you know you’re hot on the trail of some BS. Physics is a little trickier, because things are happening on many different scales, with a multitude of physical quantities to map to math, and sometimes ambiguous measurements, but it still works. In social sciences its still harder, because there isn’t even agreement about the underlying constructs being measured.

For this to work as it does in math, you have to have a reliable way of determining whether claims are compatible with one another. This is seldom binary (“raising the tax rate increases revenue, or not”). You have to control for a variety of influences (What kind of taxes? Was there a war?) and nonmonotonic or state-dependent effects (like the Laffer curve). Text descriptions of models and findings just don’t contain enough information to support that. I recently showed how causal loop diagrams and system archetypes are hopelessly underspecified models. Words are usually even less structured and more ambiguous. So, while there’s lots of potential for using algorithms to discover laws, the fodder for such discovery is big data, not big talk.

Consider the example of back radiation. There’s a cultish corner of the climate denial world that claims there is no greenhouse effect because the atmosphere can’t radiate longwave energy back to the surface. The idea has even made it into the swampy bottom tier of supposedly peer-reviewed journals. Many web pages, like http://nov79.com/gbwm/bacr.html (which I won’t dignify with a link) are dedicated to this idea. There are lots of absurdities on this site that would turn up red flags, even for a pretty dumb robot:

At the claimed global average surface temperature of the earth (15°C), the error in the Stefan-Boltzmann constant appears to be at least a factor of 30, and perhaps 50, as energy flows cannot be balanced properly with such high levels of radiation.

Not the least reason for the error is that Planck’s constant is used to derive the SBC, while there is no Planck’s constant, because the whole concept of photons is absurd and admittedly in conflict with the wave nature of light.

http://nov79.com/gbwm/sbc.html

Any my favorite:

Disequilibrium is an impossibility.

http://nov79.com/gbwm/rte.html

But then, many statements are locally correct. The following is (roughly) true, if you’re thinking of mixing *stocks* of hot and cold gases. It’s just out of context, because the real world is not about that; it’s about energy stocks influenced by *flows *from radiation:

Total carbon dioxide is 400 parts per million in the atmosphere. That means 2,500 air molecules surrounding each CO2 molecule. To heat the air 1°C, each CO2 molecule would have to be 2,500°C

http://nov79.com/gbwm/heatg.html

Debunking such tripe often takes a lot more work and words than the original. To an algorithm that doesn’t have any underlying understanding of the problem space, real and fake science may look equally plausible, just as they do to a human with no subject matter expertise. That’s why you need the actual models and data, not just descriptions of the output of them, in order to perform quality checks, make, test and track predictions, and on the whole exercise the system to actively determine whether components are behaving coherently.

If there’s one thing algorithms might detect easily, its whether a paper uses models and data to make testable predictions at all. That’s an Achilles’ heel of climate skepticism in general, and certainly there is no coherent framework for the alternative reality on the nov79 site.

Even with a system of interlocking models, I think it may be tough to automatically discover anomalies. Consider this “proof” that a pulled ladder falls infinitely fast. Its math is correct. The logical steps to the conclusion are OK. So where’s the fault? It’s rather subtle. (I’ll leave it to you to work out.) Because this is physics, it’s easy to catch a whiff of BS in the notion that a physical object can move infinitely fast. But what if the subject matter was social dynamics? I think algorithms are a long way from having the kind of general intelligence needed to sniff out the problem.

If we had a BS detector, would it be influential? Certainly it would work for many users, but more broadly, I’m not sure. It’s increasingly evident that there’s no selection pressure against certain kinds of wrong beliefs. And if you can’t reject the specifics of a correct idea that you find distasteful, you can always posit a grand conspiracy or reject science and rational empiricism altogether.

The post Detecting the inconsistency of BS appeared first on MetaSD.

]]>The post Quizzaciously++ appeared first on MetaSD.

]]>Google Trends reflects this. The week of the video’s release, there was a huge spike in interest, followed by a rapid decay, not all the way to zero, but to a slow simmer of interest.

The video discusses power laws as a model of memory. So … has the internet remembered the video according to a power law? Not exactly, but it certainly has a hint of one:

My guess is that the trajectory is modified by word-of-mouth processes that create sustained interest.

The post Quizzaciously++ appeared first on MetaSD.

]]>The post Thyroid Dynamics appeared first on MetaSD.

]]>That was a conceptual model; this is a mathematical model. This is a Vensim replication of:

Marisa Eisenberg, Mary Samuels, and Joseph J. DiStefano III

Extensions, Validation, and Clinical Applications of a Feedback Control System Simulator of the Hypothalamo-Pituitary-Thyroid Axis

Background:We upgraded our recent feedback control system (FBCS) simulation model of human thyroid hormone (TH) regulation to include explicit representation of hypothalamic and pituitary dynamics, and up-dated TH distribution and elimination (D&E) parameters. This new model greatly expands the range of clinical and basic science scenarios explorable by computer simulation.

Methods: We quantified the model from pharmacokinetic (PK) and physiological human data and validated it comparatively against several independent clinical data sets. We then explored three contemporary clinical issues with the new model: …

… These results highlight how highly nonlinear feedback in the hypothalamic-pituitary-thyroid axis acts to maintain normal hormone levels, even with severely reduced TSH secretion.

THYROID

Volume 18, Number 10, 2008

DOI: 10.1089=thy.2007.0388

This version is a superset of the authors’ earlier 2006 model, and closely reproduces that with a few parameter changes.

L-T4 Bioequivalence and Hormone Replacement Studies via Feedback Control Simulations

THYROID

Volume 16, Number 12, 2006

The model is used in:

TSH-Based Protocol, Tablet Instability, and Absorption Effects on L-T4 Bioequivalence

THYROID

Volume 19, Number 2, 2009

DOI: 10.1089=thy.2008.0148

This works with any Vensim version:

The post Thyroid Dynamics appeared first on MetaSD.

]]>The post Discrete Time Stinks appeared first on MetaSD.

]]>That’s Samuelson’s multiplier-accelerator model. The same notation is ubiquitous in statistics, economics, ABM and many other areas.

So, what’s the problem?

- Most of the real world does not happen in discrete time. A few decisions, like electric power auctions, happen at regular intervals, but those are the exception. Most of the time we’re modeling on long time scales relative to underlying phenomena, and we have lots of heterogeneous agents or particles or whatever, with diverse delays and decision intervals.
- Discrete time can be artificially unstable. A stable continuous system can be made unstable by simulating at too large a discrete interval. A discrete system may oscillate, where its continuous equivalent would not.
- You can’t easily test for the effect of the time time step on stability. Q: If your discrete time model is running with one Excel row per interval, how will you test an interval that’s 1/2 or 1/12 as big for comparison? A: You won’t. Even if it occurs to you to try, it would be too much of a pain.
- The measurement interval isn’t necessarily the relevant dynamic time scale. Often the time step of a discrete model derives from the measurement interval in the data. There’s nothing magic about that interval, with respect to how the system actually
*works*. - The notions of stocks and flows and system state are obscured. (See the diagram from the Samuelson model above.) Lack of stock flow consistency can lead to other problems, like failure to conserve physical quantities.
- Units are ambiguous. This is a consequence of #5. When states and their rates of change appear on an equal footing in an equation, it’s hard to work out what’s what. Discrete models tend to be littered with implicit time constants and other hidden parameters.
- Most delays aren’t discrete. In the Samuelson model, output depends on last year’s output. But why not last week’s, or last century’s? And why should a delay consist of precisely 3 periods, rather than be distributed over time? (This critique applies to some Delay Differential Equations, too.)
- Most logic isn’t discrete. When time is marching along merrily in discrete lockstep, it’s easy to get suckered into discrete thinking: “if the price of corn is lower than last year’s price of corn, buy hogs.” That might be a good model of one farmer, but it lacks nuance, and surely doesn’t represent the aggregate of diverse farmers. This is not a fault of discrete time per se, but the two often go hand in hand. (This is one of many flaws in the famous Levinthal & March model.)

Certainly, there are cases that require a discrete time simulation (here’s a nice chapter on analysis of such systems). But most of the time, a continuous approach is a better starting point, as Jay Forrester wrote 50 years ago. The best approach is sometimes a hybrid, with an undercurrent of continuous time for the “physics” of the model, but with measurement processes represented by explicit sampling at discrete intervals.

So, what if you find a skanky discrete time model in your analytic sock drawer? Fear not, you can convert it.

Consider the adstock model, representing the cumulative effects of advertising:

Ad Effect = f(Adstock) Adstock(t) = Advertising(t) + k*Adstock(t-1)

Notice that k is related to the lifetime of advertising, but because it’s relative to the discrete interval, it’s misleadingly dimensionless. Also, the interval is fixed at 1 time unit, and can’t be changed without scaling k.

Also notice that the ad effect has an instantaneous component. Usually there’s some delay between ad exposure and action. That delay might be negligible in some cases, like in-app purchases, but it’s typically not negligible for in-store behavior.

You can translate this into Vensim lingo literally by using a discrete delay:

Adstock = Advertising + k*Previous Adstock ~ GRPs Previous Adstock = DELAY FIXED( Adstock, Ad Life, 0 ) ~ GRPs Ad life = ... ~ weeks

That’s functional, but it’s not much of an improvement. Much better is to recognize that Adstock is (surprise!) a stock that changes over time:

Ad Effect = f(Adstock) ~ dimensionless Adstock = INTEG( Advertising - Forgetting, 0 ) ~ GRPs Advertising = ... ~ GRPs/week Forgetting = Adstock / Ad Life ~ GRPs/week Ad Life = ... ~ weeks

Now the ad life has a dimensioned real-world interpretation and you can simulate with whatever time step you need, independent of the parameters (as long as it’s small enough).

There’s one fly in the ointment: the instantaneous ad effect I mentioned above. That happens when, for example, the data interval is weekly, and ads released have some effect within their week of release – the Monday sales flyer drives weekend sales, for example.

There are two solutions for this:

- The “cheat” is to include a bit of the current flow of advertising in the
*effective*adstock, via a “current week effect” parameter. This is a little tricky, because it locks you into the weekly time step. You can generalize that away at the cost of more complexity in the equations. - A more fundamental solution is to run the model at a finer time step than the data interval. This gives you a cleaner model, and you lose nothing with respect to calibration (in Vensim/Ventity at least).

Occasionally, you’ll run into more than one delayed state on the right side of the equation, as with the inclusion of Y(t-1) and Y(t-2) in the Samuelson model (top). That generally signals either a delay with a complex structure (e.g., 2nd or higher order), or some other higher-order effect. Generally, you should be able to give a name and interpretation to these states (as with the construction of Y and C in the Samuelson model). If you can’t, don’t pull your hair out. It could be that the original is ill-formulated. Instead, think things through from scratch with stocks and flows in mind.

The post Discrete Time Stinks appeared first on MetaSD.

]]>The post The Dynamics of Initiative Success appeared first on MetaSD.

]]>A Simulation-Based Approach to Understanding the Dynamics of Innovation Implementation

The history of management practice is filled with innovations that failed to live up to the promise suggested by their early success. A paradox currently facing organizational theory is that the failure of these innovations often cannot be attributed to an intrinsic lack of efficacy. To resolve this paradox, in this paper I study the process of innovation implementation. Working from existing theoretical frameworks, I synthesize a model that describes the process through which participants in an organization develop commitment to using a newly adopted innovation. I then translate that framework into a formal model and analyze it using computer simulation. The analysis suggests three new constructs—reversion, regeneration, and the motivation threshold—characterizing the dynamics of implementation. Taken together, the constructs provide an internally consistent theory of how seemingly rational decision rules can create the apparent paradox of innovations that generate early results but fail to produce sustained benefit.

An earlier version is online here.

This is another nice example of tipping points. In this case, an initiative must demonstrate enough early success to grow its support base. If it succeeds, word of mouth takes its commitment level to 100%. If not, the positive feedbacks run as vicious cycles, and the initiative fails.

When initiatives compete for scarce resources, this creates a success to the successful dynamic, in which an an initiative that demonstrates early success attracts more support, grows commitment faster, and thereby demonstrates more success.

This version is in Ventity, in order to make it easier to handle multiple competing initiatives, with each as a discrete entity. One initialization dataset for the model creates initiatives at random intervals, with success contingent on the environment (other initiatives) prevailing at the time of launch:

This archive contains two versions of the model: “Intervention2” is the first in the paper, with no resource competition. “Intervention5” is the second, with multiple competing initiatives.

The post The Dynamics of Initiative Success appeared first on MetaSD.

]]>The post The CO2 record is no surprise appeared first on MetaSD.

]]>Here’s the data. The CO2 concentration at Mauna Loa has increased steadily since records began in 1958. Superimposed on the trend is a seasonal oscillation, which you can remove by a moving average over a monthly window (red):

In a noiseless system driven by increasing, you’d expect every year to be a concentration record, and that’s nearly true here. Almost 99% of 12-month intervals exceed all previous records.

If you look at the year-on-year difference in monthly concentrations, you can see that not only is the concentration rising, but the rate of increase is increasing as well:

This first difference is noisier, but consistent. As a natural consequence, you’d expect a typical point to be higher than any average of the interval preceding.

In other words, a record concentration coinciding with a record increase is not unusual, dynamically or statistically. Until emissions decline significantly, news outlets might as well post a standing item to this effect.

The CO2 concentration trajectory is, incidentially, closer to parabolic than to exponential. That’s because emissions have risen more or less linearly in recent decades,

*CO2 emissions, GtC/yr*

CO2 concentration (roughly) integrates emissions, so if emissions = c1*time, concentration = c2*time^2 is expected. The cause for concern here is that a peak in the rate of increase has occurred at a time with flat emissions for a few years, signalling that saturation of natural sinks may be to blame. I think it’s premature to draw that conclusion, given the level of noise in the system. But sooner or later our luck will run out, so reducing emissions is as important as ever.

After emissions do peak, you’d expect CO2 difference records to become rare. However, for CO2 concentrations to stop setting records requires that emissions fall below natural uptake, which will take longer to achieve.

The post The CO2 record is no surprise appeared first on MetaSD.

]]>The post The Tesla roof is a luxury product appeared first on MetaSD.

]]>When I see $21.85 per square foot for anything associated with a house, “cheap” is not what comes to mind. That’s in the territory for luxury interior surfaces, not bulk materials like roofing. I’m reminded of the old saw in energy economics (I think from the EMF meetings in Aspen) that *above 7000 feet, the concept of discount rates evaporates.*

So, what are the numbers, really?

The source for all the excitement is frequently this Consumer Reports article – really an offhand blog post. A follow up does the math again using Tesla’s own calculator. None of these sources has any concept of discounting or opportunity costs.

What if you do believe in time value of money? There are lots of complexities, like how you finance a $65,500 capital outlay, but let’s set that aside and consider a cash purchase of a new roof, with the savings expected to generate a reasonable return.

Then there are four issues:

- What’s a reasonable return? (Let’s say 5%)
- Will the stream appreciate as utility rates escalate? (Let’s assume not for simplicity. If everyone goes solar, grid rates might actually fall.)
- Are there costs, like maintenance and insurance, associated with the investment? (Let’s assume 0 for optimistic simplicity.)
- How long will the stream last? Houses turn over, so there are really several issues here: durability of the system (30 years), time you expect to own it (7 years is typical), and the discount subsequent buyers will apply (presumably large due to widespread ignorance and difficulty of financing – say 50%).

Let’s do my mom’s house in California (an advantageous location for solar). Tesla calculates:

So, the Tesla roof has a $29,400 advantage over a regular roof. However, the value of energy here is undiscounted – it’s a simple accumulation of $136/month over 30 years*12 months.

Here’s the discounted version, with my assumptions above:

*Discounted value of the flow of avoided energy costs over time.*

*Present value of avoided costs cumulated over 30 years. Note that, with discounting, life extension beyond 30 years doesn’t have much impact, because the discounted contribution at 30 years is already small.*

The bottom line is that if you expect an ROI, the present value of the stream of avoided electricity costs is about half as big ($25,400), or a third as big ($17,600) if you sell your house and buyers aren’t sophisticated. That makes the bottom line, net of a conventional roof, somewhere between $5700 and -1800. Note that any advantage is entirely contingent on the Federal tax credit.

*Other Issues*

- Equity. This isn’t Tesla’s problem; it’s our problem. Does it make sense for the public to pay 30% of the cost of an ultra-premium roof, when much of the area will be non-generating tiles that serve exclusively as roofing? How much of that tax credit will ever accrue to the bottom income quartile?
- Emissions. Rather than subsidizing generation, we really ought to be pricing emissions, so that the energy cost calculations above reflect the public value of avoiding climate change and local health impacts of air quality. That term will vary a lot regionally.
- Baseline. The Tesla roof mingles roofing and generating. So, the point of comparison should be a conventional roof with a conventional PV panel system or separate green power purchase.
- Other opportunities. Efficiency generally beats renewable production, so one should be looking at demand reductions: insulation, windows, etc. These aren’t as sexy but do a lot for comfort as well as energy savings. If you don’t look at these items first, you might be sizing your PV system for unneeded load, which puts you at risk for future expiration of net metering provisions.

*Should you get one?*

Sure. If you need a roof, and have the capital, it looks like a darn nice option, with some distinct benefits beyond the power generated. You’d certainly be doing the global carbon cycle a favor, particularly if you’d otherwise spend the $ on air travel. Pocket your tax credits and rejoice in a small victory over the system. But then write to your representatives and ask them for a big fat carbon tax. That will really motivate a low-carbon economy, and increase the value of your roof!

The post The Tesla roof is a luxury product appeared first on MetaSD.

]]>The post Prediction, in context appeared first on MetaSD.

]]>The hospital compiles a big dataset on patient demographics, health status, exposure to procedures, and infection outcomes. A vendor slurps this up and turns some algorithm loose on the data, seeking the risk factors associated with the infection. It might look like this:

… except that there might be 200 predictors, not six – more than you can handle by eyeballing scatter plots or control charts. Once you have a risk model, you know which patients to target for mitigation, and maybe also which associated factors to pursue further.

However, this is only half the battle. Systems thinkers will recognize this model as a dead buffalo: a laundry list with unidirectional causality. The real situation is rich in feedback, including a lot of things that probably don’t get measured, and therefore don’t end up in the data for consideration by the algorithm. For example:

Infections aren’t just a random event for the patient; they happen for reasons that are larger than the patient. Even worse, there are positive feedbacks that can make prevention of infections, and errors more generally, hard to manage. For example, as the number of patients with infections rises, workload goes up, which creates time pressure and fatigue. That induces shortcuts and errors that create risk for patients, leading to more infections. Infections spread to other patients. Fatigued staff burn out and turn over faster, which dilutes the staff experience that might otherwise mitigate risk. (Experience, like many other dynamics, is not shown above.)

An algorithm that predicts risk in this context is certainly useful, because anything that reduces risk helps to diminish the gain of the vicious cycles. But it’s no longer so clear what to do with the patient assessments. Time spent on staff education and action for risk mitigation has to come from somewhere, and therefore might have unintended consequences that aren’t assessed by the algorithm. The algorithm is actually blind in two ways: it can’t respond to any input (like staff fatigue or skill) that isn’t in the data, and it probably isn’t statistically smart enough to deal with the separation of cause and effect in time and space that arises in a feedback system.

Deep learning systems like Alpha Go Zero might learn to deal with dynamics. But so far, high performance requires very large numbers of exemplars for reinforcement learning, and that’s never going to happen in a community hospital dataset. Then again, we humans aren’t too good at managing dynamic complexity either. But until the machines take over, we can build dynamic models to sort these problems out. By taking an endogenous point of view, we can put machine learning in context, refine our understanding of leverage points, and redesign systems for greater performance.

The post Prediction, in context appeared first on MetaSD.

]]>The post Nelson Rules appeared first on MetaSD.

]]>The idea is pretty simple. You have a time series of measurements, normalized to Z-scores, and therefore varying (most of the time) by plus or minus 3 standard deviations. The Nelson Rules provide a way to detect anomalies: drift, oscillation, high or low variance, etc. Rule 1, for example, is just a threshold for outlier detection: it fires whenever a measurement is more than 3 SD from the mean.

In the machine learning context, it seems strange to me to use these heuristics when more powerful tests are available. This is not unlike the problem of deciding whether a random number generator is really random. It’s fairly easy to determine whether it’s producing a uniform distribution of values, but what about cycles or other long-term patterns? I spent a lot of time working on this when we replaced the RNG in Vensim. Many standard tests are available. They’re not all directly applicable, but the thinking is.

In any case, I got curious how the Nelson rules performed in the real world, so I developed a test model.

This feeds a test input (Normally distributed random values, with an optional signal superimposed) into a set of accounting variables that track metrics and compare with the rule thresholds. Some of these are complex.

Rule 4, for example, looks for 14 points with alternating differences. That’s a little tricky to track in Vensim, where we’re normally more interested in continuous time. I tackle that with the following structure:

Difference = Measurement-SMOOTH(Measurement,TIME STEP) ************************************************************** Is Positive=IF THEN ELSE(Difference>0,1,-1) ************************************************************** N Switched=INTEG(IF THEN ELSE(Is Positive>0 :AND: N Switched<0 ,(1-2*N Switched )/TIME STEP ,IF THEN ELSE(Is Positive<0 :AND: N Switched>0 ,(-1-2*N Switched)/TIME STEP ,(Is Positive-N Switched)/TIME STEP)),0) ************************************************************** Rule 4=IF THEN ELSE(ABS(N Switched)>14,1,0) **************************************************************

There’s a trick here. To count alternating differences, we need to know (a) the previous count, and (b) whether the previous difference encountered was positive or negative. Above, *N Switched* stores both pieces of information in a single stock (INTEG). That’s possible because the count is discrete and positive, so we can overload the storage by giving it the sign of the previous difference encountered.

Thus, if the current difference is negative (*Is Positive* < 0) and the previous difference was positive (*N Switched* > 0), we (a) invert the sign of the count by subtracting 2**N Switched,* and (b) augment the count, here by subtracting 1 to make it more negative.

Similar tricks are used elsewhere in the structure.

How does it perform? Surprisingly well. Here’s what happens when the measurement distribution shifts by one standard deviation halfway through the simulation:

There are a few false positives in the first 1000 days, but after the shift, there are many more detections from multiple rules.

The rules are pretty good at detecting a variety of pathologies: increases or decreases in variance, shifts in the mean, trends, and oscillations. The rules also have different false positive rates, which might be OK, as long as they catch nonoverlapping problems, and don’t have big differences in sensitivity as well. (The original article may have more to say about this – I haven’t checked.)

However, I’m pretty sure that I could develop some pathological inputs that would sneak past these rules. By contrast, I’m pretty sure I’d have a hard time sneaking anything past the NIST or Diehard RNG test suites.

If I were designing this from scratch, I’d use machine learning tools more directly – there are lots of tests for distributions, changes, trend breaks, oscillation, etc. that can be used online with a consistent likelihood interpretation and optimal false positive/negative tradeoffs.

Here’s the model:

The post Nelson Rules appeared first on MetaSD.

]]>The post Reforesting Iceland appeared first on MetaSD.

]]>This is an example of forest cover tipping points.

Iceland appears to be stuck in a state in which “no trees” is locally stable. So, the system pushes back when you try to reforest, at least until you can cross into another basin of attraction that’s forested.

Interestingly, in the Hirota et al. data above, a stable treeless state is a product of low precipitation. But Iceland is wet. So, deserts are a multidimensional thing.

The post Reforesting Iceland appeared first on MetaSD.

]]>