The post Young Investigator Special Competition for Time-Sharing Experiment for the Social Sciences appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>

Time-Sharing Experiments for the Social Sciences is Having A Special Competition for Young InvestigatorsTime-sharing Experiments for the Social Sciences (TESS) is an NSF-funded initiative. Investigators propose survey experiments to be fielded using a nationally representative Internet platform via NORC’s AmeriSpeak Panel (see http:/tessexperiments.org for more information). While anyone can submit a proposal to TESS at any time through our regular mechanism, we are having a Special Competition for Young Investigators. Graduate students and individuals who received their PhD in 2016 or after are eligible.

To give some examples of experiments we’ve done: one TESS experiment showed that individuals are more likely to support a business refusing service to a gay couple versus an interracial couple, but were no more supportive of religious reasons for doing so versus nonreligious reasons. Another experiment found that participants were more likely to attribute illnesses of obese patients as due to poor lifestyle choices and of non-obese patients to biological factors, which, in turn, resulted in participants being less sympathetic to overweight patients—especially when patients are female. TESS has also fielded an experiment about whether the opinions of economists influence public opinion on different issues, and the study found that they do on relatively technical issues but not so much otherwise.

The proposals that win our Special Competition will be able to be fielded at up to twice the size of a regular TESS study. We will begin accepting proposals for the Special Competition on January 1, 2019, and the deadline is March 1, 2019. Full details about the competition are available at http://www.tessexperiments.org/yic.html.

The post Young Investigator Special Competition for Time-Sharing Experiment for the Social Sciences appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Ethics in statistical practice and communication: Five recommendations. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>1. Open data and open methods,

2. Be clear about the information that goes into statistical procedures,

3. Create a culture of respect for data,

4. Publication of criticisms,

5. Respect the limitations of statistics.

The post Ethics in statistical practice and communication: Five recommendations. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Predicting spread of flu appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Fitting the Besag, York, and Mollie spatial autoregression model with discrete data appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I am trying to use the Besag, York & Mollie 1991 (BYM) model to study the sociology of crime and space/time plays a vital role. Since many of the variables and parameters are discrete in nature is it possible to develop a BYM that uses an Integer Auto-regressive (INAR) process instead of just an AR process?

I’ve seen INAR(1) modeling, even a spatial INAR or SINAR paper but they seem to be different that the way BYM is specified in the Bayes framework.

Does it even make sense to have a BYM that is INAR? I can think of discrete jumps in independent variables that affect the dependent variable in discrete jumps. (Also, do these models violate convexity requirements often required for statistical computing?)

My reply:

1. To see how to fit this sort of model in a flexible way, see this Stan case study, Spatial Models in Stan: Intrinsic Auto-Regressive Models for Areal Data, from Mitzi Morris.

2. Rather than trying to get cute with your discrete modeling, I’d suggest a simple two-level approach, where you use an underlying continuous model (use whatever space-time process you want, BYM or whatever) and then you can have a discrete data model (for example, negative binomial, that is, overdispersed Poisson) on top of that.

The post Fitting the Besag, York, and Mollie spatial autoregression model with discrete data appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Tour guides in your travels jot down Mementos and Keepsakes from each Tour[i]. Their scribblings, which may at times include details, at other times just a word or two, may be modified through the Tour, and in response to questions from travelers (so please check back). Since these are just mementos, they should not be seen as replacements for the more careful notions given in the journey (i.e., book) itself. Still, you’re apt to flesh out your notes in greater detail, so please share yours (along with errors you’re bound to spot), and we’ll create Meta-Mementos.

**Excursion 1. Tour I: Beyond Probabilism and Performance**

**Notes from Section1.1 **Severity Requirement: Bad Evidence, No Test (BENT)

1.1 Terms (quick looks, to be crystalized as we journey on)

- epistemology: The general area of philosophy that deals with knowledge, evidence, inference, and rationality.
- severity requirement. In its weakest form it supplies a minimal requirement for evidence:

severity requirement (weak): One does not have evidence for a claim if little if anything has been done to rule out ways the claim may be false. If data x agree with a claim C but the method used is practically guaranteed to ﬁnd such agreement, and had little or no capability of ﬁnding ﬂaws with C even if they exist, then we have bad evidence, no test (BENT). - error probabilities of a method: probabilities it leads or would lead to erroneous interpretations of data. (We will formalize this as we proceed.)

error statistical account: one that revolves around the control and assessment of a method’s error probabilities. An inference is qualified by the error probability of the method that led to it.

(This replaces common uses of “frequentist” which actually has many other connotations.)

error statistician: one who uses error statistical methods.

severe testers: a proper subset of error statisticians: those who use error probabilities to assess and control severity. (They may use them for other purposes as well.)

The severe tester also requires reporting what has been poorly probed and inseverely tested,

Error probabilities can, but don’t necessarily, provide assessments of the capability of methods to reveal or avoid mistaken interpretations of data. When they do, they may be used to assess how severely a claim passes a test.

- methodology and meta-methodology: Methods we use to study statistical methods may be called our meta-methodology – it’s one level removed.

We can keep to testing language as part of the meta-language we use to talk about formal statistical methods, where the latter include estimation, exploration, prediction, and data analysis.

There’s a diﬀerence between ﬁnding H poorly tested by data x, and ﬁnding x renders H improbable – in any of the many senses the latter takes on.

H: Isaac knows calculus.

x: results of a coin flipping experiment

Even taking H to be true, data x has done nothing to probe the ways in which H might be false.

5. R.A. Fisher, against isolated statistically significant results (p.4).

[W]e need, not an isolated record, but a reliable method of procedure. In relation to the

test of significance, we may say that a phenomenon is experimentally demonstrable

when we know how to conduct an experiment which will rarely fail to give us

a statistically significant result. (Fisher 1935b/1947, p. 14)

**Notes from section 1.2 of SIST:** How to get beyond the stat wars

6. statistical philosophy (associated with a statistical methodology): core ideas that direct its principles, methods, and interpretations.

two main philosophies about the roles of probability in statistical inference : performance (in the long run) and probabilism.

(i) performance: probability functions to control and assess the relative frequency of erroneous inferences in some long run of applications of the method

(ii) probabilism: probability functions to assign degrees of belief, support, or plausibility to hypotheses. They may be non-comparative (a posterior probability) or comparative (a likelihood ratio or Bayes Factor)

Severe testing introduces a third:

(iii) probativism: probability functions to assess and control a methods’ capability of detecting mistaken inferences, i.e., the severity associated with inferences.

• Performance is a necessary but not a sufficient condition for probativeness.

• Just because an account is touted as having a long-run rationale, it does not mean it lacks a short run rationale, or even one relevant for the particular case at hand.

7. Severity strong (argument from coincidence):

We have evidence for a claim C just to the extent it survives a stringent scrutiny. If C passes a test that was highly capable of finding flaws or discrepancies from C, and yet no or few are found, then the passing result, x, is evidence for C.

lift-off vs drag down

(i) lift-off : an overall inference can be more reliable and precise than its premises individually.

(ii) drag-down: An overall inference is only as reliable/precise as is its weakest premise.

• Lift-off is associated with convergent arguments, drag-down with linked arguments.

• statistics is the science par excellence for demonstrating lift-off!

8. arguing from error: there is evidence an error is absent to the extent that a procedure with a high capability of signaling the error, if and only if it is present, nevertheless detects no error.

Bernouilli (coin tossing) model: we record success or failure, assume a fixed probability of success θ on each trial, and that trials are independent. (P-value in the case of the Lady Tasting tea, pp. 16-17).

Error probabilities can be readily invalidated due to how the data (and hypotheses!) are generated or selected for testing.

9. computed (or nominal) vs actual error probabilities: You may claim it’s very difficult to get such an impressive result due to chance, when in fact it’s very easy to do so, with selective reporting (e.g., your computed P-value can be small, but the actual P-value is high.)

Examples: Peirce and Dr. Playfair (a law is inferred even though half of the cases required Playfair to modify the formula after the fact. ) Texas marksman (shooting prowess inferred from shooting bullets into the side of a barn, and painting a bull’s eye around clusters of bullet holes); Pickrite stock portfolio (Pickrite’s effectiveness at stock picking is inferred based on selecting those on which the “method” did best)

• We appeal to the same statistical reasoning to show the problematic cases as to show genuine arguments from coincidence.

• A key role for statistical inference is to identify ways to spot egregious deceptions and create strong arguments from coincidence.

10. Auditing a P-value (one part) checking if the results due to selective reporting, cherry picking, trying and trying again, or any number of other similar ruses.

• Replicability isn’t enough: Example. observational studies on Hormone Replacement therapy (HRT) reproducibly showed benefits, but had little capacity to unearth biases due to “the healthy women’s syndrome.”

**Souvenir A.[ii] Postcard to Send: the 4 fallacies from the opening of 1.1.**

• We should oust mechanical, recipe-like uses of statistical methods long lampooned,

• But simple significance tests have their uses, and shouldn’t be ousted simply because some people are liable to violate Fisher’s warnings.

• They have the means by which to register formally the fallacies in the postcard list. (Failed statistical assumptions, selection effects alter a test’s error probing capacities).

• Don’t throw out the error control baby with the bad statistics bathwater.

10. severity requirement (weak): If data x agree with a claim C but the method was practically incapable of finding flaws with C even if they exist, then x is poor evidence for C.

severity (strong): If C passes a test that was highly capable of finding flaws or discrepancies from C, and yet no or few are found, then the passing result, x, is an indication of, or evidence for, C.

**Notes from Section 1.3**: The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon

The Bayesian versus frequentist dispute parallels disputes between probabilism and performance.

-Using Bayes’ Theorem doesn’t make you a Bayesian.

-subjective Bayesianism and non-subjective (default) Bayesians

11. Advocates of uniﬁcations are keen to show that (i) default Bayesian methods have good performance in a long series of repetitions – so probabilism may yield performance; or alternatively, (ii) frequentist quantities are similar to Bayesian ones (at least in certain cases) – so performance may yield probabilist numbers. Why is this not bliss? Why are so many from all sides dissatisﬁed?

It had long been assumed that only subjective or personalistic Bayesianism had a shot at providing genuine philosophical foundations, but some Bayesians have come to question whether the widespread use of methods under the Bayesian umbrella, however useful, indicates support for subjective Bayesianism as a foundation.

**Marriages of Convenience? **The current frequentist–Bayesian uniﬁcations are often marriages of convenience;

-some are concerned that methodological conﬂicts are bad for the profession.

-frequentist tribes have not disappeared; scientists still call for error control.

-Frequentists’ incentive to marry: Lacking a suitable epistemic interpretation of error probabilities – significance levels, power, and conﬁdence levels – frequentists are constantly put on the defensive.

**Eclecticism and Ecumenism. **Current-day eclecticisms have a long history – the dabbling in tools from competing statistical tribes has not been thought to pose serious challenges.

**Decoupling .**

**Why Our Journey? To disentangle the jumgle. **Being hesitant to reopen wounds from old battles does not heal them. They show up in the current problems of scientific integrity, irreproducibility, questionable research practices, and in the swirl of methodological reforms and guidelines that spin their way down from journals and reports.

How it occurs: the new stat scrutiny (arising from failures of replication) collects from:

-the earlier social science “significance test controversy”

-the traditional frequentist and Bayesian accounts, and corresponding frequentist-Bayesian wars

-the newer Bayesian–frequentist uniﬁcations (non-subjective, default Bayesianism)

This jungle has never been disentangled.

**Your Tour Guide invites your questions in the comments.**

[i] As these are scribbled down in notebooks through ocean winds, wetlands and insects, do not expect neatness. Please share improvements nd corrections.

[ii] For a free copy of “Statistical Inference as Severe Testing”, send me your conception of Souvenir A, your real souvenir A, or a picture of your real Souvenir A (through Nov 16, 2018).

]]>

The post Stan development in RStudio appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>– Improved, context-aware autocompletion for Stan files and chunks

– A document outline, which allows for easy navigation between Stan code blocks

– Inline diagnostics, which help to find issues while you develop your Stan model

– The ability to interrupt Stan parallel workers launched within the IDE

This is awesome—especially that last feature. Rstudio is my hero.

And don’t forget this: If you don’t currently have Stan on your computer, you can play with this demo version on the web, thanks to RStudio Cloud.

The post Stan development in RStudio appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post David Brooks discovers Red State Blue State Rich State Poor State! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Our political conflict is primarily a rich, white civil war. It’s between privileged progressives and privileged conservatives. You could say that tribalism is the fruit of privilege. People with more stresses in their lives necessarily pay less attention to politics. . . .

I’ve had some differences with Brooks in the past, but when he agrees with me, I’m not gonna complain.

As David Park, Boris Shor, Joe Bafumi, Jeronimo Cortina, and I wrote ten years ago:

The cultural divide of the two Americas looms larger at high incomes . . . A theme throughout this book is that the cultural differences between states—the things that make red and blue America feel like different places—boil down to differences among richer people in these states.

And:

Consistent with Ronald Inglehart’s hypothesis of postmaterialism, survey data show social issues to be more important to higher-income voters. This can be viewed as a matter of psychology and economics, with social policy as a luxury that can be afforded once you have achieved some material comfort. Our results stand in contradiction to the commonly held idea that social issues distract lower-income voters from their natural economic concerns.

It took a few years, but it seems that our ideas have finally become conventional wisdom.

The post David Brooks discovers Red State Blue State Rich State Poor State! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post The AAA tranche of subprime science, revisited appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>We break down the analysis by market segment defined by loan type (Prime, Alt-A, and Subprime). Table 5 shows the results and documents the third fact: the subprime AAA-rated RMBS did particularly well. AAA-rated Subprime Mortgage Backed Securities were the safest securities among the non-agency RMBS market. As of December 2013 the principal-weighted loss rates AAA-rated subprime securities were on average 0.42% [2.2% for Prime AAA same page]. We do not deny that even the seemingly small loss of 0.42% should be considered large for any given AAA security.

Nonetheless, we consider this to be a surprising fact given the conventional narrative for the causes of the financial crisis and its assignment of the considerable blame to the subprime market and its mortgage-backed securities. An example of this narrative is provided by Gelman and Loken (2014):

We have in mind an analogy with the notorious AAA-class bonds created during the mid-2000s that led to the subprime mortgage crisis. Lower-quality mortgages [that is, mortgages with high probability of default and, thus, high uncertainty] were packaged and transformed into financial instruments that were (in retrospect, falsely) characterized as low risk.

OK, our paper wasn’t actually about mortgages; it was about statistics. We were just using mortgages as an example. But if Ospina and Uhlig are correct, we were mistaken in using AAA-rated subprime mortgages as an example of a bad bet. Analogies are tricky things!

**P.S.** Daula adds:

Overall, I think it fits your data collection/measurement theme, and how doing that well can provide novel insights. In that vein, they provide a lot of detail to replicate the results, in case folks disagree. There’s the technical appendix which (p.39) “serves as guide for replication and for understanding the contents of our database” as well as (p.7) a replication kit available from the authors. As to the latter, (p.15) footnote 15 guides the reader to where exactly where to look for the one bit of modeling in the paper (“For a detailed list of the covariates employed, refer to MBS Project/Replication/DefaultsAnalysis/Step7”).

The post The AAA tranche of subprime science, revisited appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Toward better measurement in K-12 education research appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Buchanan writes:

I don’t think that every classroom educator, instructional coach, principal, or central office administrator needs to be an expert on measurement. I do, however, think that if we are training individuals to be researchers (e.g., PhDs) we have a duty to make sure they are able to conduct the best possible research, understand the various caveats and limitations to their studies, and especially understand how measurement – as the foundational component of all research across all disciplines (yes, even qualitative research) – affects the inferences derived from their data. . . .

In essence, if a researcher wants to use an existing vetted measurement tool for research purposes, they should already have access to the technical documentation and can provide the information up front so as our staff reviews the request we can also evaluate whether or not the measurement tool is appropriate for the study. If a researcher wants to use their own measure, we want them to be prepared to provide sufficient information about the validity of their measurement tool so we can ensure that they publish valid results from their study; this also has an added benefit for the researcher by essentially motivating them to generate another paper – or two or three – from their single study.

He provides some links to resources, and then he continues:

I would like to encourage other K-12 school districts to join with us in requiring researchers to do the highest quality research possible. I, for one, at least feel that the students, families, and communities that we serve deserve nothing less than the best and believe if you feel the same that your organization would adopt similar practices. You can find our policies here: Fayette County Public Schools Research Requests. Or if you would like to use our documentation/information and/or contribute to what we currently have, you can submit an issue to our documentation’s GitHub Repository (https://github.com/fcps/research-request).

The post Toward better measurement in K-12 education research appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post He had a sudden cardiac arrest. How does this change the probability that he has a particular genetic condition? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I have a friend with a probability problem I don’t know how to solve. He’s 37 and just keeled over with sudden cardiac arrest, and is trying to figure out how to assess the probability that he has a given condition as his doctors work through his case. He knows I’ve been sharply critical of doctors’ failures to properly assess the Type I/Type II tradeoff, so he reached out to me, but we quickly got into math questions above my pay grade, so I volunteered to ask if you would sketch out the correct statistical approach.

To be clear, he’s an engineer, so he’s not asking you to do the work for him! Just to sketch out in a few words how he might approach information gathering and setting up a problem like “given that you’ve had sudden cardiac arrest, what’s the likelihood that a result on a particular genetic test is a false positive?”

My reply:

I agree that the conditional probability should change, given the knowledge that he had the cardiac arrest. Unfortunately it’s hard for me to be helpful here because there are too many moving parts: of course the probability of the heart attack conditional on having the condition or not, but also the relevance of the genetic test to his health condition. This is the kind of problem that is addressed in the medical decision making literature, but I don’t think I have anything useful to add here, beyond emphasizing that the calculation of any such probability is an intermediate step in this person’s goal of figuring out what he should do next regarding his heart condition.

I’m posting the question here, in case any of you can point to useful materials on this. In addition to the patient’s immediate steps in staying alive and healthy, this is a general statistical issue that has to be coming up in medical testing all the time, in that tests are often done in the context of something that happened to you, so maybe there is some general resource on this topic?

The post He had a sudden cardiac arrest. How does this change the probability that he has a particular genetic condition? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>