Draft of paper about Amish

Mike Weight and I have a draft of a manuscript about responses to personality tests by Amish and non-Amish young men from the same county in Indiana. We have mentioned this material before on the blog. The paper is getting ready to ship out: we are hoping to take advantage of our readers and solicit comments and criticisms and outrage and whatever. Drs. Charlton and Thompson may be especially knowledgeable about this approach along with many of our anonymous cowards.

Our approach is to use published data from a personality test from 1970 to construct an index of “Amishness” that we call the AQ that is analogous to the well-known IQ of cognitive testing. With that, the whole standard machinery of quantitative genetics is immediately available. Whether or not the genetic model is correct or near correct there is a clear and explicit baseline that alternate models should be able to match. For example the difference in mean AQ between young Amish men and their non-Amish neighbors is about 2.8 standard deviations. In the IQ world this would correspond to a group different of 42 points. In the stature world this would correspond to a height difference of about 8 inches.


This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

47 Responses to Draft of paper about Amish

  1. Toad says:

    personality tests by Amish and non-Amish young men from the same county in Indiana.

    Should be “non-Amish young men from Switzerland”. Otherwise it’s just comparing Swiss and Irish or English whatever.

  2. Toad says:

    I think it would be more accurate to compare 2nd or 3rd generation ex-members to control for culture and environment.

  3. Typos:

    The new subpopulations now mate at random within within groups.

    boilinig-off” or defection

    communal aide

    we use a conventional for of data reduction

    but thse models hardly even provide much insight.

    horizonta axis

    Amish young me

    standard deviations if the the population mean

    population growht


    it apparently not popularly used

    The purpose of figure 1 is to convey a picture of global group differences but our real interest is in figure 2 showing in detail individual differences among young rural Indiana men. — “our real interest in figure 2 is showing” sounds more natural to me.

    figure 1 or figure 2 <- Not sure about the style for where you're submitting this, but isn't 'figure' usually capitalized?

    If you want to call the most important dimension the Amish Quotient, then you'd get superior discrimination using LDA (linear discriminant analysis) instead of PCA. That PCA does find a good separation between AM and IN in the first dimension is solid evidence, though, and so it seems worth keeping in. (At the risk of insulting you if you're familiar with both, the difference between them is that PCA is unsupervised and LDA and is supervised. In PCA, you just find how people vary from each other, and hope that the primary dimension separates the Amish group and the non-Amish group; in LDA, you tell the computer which group is Amish and which group is non-Amish, and it tells you the dimension that maximizes the distance between the Amish and non-Amish groups while minimizing the variance within each group.)

    "With 25 years per generation, “Amishness” will increase by a full standard deviation in 10 generations or 250 years." — Won't the std compress each time there's truncation selection, so it will increase by slightly less than a full std?

    • harpend says:

      To Matthew Graves:

      We are certainly grateful for your ability to spot typos and other awkwardness. Nice job.

      You are likely right that ‘figure’ should be ‘Figure’. My view is that as a learning experience Weight should take care of that sort of thing, in LaTeX, purely for his own good.

      We would get superior discrimination with a Fisher discriminant function, but we are not after discrimination, we are trying to find what Steve Hsu calls a lossy data compression into one dimension, for which PCA is appropriate, although there are other approaches like scaling. In fact the discriminant function is nearly the same as the first PC and the hits/misses table is 48 and 2, missing those two red A’s in the swarm of I’s.

      Yes indeed the std compresses each generation, but not very much, surprisingly. There is a kind of rat’s nest in the literature about that issue, and since in the present case all the data are so sloppy that we try to squeeze past the issue.

      Thanks again for the comments, ….

  4. JayMan says:

    Very interesting. Not much I can think of to add, except it would be fascinating to get PCA from different White Americans in different parts of the country. Would Colin Woodard’s American Nations pop out?

    As well, as for the boiling off selection, in addition to the Amish, a similar process seems to have occurred in the Great Plains:

    More Maps of the American Nations | JayMan’s Blog

    The upper Plains appears to be enriched for traits the Left Coast for example lacks.

    • harpend says:

      It sure would be interesting to do a lot more of this but the 16PF is proprietary and the standard way of reporting results normalizes all the answers to the sample, so every factor has a mean of 7 as I understand it (poorly, I admit). This means data across studies cannot be compared.

      Genetics, lately, has a custom or imposed requirement, that data be posted and made available. No such thing in this businss that I can detect.

  5. Steve Sailer says:

    I wouldn’t be surprised if the “English” rural Indiana boys in 1968 didn’t see themselves as more radical and innovative than they would haven’t if they weren’t exposed to their Amish neighbors: “Well, sure, I’m just an average Indiana farm boy, but compared to these Amish guys, I’m practically Jimi Hendrix.”

  6. Ron Pavellas says:

    Not my area of expertise, but nonetheless interesting. Just to show you I actually read through to at least page 4, there’s a typo: “bolinig off”. I wonder if the “16 types” relate in any way to the MBTI 16 types? Or was this in here and I missed it?

  7. marcel says:

    Genetics/evolution definitely not my field of expertise, but I try to be well read…

    Whenever I’ve read about assortative mating, it is generally in discussions of income and wealth inequality (essentially of economic class), and whenever I’ve read about sexual selection, it is related to genetics and evolution. The connection just dawned on me, or rather that either can be seen as the flip side of the other, so long as there are females who can be grouped according to the traits they prefer.* Two questions present themselves:

    1. (Substantive) Is this a reasonable way of thinking about the two mechanisms, and further is it an accurate way of thinking about them?
    2. (Vanity) Is this realization (of the connection) original or, as is more likely, has someone else already pointed to it?

    *My understanding of sexual selection is that agency is attributed to the female, presumably because of the relative costs of egg and sperm.

    • harpend says:

      Yes! You got what is IMHO the main point of the paper. Evolutionary biologists are interested in sexual selection, sociologists and psychologists in income and wealth, and no on seems to put them together.

      Your question 1: I think it is. Selection is selection, period. There are numerous ways of talking about it but they boil down to the same thing. Another example is all the hot air generated arguing about whether or not group selection is the same as kin selection or not. They are different ways of writing the same equations.

      Your question 2: I have not seen anyone make the explicit connection.

      Thanks, Henry

  8. Thanks. Reading the paper, and will post on it as soon as I can.

  9. Nick Rowe says:

    Typo? Page 2: “That figure has remained about the same as the number of such loci available to has has grown from dozens to hundreds of thousands.”

  10. melendwyr says:

    Lots of typos, I’m afraid, with the ones I noticed being already pointed out here.
    – I’m not qualified to critique most of Dr. Thompson’s analysis. I would like to note that the Amish aren’t necessarily being sorted for old-fashioned civility. If they’re being sorted for conformity to social standards, and those standards emphasize civility, you’ll get an increase in civility; if they emphasized independence and rudeness instead, there might well be an increase in those as well.
    – Is there any good way to empirically distinguish between changes in conformity to society’s standards and changes in the underlying impulses towards types of behavior?

    • harpend says:

      I am not sure that we could define very clearly the difference between you two alternatives. For example Chagnon describes selection for being good at violence, loud-mouthed, and so on. Aren’t those “society’s standards” or are they something else?

      • melendwyr says:

        I’ve heard ‘character’ defined as what you are when no one is watching. There is a clear conceptual distinction between behaviors adopted to meet the expectations of others and behaviors made to satisfy direct preferences. It’s how to tell them apart practically that concerns me at the moment.
        For example, I know a person who genuinely dislikes the flavor of pork and wouldn’t eat it even if you offered a modest reward (or threatened a modest punishment). I also know someone who believes pork would be delicious but refrains because it’s against the behavioral code of their religion. Both refuse to eat pork, but the motivation of the first is primary, while the second is secondary motivation; the first goes along with their desire, while the second has two competing desires that must be chosen between.
        I doubt very much that the Amish do not use buttons because they have personal aversions to them, although I’m sure that koumpounophobia could be selected for.

        • harpend says:

          “koumpounophobia”. Ten point blog penalty to you for using that word. Aside from that outrage, very useful points you make.

          thanks, HCH

        • annoynamouse says:

          Amish use buttons; always have.

          • melendwyr says:

            The more conservative Amish do not. Many communities have ‘lightened up’. Yes, they’re so liberal that they allow buttons.

          • annoynamouse says:

            No buttons for the Amish? People who get their information from Amish “reality” shows or Twitter perpetuate this stuff. Amish have always used and continue to use buttons. Buttons are not allowed on suits, but pants, shirts, suspenders, etcetera have always have buttons. From Ohio: Booking photos from beard-cutting; this video from the same community shows universal use of buttons; and a detail shows the buttons on shirt but hook and eye on suit. A more direct statement on this is from Brenda Miller whose “family and all of our neighbors are Amish.” Research and observation will only find more of the same.

            The issue is not the lowly button or the silliness of koumpounophobia (it isn’t a fear and it surely isn’t irrational); a closer joke might be fermouárphobia, but more properly it’s Meidenreißverschluss or its Greek equivalent fermouárapofygí.

            The issue is over statements based on ignorance. Despite working with a set of limited facts about a group who are strange to modern culture, inviting comment or speculation to fill in unexplained results, Henry is to be commended for his paper because he conscientiously avoids judgments outside of his knowledge or subject.

            Not constrained by the formality of publication, I will comment that Amish are well versed in genetic selection; most farmers are and they are also well aware (or at least believe in) of the heritability of temperaments: IQ and ‘AQ’. Although marriage choices are free, Amish will make suggestions on prospective pairing and the resultant characteristics of the children.

            With regard to the Amish as a closed genetic pool with ‘boiling off’ (not sure I like that term) of those with less AQ, the Old Order (OO) Mennonite communities from which the Amish descended may make a better subject pool. With Amish communities a person is either completely in or completely out. In the OO communities there are a number of choices that can be made; a series of step-downs if you will where a person can stay, they can join formal Mennonites, Bretheren, one of the ‘unitarian’ Progressive Mennonite communities, or of course they can leave completely. It is messier than ‘in or out’, but allows a better sorting for the equivalent to AQ.

          • melendwyr says:

            The most conservative sects don’t permit buttons. More permissive ones don’t permit them on certain pieces of outerwear – the tradition dates back to to when buttons were expensive and commonly used as adornment in addition to function. (Think Bilbo Baggins mourning over the brass buttons on his waistcoat when he escaped from Moria.)

            That’s why the more permissive sects still insist that hooks-and-eyes be used for the most visible locations where ‘decorative’ buttons might be placed.

            They also don’t use belts.

        • marcel says:

          Both are probably also aware that sex is better

    • Steve Sailer says:

      My vague impression is that the Amish are not enormously polite. Not rude like Hasidic Jews in upstate New York, but they don’t go out of their way to be gracious toward outsiders either.

      For example, horse droppings in the middle of small towns are not real polite by 21st Century American standards, but the Amish have their reasons for using horses rather than cars, so the English around them have to put up with them.

      • harpend says:

        I agree entirely with you based on my experience at home in upstate New York. On the other hand they do leave their neighbors alone, they do not proselytize, their buggies do not beat up our highways and bridges very much, they do not use a lot of our hydrocarbon fuels, they put a lot of sugar in their deserts, and they don’t rob convenience stores. Those are all pluses by my own accounting so I will still go with ‘polite’.


        • BRW says:

          As a neighbor to many an Amishman I will second Henry’s polite assessment. They give a polite wave and keep to themselves. How every great neighbor should be.

      • tokyobling says:

        Where I come from the people would be scooping those horse dropping up and putting it in their composts. Free nutrition for your garden.

      • Jim says:

        Neither the Hasidim nor the Amish are friendly to outsidrs but they are not hostile either. They just ignore outsidrs. If all my neighbors were Amish I I would have nothing to fear. I wouldn’t even bother to lock my doors. But I would probably be pretty lonely.

  11. David says:

    Also not qualified to do more than point out typos. Fascinating though.

    Page 4, bottom paragraph: “aministered”

  12. JK says:

    It’s a shortcoming that you don’t use any explicit psychometric model of personality. It means that there is no way of telling if the differences between the Amish and non-Amish reflect genuine differences in personality or if they are just a measurement artifact. Using an explicit psychometric model, you could test if the test items (or item bundles) function in the same way in the two groups, that is, whether the items are interpreted in the same way across groups. For example, analyses of IQ tests show near-universal measurement bias with respect to older cohorts, which means that rising IQs cannot be used as evidence of rising intelligence. However, given your tiny sample size, it would in practice be difficult to get meaningful results from DIF or factorial invariance analyses.

    The heritability of personality traits appears to be strongly non-additive, so your h^2 value of 0.5 may be too high.

    It’s unfair to say that Turkheimer “threw in the towel” with respect to heritability research. He had never endorsed the blank slate view, so the GCTA results do not challenge his views. Moreover, it’s not correct to say that GCTA renders family studies obsolete. GCTA cannot take into account the influence of rare variants or non-additivity, unlike family designs.

    • harpend says:

      “It is a shortcoming….”. It would be if we had any interest in personality testing. I have always assumed that ability tests measure something real (Can you do this problem?) but that personality tests were vapor. No good reason, just my own bigotry. I did read, on the recommendation of a colleague, Matthews and Deary’s Personality Tests and my bigotry remained. I ought to learn more.

      I have no idea on earth what DIF and factorial invariance analyses are. Once long ago, curious about group differences in cognitive test performance in North America, I looked at the PC’s of US White and Black test scores to see if they pointed in different directions. To my surprise they did not. Is this the sort of thing you are suggesting?
      How on earth does one parse this statement: “rising IQs cannot be used as evidence of rising intelligence”?

      You are right about SNP studies of kinship of course. I meant to attribute to Turkheimer just his denial of the carping criticism in the latter twentieth century by people like Kamin and Gould and Goldberger.

      You have a source that I can understand of DIF and factorial invariance analysis?

      Thanks, Henry

      • JK says:

        Differential item functioning (DIF) and factorial invariance (FI) refer to analyses of whether observed test score differences (on, for example, a test of personality or intelligence) between two groups can be attributed to the latent (unobserved) traits measured by the tests in question, or are the differences just test artifacts. In other words, they assess the presence of systematic test bias. DIF is about item-level data, while FI is about scales or subtests that sum several item scores. FI is tested in multi-group confirmatory factor analysis, for DIF there are various different methods.

        In DIF and FI analyses, you model the variances, covariances, and means of the observed test scores in terms of latent traits that are assumed to cause individual and group differences in the observed test scores. For example, if you have a bunch of items assumed to measure “extraversion”, you model the item variances, covariances, and means as a latent trait that you may call extraversion, and then test if the latent extraversion factor relates to the items in the same way in both groups. Tests are unbiased indicators of the latent trait if individuals belonging to different groups have the same expected observed test scores when they have the same latent trait scores. If the tests show DIF or violate FI, then observed group differences in test scores may not be attributed to group differences in the latent trait(s) in question. However, if only some items or subtests are biased, you can estimate group differences in the latent traits using the unbiased indicator variables.

        There’s a detailed article on the DIF in Wikipedia. This is a pretty clear article on factorial invariance (or measurement invariance) with respect to IQ tests. Here’s an application of FI to personality data from different cohorts. The last paper gives an example of the kind of bias that these methods can uncover:

        For instance, from the finding that “self-reports of anxiety/neuroticism have increased substantially from the 1950s to the early 1990s” (Twenge, 2000; p. 1017), Twenge (2000) inferred that “the larger sociocultural environment
        . . . has a considerable effect on a major personality trait” (p. 1017). However, bearing in mind that successive cohorts may be separated by many years, one may ask whether an item administered to young adults in the 1950s and early 1990s will necessarily measure the same construct. For instance, the response to an item purporting to measure Conscientiousness, such as “It is important to dress to the occasion,” is likely to be dictated by the sociocultural perceptions of what is appropriate with respect to dress code. Changes in the responses to such an item may have more to do with changes in such perceptions than with the latent variable Conscientiousness per se.

        Rising IQ test scores cannot be straightforwardly interpreted as increases in intelligence. This is easy to see when you consider that a literal interpretation of the Flynn effect would indicate that the majority of people 100 years ago were mentally retarded and incapable of leading normal, productive lives. IQ tests do not measure intelligence in an invariant manner across generations. This is because test items are necessarily culturally contingent, strongly reflecting the prior exposure of individuals to relevant cultural contents (this applies also to seemingly culturally neutral tests like Raven’s matrices). However, this does not mean that individual differences within a certain culture are environmentally determined — behavioral genetics indicates that individual differences in IQ within a culture are largely genetically determined.

        Tests of DIF and FI can be used to assess whether two groups are culturally similar enough for their test results to reflect authentic differences in the underlying traits. For example, IQ differences between blacks and whites in America (within the same age cohort) appear to reflect genuine differences in latent abilities, especially g. However, finding that the principal components or factors derived from test scores are similar across groups is insufficient to establish this. Wicherts & Dolan (2010), mentioned above, shows why this is so.

        While the structure of personality is less well understood than that of intelligence (because it’s much more multidimensional), personality tests are hardly “vapor.” For example, personality factors show substantial predictive validity and genetic stability over time, as discussed in the Turkheimer paper you cite. Personality traits like the Big Five may not carve nature at its joints, but neither are they artifactual.

        • JK says:

          “Personality traits like the Big Five may not carve nature at its joints, but neither are they artifactual.”

          I should have said, not completely artifactual. The Big Five are probably best seen as convenient summaries of more basal or “real” traits.

        • harpend says:

          Wow, thanks, I guess. I will dive in but fear I am getting too long in the tooth to learn new stuff. I like the “carve the joints” analogy.

          Regards, again, HCH

          • I should point out, having discussed this matter with Jelte Wicherts, that the requirements for a proper test of factorial invariance are very demanding. You need a very large sample, and even then many of those large samples do not meet the inclusion criteria. I think that it is a council of perfection, and that simpler item analyses can go a long way to showing whether a set of test items are behaving in the same way (tapping the same factors) in different populations. The main problem with personality questionnaires, in my view, is that they are based on fallible self report. This is far more important than obsessing about DIF and FI. Only a few tests have reasonable observer reports. I think those are very informative, particularly for a factor like conscientiousness, in which sloppy people imagine they have it, and very conscientious respondents are painful aware of their occasional short comings.

  13. annoynamouse says:

    Typo: “This technology has instantly rendered nearly obsolete studies of real know family relationships …”
    Awkward use: “more or less” is used twice – consider substituting ‘approximately’, ‘closely’, or a similar word.
    Not sure I am reading this right, but it looks like a missing word or two here: “That figure has remained about the same as the number of such loci available to has grown from dozens to hundreds of thousands.”

  14. Brett Olsen says:

    1) What percentage of the total variance do the first (=AQ) and second principal component directions explain? Is the AQ a highly explanatory vector?

    2) The inclusion of the other groups is slightly confusing. Correct me if I’m wrong, but it sounds like the PCA was performed on only the Amish/Indianan groups, with the UK and Chinese included for comparison. You might consider clarifying this in the text.

    3) So the left panel in Figure 2 is the dual of the PCA – i.e., tells us what the PCA components are in terms of the traits. Does this graph include the mapping of the other groups – i.e., the Chinese nurses? If so, it’s interesting that they map out as fairly high in C, H, B (Unemotional, Socially Bold, Abstract thinking) and low in O, Q4 (Resilient, Relaxed).

    4) I ran some numbers and truncation on the lower 10% of a normal distribution compresses the standard deviation of the remaining Amish population to about 85% of its original value (while increasing its mean by 0.20 standard deviations as you say in the text). I don’t think this is ignorable! The standard deviation of the Amish population under this assumption rapidly decreases, which in turn decreases the effect of selection in the next generation. When I modeled this for h=1.0, I got a maximum increase in AQ for a boiling off of 10% of only +1.25 standard deviations (of the normal population) and essentially a delta function for the Amish distribution as its variance has dropped so much. It’s not clear to me how lower heritability will effect this population compression, so I’m not sure how to model this at lower heritabilities. The observed equal standard deviations of the Amish vs. Indiana populations suggests this compression isn’t happening — so why not?

    • harpend says:

      Excellent comments, thanks.

      1.The eigenvalues of the correlation matrix of just the subjects are proportional to:
      0.23, 0.18, 0.1 , 0.08, 0.07, 0.06, 0.05, 0.04, 0.04, 0.03, 0.03, 0.02, 0.02, 0.02, 0.01, 0.01

      2.Right, will clarify. There are two different PCAs. The first is the group means as data objects: six objects, Amish, Indiana, UK mean, UK top managers, Chinese Nurses, Excellent Chinese Nurses. This small number of groups reflects the lack of accessible data.

      The second PCA, figure 2, has 51 objects: 25 Amish, 25 Non-Amish, the UK mean.

      1. Figure 2 is made from the 50 young males plus the UK average, not from all of groups. You are reading it wrong, AQ is the projection onto the x axis, not the y axis. Read it left to right, not down to up! The axes on the left and right panels are the same, so the Amish, on the left, are high on G, Q3, and I, “conscientious”, “tough-minded”, and “controlled”, and low on Q1, “radical”, etc.
      2. You are right on this point. Our model is like cartoon quantitative genetics and there are things we would like to sweep under the rug at this stage. Here goes pass 1. After the selection the mean changes by 0.2 sds and the new variance is .712 rather than 1.0 as it was. Now a generation of random mating occurs and, since heritability is 0.5, the mean goes back up to 0.1 sds and the variance after random mating is 0.856. The square root of this is 0.925, i.e. the new sd, while before the episode of selection it was 1.0. This process goes on, under the 10% assumption, and the sd gets pretty shrunk after ten generations or so, i.e. cut in half.

      At this point we have to look at the experimental literature, of which there is a lot. There is also disagreement about appropriate assumptions and so on. Lande and Arnold, as I recall, attributed their findings to ongoing mutations that restored variance. I will hit the books.

      Why is there no compression in the data? Beats me, I haven’t given it a lot of thought. We are finding the axis along which variance is maximized in the population of half Amish and half non-Amish: note that on the second PC, up and down in figure 2, the non-Amish are much more dispersed than the Amish.

      I will find all this more interesting if we can find another mating system and data set for which this kind of technology is appropriate. We have a pretty small sample here.

      Happy to send you our data tables, scipy code, BTW.


      • harpend says:

        With respect to the compression or lack of it in the Amish data, Wittmer in his dissertation was concerned with differences in the variability of the 16PF factors in the Amish and non-Amish. Variances were a lot greater among the non-Amish, i.e. they did not not behave as our AQ does.

        I took the 50×16 data table, converted each score to a normalized score or z-score, then looked at the Amish and non-Amish variances separately. Here are the results, i.e. what Wittmer found when he look at these in much greater detail:

        Variances of non-Amish:
        1.24, 0.88, 1.46, 1.3 , 1. , 0.74, 1.35, 1.11, 1.54, 1.19, 1.48, 0.98, 0.88, 1.4 , 1.52, 0.97

        Variances of Amish:
        0.4 , 0.79, 0.52, 0.5 , 0.69, 0.55, 0.63, 0.38, 0.54, 0.83, 0.49, 1.06, 0.65, 0.54, 0.55, 1.08

        List of factors:
        ‘a’, ‘b’, ‘c’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘l’, ‘m’, ‘n’, ‘o’, ‘q1’, ‘q2’, ‘q3’, ‘q4’

        The non-Amish variances are a lot bigger.

      • Brett Olsen says:

        I’d love to take a look at the data, if you don’t mind – my email is brett.olsen@gmail.com. I’m normally a biophysicist by trade, not a population geneticist, but scipy is my normal working framework and the data sounds quite interesting.

        I’m wondering if an explicit population simulation of a trait governed by a large number of additive genes of small effects might give some insight into what’s going on with the variance compression – although it sounds like the raw trait data does show lower variance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s