Thinking about preparing for a course this Fall, I had a look at the current literature on assortative mating and found a pair of excellent papers on assortment by education [1,2]. Using census data they generated tables by decade from 1940 to 2000 with counts of marriages by level of education of husband and wife. There are two families of tables in these papers, one based on recent marriages (“newlyweds”) and the other based on marriages of five or more years duration (“established”). Since the established series is more up to date, I have looked at that series.

Here is the raw data table from a sample from the 1940 US census of established marriages. The data are given as percentages of the total number of marriages but the raw numbers are simple to generate from the published tables. Here for example is the data table from 1940:

```
1940 Marriages
Male Education
<10 10-11 12 13-15 16+
<10 69729 7053 4945 1236 634
10-11 11618 6150 4137 1093 570
Female 12 10382 5706 12887 3629 3027
Education 13-15 2092 1062 2330 2504 2567
16+ 507 253 745 855 2773
```

The education categories are dated from today’s perspective: fewer than ten years of schooling, ten to eleven years, high school graduate, some college, and college graduate. Of course the significance of these has changed since 1940: today’s associate degree is equivalent to finishing high school in 1940 some would say. Nevertheless we can look at what we have.

While the above table is complete, how do we make sense of it? Some things are easy, for example the largest number of marriages, 69729, were those in which both partners had fewer than ten years of education. In 2000 the largest number of established marriages is between two college graduates. The smallest entry in 1940, 507, is for college graduate women married to men with fewer than ten years of schooling. Interesting numbers, I suppose, but what is the big picture? Is there a little or a lot of assortative mating? For a full analysis using log-linear models and covariates see the above papers. The second, by Schwartz and Mare, is well written, easy to understand, but it is a full course meal. What I want is a snack, a tasty cracker, rather than a full meal.

One attractive approach to generating a light snack is called “raking” the table [3]. This technique was a hot topic way back when I was in graduate school [4]. Iterating, one adusts the marginals, alternating rows and columns, to some desired values. In more detail: first compute the column sums, then divide each column by its sum, then sum the rows, then divide each row but its sum, repeating these steps until the table stops changing. This adjusts the marginals, i.e. the row and column sums, to 1.

This method, applied to the 1940 marriages, gives this:

```
1940 Marriages
0.59 0.23 0.12 0.04 0.02
0.23 0.42 0.25 0.08 0.02
0.11 0.23 0.36 0.22 0.08
0.04 0.08 0.19 0.41 0.28
0.03 0.03 0.09 0.25 0.60
```

The row and column labels are the same as in the table above. What do these numbers mean? They are (supposed to be) estimates of what we might call “attraction.” (I want to say “preference” but sociologists use that for something else.) Since we have iterated to unit marginals (the rows and columns each sum to unity), the numbers tell us how many marriages of each mating-type (pair of education levels) would occur if the numbers of potential mates from each level were the same. For example, the rows correspond to female partners. The second entry in the first row, 0.23, tells us that if the mating pool numbers were uniform across levels, 23% of the women with fewer than ten years of schooling would marry men with ten or eleven years of schooling. We could have adjusted the marginals to anything we wished, of course, but the reduction to unit marginals in my opinion gives the simplest picture possible of assortative mating by education.

Here is the same table from the 2000 census:

```
2000 Marriages
0.64 0.19 0.11 0.05 0.02
0.19 0.48 0.21 0.09 0.02
0.11 0.21 0.39 0.22 0.08
0.04 0.1 0.22 0.42 0.21
0.01 0.02 0.08 0.22 0.67
```

Now things are starting to get strange. The diagonals in the 2000 table are slightly greater, indicating more within-group marriage attraction, but not much. Other than that is is essentially the same table. Hardly anything has changed in 60 years!

Here is another (to me) surprise. Female hypergamy (marrying up) is often thought to be common. A simple indicator is the average number of marriages in the upper triangle (male education>female), on the diagonal (both partners have same education level), and in the lower triangle(male education<female). There is scarcely any indication of preferential hypergyny, neither in 1940 nor in 2000 nor in the intervals between. There might be some if I had reported more signficant digits but it would not be large enough to be of any interest.

```
Average of upper triangle, diagonal, and lower triangle values
Year Female<Male Same Female<Male
1940 0.37 0.47 0.37
1960 0.37 0.48 0.37
1980 0.38 0.50 0.37
2000 0.38 0.52 0.38
```

So this is the cracker I have gotten ready for my class this fall, perhaps 15 minutes worth. If you are interested in digging deeper I recommend a careful read of Schwartz and Mare [2]: good luck.

If you are interested in the numbers, here are the raw data straight from their paper, in a format such that you can copy one of the blocks and read it directly as a floating point array with numeric python’s loadtxt() function.

```
# year 1940 counts from schwartz and mare 2005 demography
# men in columns, women in rows, schooling increases across and down
#N=158512
28.83 4.37 3.53 0.62 0.42
7.23 4.96 4.67 0.72 0.22
7.01 5.80 14.29 3.92 2.30
0.74 0.52 1.97 1.97 2.05
0.25 0.12 0.52 0.64 2.32
#1960
#N=203117
9.32 2.94 2.85 0.47 0.19
4.70 5.04 6.94 1.42 0.37
5.84 6.41 22.61 6.93 3.01
0.63 0.76 3.35 5.15 3.81
0.10 0.08 0.84 1.54 4.69
#1980
#N=239980
2.68 1.51 1.92 0.44 0.09
1.51 3.16 5.00 0.94 0.16
2.23 4.33 25.51 8.26 3.03
0.42 1.03 7.08 9.48 5.51
0.09 0.14 1.66 3.37 10.46
#2000
#N=220209
3.47 0.60 1.42 0.52 0.16
0.68 1.01 1.79 0.65 0.13
1.80 2.02 15.54 7.33 2.41
0.76 1.06 9.26 14.91 6.98
0.17 0.18 2.80 6.33 18.02
```

REFERENCES

[1] R.D. Mare, Five decades of educational assortative mating, American Sociological Review. (1991) 15–32.

[2] C.R. Schwartz, R.D. Mare, Trends in educational assortative marriage from 1940 to 2003, Demography. 42 (2005) 621–646.

[3] A. Agresti, Categorical data analysis, Wiley, Hoboken, NJ, 2013.

[4] F. Mosteller, Association and estimation in contingency tables, Journal of the American Statistical Association. 63 (1968) 1–28.

There is scarcely any indication of preferential hypergyny, neither in 1940 nor in 2000 nor in the intervals between.Perhaps for marriage. How about affairs, assignations, etc? Men are probably much more cautious regarding marriage.

Really strange. The abscence of female hypergamy (symmetry of the matrix) is both against stereotypes and my personal experience. This is not so common, usually facts collide with the PC view, not folk wisdom or personal experience …

Trophy wives are easily the biggest form of hypergamy in my experience – so maybe female hypergamy is simply swamped by a numerically much larger male hypergamy.

This is very interesting, thanks for the link to the papers.

I’m not quite sure I’m following the results you’re reporting, though. I ran through the numbers myself, and I’m getting identical results for the diagonals, but significantly lower for the off-diagonals (upper and lower triangular regions).

I calculated (based on both raw numbers and the raked affinities):

Sorry, looks like the column labels got displaced. That should be year, fraction of marriages with lower female educational attainment, fraction of marriages with equal educational attainment, and fraction of marriages with greater female education attainment, as calculated from both the raw data and from the adjusted affinities.

I find it interesting that while the raking does improve the balance between the upper and lower triangles, the balance isn’t very far off even in the raw data.

I did find an interesting metric that *does* change over time. We can separate the population of marriages into those with high-ish educational attainment (the lower right triangle) and those with low-ish educational attainment (the upper left triangle). Then we can apply the same analysis of homogamy vs. hypergamy in these categories. Applied to the raw data, we get:

It looks like in 1940 we see two types of hypergamy: less educated males marrying more highly educated women and very highly educated males marrying less educated women. Net, they roughly balance out, so we see about the same number of marriages with more educated women as those of more educated men. This trend diminishes over time, though, with less polarization in the effects on low vs. highly educated marriages.

I think that we need to check the averages here. The off-diagonal sums entries sums have to be divided by 10 to get the averages, the diagonal sums divided by 5. Your three averages sum to unity and they should not, as mine don’t. We should have 10*(average upper triangle) + 10*(average lower triangle) + 5* (average diagonal) = 10.

Right?

Your third post is very interesting indeed: we will have to see if we (i.e. you) can justify doing it. Sounds right though. Back to the computer………

Hmm. What I was calculating was the fractions of marriages in the equal, high-F, and high-M , under both raw and normalized population distributions. I.e., taking the matrix, normalizing it so that the sum of all entries in the matrix is 1, then summing the fractions of interest. I think which metric is more useful depends on how you interpret the values of a raked matrix. If you interpret them as relative “attraction” between groups, then I think the average attraction makes sense. If you interpret them as relative populations of marriages under a uniform distribution of educational attainment, then I think the fractional approach is reasonable.

That said, I still can’t replicate the results you’re showing here. I think you’re including the diagonal elements in the calculation of the mean upper/lower triangular values.

E.g., from the 1940 raked upper triangle, we have: 0.19, 0.11, 0.05, 0.02, 0.21, 0.09, 0.02, 0.22, 0.08, 0.21, the sum of which is 1.2, divided by 10 should be 0.12. I suspect you’re adding in the diagonal elements 0.64, 0.48, 0.39, 0.42, 0.67 for a total sum of 3.8, and dividing by 10 for a final value of 0.38.

If you’re using numpy as well, you may have run into the same problem I did of numpy.triu() and numpy.tril() including the diagonal values. I put my analysis code up on pastebin (http://pastebin.com/XNtPXCJC) to look at if you like.

Brett you are absolutely right: I forgot that to get the triangles without the diagonal a second argument is required. Pretty high grade clientele we have here. Our substantive conclusions are the same, as you say.

Still not sure about isolating the upperleft and lowerright triangles–have to give that some thought.

Thanks

I used the GSS to analyze assortive marriage in response to Whiskey/testing99/evil neocon in this post.

Good idea pulling the variables from the GSS–I will give it a shot.

OTOH I could not understand what the argument was about to which your post was a response. Can you give us a summary?

Novaseeker has deleted his blog. His post is archived here though:

http://web.archive.org/web/20101018073214/http://novaseeker.blogspot.com/2009/09/its-all-about-numbers.html

Unfortunately, Mangan both deleted his post and has a robots.txt file preventing its archival.

Error in table labeled “Average of upper triangle, diagonal, …”:

“Year Female<Male Same FemaleMale” ?

Sigh – brackets were eaten by html engine –

Two columns are labeled “Female LT male” – one should be “Female GT Male”

Maybe I’m just being dense here, but how can there be hypergamy if virtually all males and all females get married to somebody, or if those who don’t — a pretty small percentage historically — aren’t so different from one sex to the other? Doesn’t any possibility for hypergamy depend on a difference between the two rather small classes, one of males who don’t get married, and the other of the women who don’t, and that it’s going to require a major difference in those two classes to be detected at all in the tables of those who do get married?

And wouldn’t it be in any case easier to detect the presence of hypergamy by examining those who don’t get married?

The more I try to think about this, I don’t see how hypergamy might possibly work if virtually everybody gets married, or if those who don’t are basically indistinguishable from sex to sex.

There’s got to be an easy theorem.

And of course the same point applies to, say, levels of attractiveness.

Good point. I was thinking along those lines too, it’s the only way I can makes sense of the data and go over the impression that there “must” me more women marrying up than men. However, I have a vague remembering that number of unmarried men is not so low…and much higher than unmarried women. Or is it childlessness I remember?

Also, recent census are not so meaningful given the divorce rate….much higher when wife income > husband income, iirc. Income is correlated to education, so I expect mariages in the lower half of the array to be short-lived. I wonder how the table would look like if we used people*year as unit, instead of people…

Sure, for every male who marries up there corresponds a female (his wife) who marries down.

kai,

A quick google seems to show that in 1980, the number of men of age 40-44 in the US who were never married was only 6%, but that number rose to 16% in 2008. I’d assume — though I could certainly be wrong — that the numbers in 1940 were fairly similar to those of 1980, and certainly the number of divorces would be small as well. So there should be little hypergamy of any kind in those times.

Not sure what to make of the numbers in 2008, given the various categories of men who might not marry in this more current era, as well as the increased divorce rate.

What the original poster (and myself) meant was that if everybody is getting married (and average education is the same for both sexes), the table must be symmetric (as much up and down marriage for both sexes). Basically, monogamy and everybody getting married ensure symmetry, no place for an average up or down selection, even if there was a preference in the first place.

Historically though most males never married (nor mated!), but it became the norm perhaps after the Industrial Revolution.

Slightly off topic.

The other night I watched John Stossel on TV. He is a libertarian and I’m not, but his show can be interesting. He had a segment on popular myths. One of those was what he called the myth of cousin marriage. He announced that modern science now realizes that it is OK to marry your first cousin.

Maybe I’m behind the times. When I think of consanguinity I remember the opera Don Carlos. The real Carlos was a deformed. psychopathic dwarf – yet another genetically defective Hapsburg.

Is Stossel right?

well, when you think about it, marrying your first cousin is part of a coherent life-style, also characterized by smoking a lot of dope and riding a motorcycle without a helmet.

I’ll take that as a no.

Against a background of very little cosanguinous marriage, one instance of cousin marriage would be worse than average but not as catastrophic as the popular view might put it (after all, Darwin married his cousin as well). But if the norm against cousin marriage were to fade, we could wind up with the repeated cousin marriage and inbreeding depression found in a number of Muslim cultures.

This obvious distinction was well understood when the phrase “cousin marriage” signified “stupid white people in Appalachia”. As it comes to be associated more and more with people we’re not supposed to sneer at, we’re all going to need to start pretending that a cousin marriage here and there is exactly the same thing as marrying one’s niece for generation upon generation.

What do you make of this new study:

http://asr.sagepub.com/content/early/2014/05/29/0003122414536391.abstract