Into the Wardrobe — a C. S. Lewis website

by **davidcm** » January 1st, 2008, 4:52 pm

I know there is controversy over who is the actual author of The Dark Tower, whether Lewis wrote it, whether Walter Hooper wrote it, whether Hooper completed something that Lewis began, etc. I have opinions on that subject, on the allegations that have been made, but I will not get into that. My main question is as follows.

Back in the late 1990's or early 2000's, I think, when I was a member of the MereLewis list, it was being discussed that some kind of computerized examination was going to be done on The Dark Tower, comparing it to other writings by Lewis and Hooper and potentially others. It was said that the computer program would be able to examine the user of words, phrases, etc., and that it might be possible to determine the likelihood that Lewis was the author.

Is anyone, here, familiar with what I am talking about? Does anyone know if that study was ever performed and what the results were?

by **Tuke** » January 9th, 2008, 1:16 am

Have you read this article in Christianity Today? It discusses the study you mention and draws some fairly convincing conclusions.

"Shedding Light on The Dark Tower"
A C.S. Lewis mystery is solved.
http://www.christianitytoday.com/ct/200 ... 28.44.html

by **rusmeister** » January 10th, 2008, 6:08 am

by **Stanley Anderson** » January 10th, 2008, 2:06 pm

All I can say is that if you refuse to read it you are missing out on a great bit of story telling. I've often joked that I wish it were a forgery so that there was hope that the real author was still alive to be able to finish it.

I can respect your reasons, though I disagree with them -- and I think Lewis would disagree too, on the grounds of undue pride about one's opinion of one's own works and how they reflect on one's "greatness" is not to be encouraged. (Of course I recognize that if one thought a piece was unhealthy and unedifying for others to read, it would be wise to get rid of it. Interesting though, that Lewis apparently didn't destroy the work but kept it around if that really were the reason for not finishing it.)

--Stanley

by **Tuke** » January 10th, 2008, 8:33 pm

by **larry gilman** » March 26th, 2008, 4:07 pm

by **larry gilman** » March 26th, 2008, 4:24 pm

There have been two numerical analyses of The Dark Tower (TDT). The first -- and the only one whose details have been published -- was by Carla Faust Jones (“The Literary Detective Computer Analysis of Stylistic Differences Between ‘The Dark Tower’ and C. S. Lewis’ ‘Deep Space Trilogy’,” Mythlore, Spring 1989). Jones used a computer program to perform a widely-recognized, albeit crude, form of stylistic analysis of the first 16,336 characters of TDT and of comparable text samples from the openings of the three space-trilogy books. The method involved counting and comparing frequencies (numbers of appearances) of single letters and letter pairs (bigrams) in the text.

Jones found that the quantitative stylistic differences between TDT and the other three space-trilogy books were greater than those between the three trilogy books. In other words, by these standards or metrics applied to this sample of text, TDT was stylistically distinct. That was not a judgement but a numerical fact arising from a reproducible calculation.

The other analysis, which was of much shorter fragments of TDT and the space trilogy books, was performed by A. Q. Morton using his controversial “cusum” (cumulative sum) technique. The Morton analysis is highly questionable. Not only have its details never been published, but the cusum technique has been tested and found wanting by the literary computing expert community: it is used only by a dissident handful of analysts and is widely considered useless. I have myself read Morton’s book and implemented his techniques by hand and by computer. They are full of opportunities for self-deceptive manipulation of the text and over-interpretation of the results, and I decided for myself, even before reading expert opinions in the journal Literary and Linguistic Computing, that it was useless.

The Christianity Today article referenced above is a hatchet job on Lindskoog. It does not even mention the Jones computer analysis, which is the one that Lindskoog attributed her first suspicions of the Dark Tower’s authenticity to and which is the only analysis yet published using a recognized, objective technique (letter and bigram frequencies). While it is correct, in my opinin, to dismiss the Morton cusum analysis -- although Christianity Today justifies its dismissal only by vague mockery (“This type of style analysis has been used to prove that Shakespeare did not write his plays”) --- to dismiss the cusum results without even mentioning the Jones results, which are prominently discussed in Lindskoog’s books, is yellow journalism.

So, in short, thus far we have one serious, checkable analysis and one bogus analysis. It is regrettable that Lindskoog was taken in by the Morton analysis, which would have been better off undone, but the Jones analysis is a serious piece of work.

Software such as Jones used has long been available only to experts who write the computer code themselves. Her code is simply not publicly available and may no longer even exist. Searching for Jones on the internet turns up no contact information (perhaps a marital name-change). Recently, however, a prototype authorship-attribution program has been made freely available by literary-analysis expert Patrick Juola (see ), with funding from the National Science Foundation. Anybody who has the patience to prepare careful text samples, figure out how to run a Java program, and do the numbers, can now download this program and perform authorship-attribution studies (with limitations). I have done so, or begun to. The obvious place to start is with Jones’s results. Are they reproducible? Does she get it basically right or was it all a mirage from the get-go?

These are my results, below. The first diagram is a two-dimensional plot of Jones’s published 1989 numbers. The vertical axis is text-to-text distance (dissimilarity) for character bigrams (letter pairs). The horizontal axis is textual dissimilarity for single characters. Textual dissimlarities for TDT and the space-trilogy books (three pairs of figures) are plotted as green squares; dissimilarities between the space-trilogy books (also three pairs of figures) are plotted as red stars. The greater mutual similarity of the space-trilogy books manifests as clustering of the red stars apart from the green squares:

This is just a handy way of visualizing Jones’s old results.

Now here, below, are the results of two analysis I did recently using JGAAP on the same text samples that Jones used (which I obtained by scanning in the book pages and doing optical character recognition on the scans to produce text-file samples). The first shows textual dissimilarity measured by “histogram distance,” a metric calculated internally by the JGAAP program but generally similar to Jones’s distance measure:

The next shows dissimilarities calculated by JGAAP using an alternative statistical standard, Kolmogorov-Smirnov distance (K-S Distance):

As you can see, clustering is visible in both fresh analyses. So, Jones had it basically right. Indeed, by the K-S distance criterion, the clusters are even more distinct than in Jones’s analysis.

What does this mean? It means that there is, as far as these textual samples and difference metrics go, we have at least one reproducible observation: TDT is stylistically distinct from the Space Trilogy. It does not prove that TDT was forged. As Jones herself was careful to point out, there are many possible explanations for such a difference -- though TDT authorship by someone other than Lewis is one of those possible explanations. I draw no conclusions at this point and don’t think anyone could.

It also means, incidentally, that Christianity Today’s sneering dismissal of the textual-analysis evidence can itself be dismissed. Such evidence does exist -- though not from Morton’s cusum method. Jones’s results are essentially reproducible, as I have shown -- using completely different software.

Clearly, more work begs to be done. Most pressingly, what about taking text samples of similar length from later in the Trilogy books and TDT? Does one find a consistent stylistic difference across samples? And what about other Lewis writings: does his style (as measured by reproducible, objective metrics) vary between fiction and nonfiction, in the same period as these other sample texts (late 30s, early 40s)? And what about analysis of the other texts that Lindskoog argued had doubtful provenance? (We can dismiss A. Q. Morton’s rarely-cited analysis of “Christian Reunion,” which claimed that “a large section” of that esay “is not by C. S. Lewis anad is by some person who cannot be distinguished [stylistically] from Hooper.” His method is simply no good and no conclusion can be drawn here until better analyses are done.)

I have the tools to answer these questions, but it is time-consuming, especially the preparation of clean text samples and the recording and plotting of test results, but am keenly interested in doing the work. Just writing up this precis has taken too much of my morning. . . Anyway, I hope this answers the original question. Stay tuned to the Wardrobe for my next round of results, when they are ready.

Sincerely,

Larry Gilman

by **Stanley Anderson** » March 26th, 2008, 4:57 pm

by **larry gilman** » March 26th, 2008, 7:15 pm

Stanley,

Pure carelessness on my part in the quotation thing! Apologies! I have gone back and changed the attribution.

As for interpretation of the plots: email me direct (lnpgilman[[at-sign]]wildblue.net) and I will send you a PDF of the Faust article, if you wish it. She includes a painstaking description of how the metrics are calculated. That would inform your reading of the x and y axes in my plot of her results.

As for the JGAAP results, the axes are not directly comparable to those for Jones. In fact, that is one of their merits. This is completely different software looking at the same basic features of the text (single letters and double-letter pairs) but not deriving exactly the same measures for them -- yet the basic results are the same. As for exactly how they are calculated, that is unimportant: what is important is that anybody can check these results or produce similar ones for other text samples. These are reproducible results that reproduce Jones’s results.

Your clustering vs. distance-from-origin question is relevant. The absolute distance from the origin, along each axis, is smaller for more similar texts. What makes the plots suggestive is that the trilogy-to-trilogy distances (i.e., the distances of the stars from the origin, not the distances between them) are almost always smaller than the TDT-to-trilogy distances (distances of the squares from the origin). Points plotted farther from the origin correspond to more-dissimilar texts.

This results in a sort of clustering: if a group of texts is consistently more similar, it will be plotted closer to the origin. A group with larger metrics on both axes will be farther out from the origin along the x=y line (more or less). Two groupings, or what I've called "clusters," will result, one bunch closer to the origin and the other farther away -- as we see in these plots.

If we were calculating distances between texts by an author all of whose writings had precisely the same statistical properties, all points would land on the origin, at 0 intertextual distance. This is of course cannot happen with real texts. What signifies is that the TDT-trilogy distances are farther from the origin than the trilogy-trilogy distances. Probably better not to talk about "clusters" at all.

The scatter within each group, which you mention, tells us nothing in particular about the value of the method because the sample is too small and what is really of interest is the method's decision-making reliability in actual authorship-attribution tasks. The methods lately refined by Juola (and many others in the literary-computing field) seem to perform quite well by this standard: more details on this question can be found at http://www.mathcs.duq.edu/~juola/papers.d/casta.pdf and other articles available through the JGAAP website (see link in my previous posting). Also in the peer-reviewed journal Literary and Linguistic Computing.

JGAAP is a prototype amateur tool, suitable for conceptual studies, not for publishable research. But it is unique in that it does allow us to take a first crack at certain questions without writing our own verified code from the ground up (a work of years). And anyone can check anyone else's results, now --- we can all download the same software. That's pretty cool.

Remember, I am not claiming that my follow-up on Jones proves anything except that she did not make up her results or produce them by some idiosyncratic computational error. Without further testing I, like you, cannot draw any conclusions about authorship here. Maybe not even with further testing. But certainly not without it. And that will take time . . .

So, by the way, any thoughts on Hooper’s fantastically botched ream-out of The Screwtape Letters? (I’ve just repaired that posting so that the links work and the quotations are all present.) At http://cslewis.drzeus.net/forums/viewtopic.php?t=8734 .

Sincerely,

Larry

by **larry gilman** » March 27th, 2008, 8:39 pm

I've put up a PDF of Carla Faust Jones's article on her 1989 computerized comparison of the first 16,000 or so characters of The Dark Tower, Out of the Silent Planet, Perelandra, and That Hideous Strength:

http://www.larrygilman.net/misc_documen ... rticle.pdf

If the copyright holder complains about infringement, I'll take this link down at once. But I doubt they will: the periodical no longer exists.

Regards,

Larry

by **Stanley Anderson** » March 30th, 2008, 2:42 am

Into the Wardrobe — a C. S. Lewis website

Message Forums — A Community of Wardrobians

Author of The Dark Tower

Author of The Dark Tower

Can't imagine?

Numerical analysis of The Dark Tower etc.

Re: Can't imagine?

Ooops!!!

Carla Faust Jones Article

Re: Ooops!!!

Who is online