A link to a paper at First Monday was recently posted to the Distributed Proofreaders discussion forums as a bit of "bad press" for Project Gutenberg (and, therefore, DP as a closely related project). The paper's author, Paul Duguid, addresses the quality of "peer-production" projects, such as PG, Wikipedia and Gracenote (which is actually a commercial effort that uses user-submitted information for its CD database) using what he calls the "laws of quality." These laws are as follows.
- Linus's Law: “given enough eyeballs, all bugs are shallow.” (attributed to Linus Torvalds)
- Graham's Law: "The method of ensuring quality” in peer production is “Darwinian ... People just produce whatever they want; the good stuff spreads, and the bad gets ignored" (attributed to Paul Graham).
This paper interested me because I have long been a contributor to PG (mostly through Distributed Proofreaders as well as individually). As such, I have to respond to a couple of critiques mentioned by Mr. Duguid.
In looking at the quality of PG texts, Mr. Duguid outlines the problems of two etexts specifically, The Life and Opinions of Tristram Shandy, Gentleman by Laurence Sterne, and Pan by Knut Hamsen.
According to Mr. Duguid, the problems of Tristam Shandy are numerous. Being "very much a book about making a book" creates many problems for the electronic version due to various typographical devices used in the physical book. Mr. Duiguid cites many errors, including lack of edition information, missing Greek text, unidentified inline footnotes, and (significant) blank pages missing within the text. These are all valid complaints.
But while Mr. Duguid points out these valid faults, he also points out items that are not faults. About the PG edition of Tristram Shandy he says, "The novel was originally published as nine volumes with two– to three–dozen chapters numbered sequentially within each volume. The Project Gutenberg version appears to come from a four–volume edition that ignored the original volume divisions and chapter numbering." He goes on to complain about possible confusion, or at least the absence of certain jokes intended by the author, due to this renumbering. But this is hardly something that can be assessed as a PG deficiency. Although we can't say for sure what the original edition was like (since there is no edition information in the text), neither can we say for sure that etext does not faithfully represent the original edition. The same confusions and joke-mangling may be present in the edition used for transcription.
Of course, this brings up one of my own criticisms of PG, namely that for a long time it did not publish, and even actively discouraged adding, edition information to its texts. Fortunately, that sentiment has changed.
Oddly enough, where Mr. Duguid points out PG's flaws in not faithfully representing the print editions of Tristram Shandy, he seems to make the exact opposite argument in regards to Pan. Namely, he doesn't like the fact that the 1921 edition used as the source for the PG etext was allegedly "bowdlerized" (a weighted term if I ever saw one!) by the printer – according to the most recent Penguin edition of the same book. While this might be true, again this is not a fault. In fact, there is nothing preventing an "unbowdlerized" edition of Pan from being submitted to PG, so long as its copyright can be cleared.
However, Mr. Duguid sees a lack of a note at the beginning of the PG etext, identifying its bowdlerized state, as a negative. He claims that readers are "cheated" by the lack of such a note. In what way? In this case the edition is clearly identified, and if readers want they can look up historical information about it. It is not PG's responsibility, nor goal, to warn its readers that X edition has 10% fewer sex scenes than Y edition. It merely provides the books, as does any modern library.
In sum, I think Mr. Duguid does raise some good points about the quality of PG texts. None of them are necessarily new – many of these points have been raised on various PG mailing lists and discussion boards for years – though the examples themselves might be ones that haven't been brought up before. However, I disagree with the assessment of all of his examples as "faults." The goal of PG has not ever been to provide authoritative or critical editions. Overall, the quality of PG texts is very good, near the level of commercial publishers, and in many cases (especially more recently posted texts) it may be better, as books are produced using better tools and processes, such as those provided by Distributed Proofreaders. Certainly Mr. Duguid's point is valid, though, that PG volunteers need to continue assessing both the quality of currently produced books and previously produced books.
Oh, and by the way, if you ever happen to spot an error in a book, you can always submit a fix request to errata@pglaf.org.