Distributed Proofreaders
For the last couple of weeks I've been contributing a little to Distributed Proofreaders (DP). Part of Project Gutenberg, DP:
provides a web-based method to ease the conversion of Public Domain books into e-books. By dividing the workload into individual pages, many volunteers can work on a book at the same time, which significantly speeds up the creation process.
For new contributors like me, this is what it looks like:

which is admittedly not the prettiest UI in the world (the whole site screams "2000s PHP"), but it gets the job done: you get an image along with the text extracted by some OCR, and your job is to fix the generated text according to some proofreading guidelines (you don't need to read the whole thing to get started, and in fact I haven't yet; for most texts you only need to know a small subset of those rules).
That same page, after proofreading, looks like this:

where I removed the header and end-of-line hyphenations, replaced "Senor" with "Señor", and checked in general that the OCR text matched the image.
My fixes might've missed something, but that's fine: this is just the first phase of the book's digitalization (P1, to use DP's terminology). The whole project then goes through two more rounds of proofreading (P2 and P3), two rounds of formatting (F1 and F2), and two rounds of post-processing (PP and PPV)1.
You can't just participate in any round. After signing up and completing some "mandatory training", you can only participate in P1. To get access to the P2 and F1 rounds, you need to proofread 300 P1 pages, plus some other requirements that are relatively easy. To get access to P3, you need to complete 150 P2 pages and 50 F1 pages. You get the idea.
This is a lot of procedure, but it seems to work reasonably well: at the time of writing this, DP has finished 49,365 books, which is more than 60% the number of books in Project Gutenberg. (Both Gutenberg and DP have detailed statistics of how many books they have completed, so it would be nice and easy to build a stacked bar chart showing how many books Gutenberg published each year, and how many of those came from DP.)
I personally find contributing to DP both fun and rewarding. As someone who loves books, public goods, free access to information, and projects that only the internet can enable, this hits all the right notes.
Another nice aspect of DP is that you can easily get feedback from experienced proofreaders. Some projects are marked as "beginners-only" and, if you proofread some of their pages, you'll get a message from a mentor giving you some tips. I got some useful feedback from a kind mentor, and later learned that they have proofread more than 12,000 P1 pages (!!). This kind of thing reminds you of the insane number of unsung heroes there are in the world.
Further reading:
- A great blogpost by the aforementioned mentor, that illustrates the fun challenges of proofreading old books.
There's also an optional "Smooth Reading" round between the two post-processing rounds, where people can read the almost-finished book for pleasure, and report any issues they find. I don't know how many people do this, and how effective it is.↩