Google digital humanities $$
Google Digital Humanities Research Awards
Google has so far digitized over 12 million books in over 300 languages – a significant fraction of all books ever published. This collection, much of which was previously available only in university libraries, has helped many disciplines in the humanities. Because of this vast increase in digitized information, new avenues of literary research are now possible. We also know more could be done to facilitate this research. Sometimes humanities research consists of amassing and curating a private data set, and writing or customizing tools specifically for that data set. While that might be the quickest way to answer a particular research question, it does little to help other researchers with similar questions. We want to make it easy for people to share not just results, but the tools and intermediate data upon which future research can build. Toward these ends, Google is creating a collaborative research program to explore the digital humanities using the Google Books corpus. Disciplines of interest include (but are not limited to):
- Linguistics
- History
- Classics
- Literature
- Philosophy
- Sociology
- Archaeology
- Anthropology
Some example projects to give you an idea of what we’re thinking about:
- Building software for tracking changes in language over time
- Building software for tagging and identifying concepts, structure, or entities in text (possibly tailored to a specific domain or language)
- Creating utilities to discover books and passages of interest to a particular discipline, with support for annotations and collaborative research
- Developing systems for crowdsourced corrections to book data (e.g., OCR text) and metadata
- Generating marked up freely usable datasets (e.g., part-of-speech tagging for little-known languages)
- The testing of a literary or historical hypothesis through innovative analysis of a book corpus
- Analysis of the generative or creative processes revealed in texts
These are one-time awards for up to US $50,000. Google may choose to renew the award for another year following review of the research at the conclusion of the first year. Where appropriate, we expect award recipients to make their software, utilities, datasets, or similar results freely available to others to use. We are requesting proposals in this area from select researchers and faculty members, and we would be delighted with your participation. We expect to make several awards under this program, and welcome proposals that include investigators from multiple organizations. Proposals that share resources or funding with other efforts are also welcome. Google may offer help in some instances by providing relevant subsets of the Google Books corpus (subject to copyright and metadata licensing) or by hosting data for researchers. For instance, we anticipate being able to provide frequency lists of words categorized by language, publication date, country, and subject; and a limited number of scans and plain text from books in the public domain. If your research requires a specific data set, feel free to contact Jon Orwant (orwant@google.com) about availability.
*interesting that two of the examples pertain to language documentation–linguists are you looking!