Journals Higher Education
Important OUP's Response to COVID-19 Learn more

Language Engineering

As the world's best-known global dictionary brand and the publishers of the world-famous Oxford English Dictionary, Oxford University Press is a repository of language expertise and its teams of editors, in-house and around the world, are constantly engaged in revising and updating the vast databanks which underpin every title or service Oxford publishes. The preparation of lexical reference works, of all types and for all ages, has been a central part of the Press’s activities for more than 100 years. Millions of pounds of investment is made annually to produce and maintain the ranges of dictionaries, thesauruses, subject reference works, quotation dictionaries, style guides and language learning material in print, online, offline and electronic publication. Oxford University Press, which is a department of the University of Oxford, has taken forward its language reference business to make its unique, high-quality language resources available to commercial organizations in the fields of language engineering and NLP.

As a key data provider to the language technology industry, Oxford content is currently being used in applications in the following areas, amongst others:

  • Search engine technology
  • information and knowledge management
  • Machine translation
  • Crosslingual information retrieval (CLIR)
  • Speech-to-text applications
  • Speech recognition systems


Some of Oxford’s resources include:

  • a wide range of linguistic assets spanning monolingual dictionaries and thesauruses, bilingual dictionaries, technical reference works, natural language databases, audio recordings and more.
  • resources available in SGML or XML with full metadata.
  • the Oxford Dictionary of English contains 350,000 words, phrases and definitions, 52,000 scientific and technical words and senses and 12,000 encyclopedic entries. We hold an enhanced non-print version with inflectional, frequency, usage and collocational information, as well as information such as lexical set markers, domain labels, multi-word expressions, sense indicators, pronunciation information, audio files and superordinates with mappings to WordNet.
  • Monolingual assets cover both British and American English, in collaboration with our American English Dictionary Centre in Connecticut.
  • developments currently under way include word sense frequency lists for senses, compounds and phrases; corpus-derived collocation lists including frequency information; word sense disambiguation algorithms for special sets of words; and linked dictionary/thesaurus resources.
  • Monolingual assets cover both British and American English, in collaboration with our American English Dictionary Centre in Connecticut, and we have access to extensive and up-to-date holdings on World English.
  • from over 20 offices world-wide, Oxford publishes bilingual dictionaries in over 40 languages. Oxford also has partnerships with leading dictionary publishers in Europe and the Far East, and is able to source a wide range of materials through such partnerships.
  • in addition to dictionary and subject reference materials, Oxford hosts the British National Corpus, and further has its own corpus of comparable size, as well as holdings in major foreign languages. A constant and unique worldwide language research programme monitors new words and changes in English.
  • an experienced team of editors, linguists and text processing specialists to tailor material to a project or service by adapting our existing texts or creating customized data.
  • over 100 specialist subject reference works covering topics as diverse as Physics, Plant Sciences and Medicine.

Distribution Partnership

Oxford University Press has entered into an agreement with The European Language resources Distribution Agency to distribute some of our resources for academic research. If you are an ELRA/ELDA member, please contact ELDA for more details.

If you would like more details and samples of our data, please contact us.