By Kamil Wiśniewski Aug 19th, 2007
Corpus linguistics is not another branch of science, but rather a term that denotes the methodologies and approaches to the analysis of languages. A corpus is a collection of either spoken or written texts in a given language (less often of two languages) consisting nowadays usually more than a million words. Different types of corpora enable analyzing various kinds of discourses in order to find quantitative evidence on existence of patterns in language or to verify some theories.
At first corpus studies focused on single words, their frequency and occurrence, yet with the development of technology and more precise search engines the possibilities increased dramatically. Now it is possible to search for a word and only a particular instances of a given word class, or entire patterns such as preposition + noun, or determiner + noun, or a word + specific word class following it. Such investigations make it easy, for example, for dictionary publishers to find collocations.
Corpus linguistics is also applied to translation studies where with the use of corpora of two languages it became apparent the meanings of words and their supposed equivalents might differ in use or collocates. Moreover, some grammar aspect strongly connected to lexis enable linguists to show differences in the use of certain grammar structures in translations, even if similar grammar structures are available in the source and target languages. In the case of English also differences between its British and American varieties can be easily analyzed thanks to the corpora.
Historical change of words’ meanings and grammar is analyzed as a result of corpora development and although the number of old texts available in the electronic form is much smaller than the amount of contemporary texts the work is doable. Thus, the differences in grammar aspects concerning the passive voice were traced and it turns out that with the 19th century the passive voice in the English language started to be used more and more often.
When written and spoken corpora became available, linguists started analyzing them in order to check if there are any patterns of differences between speech and writing. It appears that apart from some quite obvious features such as false starts and hesitations which occur in speech, but not in writing, the use of large numbers of deictic expressions is also more frequent in oral discourses. It is probably because of extra linguistic signals that the spoken language is more vague. Additionally certain grammatical features apparent in speech might be considered ungrammatical in writing.
Unlike other scholars, linguists following the corpus linguistics methodology attempt to describe naturally occurring language supporting their views by large amounts of evidence found in corpora. Moreover, statistical operations are often involved in the work on corpora especially when frequencies of use of some linguistic aspects are measured. Large databases of naturally occurring language helped to make progress in the studies of phraseology, especially when it was discovered that certain meanings of words correlate with the grammatical structures in which they are used.
Corpus linguistics found application in many fields such as critical discourse analysis, stylistics, forensic linguistics, as well as translations and language teaching. In translations it is helpful since using parallel corpora enables better choice of equivalents and grammar structures that would reflect the desired meaning. Additionally studying corpora revealed that translators do not translate words in texts, but larger units – clauses, or sentences. Corpora studies have probably had even bigger influence on language teaching. First of all, they influenced the ways dictionaries are made, secondly learners’ language has been studied to improve the teachers’ knowledge of it, and the learners are nowadays encouraged to make use of corpora on their own, in order to increase their language awareness. Moreover, the results of studying information gathered from corpora influenced the design and content of language workbooks.
Brown K. (Editor) 2005. Encyclopedia of Language and Linguistics – 2nd Edition. Oxford: Elsevier.