Computational Linguistics examines human language. It uses formal models that can be implemented on the computer. In this way, it gains knowledge about the phonetic, syntactic and semantic structure of languages and about the way humans understand, produce and learn language. Aims of CLComputational linguistics is used to create computer programs for natural language processing, natural language generation and translation.
Forget about learning foreign languages If you are a language freak, you want to learn as many languages as possible, because you like communicating with people from all over the world. You don’t need that in CL. Most of the examples you will study will be in English and/or the language of instruction. Sometimes you will see sentences in exotic languages, but you needn’t understand them. Computational linguists use exotic languages to show properties unusual in well-known languages like English. In a course on grammar theory, I once had an example from Warlpiri, a Pama-Nyungan language spoken in northern Australia:
"The two small children are chasing the dog." You will always get those literal translations annotated with grammatical functions; no need to actually learn Warlpiri… Fall in love with grammar What do you love in a language? Its sound, the beauty of the writing system, proverbs, metaphors, poetry? That is mostly irrelevant to CL. You will deal with language on the following levels:
If you want to know how CL deals with those levels, please read the following section from the linguistic point of view. CL for LinguistsA practical approach with computers Computational linguistics combines several linguistic disciplines in a practical way. CL is no blabla and no philosophical discussions. Its aim is to make computers process or generate language, so you should know a little about computer programming. Computers are stupid. They can only perform a task if it is sufficiently explained (either by you or the programmer). If you ask people how to say ‘apple’ in Finnish, they will either use their knowledge or a dictionary to tell you the answer. A computer would have to execute the following program: if English-Finnish dictionary database is available then return dictionary_entry(database, English-Finnish, "apple") else check which dictionaries are on the bookshelf if book_found(bookshelf, English-Finnish dictionary) then search_entry(that book, "apple") if entry found then return read_entry(that book, found entry) else return error_message("Word not found in dictionary") end if else return error_message("Dictionary not found on bookshelf") end if end if And this is only a simplified sketch; real computer programs are much more complicated! The language of mathematics Computational linguists use strict formalisms like mathematicians and computer scientists. The example below shows a formalization of the sentence ‘I want to put a book on the table’. For computational linguistics, it is important to express everything related to a language (phonetics, phonology, morphology, syntax, semantics, pragmatics) in such a formal way. No philosophical reasoning about metaphors, just plain logical thinking. Why? Because CL is about teaching language to computers. And everything computers understand is formal logic. CL for Computer ProgrammersAs a programmer and computer freak, you don’t have much of a problem with the points mentioned in the two previous chapters. But wait a moment, CL still stands for computational linguistics… CL is about human languages There are huge differences between programming languages and real
human languages. How would you react if someone told you
Of course you use programming languages, or at least stuff like PROLOG or LISP. But that’s not what you’ll implement. Parsing natural languages is much more difficult:
Now your parser should know that the order Aux-NP-V indicates a question in English (with that noun phrase as its subject), and that the present tense of the auxiliary ‘to have’ together with the past participle of a verb puts that verb into present perfect. CL is a branch of linguistics A program always has a purpose, even if it’s fun writing it. When you write natural language processing or generation programs, you need to know how natural languages work. Do you know what an adjunct is? Ever heard of equi and raising phenomena? Are you able to recognize the finite verb in a long sentence? You should be. In my CL studies, I was confronted with tons of words like that, and all that was really explained by the teachers was unification, which is not much more than a simple intersection of two sets. The main problem of CL is the complexity of natural language, especially ambiguities. A word can have several meanings (the bank of a river vs. the Bank of England), and so can a complete sentence ("John follows a gangster in a sports car" – Who drives the sports car?). But even if there is no ambiguity, there are difficulties. The sentence ‘I am cold’ can be an expression about temperature, an indication of a disease or a request to close the window. CL for YouAre you interested in expressing natural language in a formal way? Interested in programming software that can process and generate natural language? Why don’t you start computational linguistics? :-) CL methods can be used in various applications such as car navigation systems, spell checkers for word processors or better search engines in the WWW. CL is a rather new science that still has a lot to discover. Although computers are getting faster and faster, we’re far from universal translators devices like in Star Trek. Though we can do better than Altavista’s Babelfish, as the example of Verbmobil shows: To learn more, check out The Association for Computational Linguists and The Linguist List or ask UniLangers who have experience with CL. :-) By the way, I was told the best universities to study computational linguistics are the following three: |
Saaropean
Originally published in Babel Babble