- Natural Language Processing FAQ

 Home >  Science >

Natural Language Processing FAQ

Section 1 of 3 - Prev - Next
All sections - 1 - 2 - 3

Last-Modified: Fri Feb  2 14:18:48 EST 2001
Posting-Frequency: Monthly
Version: 0.1
Archive-Name: natural-lang-processing-faq

This is the latest release of an FAQ (frequently asked questions and
answers) list for the newsgroup. Please don't
hesitate to send me any comments, be they positive or negative.  There
are many blank spots in the FAQ, please help fill them.

Copyright (c) 1994-2001, Dragomir R. Radev. All rights reserved.

Permission to distribute this FAQ by all volatile electronic means
(mailing lists, FTP, WWW, Usenet news, etc.) is hereby given under
the restriction that the file is not modified and all disclaimers and
acknowledgements remain intact.
This permission does NOT apply to CD-ROMS and/or commercial printed
publications. All requests for republication in this case should
be referred to the FAQ maintainer (

Many people have contributed to this FAQ. A list of credits is shown at the
end of the message.


[1] What is this FAQ all about
[2] What is Computational Linguistics
[3] What is
[4] How to get updates to this FAQ
[5] World-Wide Web resources.
[6] Which schools offer graduate programs in CL/NLP
[7] How to apply to graduate school in CL/NLP in the USA
[8] Organizations that are partly related to CL/NLP
[9] Major non-academic research laboratories
[10] What major publications exist in the field
[11] Bibliographies
[12] Electronic mailing lists
[13] Newsgroups
[14] Professional Organizations, Associations
[15] Major Conferences
[16] Evaluation Competitions
[17] How to join a mailing list
[18] How to obtain files by anonymous ftp
[19] FTP repositories
[20] What are some important books in NLP
[21] Encyclopedia of Artificial Intelligence
[22] Machine Translation
[23] What are the major accomplishments of the field
[24] Publishers
[25] Credits

Disclaimers and Notes

 1. Please read this FAQ list before posting to
 2. The FAQ is a collection of materials, rather than a complete reference.
    Some of the information may be out of date, so please be careful and
    take everything with a grain of salt. The maintainer, Dragomir R. Radev
    (, doesn't assume any responsibility for wrong
    information. The list of contributors to the FAQ appears at the end of
    this document.
 3. Any comments, contributions, and corrections are more than welcome.  
    Please help make the FAQ really helpful and interesting. 

[1] What is this FAQ all about

This is an attempt to put together a list of frequently (and not so
frequently) asked questions about Natural Language Processing and their
answers. This document is in no way perfect or complete or 100% accurate.
In no way should the maintainer be responsible for damage resulting 
directly or indirectly from using information in this FAQ.

The FAQ originated from Mark Kantrowitz's FAQ on AI. Some questions in
the present document come directly from Mark's original FAQ (available

This FAQ is maintained by Dragomir R. Radev of the University of
Michigan. Please send me all your comments, suggestions, corrections,
additions, and such to my e-mail address:
[2] What is Computational Linguistics

Computational linguistics (CL) is a discipline between linguistics and 
computer science which is concerned with the computational aspects of the 
human language faculty. It belongs to the cognitive sciences and overlaps 
with the field of artificial intelligence (AI), a branch of computer 
science that is aiming at computational models of human cognition. 
There are two components of CL: applied and theoretical.

The applied component of CL is more interested in the practical
outcome of modelling human language use. The goal is to create
software products that have some knowledge of human language.  Such
products are urgently needed for improving human-machine interaction
since the main obstacle in the interaction beween human and computer
is one of communication. Today's computers do not understand our
language, and humans have difficulties understand the computer's
language, which does not correspond to the structure of human thought.

Natural language interfaces enable the user to communicate with the 
computer in German, English or another human language.  Some applications 
of such interfaces are database queries, information retrieval from texts 
and so-called expert systems.  Current advances in recognition of spoken 
language improve the usability of many types of natural language systems.  
Communication with computers using spoken language will have a lasting
impact upon the work environment, opening up completely new areas of
application for information technology.

Although existing CL programs are far from achieving human ability, they 
have numerous possible applications. Even if the language the machine 
understands and its domain of discourse are very restricted, the use of 
human language can increase the acceptance of software and the productivity 
of its users.

Much older than communication problems between human beings and machines 
are those between people with different mother tongues.  One of the 
original goals of applied computational linguistics was fully automatic 
translation between human languages.  From bitter experience scientists 
have realized that they are far from achieving this.  Nevertheless,
computational linguists have created software systems which can simplify 
the work of human translators and clearly improve their productivity.  

The future of applied computational linguistics will be determined by the 
growing need for user-friendly software.  Even though the successful 
simulation of human language competence is not to be expected in the near 
future, computational linguists have numerous immediate research goals 
involving the design, realization and maintenance of systems which 
facilitate everyday work, such as grammar checkers for word processing 

Theoretical CL takes up issues in formal theories. It deals with
formal theories about the linguistic knowledge that a human needs for
generating and understanding language. Today these theories have
reached a degree of complexity that can only be managed by employing
computers.  Computational linguists develop formal models simulating
aspects of the human language faculty and implement them as computer
programmes. These programmes constitute the basis for the evaluation
and further development of the theories. In addition to linguistic
theories, findings from cognitive psychology play a major role in
simulating linguistic competence.  Within psychology, it is mainly the
area of psycholinguistics that examines the cognitive processes
constituting human language use.

The special attraction of computational linguistics lies in the combination 
of methods and strategies from the humanities, natural and behavioural 
sciences, and engineering.  

SEE ALSO: which contains:

* Chapter 1 of Christopher D. Manning and Hinrich Sch|tze, 1999,
  Foundations of Statistical Natural Language Processing, MIT Press,
  Cambridge, MA.
* Chapter 1 of Daniel Jurafsky and James H. Martin, 2000, Speech and 
  Language Processing, Prentice Hall, Upper Saddle River, New Jersey.

[3] What is

Here follows the original charter for 


Moderation:   This group will be unmoderated.

Purpose:      To discuss issues relating to natural language, especially
              computer-related issues from an AI viewpoint. The topics
              that will be discussed in this group will concentrate on, but
              are not limited to, the following:

                   *   Natural Language Understanding
                   *   Natural Language Generation
                   *   Machine Translation
                   *   Dialogue and Discourse Systems
                   *   Natural Language Interfaces
                   *   Parsing
                   *   Computational Linguistics
                   *   Computer-Aided Language Learning

              This group will avoid discussing issues that are more properly
              covered by other newsgroups. For example, speech synthesis
              should be discussed in comp.speech. However, due to the
              interdisciplinary nature of the field, there may be overlap in
              material between other groups. To try to keep this to a 
              minimum, topics should pertain to computer-related aspects
              of natural language.

Rules of Decorum:  Because of the unmoderated format, anyone with access to
                   this newsgroup will be able to post without review.
                   This is meant to encourage discussion of the topics.
                   Please refrain from "flames" or unnecessary criticism
                   of a person's viewpoints or personality in a harsh
                   or insulting manner. Criticisms should constructive
                   and polite whenever possible.

[4] How to get updates to this FAQ

This FAQ is available currently from the following newsgroups:, comp.answers,, and news.answers
It is posted once a month although updates are made less often.

The official archive of the above newsgroups is at MIT. You can get a
copy of the FAQ from

Another major site with lots of FAQs (including this one) is 

The current copy can also be retrieved from the following URL:

[5] World-Wide Web resources.


5.1.  The Association for Computational Linguistics site:

      The Association for Computational Linguistics is the major
      international organization in the field.

5.2.  The ACL NLP/CL Universe:

      The largest index of Computational Linguistics and Natural Language
      Processing resources on the Web. It features a search engine
      which should allow you to find specific NLP-related Web pages.

5.3.  The Computation and Language E-Print Archive

      The Computation and Language E-Print Archive is a fully automated
      electronic archive and distribution server for papers on 
      computational linguistics, natural-language processing, 
      speech processing, and related fields. 

5.4.  The Survey of the State of the Art of Human Language Technology

      This book surveys the state of the art of human language
      technology. The goal of the survey is to provide an interested reader
      with an overview of the field---the main areas of work, the
      capabilities and limitations of current technology, and the technical
      challenges that must be overcome to realize the vision of graceful
      human computer interaction using natural communication skills. 

5.5.  The Linguistic Data Consortium

      The Linguistic Data Consortium is an open consortium of universities,
      companies and government research laboratories. It creates, collects
      and distributes speech and text databases, lexicons, and other
      resources for research and development purposes. The University of
      Pennsylvania is the LDC's host institution. 

5.6. The Language Technology Helpdesk

      Frequently-asked questions of the Human COmmunication Research
      Centre at U. Edinburgh.


5.7.  Head-Driven Phrase Structure Grammar

      The HPSG offers current information relating to various aspects
      of the grammar formalism and linguistic theory of Head-Driven
      Phrase Structure Grammar, a constraint-based, lexicalist
      approach to grammatical theory that seeks to model human
      languages as systems of constraints on typed feature structures.

5.8.  Lexical Functional Grammar

      This site provides access to information about various aspects
      of the grammatical theory known as Lexical Functional Grammar

5.9.  Word Grammar

      This site houses publications on Word-Grammar and has some
      information on the group and its meetings.

[6] Which schools offer graduate programs in CL/NLP

This list is, *of course*, completely preliminary. Please send me 
information about other programs. I will try and get in touch with the
editors of the ACL guide to Graduate Programs in CL for more information.
Universities are given in alphabetical order. If a certain university
is not included now and you feel it must be included, please send me
some information about it.


Melbourne, University of
Microsoft Institute of Advanced Software Technology in association with
        Macquarie University


Montreal, University of
Ottawa, University of
Simon Fraser University
Toronto, University of
Waterloo, University of


Helsinki, University of


Paris 7, Jussieu, University of
Provence, University of


Bonn, University of
Heidelberg, University of
Humboldt University, Berlin
Koblenz-Landau, University of
Munich, University of
Osnabrueck, University of
Saarland, University of the
Potsdam, University of
Stuttgart, University of
Tuebingen, University of

Pisa, University of
Trento, University of


Kyoto University


Pohang University of Science and Technology, Pohang


Amsterdam, University of
Groningen, University of
Nijmegen, University of
Tilburg, University of
Utrecht, University of


Goteborg (Gothenburg), University of
Skoevde, University of
Uppsala, University of


Geneva, University of
Zurich, University of


Brighton, University of
Cambridge, University of
Durham, University of
Essex, University of
Edinburgh, University of
Sheffield, University of
Sussex, University of
University of Manchester Institute of Science and Technology


Brown University
Buffalo, SUNY at
California at Berkeley, University of
California at Los Angeles, University of
Carnegie-Mellon University
Columbia University
Cornell University
Delaware, University of
Duke University
Georgetown University
Georgia, University of
Georgia Institute of Technology
Harvard University
Indiana University
Information Sciences Institute (ISI) at the University of Southern California
Johns Hopkins University
Massachusetts at Amherst, University of
Massachusetts Institute of Technology
Michigan, University of
New Mexico State University
New York University
Ohio State University
Pennsylvania, University of
Rochester, University of
Southern California, University of
Stanford University
SUNY, Buffalo
Utah, University of
Wisconsin - Milwaukee, University of
Yale University

[7]How to apply to graduate school in CL/NLP in the USA

Usually, the best timetable is as follows (given that M is the month
when your studies would start, usually, in September)

        M - 24 : Try to clarify your interests: is it really NLP
                 that you are interested in? What possible
                 subfields might be of interest to you? ...etc.
                 Remember: 5 years working in an area you are
                           not interested in will be a very painful
        M - 18 : Read publications in the area of your interest
                 in order to discover the best places for
                 you. Pay close attention to the specific fields of
                 research: which professors are most active in  those
                 fields, and which institutions. 
                 Remember: Unless you are familiar with the most
                           current research, you will not be able
                           to find the best place for you.
        M - 18 : Go to your local library and consult some of the
                 available directories (see [3-3]) - write down
                 as much information as you can about some
                 15-25 universities. These universities form your
                 preliminary list.
                 Remember: There are some 100 universities in the
                           USA offering NLP/CL programs. Some of them
                           will be more attractive to you than others.
        M - 18 : Talk to your advisers at school, talk to other
                 students, post questions on the Internet, visit
                 departmental Web sites.
                 This way you will get advice on a few more univer-
                 sities that you might have skipped until this moment.
                 Remember: Others have faced what you are going
                           through. Use their experience.
        M - 15 : Send letters to the universities that you have
                 on your preliminary list. Make sure you indicate
                 when do you want to start, what degree (MA, MS,
                 Ph.D.) you are interested in, whether or not
                 you will be applying for financial aid, whether
                 you will need some special visa...
                 Remember: Ask for all the information that you
                           need; give them all the information they'd
                           need to satisfy your request.
        M - 12 : Read carefully the information that you have 
                 received from the universities. Shorten your list
                 of places to the number that you will eventually
                 apply to (usually 5-8 is a good number). 
                 Remember: Make sure you include both your best choice 
                           schools and some places where you are almost
                           certain of getting accepted.
        M - 10 : Fill in all the forms that are sent to you, 
                 ask your professors to send reference letters to 
                 the schools directly.
                 Remember: Professors will probably be very busy.
                           Give them the reference forms
                           as early as possible and make sure you 
                           specify a reasonable time for them to fill
                           them in and send them out.
        M - 10 : (or earlier) - take the necessary tests (GRE,
                 TOEFL, or others) that the schools want. Make sure
                 you tell the testing service which universities
                 you want them to send your scores to.
                 Remember: Time yourself through several practice
                           tests. The GRE General test, for example,
                           is more about mastery of timing than knowledge.
        M -  9 : (approximately) - mail your forms to the schools,
                 preferably 2-3 weeks before the deadlines.
                 Remember: You don't want your applications to get there
                           at the same time as everyone else. Give the
                           admissions committee some extra time to
                           review your application.
        M -  6 : usually six months before the beginning of the semester
                 that you are applying for, you will get a letter 
                 saying whether you have been accepted.
                 Remember: Usually, thick letters, e-mails, and telegrams
                           mean acceptance. Thin one-sheet letters will
                           most likely be disappointing for you.
        M -  5 : now, you have been accepted to a few schools. Go back
                 to the same resources that you used when you were 
                 deciding where to apply (journals, catalogs, directo-
                 ries, professors, etc.). Ask the schools that accepted
                 you to fly you in for a visit (many will do this).
                 Remember: Don't forget non-academic factors such as
                           location, financial aid, the atmosphere in
                           the department, etc.
[8] Organizations that are partly related to CL/NLP

International Assoc of MT (IAMT) and its daughters AMTA, EAMT, AAMT

ACM SIGIR (Special Interest Group in Information Retrieval)



[9] Major non-academic research laboratories

AT&T Labs - Research
BBN Systems and Technologies Corporation
DFKI (German research center for AI)
General Electric R&D
IRST, Italy
IBM T.J. Watson Research, NY
Lucent Technologies Bell Labs, Murray Hill, NJ
Microsoft Research, Redmond, WA
NEC Corporation
SRI International, Menlo Park, CA
SRI International, Cambridge, UK
Xerox, Palo Alto, CA
XRCE, Grenoble, France

[10] What major publications exist in the field


Computational Linguistics is the only publication devoted exclusively
to the design and analysis of natural language processing
systems. From this unique quarterly, university and industry
linguists, computational linguists, artificial intelligence (AI)
investigators, cognitive scientists, speech specialists, and
philosophers get information about computational aspects of research
on language, linguistics, and the psychology of language processing
and performance.

Published by The MIT Press for: The Association for Computational Linguistics. 



Dr B. K. Boguraev, IBM Thomas J. Watson Research Center, New York, USA
Professor Roberto Garigliano, University of Durham, UK
Dr John I. Tait, University of Sunderland, UK

Published: March, June, September and December. ISSN:1351-3249.

Natural Language Engineering is an international journal designed
to meet the needs of professionals and researchers working in all
areas of computerised language processing, whether from the
perspective of theoretical or descriptive linguistics, lexicology,
computer science or engineering. Its principal aim is to bridge the
gap between traditional computational linguistics research and the
implementation of practical applications with potential real-world
use. As well as publishing research articles on a broad range of
topics – from text analysis, machine translation and speech
generation and synthesis to integrated systems and multi modal
interfaces – the journal also publishes book reviews. Its aim is
to provide the essential link between industry and the academic community


Editors: Prof. S.J. Young & Dr. S.E. Levinson
Send manuscripts (worldwide apart from the Americas) to:
Prof. Steve Young, Cambridge University Engineering Dept.,
Trumpington Street, Cambridge, CB2 1PZ, England. 
Send manuscripts (from the Americas) to:
Dr. Steve Levinson, Head Linguistics Reseach,
AT&T Bell Laboratories, 600 Mountain Ave., 
Murray Hill, New Jersey 07974. USA. 
US Subscription rates are $170, with a personal rate of $75.
CS&L is published 4 times per year.
The address for subscription orders is:
Harcourt Brace and Company Limited,
High Street, Foots Cray, 
Sidcup, Kent, DA14 SHP. England.

Published 4 times annually. ISSN 0922-6567.
Subscriptions: Institutions $141 plus $16 postage; Individuals $55
(members of ACL $46).
Kluwer Academic Publishers, PO Box 322, 3300 AH Dordrecht, The
Netherlands, or Kluwer Academic Publishers, PO Box 358, Accord
Station, Hingham, MA 02018-0358. 

Published quarterly, since 1981.
Media Dimensions, New York, NY, USA

Published quarterly. ISSN 0167-806X
Subscriptions: Individual $59,-/Dfl.156,-; Institutional $200,-/Dfl.383,-
including p&h. Kluwer Academic Publishers
USA: Order Dept, Box 358, Accord Station, Hingham, MA 02018-0358. Phone 
(617) 871-6600; Fax (617) 871-6528; E-mail:
Other: P.O.Box 322, 3300 AH Dordrecht, The Netherlands. Phone (31) 78 
524400; Fax (31) 78 183273; Telex: kadc nl; E-mail:

Editors: Cotheart, Davies, Guttenplan, Harris, Humphreys, Leslie,
Smith, Wilson.
4 times annually
Blackwell Publishers, Oxford, UK.

Editor: Peter Gardenfors

[11] Bibliographies


   For information on a fairly complete bibliography of computational
   linguistics and natural language processing work from the 1980s, send
   mail to with the subject HELP. 

   The CSLI linguistics bibliography contains 3,300 entries in
   bib/tib/refer format. The bibliography is heavily slanted towards
   phonetics and phonology but also includes a fair amount of
   computational morphology, syntax, semantics, and psycholinguistics.
   The bibliography can be used with James Alexander's tib
   bibliography system, which is available from
   [] among other places. The bibliography itself is available
   by anonymous ftp from
   Contributions are welcome, but should be in tib format.
   For more information, contact Andras Kornai 


   Robert Dale's Natural Language Generation (NLG) bibliography is
   available by anonymous ftp from [] 
   Note that it is formatted for A4 paper. Stick in a line 
      .94 .94 scale
   after the %! line to print on 8.5 x 11 paper. For further information,
   write to Robert Dale, University of Edinburgh, Centre for Cognitive
   Science, 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, or
    or .

   Mark Kantrowitz's Natural Language Generation (NLG) bibliography is
   available by anonymous ftp from [] 
   In addition to the tech report, the BibTeX file containing the
   bibliography is also available.  The bibliography contains more than
   1,200 entries. A searchable index to the bibliography is
   available via the URL
   Additions and corrections should be sent to 

[12] Electronic mailing lists

(This section is out of date - should be fixed for next release.)

Information Retrieval:                                                 
Natural Language and Knowledge Representation (moderated):             (formerly                       
   Gatewayed to the newsgroup                       

Natural Language Generation:                                             

LFG (Lexical-Functional Grammar):

Statistics, Natural Language, and Computing:                          

Colibri (weekly update on Conferences, Seminars, Jobs and Shareware in
NLP and speech)
Dependency Grammar                                                                                                                

Text Analysis and Natural Language Applications:                           
Text Corpora:                                                         

Speech production and perception:                                    



Eastern (European) Language Engineering list:
   to join, send mail to
Preprint archive mailing list

  For further information about (among other topics) submission of papers to
  the server, subscribing or canceling your subscription, requesting full
  text of any of the papers above, retrieving macro files for these papers, 

Section 1 of 3 - Prev - Next
All sections - 1 - 2 - 3

Back to category Science - Use Smart Search
Home - Smart Search - About the project - Feedback

© | Terms of use