CS8084 NATURAL LANGUAGE PROCESSING Syllabus 2017 Regulation


CS8084 NATURAL LANGUAGE PROCESSING Syllabus 2017 Regulation

NATURAL LANGUAGE PROCESSING Syllabus 2017 Regulation,CS8084 NATURAL LANGUAGE PROCESSING Syllabus 2017 Regulation

CS8084                       NATURAL LANGUAGE PROCESSING                             L T P C                                                                                                                            3 0 0 3


  • To learn the fundamentals of natural language processing
  • To understand the use of CFG and PCFG in NLP
  • To understand the role of semantics of sentences and pragmatics
  • To apply the NLP techniques to IR applications

UNIT I INTRODUCTION                                                   9

Origins and challenges of NLP – Language Modeling: Grammar-based LM, Statistical LM – Regular Expressions, Finite-State Automata – English Morphology, Transducers for lexicon and rules, Tokenization, Detecting and Correcting Spelling Errors, Minimum Edit Distance

UNIT II WORD LEVEL ANALYSIS                                   9

Unsmoothed N-grams, Evaluating N-grams, Smoothing, Interpolation and Backoff – Word Classes, Part-of-Speech Tagging, Rule-based, Stochastic and Transformation-based tagging, Issues in PoS tagging – Hidden Markov and Maximum Entropy models.

UNIT III SYNTACTIC ANALYSIS                                      9

Context-Free Grammars, Grammar rules for English, Treebanks, Normal Forms for grammar – Dependency Grammar – Syntactic Parsing, Ambiguity, Dynamic Programming parsing – Shallow parsing – Probabilistic CFG, Probabilistic CYK, Probabilistic Lexicalized CFGs – Feature structures, Unification of feature structures.


Requirements for representation, First-Order Logic, Description Logics – Syntax-Driven Semantic analysis, Semantic attachments – Word Senses, Relations between Senses, Thematic Roles, selectional restrictions – Word Sense Disambiguation, WSD using Supervised, Dictionary & Thesaurus, Bootstrapping methods – Word Similarity using Thesaurus and Distributional methods.

UNIT V DISCOURSE ANALYSIS AND LEXICAL RESOURCES                                                                   8

Discourse segmentation, Coherence – Reference Phenomena, Anaphora Resolution using Hobbs and Centering Algorithm – Coreference Resolution – Resources: Porter Stemmer, Lemmatizer, Penn Treebank, Brill’s Tagger, WordNet, PropBank, FrameNet, Brown Corpus, British National Corpus (BNC).

                                                                                                      TOTAL :45 PERIODS


Upon completion of the course, the students will be able to:

  1. To tag a given text with basic Language features
  2. To design an innovative application using NLP components
  3. To implement a rule based system to tackle morphology/syntax of a language
  4. To design a tag set to be used for statistical processing for real-time applications
  5. To compare and contrast the use of different statistical approaches for different types of NLP applications.


  1. Daniel Jurafsky, James H. Martin―Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech, Pearson Publication, 2014.
  2. Steven Bird, Ewan Klein and Edward Loper, ―Natural Language Processing with Python, First Edition, OReilly Media, 2009.


  1. 1. Breck Baldwin, ―Language Processing with Java and LingPipe Cookbook, Atlantic Publisher, 2015.
  2. Richard M Reese, ―Natural Language Processing with Java, OReilly Media, 2015.
  3. Nitin Indurkhya and Fred J. Damerau, ―Handbook of Natural Language Processing, Second Edition, Chapman and Hall/CRC Press, 2010.
  4. Tanveer Siddiqui, U.S. Tiwary, ―Natural Language Processing and Information Retrieval, Oxford University Press, 2008.


Please enter your comment!
Please enter your name here