CS8084 NATURAL LANGUAGE PROCESSING Syllabus 2017 Regulation
NATURAL LANGUAGE PROCESSING Syllabus 2017 Regulation,CS8084 NATURAL LANGUAGE PROCESSING Syllabus 2017 Regulation
CS8084Â Â Â Â Â Â Â Â Â Â Â Â NATURAL LANGUAGE PROCESSINGÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â L T P CÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 3 0 0 3
OBJECTIVES:
- To learn the fundamentals of natural language processing
- To understand the use of CFG and PCFG in NLP
- To understand the role of semantics of sentences and pragmatics
- To apply the NLP techniques to IR applications
UNIT I INTRODUCTIONÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 9
Origins and challenges of NLP – Language Modeling: Grammar-based LM, Statistical LM – Regular Expressions, Finite-State Automata – English Morphology, Transducers for lexicon and rules, Tokenization, Detecting and Correcting Spelling Errors, Minimum Edit Distance
UNIT II WORD LEVEL ANALYSISÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 9
Unsmoothed N-grams, Evaluating N-grams, Smoothing, Interpolation and Backoff – Word Classes, Part-of-Speech Tagging, Rule-based, Stochastic and Transformation-based tagging, Issues in PoS tagging – Hidden Markov and Maximum Entropy models.
UNIT III SYNTACTIC ANALYSISÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 9
Context-Free Grammars, Grammar rules for English, Treebanks, Normal Forms for grammar – Dependency Grammar – Syntactic Parsing, Ambiguity, Dynamic Programming parsing – Shallow parsing – Probabilistic CFG, Probabilistic CYK, Probabilistic Lexicalized CFGs – Feature structures, Unification of feature structures.
UNIT IV SEMANTICS AND PRAGMATICSÂ Â Â Â Â Â Â Â Â Â 10
Requirements for representation, First-Order Logic, Description Logics – Syntax-Driven Semantic analysis, Semantic attachments – Word Senses, Relations between Senses, Thematic Roles, selectional restrictions – Word Sense Disambiguation, WSD using Supervised, Dictionary & Thesaurus, Bootstrapping methods – Word Similarity using Thesaurus and Distributional methods.
UNIT V DISCOURSE ANALYSIS AND LEXICAL RESOURCESÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 8
Discourse segmentation, Coherence – Reference Phenomena, Anaphora Resolution using Hobbs and Centering Algorithm – Coreference Resolution – Resources: Porter Stemmer, Lemmatizer, Penn Treebank, Brill’s Tagger, WordNet, PropBank, FrameNet, Brown Corpus, British National Corpus (BNC).
                                                   TOTAL :45 PERIODS
OUTCOMES:
Upon completion of the course, the students will be able to:
- To tag a given text with basic Language features
- To design an innovative application using NLP components
- To implement a rule based system to tackle morphology/syntax of a language
- To design a tag set to be used for statistical processing for real-time applications
- To compare and contrast the use of different statistical approaches for different types of NLP applications.
TEXT BOOKS:
- Daniel Jurafsky, James H. Martin―Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech, Pearson Publication, 2014.
- Steven Bird, Ewan Klein and Edward Loper, ―Natural Language Processing with Python, First Edition, OReilly Media, 2009.
REFERENCES:
- 1. Breck Baldwin, ―Language Processing with Java and LingPipe Cookbook, Atlantic Publisher, 2015.
- Richard M Reese, ―Natural Language Processing with Java, OReilly Media, 2015.
- Nitin Indurkhya and Fred J. Damerau, ―Handbook of Natural Language Processing, Second Edition, Chapman and Hall/CRC Press, 2010.
- Tanveer Siddiqui, U.S. Tiwary, ―Natural Language Processing and Information Retrieval, Oxford University Press, 2008.