Department of Computer Science Engineering Sagar Institute of Research & Technoog! Session "#$%"#$'
(# Based Grammar Chec0er A grammar chec0er is one of the /asic atural (anguage #rocessing )(# tools for any language. ,he (# field is relati+ely new in India and a lot of tools ha+e yet to /e de+eloped. $ne of these is a grammar chec0er.
Goas ,o implement a ,e1t #rocessing system which chec0s grammar of Input te1t and identifies types of error2
Description in detai(
$)
P*S tagging
Before grammar chec0ing can /e performed on a te1t it needs to /e run through a part‐of ‐ speech )#$S tagger and parser. ,his ena/les the grammar chec0er to recognise types of words within each sentence. ,he te1t is first run through a #$S tagger which generates a tag for each word in a sentence. ,he tag indicates the word3s class. e1t4 the te1t )with tags is run through a parser which performs syntactic analysis on it4 adding tags to parts of the sentence4 mar0ing phrases within it and syntactic roles. for e1ample5
6. +aking Chunk,ased Sentence Patterns chun0s is a process to parse the sentence into a form that is a chun0 /ased sentence structure. A chun0 is a te1tual unit of ad7acent #$S tags which display the relations /etween their internal words. Input English sentence is made in chun0 structure /y using hand written rules. It represents how these chun0s fit together to form the constituents of the sentence. Conte1t Free Grammar )CFG5 CFGs constitute an important class of grammars4 with a /road range of applications including programming languages4 natural language processing4 /io informatics and so on. CFG3s rules present a single sym/ol on the left8hand8side4 are a sufficiently powerful formalism to descri/e most of the structure in natural language.
A conte1t8free grammar G 9 )V4 ,4 S4 # is gi+en /y • A
finite set V of +aria/les or non terminal sym/ols.
• A
finite set , of sym/ols or terminal sym/ols. 'e assume that the sets V and , are
dis7oint. • A
start sym/ol S
∈V.
• A
finite set # ⊆ V × )V∪,: of productions. A production )A4 ;4 where A ∈V and
;∈)V∪,: is a se reduce parsing /egins with the input sentence and com/ines words into higher8le+el chun0s until the unit finally /ecomes a sentence. Parsing chunks ,! using C-G( ,he syntactic chun0 structure of a sentence is necessary to determine its gra mmar correctness. In the proposed system4 ten general chun0 types are used to ma0e the chun0 structure as shown in ,a/le.
,he proposed grammar chec0er identifies the chun0s using CFG /ased /ottom8up parsing for assem/ling #$S tags into higher le+el chun0s4 until a complete sentence has /een found. For e1ample4 a simple sentence ?,he students are playing foot/all in the playground.@ is chun0ed as follows5 CVCC##CCE! )Chun08/ased Sentence #attern CVCC##CC CVCC##C CVCC
S!stem Components $) PoS Tagger ") Chunk Based Grammar Checker)
.ppications • • • • • • • • • •
,e1t #rocessing -achine ,ranslation Systems Search Engine Spell8chec0er Grammar Chec0er amed Entity Identification Information E1traction Information Retrie+al ,e1t Classification and Clustering uestion Answering Systems Custom Search Systems