role of lexical analyser

Role of a Lexical Analyser

The lexical analyzer is the first phase of compiler. A program or function which performs lexical analysis is called a lexical analyzer, lexer or scanner. A lexer often exists as a single function which is called by a parser a parser or or another function. y

y y y y y y y y

y y y y

Its

main task is is to read the input characters from from the source Program and produces output a sequence of tokens tok ens that the parser uses for syntax analysis. To group them into lexemes Produce as output a sequence of tokens Group them into lexemes Produce as output a sequence of tokens input for the syntactical analyzer Interact with the symbol table Insert identifiers identifiers to strip out  comments  whitespaces: blank, newline, tab, «  other separators to correlate error messages generated by the co mpiler mpiler with the source program to keep track of the number of newlines seen to associate a line number with each eac h error Message. Macros expansion

Upon receiving a ³get next token´ command from the parser the lexical analyzer reads input characters until it can identify the next token. The LA return to the parser representation for the token it has found. The representation will be an integer code, if the token is a simple construct such as parenthesis, comma or colon. The representation representatio n is a pair consisting of an integer code and a pointer to a table if the token is a more complex element such as an identifier or constant. The integer code gives the token type and the pointer points to the value of that token.

Sometimes , lexical analyzers are divi d ivided ded into a cascade of two phases, the first called called ³scanning´, and the second ³lexical analysis´.

The scanner is responsible for doing simple tasks, while the lexical analyzer proper does the more complex operations. The lexical analyzer which we have designed takes the input from a input file. It reads one character at a time from the input file, and continues to read until end of the file is reached. It recognizes the valid identifiers, keywords and specifies the to ken values of the keywords. It

also identifies the header files, #define statements, numbers, special characters, various relational and logical operators, ignores the white spaces and comments. It prints the output in a separate file specifying the line number

Token A token is a string of characters, categorized according to the rules as a symbol (e.g., IDENTIFIER , NUMBER , COMMA). The process of forming tokens from an input stream of characters is called tokenization and the lexer categorizes them according to a symbol type. A token can look like anything that is useful for processing an input text stream or text file. A lexical analyzer generally does nothing with combinations of tokens, a task left for a parser . For example, a typical lexical analyzer recognizes parentheses as tokens, but does nothing to ensure that each '(' is matched w ith a ')'. Consider this expression in the C programming language: sum=3+2;

Tokenized in the following table

Lexeme

Token

type

sum

Identifier

=

Assignment operator

3

Number

+

Addition operator

2

Number

;

End

of statement

Tokens are frequently defined by regular expressions, which are understood by a lexical analyzer generator such as lex. is the process of demarcating and possibly classifying sections of a string of input characters. The resulting tokens are then passed on to some other form of processing. T he process can be considered a sub-task of parsing input. Tokenization

Take, for example, The quick brown fox jumps over the lazy dog

The string isn't implicitly segmented on spaces, as an English speaker would do. The raw input, the 43 characters, must be explicitly split into the 9 tokens with a given space delimiter (i.e. matching the string " " or regular expression /\s{1}/. The tokens could be represented in XML, The quick brown fox jumps over the lazy dog

Or an s-expression, (sentence ((word The) (word quick) (word brown) (word fox) (word jumps) (word over) (word the) (word lazy) (word dog)))

Examples

of Tokens

Dealing With Errors Lexical analyzer unable to proceed: no pattern matches Panic mode recovery: delete successive characters from remaining input until token found

Insert

missing character Delete a character Replace character by another Transpose characters

two

adjacent

role of lexical analyser

Recommend Documents