Lecture 2
Lexical Analysis
Lecture
Notes: PDF
Outline
- Lexical analysis
- Lexemes and tokens
- Regular expressions
- Lexer generators
Workshop
Outline
- Lexemes of While language
- Manual implementation of a lexer
- Lexer generator
While Language
While is simple programming language described in the book “Principles of Program Analysis”. It will be used in the course to demonstrate implementations of various analysis techniques. The language specification is here.
Examples
Tasks
- Compile and run the lexer examples.
- Extend all the lexers to support hexadecimal and binary numbers.
- Extend the hand-made lexer to support
_
in identifiers. - Extend all the lexers to support comments: single-line (
//
) and multi-line (/*
,*/
).
References
Theory
- Lexical analysis (Wikipedia)
- [DRADON] Chapter 3: Lexical Analysis
- [PARR] Chapter 2: Basic Parsing Patterns
- [INTERP] Chapter 4: Scanning
- [ANTLR] Chapter 5: Designing Grammars, Section 5.5 : Recognizing Common Lexical Structures
Tools
- Online regular expression parser regex101
- Parser (and lexer) generator ANTLR
- Lexer generator JFlex
- The Fast Lexical Analyzer Flex
- IntelliJ Platform Plugin SDK: Lexer and Parser Definition