CS 3361 | Fall 2020 | Assignment #3 Lexical Analyzer
Assignment #3
Lexical Analyzer
Develop the C or C++ source code required to solve the following problem.
Problem
Develop a lexical analyzer in C or C++ that can identify lexemes and tokens found in a source code file provided by the user. Once the analyzer has identified the lexemes of the language and matched them to a token group, the program should print each lexeme / token pair to the screen.
The source code file provided by the user will be written in a new programming language called “DanC” and is based upon the following grammar (in BNF):
P ::= S S ::= V:=E | read(V) | write(V) | while C do S od | S;S C ::= E < E | E > E | E = E | E <> E | E <= E | E >= E E ::= T | E + T | E - T T ::= F | T * F | T / F F ::= (E) | N | V V ::= a | b | … | y | z | aV | bV | … | yV | zV N ::= 0 | 1 | … | 8 | 9 | 0N | 1N | … | 8N | 9N
Your analyzer should accept the source code file as a required command line argument and display an appropriate error message if the argument is not provided or the file does not exist. The command to run your application will look something like this:
Form: danc_analyzer <path_to_source_file>
Example: danc_analyzer test_file.danc
Lexeme formation is guided using the BNF rules / grammar above. Your application should output each lexeme and its associated token. Invalid lexemes should output UNKNOWN as their token group. The following token names should be used to identify each valid lexeme:
Lexeme
Token
Lexeme
Token
Lexeme
Token := ASSIGN_OP + ADD_OP do KEY_DO
<
LESSER_OP
-
SUB_OP
od
KEY_OD > GREATER_OP * MULT_OP <variable name> IDENT
=
EQUAL_OP
/
DIV_OP
<integer>
INT_LIT <> NEQUAL_OP read KEY_READ ( LEFT_PAREN
<=
LEQUAL_OP
write
KEY_WRITE
)
RIGHT_PAREN >= GEQUAL_OP while KEY_WHILE ; SEMICOLON
CS 3361 | Fall 2020 | Assignment #3 Lexical Analyzer
Additional Solution Rules
Your solution must conform to the following rules:
1) Your solution should be able to use whitespace, tabs, and end of line characters as delimiters between lexemes, however your solution should ignore these characters and not report them as lexemes nor should it require these characters to delimit lexemes of different types.
a. Example: “while i<=n do”
i. This line will generate 5 lexemes “while”, “i”, “<=”, “n”, and “do”.
ii. This means the space between “while” and “i” separated the two lexemes but wasn’t a lexeme itself.
iii. This also means that no space is required between the lexemes “i”, “<=”, and “n”.
2) Your solution should print out “DanC Analyzer :: R<#>” on the first line of output. The double colon “::” is required for correct grading of your submission.
3) Your solution must be tested to ensure compatibility with the GNU C/C++ compiler version 5.4.0.
4) Lexemes that do not match to a known token should be reported as an “UNKNOWN” token. This should not stop execution of your program or generate an error message.
Hints
1) Draw inspiration by looking at the lexical analyzer code discussed and distributed in class.
2) Start by focusing on writing the program in your usual C/C++ development environment.
3) Once your solution is correct, then work on testing it in Linux using the appropriate version of the GNU compiler (gcc).
4) Linux/Makefile tutorials:
a. Linux Video walkthrough: http://www.depts.ttu.edu/hpcc/about/training.php#intro_linux
b. Linux Text walkthrough: http://www.ee.surrey.ac.uk/Teaching/Unix/
c. Makefile tutorial: https://www.tutorialspoint.com/makefile/index.htm
What to turn in to BlackBoard
A zip archive (.zip) containing the following files:
• <FirstName>_<LastName>_<R#>_Assignment3.c / <FirstName>_<LastName>_<R#>_Assignment3.cpp
o C/C++ Source code file
o Example: Eric_Rees_R123456_Assignment3.c
• Makefile
o A makefile for compiling your C/C++ file.
o This makefile must work in the HPCC environment to compile your source code file and output an executable named danc_analyzer.
CS 3361 | Fall 2020 | Assignment #3 Lexical Analyzer
Example Execution
The example execution below was run on Quanah, one of the HPCC clusters. It shows all the commands used to compile and execute my analyzer. Bolded text is text from the Linux OS, text in red are the commands I typed and executed, and the text in blue represents the output from each step.
quanah:/assignment_3$ make clean
rm -f danc_analyzer
quanah:/assignment_3$ make
gcc -o danc_analyzer Eric_Rees_R123456_Assignment3.c
quanah:/assignment_3$ ./danc_analyzer test.danc
DanC Analyzer :: R123456
f IDENT
:= ASSIGN_OP
1 INT_LIT
; SEMICOLON
i IDENT
:= ASSIGN_OP
1 INT_LIT
; SEMICOLON
read KEY_READ
( LEFT_PAREN
n IDENT
) RIGHT_PAREN
; SEMICOLON
while KEY_WHILE
i IDENT
<= LEQUAL_OP
n IDENT
do KEY_DO
f IDENT
:= ASSIGN_OP
f IDENT
* MULT_OP
i IDENT
; SEMICOLON
i IDENT
:= ASSIGN_OP
i IDENT
+ ADD_OP
1 INT_LIT
od KEY_OD
; SEMICOLON