Original author(s) | Mike Lesk, Eric Schmidt |
---|---|
Initial release | 1975; 49 years ago (1975) |
Repository | |
Written in | C |
Operating system | Unix, Unix-like, Plan 9 |
Platform | Cross-platform |
Type | Command |
License | Plan 9: MIT License |
Lex is a computer program that generates lexical analyzers ("scanners" or "lexers").[1][2] It is commonly used with the yacc parser generator and is the standard lexical analyzer generator on many Unix and Unix-like systems. An equivalent tool is specified as part of the POSIX standard.[3]
Lex reads an input stream specifying the lexical analyzer and writes source code which implements the lexical analyzer in the C programming language.
In addition to C, some old versions of Lex could generate a lexer in Ratfor.[4]
Lex was originally written by Mike Lesk and Eric Schmidt[5] and described in 1975.[6][7] In the following years, Lex became standard lexical analyzer generator on many Unix and Unix-like systems. In 1983, Lex was one of several UNIX tools available for Charles River Data Systems' UNOS operating system under Bell Laboratories license.[8] Although originally distributed as proprietary software, some versions of Lex are now open-source. Open-source versions of Lex, based on the original proprietary code, are now distributed with open-source operating systems such as OpenSolaris and Plan 9 from Bell Labs. One popular open-source version of Lex, called flex, or the "fast lexical analyzer", is not derived from proprietary coding.
The structure of a Lex file is intentionally similar to that of a yacc file: files are divided into three sections, separated by lines that contain only two percent signs, as follows:
The following is an example Lex file for the flex version of Lex. It recognizes strings of numbers (positive integers) in the input, and simply prints them out.
/*** Definition section ***/
%{
/* C code to be copied verbatim */
#include <stdio.h>
%}
%%
/*** Rules section ***/
/* [0-9]+ matches a string of one or more digits */
[0-9]+ {
/* yytext is a string containing the matched text. */
printf("Saw an integer: %s\n", yytext);
}
.|\n { /* Ignore all other characters. */ }
%%
/*** C Code section ***/
int main(void)
{
/* Call the lexer, then quit. */
yylex();
return 0;
}
If this input is given to flex
, it will be converted into a C file, lex.yy.c
. This can be compiled into an executable which matches and outputs strings of integers. For example, given the input:
abc123z.!&*2gj6
the program will print:
Saw an integer: 123 Saw an integer: 2 Saw an integer: 6
Lex, as with other lexical analyzers, limits rules to those which can be described by regular expressions. Due to this, Lex can be implemented by a finite state automata as shown by the Chomsky hierarchy of languages. To recognize more complex languages, Lex is often used with parser generators such as YaccorBison. Parser generators use a formal grammar to parse an input stream.
It is typically preferable to have a parser, one generated by Yacc for instance, accept a stream of tokens (a "token-stream") as input, rather than having to process a stream of characters (a "character-stream") directly. Lex is often used to produce such a token-stream.
Scannerless parsing refers to parsing the input character-stream directly, without a distinct lexer.
make is a utility that can be used to maintain programs involving Lex. Make assumes that a file that has an extension of .l
is a Lex source file. The make internal macro LFLAGS
can be used to specify Lex options to be invoked automatically by make.[9]
lex(1)
– Solaris 11.4 User Commands Reference Manuallex(1)
– Plan 9 Programmer's Manual, Volume 1
| |
---|---|
File system |
|
Processes |
|
User environment |
|
Text processing |
|
Shell builtins |
|
Searching |
|
Documentation |
|
Software development |
|
Miscellaneous |
|
|
| |
---|---|
File system |
|
Processes |
|
User environment |
|
Text processing |
|
Shell builtins |
|
Networking |
|
Searching |
|
Software development |
|
Miscellaneous |
|
|