3º. 2º cuatrimestre. Itinerario de Computación. Grado en Ingeniería Informática. ULL
i
option for “literal”.
[characters]
& { predicate }
PEG.js is a parser generator for JavaScript that produces parsers.
It generates a parser from a Parsing Expression Grammar describing a language.
We can specify what the parser returns (using semantic actions on matched parts of the input).
To use the pegjs command, install globally:
$ npm install -g pegjs
To use the JavaScript API, install locally:
$ npm install pegjs
To use it from the browser, Install it using Bower:
$ bower install pegjs
[~/.../pegjs/examples(master)]$ pegjs --help
Usage: pegjs [options] [--] [<input_file>]
Options:
--allowed-start-rules <rules> comma-separated list of rules the generated
parser will be allowed to start parsing
from (default: the first rule in the
grammar)
--cache make generated parser cache results
-d, --dependency <dependency> use specified dependency (can be specified
multiple times)
-e, --export-var <variable> name of a global variable into which the
parser object is assigned to when no module
loader is detected
--extra-options <options> additional options (in JSON format) to pass
to peg.generate
--extra-options-file <file> file with additional options (in JSON
format) to pass to peg.generate
--format <format> format of the generated parser: amd,
commonjs, globals, umd (default: commonjs)
-h, --help print help and exit
-O, --optimize <goal> select optimization for speed or size
(default: speed)
-o, --output <file> output file
--plugin <plugin> use a specified plugin (can be specified
multiple times)
--trace enable tracing in generated parser
-v, --version print version information and exit
~/.../pegjs/examples(master)]$ node
Welcome to Node.js v12.10.0.
Type ".help" for more information.
> PEG = require("pegjs")
{
VERSION: '0.10.0',
GrammarError: [Function: GrammarError],
parser: {
SyntaxError: [Function: peg$SyntaxError] { buildMessage: [Function] },
parse: [Function: peg$parse]
},
compiler: {
visitor: { build: [Function: build] },
passes: { check: [Object], transform: [Object], generate: [Object] },
compile: [Function: compile]
},
generate: [Function: generate]
}
> parser = PEG.generate("start = ('a' / 'b')+")
{
SyntaxError: [Function: peg$SyntaxError] { buildMessage: [Function] },
parse: [Function: peg$parse]
}
On the top level, the grammar consists of rules (in our example, there is only one). Each rule has
start
) that identifies the rule, and('a' / 'b')+
) that defines a pattern to match against the input text andAn expression matching a literal string produces a JavaScript string containing matched text ('a'
matches a
).
In PEG.js an expression like \(expression_1 / expression_2 / ... / expression_n\) tries to match the first expression, if it does not succeed, try the second one, etc. Return the match result of the first successfully matched expression. If no expression matches, the match fails.
Also an expression like expression +
matches one or more repetitions of the expression and return their match results in an array. The matching is greedy, i.e. the parser tries to match the expression as many times as possible. Unlike in regular expressions, there is no backtracking.
Since an expression matching repeated occurrence of some subexpression produces a JavaScript array with all the matches, the expression ('a' / 'b')+
matches abb
and has as match result the array ['a', 'b', 'b']
.
Using the generated parser is simple — just call its parse
method and
pass an input string as a parameter.
> parser.parse("abba")
[ 'a', 'b', 'b', 'a' ]
The method will return
a parse result (the exact value depends on the grammar used to build the parser) or
throw an exception if the input is invalid.
The exception will contain
location
,
expected
,
found
and message
properties with more details about the error.
If an expression successfully matches a part of the text when running the generated parser, it produces a match result, which is a JavaScript value. For example:
The match results propagate through the rules when the rule names are used in expressions, up to the start rule. The generated parser returns start rule’s match result when parsing is successful.
That is why in our example:
> parser = PEG.generate("start = ('a' / 'b')+")
The match result for input abba
is [ 'a', 'b', 'b', 'a' ]
:
> parser.parse("abba")
[ 'a', 'b', 'b', 'a' ]
PEG parsers don’t backtrack like regexps and other recursive-descent parsers or Prolog do. Rather, when confronted with a choice, a PEG parser will try every option until one succeeds. Once one succeeds, it will commit to it no matter how the rule was invoked.
El siguiente ejemplo ilustra el /
y el backtracking en los PEGs:
[~/.../pegjs/examples(master)]$ cat backtracking.js
const PEG = require ("pegjs");
const grammar1 = `
start = "test" / "test ;"
`;
let parser = PEG.generate(grammar1);
let input = 'test';
console.log("OK: "+parser.parse(input)); // OK: test
try {
// This input will not be accepted
const input = 'test ;';
console.log(parser.parse(input));
}
catch (e) { // Expected end of input but " " found
console.log(e.message);
}
const grammar2 = `
start = "test" !" ;" / "test ;"
`;
parser = PEG.generate(grammar2);
input = 'test ;';
console.log("OK: "+parser.parse(input)); // OK: test ;
The expression ! ";"
is like a regexp negative lookahead:
tries to match ";"
. If the match does not succeed, just
return undefined
and does not advance the parser position, otherwise
considers the match failed giving opportunity for the next choice "test ;"
to work.
Una gramática y un PEG con las mismas reglas no definen el mismo lenguaje. Véase este ejemplo:
[~/.../pegjs/examples(master)]$ cat grammarvspegjs.js
const PEG = require('pegjs');
const grammar = `
A = B 'c'
B = 'b' / 'b' 'a'
`;
const parser = PEG.generate(grammar);
let r = parser.parse("bc");
console.log("r = " + r);
try {
r = parser.parse("bac");
console.log("r = " + r);
} catch(e) { console.log(e.message); }
[~/.../pegjs/examples(master)]$ node grammarvspegjs.js
r = b,c
Expected "c" but "a" found.
Obsérvese que la correspondiente gramática genera el lenguaje:
{ 'bc', 'bac' }
Mientras que el PEG acepta el lenguaje 'bc'
.
En efecto, probemos con una herramienta como Jison que permite procesar gramáticas:
[~/.../pegjs/examples(master)]$ cat grammarvspeg.jison
%lex
%%
. { return yytext; }
/lex
%%
A: B 'c'
;
B: 'b' | 'b' 'a'
;
Cuando parseamos la entrada bac
con esta gramática obtenemos:
[~/.../pegjs/examples(master)]$ jison grammarvspeg.jison
[~/.../pegjs/examples(master)]$ ls -ltr |tail -n 1
-rw-r--r-- 1 casiano staff 20615 10 may 09:18 grammarvspeg.js
[~/.../pegjs/examples(master)]$ ./use.js grammarvspeg bac
Processing <bac>
true
You can tweak parser behavior by passing a second parameter with an
options
object to the parse
method.
The following options are supported:
startRule
Name of the rule to start parsing from.
tracer
Tracer to use. See this example of use
Parsers can also support their own custom options.
Specifying allowedStartRules
we can set the rules the parser will be allowed to start
parsing from (default: the first rule in the grammar).
[~/.../pegjs/examples(master)]$ cat allowedstartrules.js
"use strict";
const util = require('util');
const PEG = require("pegjs");
const grammar = `
a = 'hello' b
b = 'world'
`;
console.log(grammar);
const parser = PEG.generate(grammar,{ allowedStartRules: ['a', 'b'] });
let r = parser.parse("helloworld", { startRule: 'a' });
console.log(r); // [ 'hello', 'world' ]
r = parser.parse("helloworld")
console.log(r); // [ 'hello', 'world' ]
r = parser.parse("world", { startRule: 'b' })
console.log(r); // 'world'
try {
r = parser.parse("world"); // Throws an exception
}
catch(e) {
r = e;
}
console.log(`The error object:`);
console.log(util.inspect(r, {depth: null}));
/*
{
message: 'Expected "hello" but "w" found.',
expected: [ { type: 'literal', text: 'hello', ignoreCase: false } ],
found: 'w',
location: {
start: { offset: 0, line: 1, column: 1 },
end: { offset: 1, line: 1, column: 2 }
},
name: 'SyntaxError'
}
*/
[~/.../pegjs/examples(master)]$ node allowedstartrules.js
a = 'hello' b
b = 'world'
[ 'hello', 'world' ]
[ 'hello', 'world' ]
world
The error object:
peg$SyntaxError: Expected "hello" but "w" found.
at peg$buildStructuredError (eval at compile (/Users/casiano/local/src/javascript/PLgrado/pegjs/examples/node_modules/pegjs/lib/compiler/index.js:67:29), <anonymous>:277:14)
at Object.peg$parse [as parse] (eval at compile (/Users/casiano/local/src/javascript/PLgrado/pegjs/examples/node_modules/pegjs/lib/compiler/index.js:67:29), <anonymous>:336:13)
at Object.<anonymous> (/Users/casiano/local/src/javascript/PLgrado/pegjs/examples/allowedstartrules.js:22:14)
at Module._compile (internal/modules/cjs/loader.js:936:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:947:10)
at Module.load (internal/modules/cjs/loader.js:790:32)
at Function.Module._load (internal/modules/cjs/loader.js:703:12)
at Function.Module.runMain (internal/modules/cjs/loader.js:999:10)
at internal/main/run_main_module.js:17:11 {
message: 'Expected "hello" but "w" found.',
expected: [ { type: 'literal', text: 'hello', ignoreCase: false } ],
found: 'w',
location: {
start: { offset: 0, line: 1, column: 1 },
end: { offset: 1, line: 1, column: 2 }
},
name: 'SyntaxError'
}
The error object e
contains
start
and end
objects with offset, line and column)and properties with more details about the error.
output
is set to 'parser'
, the method will return generated parser
object; if set to 'source'
, it will return parser source code as a string> grammar = "a = 'hello' b\nb='world'"
"a = 'hello' b\nb='world'"
> parser = PEG.generate(grammar,{ output: "parser"})
{
SyntaxError: [Function: peg$SyntaxError] { buildMessage: [Function] },
parse: [Function: peg$parse]
}
> let parser = PEG.generate(grammar,{ output: "source"})
undefined
> typeof parser
'string'
> parser.substring(0,100)
'/*\n' +
' * Generated by PEG.js 0.10.0.\n' +
' *\n' +
' * http://pegjs.org/\n' +
' */\n' +
'(function() {\n' +
' "use strict";\n' +
'\n' +
' funct'
The default value is: parser
.
La opción plugins
o en línea de comando plugin
indica que plugin se van a usar.
Véase la lista de plugins
[~/.../pegjs/examples(master)]$ cat plugin.coffee
#!/usr/bin/env coffee
PEG = require 'pegjs'
coffee = require 'pegjs-coffee-plugin'
grammar = """
a = 'hello' _ b { console.log 3; "hello world!" }
b = 'world' { console.log 2 }
_ = [ \t]+ { console.log 1 }
"""
parser = PEG.generate grammar, plugins: [coffee]
r = parser.parse "hello world"
console.log("r = #{r}")
One special case of parser expression is a parser action — a piece of JavaScript code inside curly braces ({
and }
) that takes match results of some of the the preceding expressions and returns a JavaScript value. This value is considered match result of the preceding expression (in other words, the parser action is a match result transformer).
La ejecución nos muestra además el orden de abajo - arriba y de izquierda -derecha en la ejecución de las parser actions o acciones semánticas:
[~/.../pegjs/examples(master)]$ npm i pegjs-coffee-plugin
[~/.../pegjs/examples(master)]$ npm i -g coffeescript
[~/.../pegjs/examples(master)]$ ./plugin.coffee
1
2
3
r = hello world!
The PEG.js syntax is similar to JavaScript in that it is not line-oriented and ignores whitespace between tokens.
You can also use JavaScript-style comments (// ...
and /* ... */
).
Let’s look at example grammar that recognizes simple arithmetic
expressions like 2*(3+4)
.
A parser generated from this grammar computes their values.
[~/.../pegjs/examples(master)]$ cat arithmetics-with-minus.pegjs
/*
* Classic example grammar, which recognizes simple arithmetic expressions like
* "2*(3+4)". The parser generated from this grammar then computes their value.
*/
start
= additive
additive
= left:multiplicative PLUS right:additive { return left + right; }
/ left:multiplicative MINUS right:additive { return left - right; }
/ multiplicative
multiplicative
= left:primary MULT right:multiplicative { return left * right; }
/ left:primary DIV right:multiplicative { return left / right; }
/ primary
primary
= integer
/ LEFTPAR additive:additive RIGHTPAR { return additive; }
integer "integer"
= NUMBER
_ = $[ \t\n\r]*
PLUS = _"+"_
MINUS = _"-"_
MULT = _"*"_
DIV = _"/"_
LEFTPAR = _"("_
RIGHTPAR = _")"_
NUMBER = _ digits:$[0-9]+ _ { return parseInt(digits, 10); }
On the top level, the grammar consists of rules (in our example, there are five of them). Each rule has
integer
) that identifies the rule, anddigits:[0-9]+
) that defines a pattern to match against the input text and{ return parseInt(digits.join(""), 10); }
).A rule can also contain human-readable name that is used in error messages (in our example, only the integer
rule has a human-readable name).
The parsing starts at the first rule, which is also called the start rule.
=
and a parsing expression.;
) after the parsing expression is allowed.In PEG.js the peg expression $ expression
tries to match the expression
. If the match succeeds, return the matched text instead of the match result.
That is why we use the $
inside the rule:
_ = $[ \t\n\r]*
This way, instead of returning an array of individual whites, the rule returns the matched string of whites.
We do the same in:
$[0-9]+
And thus, instead of an array of digits we have the string with the digits, simplifying the parsing to parseInt(digits, 10)
.
In our example there are many parser actions. Consider the action in expression
left:multiplicative PLUS right:additive { return left + right; }
It takes the match result of the appearance of the rule multiplicative
, which is provided in a variable named left
an the match result associated with the appearance of the rule additive
. They must be numbers and the action sets as match result
of the rule the sum of the two numbers { return left + right; }
In this example you see many examples of the syntax:
label : expression
The meaning is that it matches the expression and remembers its match result under given label
.
To produce the parser we compile it with pegjs
:
[~/.../pegjs/examples(master)]$ pegjs arithmetics-with-minus.pegjs
[~/.../pegjs/examples(master)]$ ls -ltr | tail -1
-rw-r--r-- 1 casiano staff 18688 9 may 14:36 arithmetics-with-minus.js
The output file arithmetics-with-minus.js
is a COMMON.js module that we can use
with a require
:
[~/.../pegjs/examples(master)]$ cat use-arithmetics-with-minus.js
var PEG = require("./arithmetics-with-minus.js");
var input = process.argv[2] || "5+3*2";
console.log(`Processing <${input}>`);
var r = PEG.parse(input);
console.log(r);
The execution gives:
[~/.../pegjs/examples(master)]$ node use-arithmetics-with-minus.js
Processing <5+3*2>
11
Una Gramática Independiente del Contexto se dice Recursiva por la Izquierda cuando existe una derivación \(A \stackrel{*}{\Longrightarrow} A \alpha\).
En particular, es recursiva por la izquierda si contiene una regla de producción de la forma \(A \rightarrow A \alpha\). En este caso se dice que la recursión por la izquierda es directa.
Cuando la gramática es recursiva por la izquierda, un método de análisis recursivo descendente
como los de los PEGs no funciona. En ese caso, el procedimiento A
asociado con
\(A\) ciclaría para siempre sin llegar a consumir ningún terminal.
Es por eso que en el ejemplo hemos empezado escribiendo las reglas de la calculadora con recursividad a derechas,
additive
= left:multiplicative PLUS right:additive { return left + right; }
/ left:multiplicative MINUS right:additive { return left - right; }
/ multiplicative
pero eso da lugar a árboles hundidos hacia la derecha y a una aplicación de las reglas semánticas errónea:
[~/.../pegjs/examples(master)]$ node use-arithmetics-with-minus.js 5-3-2
Processing <5-3-2>
4
The first rule can be preceded by an initializer — a piece of JavaScript code in curly braces ({
and }
).
options
variable.Here is an example of use of initializers that also solves the associativity problem posed in the section a very simple calculator:
[~/.../pegjs/examples(master)]$ cat simple_reduce.pegjs
{
function reduce(left, right) {
return right.reduce((sum, [op, num]) => eval(`sum ${op}= num`), left);
}
}
sum = left:product right:($[+-] product)* { return reduce(left, right); }
product = left:value right:($[*/] value)* { return reduce(left, right); }
value = number:$[0-9]+ { return parseInt(number,10); }
/ '(' sum:sum ')' { return sum; }
We have written a helper script use.js
to load a parser and execute it on a
given input:
[~/.../pegjs/examples(master)]$ cat use.js
#!/usr/bin/env node
const [peg, input] = process.argv.splice(2);
const parse = require('./'+peg).parse;
console.log(`Processing <${input}>`);
const r = parse(input);
console.log(r);
Now we have an arithmetics calculator that correctly computes substractions and divisions:
[~/.../pegjs/examples(master)]$ ./use.js 'simple_reduce' 5-3-2
Processing <5-3-2>
0
In this example the variable g
declared inside the initializer is visible
in all the actions. The variable options
contains the options argument
used in the call to the parser:
[~/.../pegjs/examples(master)]$ cat initializer.js
"use strict";
const PEG = require("pegjs");
const grammar = `
{
const util = require("util");
const g = "visible variable";
console.log("Inside Initializer! options = "+util.inspect(options));
}
start = 'a' { console.log('g = '+g); return 1; }
/ & { console.log('inside predicate: g = '+g); return true; } 'b' { return 2; }
`;
const parser = PEG.generate(grammar);
console.log("PARSING 'a'");
let r = parser.parse("a", { x: 'hello' });
console.log(r);
console.log("PARSING 'b'");
r = parser.parse("b");
console.log(r);
[~/.../pegjs/examples(master)]$ node initializer.js
PARSING 'a'
Inside Initializer! options = { x: 'hello' }
g = visible variable
1
PARSING 'b'
Inside Initializer! options = {}
inside predicate: g = visible variable
2
i
option for “literal”Appending i
right after the literal makes the match case-insensitive:
[~/.../pegjs/examples(master)]$ cat ignorecasejs.js
"use strict"
const PEG = require('pegjs');
const grammar = `start = a: 'a'i `;
const parser = PEG.generate(grammar);
let r = parser.parse('A');
console.log(r);
.
Match exactly one character (including \n
) and return it as a string:
[~/.../pegjs/examples(master)]$ cat dotjs.js
const PEG = require('pegjs');
const grammar = 'start = a: ..';
const parser = PEG.generate(grammar);
const r = parser.parse("\n\t");
console.log(r);
[~/.../pegjs/examples(master)]$ node dotjs.js
[ '\n', '\t' ]
[characters]
Match one character from a set and return it as a string.
The characters in the list can be escaped in exactly the same way as in JavaScript string.
The list of characters can also contain ranges (e.g. [a-z]
means all lowercase letters).
Preceding the characters with ^
inverts the matched set (e.g.
[^a-z]
means “all character but lowercase letters).
Appending i
right after the class makes the match case-insensitive.
Example:
[~/.../pegjs/examples(master)]$ cat regexpjs.js
const PEG = require('pegjs');
grammar = 'start = a: [aeiou\u2661]i . [^x-z] ';
parser = PEG.generate(grammar);
r = parser.parse('Abr');
console.log(r);
r = parser.parse('♡br');
console.log(r);
[~/.../pegjs/examples(master)]$ node regexpjs.js
[ 'A', 'b', 'r' ]
[ '♡', 'b', 'r' ]
& expression
Try to match the expression. If the match succeeds, just return undefined
and do not advance the parser position, otherwise consider the match failed.
! expression
Try to match the expression. If the match does not succeed, just
return undefined
and do not advance the parser position, otherwise
consider the match failed.
The language \(\{ a^n b^n c^n / n \in \mathcal{N} \}\) can’t be generated by a context free grammar.
[~/.../pegjs/examples(master)]$ cat anbncn.pegjs
/*
The following parsing expression grammar describes the classic
non-context-free language :
{ a^nb^nc^n / n >= 1 }
*/
S = &(A 'c') 'a'+ B:B !. { return B; }
/* S = &A 'a'+ B:B !. does not work since accepts abbcc */
A = 'a' A:A? 'b' { if (A) { return A+1; } else return 1; }
B = 'b' B:B? 'c' { if (B) { return B+1; } else return 1; }
[~/.../pegjs/examples(master)]$ pegjs anbncn.pegjs
[~/.../pegjs/examples(master)]$ ./use.js anbncn aabcc
Processing <aabcc>
Expected , or undefined but "a" found.
[~/.../pegjs/examples(master)]$ ./use.js anbncn aabbcc
Processing <aabbcc>
2
[~/.../pegjs/examples(master)]$ ./use.js anbncn aabccc
Processing <aabccc>
Expected , or undefined but "a" found.
See also the old version of this example
The following recursive program matches Pascal-style nested comment syntax:
(* which can (* nest *) like this *)
[~/.../pegjs/examples(master)]$ cat pascal_comments.pegjs
/* Pascal nested comments */
P = prog:N+ { return prog; }
N = chars:$(!Begin .)+ { return chars;}
/ C
C = Begin chars:$T* End { return "C: "+chars; }
T = C
/ !Begin !End char:. { return char;}
Begin = '(*'
End = '*)'
$ cat use_pascal_comments.js
var PEG = require("./pascal_comments.js");
var r = PEG.parse(
"not bla bla (* pascal (* nested *) comment *)"+
" pum pum (* another comment *)");
console.log(r);
[~/.../pegjs/examples(master)]$ node use_pascal_comments.js
[
'not bla bla ',
'C: pascal (* nested *) comment ',
' pum pum ',
'C: another comment '
]
Here is an example recognizing JavaScript whitespaces and comments:
[~/srcPLgrado/pegjs/examples(master)]$ cat notpredicate.pegjs
__ = (whitespace / eol / comment)*
/* Modeled after ECMA-262, 5th ed., 7.4. */
comment "comment"
= singleLineComment
/ multiLineComment
singleLineComment
= "//" (!eolChar .)* { return text(); }
multiLineComment
= "/*" (!"*/" .)* "*/" { return text(); }
/* Modeled after ECMA-262, 5th ed., 7.3. */
eol "end of line"
= "\n"
/ "\r\n"
/ "\r"
/ "\u2028"
/ "\u2029"
eolChar
= [\n\r\u2028\u2029]
whitespace "whitespace"
= [ \t\v\f\u00A0\uFEFF\u1680\u180E\u2000-\u200A\u202F\u205F\u3000]
Once it is compiled we can call it from our main program:
[~/srcPLgrado/pegjs/examples(master)]$ cat mainnotpredicate.js
var PEG = require("./notpredicate.js");
var input = process.argv[2] || "// one comment\n"+
"// another comment \t/\n"+
"/* a\n"+
" third comment */";
console.log("\n*****\n"+input+"\n*****\n");
var r = PEG.parse(input);
console.log(r);
This is the output:
[~/srcPLgrado/pegjs/examples(master)]$ pegjs notpredicate.pegjs
[~/srcPLgrado/pegjs/examples(master)]$ node mainnotpredicate.js
*****
// one comment
// another comment /
/* a
third comment */
*****
[ '// one comment',
'\n',
'// another comment \t/',
'\n',
'/* a\n third comment */' ]
& { predicate }
The predicate is a piece of JavaScript code that is executed as if it was inside a function. It gets the match results of labeled expressions in preceding expression as its arguments. It should return some JavaScript value using the return
statement. If the returned value evaluates to true
in boolean context, just return undefined
and do not consume any input; otherwise consider the match failed.
[~/.../pegjs/examples(master)]$ cat semantic_intermediajs.js
const PEG = require('pegjs');
const grammar = `
a = a:$'a'+
& { console.log("acción intermedia. a = "+a); return true; }
b:$'b'+ {
console.log("acción final. b = "+b);
return text();
}
`;
parser = PEG.generate(grammar);
r = parser.parse("aabb");
console.log("r = " + r);
[~/.../pegjs/examples(master)]$ node semantic_intermediajs.js
acción intermedia. a = aa
acción final. b = bb
r = aabb
The code inside the predicate can access all variables and functions defined in the initializer at the beginning of the grammar.
The code inside the predicate can also access location information using the location
function. It returns an object like this:
{
start: { offset: 23, line: 5, column: 6 },
end: { offset: 23, line: 5, column: 6 }
}
The start
and end
properties both refer to the current parse position.
The offset
property contains an offset as a zero-based index and line
and column
properties contain a line and a column as one-based indices.
The code inside the predicate can also access options
passed to the parser using the options variable.
Note that curly braces in the predicate code must be balanced.
[~/.../pegjs/examples(master)]$ cat semantic_predicatejs.js
let PEG, grammar, parser, r;
PEG = require('pegjs');
grammar = `
{
let util = require("util")
let g = "visible variable"
console.log("Inside Initializer! options = "+util.inspect(options))
}
start = 'a' { console.log(g); return 1 }
/ c:'c' '\\n' & {
console.log("inside predicate: g = "+g+" c = "+c);
console.log("options = "+util.inspect(options));
console.log("location = "+util.inspect(location()));
return true
} 'b' { return 2 }
`;
parser = PEG.generate(grammar);
r = parser.parse('a', { x: 'hello' });
console.log(r);
r = parser.parse("c\nb", { y: 'world' });
console.log(r);
Execution:
[~/.../pegjs/examples(master)]$ node semantic_predicatejs.js
Inside Initializer! options = { x: 'hello' }
visible variable
1
Inside Initializer! options = { y: 'world' }
inside predicate: g = visible variable c = c
options = { y: 'world' }
location = {
start: { offset: 2, line: 2, column: 1 },
end: { offset: 2, line: 2, column: 1 }
}
2
The dangling else is a problem in computer programming in which an
optional else clause
in an If–then(–else)
statement results in
nested conditionals being ambiguous.
Formally, the reference context-free grammar of the language is ambiguous, meaning there is more than one correct parse tree.
In many programming languages one may write conditionally executed code in two forms:
the if-then
form, and the if-then-else
form – the else
clause is
optional:
if a then s
if a then s1 else s2
This gives rise to an ambiguity in interpretation when there are nested
statements, specifically whenever an if-then
form appears as s1
in
an if-then-else
form:
if a then if b then s else s2
In this example, s
is unambiguously executed when a
is true
and
b
is true
, but one may interpret s2
as being executed when a
is
false
(thus attaching the else
to the first if
) or when
a
is true
and b
is false
(thus attaching the else
to the
second if
).
In other words, one may see the previous statement as either of the following expressions:
if a then (if b then s) else s2
or
if a then (if b then s else s2)
This is a problem that often comes up in compiler construction, especially scannerless parsing.
The convention when dealing with the dangling else
is to attach the
else
to the nearby if
statement.
Programming languages like Pascal and C follow this convention, so there
is no ambiguity in the semantics of the language, though the use of a
parser generator may lead to ambiguous grammars. In these cases , such
as begin...end
in Pascal and {...}
in C.
Here follows a solution in PEG.js:
$ cat danglingelse.pegjs
/*
S ← 'if' C 'then' S 'else' S / 'if' C 'then' S
*/
S = if C:C then S1:S else S2:S { return [ 'ifthenelse', C, S1, S2 ]; }
/ if C:C then S:S { return [ 'ifthen', C, S]; }
/ O { return 'O'; }
_ = ' '*
C = _'c'_ { return 'c'; }
O = _'o'_ { return 'o'; }
else = _'else'_
if = _'if'_
then = _'then'_
$ cat use_danglingelse.js
var PEG = require("./danglingelse.js");
var r = PEG.parse("if c then if c then o else o");
console.log(r);
$ ../bin/pegjs danglingelse.pegjs
$ node use_danglingelse.js
[ 'ifthen', 'c', [ 'ifthenelse', 'c', 'O', 'O' ] ]
Si invertimos el orden de las alternativas:
[~/srcPLgrado/pegjs/examples(master)]$ cat danglingelse2.pegjs
/*
S ← 'if' C 'then' S 'else' S / 'if' C 'then' S
*/
S = if C:C then S:S { return [ 'ifthen', C, S]; }
/ if C:C then S1:S else S2:S { return [ 'ifthenelse', C, S1, S2 ]; }
/ O { return 'O'; }
_ = ' '*
C = _'c'_ { return 'c'; }
O = _'o'_ { return 'o'; }
else = _'else'_
if = _'if'_
then = _'then'_
el lenguaje reconocido cambia (vease el ejemplo en la sección [subsection:pegvsgrammars]):
[~/srcPLgrado/pegjs/examples(master)]$ pegjs danglingelse2.pegjs
[~/srcPLgrado/pegjs/examples(master)]$ cat use_danglingelse2.js
var PEG = require("./danglingelse2.js");
var r = PEG.parse("if c then if c then o else o");
console.log(JSON.stringify(r));
[~/srcPLgrado/pegjs/examples(master)]$ node use_danglingelse2.js
/Users/casiano/local/src/javascript/PLgrado/pegjs/examples/danglingelse2.js:513
SyntaxError: Expected " " or end of input but "e" found.
Vease PegJS en los Browser
This lab illustrates a problem that arises in C++. The C++ syntax does
not disambiguate between expression statements (stmt
) and declaration
statements (decl
). The ambiguity arises when an expression statement
has a function-style cast as its left-most subexpression. Since C does
not support function-style casts, this ambiguity does not occur in C
programs. For example, the phrase
int (x) = y+z;
parses as either a decl
or a stmt
.
The disambiguation rule used in C++ is that if the statement can be interpreted both as a declaration and as an expression, the statement is interpreted as a declaration statement.
The following examples disambiguate into expression
statements when the potential declarator is followed by
an operator different from equal or semicolon (type_spec
stands for a
type specifier):
type_spec(i)++;
type_spec(i,3)<<d;
type_spec(i)->l=24;
type_spec(*i)(int);
type_spec(j)[5];
type_spec(m) = { 1, 2 };
type_spec(a);
type_spec(*b)();
type_spec(c)=23;
type_spec(d),e,f,g=0;
type_spec(h)(e,3);
Regarding to this problem, Bjarne Stroustrup remarks:
Consider analyzing a statement consisting of a sequence of tokens as follows:
type_spec (dec_or_exp) tail
Here
dec_or_exp
must be a declarator, an expression, or both for the statement to be legal. This implies thattail
must be a semicolon, something that can follow a parenthesized declarator or something that can follow a parenthesized expression, that is, an initializer,const
,volatile
,(
,[
, or a postfix or infix operator. The general cases cannot be resolved without backtracking, nested grammars or similar advanced parsing strategies. In particular, the lookahead needed to disambiguate this case is not limited.
The following grammar depicts an oversimplified version of the C++ ambiguity:
$ cat CplusplusNested.y
%token ID INT NUM
%right '='
%left '+'
%%
prog:
/* empty */
| prog stmt
;
stmt:
expr ';'
| decl
;
expr:
ID
| NUM
| INT '(' expr ')' /* typecast */
| expr '+' expr
| expr '=' expr
;
decl:
INT declarator ';'
| INT declarator '=' expr ';'
;
declarator:
ID
| '(' declarator ')'
;
%%
Escriba un programa PegJS que distinga correctamente entre declaraciones y sentencias. Este es un ejemplo de un programa que usa una solución al problema:
[~/Dropbox/src/javascript/PLgrado/pegjs-coffee-plugin/examples(master)]$ cat use_cplusplus.coffee
PEG = require("./cplusplus.js")
input = "int (a); int c = int (b);"
r = PEG.parse(input)
console.log("input = '#{input}'\noutput="+JSON.stringify r)
input = "int b = 4+2 ; "
r = PEG.parse(input)
console.log("input = '#{input}'\noutput="+JSON.stringify r)
input = "bum = caf = 4-1;\n"
r = PEG.parse(input)
console.log("input = '#{input}'\noutput="+JSON.stringify r)
input = "b2 = int(4);"
r = PEG.parse(input)
console.log("input = '#{input}'\noutput="+JSON.stringify r)
input = "int(4);"
r = PEG.parse(input)
console.log("input = '#{input}'\noutput="+JSON.stringify r)
Y este un ejemplo de salida:
$ pegcoffee cplusplus.pegjs
$ coffee use_cplusplus.coffee
input = 'int (a); int c = int (b);'
output=["decl","decl"]
input = 'int b = 4+2 ; '
output=["decl"]
input = 'bum = caf = 4-1;
'
output=["stmt"]
input = 'b2 = int(4);'
output=["stmt"]
input = 'int(4);'
output=["stmt"]
See the PEG.js grammar here
[~/srcPLgrado/pegjs/examples(master)]$ pwd -P
/Users/casiano/local/src/javascript/PLgrado/pegjs/examples
[~/.../pegjs/examples(master)]$ git remote -v
2020 https://github.com/pegjs/pegjs (fetch)
2020 https://github.com/pegjs/pegjs (push)
dmajda https://github.com/dmajda/pegjs.git (fetch)
dmajda https://github.com/dmajda/pegjs.git (push)
origin git@github.com:crguezl/pegjs.git (fetch)
origin git@github.com:crguezl/pegjs.git (push)