A modern, high-performance lexical analysis and parsing system with comprehensive PCRE2 support and CognitiveGraph integration
This guide explains how to create grammar files for the DevelApp.StepLexer and DevelApp.StepParser system, enabling you to define Domain-Specific Languages (DSLs) with comprehensive parsing and CognitiveGraph integration.
The grammar system supports a two-phase parsing approach:
Grammar: YourDSLName
TokenSplitter: Space
# Token Rules (Lexical Analysis)
<TOKEN_NAME> ::= pattern
<ANOTHER_TOKEN> ::= pattern
# Production Rules (Syntax Analysis)
<rule_name> ::= <TOKEN1> <TOKEN2>
| <alternative_pattern>
Grammar: MyDSL
TokenSplitter: Space
FormatType: EBNF
Inheritable: true
Header Options:
Grammar: name - Required. Name of your DSLTokenSplitter: strategy - Optional. Default: “Space”. How to split tokensFormatType: type - Optional. Grammar format (EBNF, ANTLR, etc.)Inheritable: bool - Optional. Allow other grammars to inherit from this oneToken rules define how to recognize basic lexical elements. They use the pattern <TOKEN_NAME> ::= pattern.
1. Regular Expressions (recommended)
<NUMBER> ::= /[0-9]+/
<IDENTIFIER> ::= /[a-zA-Z][a-zA-Z0-9]*/
<STRING> ::= /"[^"]*"/
<WHITESPACE> ::= /[ \t\r\n]+/
2. Literal Strings
<PLUS> ::= '+'
<EQUALS> ::= '='
<IF> ::= 'if'
<WHILE> ::= 'while'
3. Double-Quoted Strings
<KEYWORD_CLASS> ::= "class"
<SEMICOLON> ::= ";"
# Programming Language Tokens
<NUMBER> ::= /[0-9]+(\.[0-9]+)?/
<IDENTIFIER> ::= /[a-zA-Z_][a-zA-Z0-9_]*/
<STRING_LITERAL> ::= /"([^"\\]|\\.)*"/
<COMMENT> ::= /\/\/[^\r\n]*/
# Operators
<PLUS> ::= '+'
<MINUS> ::= '-'
<MULTIPLY> ::= '*'
<DIVIDE> ::= '/'
<ASSIGN> ::= '='
# Keywords (higher priority than IDENTIFIER)
<IF> ::= 'if'
<ELSE> ::= 'else'
<WHILE> ::= 'while'
<FUNCTION> ::= 'function'
# Delimiters
<LPAREN> ::= '('
<RPAREN> ::= ')'
<LBRACE> ::= '{'
<RBRACE> ::= '}'
# Whitespace (typically skipped)
<WS> ::= /[ \t\r\n]+/ => { skip }
Production rules define the grammar structure. They reference token rules and other production rules.
<program> ::= <statement_list>
<statement_list> ::= <statement>
| <statement_list> <statement>
<statement> ::= <assignment>
| <if_statement>
| <while_statement>
<assignment> ::= <IDENTIFIER> <ASSIGN> <expression>
<expression> ::= <term>
| <expression> <PLUS> <term>
| <expression> <MINUS> <term>
<term> ::= <factor>
| <term> <MULTIPLY> <factor>
| <term> <DIVIDE> <factor>
<factor> ::= <NUMBER>
| <IDENTIFIER>
| <LPAREN> <expression> <RPAREN>
For complex rules, you can use continuation lines with |:
<if_statement> ::= <IF> <LPAREN> <expression> <RPAREN> <statement>
| <IF> <LPAREN> <expression> <RPAREN> <statement> <ELSE> <statement>
<function_call> ::= <IDENTIFIER> <LPAREN> <RPAREN>
| <IDENTIFIER> <LPAREN> <argument_list> <RPAREN>
The system supports context-sensitive parsing for advanced scenarios:
# Different rules in different contexts
<string_content[string]> ::= /[^"]*/
<string_start> ::= '"' => { enter_context: string }
<string_end[string]> ::= '"' => { exit_context }
Control operator precedence and associativity:
# Precedence declarations (higher numbers = higher precedence)
%precedence <MULTIPLY> 10
%precedence <DIVIDE> 10
%precedence <PLUS> 5
%precedence <MINUS> 5
# Associativity
%left <PLUS> <MINUS>
%left <MULTIPLY> <DIVIDE>
%right <ASSIGN>
Grammar: Calculator
TokenSplitter: Space
FormatType: EBNF
# Token Rules
<NUMBER> ::= /[0-9]+(\.[0-9]+)?/
<IDENTIFIER> ::= /[a-zA-Z][a-zA-Z0-9]*/
<PLUS> ::= '+'
<MINUS> ::= '-'
<MULTIPLY> ::= '*'
<DIVIDE> ::= '/'
<LPAREN> ::= '('
<RPAREN> ::= ')'
<ASSIGN> ::= '='
<NEWLINE> ::= /\r?\n/
<WS> ::= /[ \t]+/ => { skip }
# Production Rules
<program> ::= <statement_list>
<statement_list> ::= <statement>
| <statement_list> <NEWLINE> <statement>
<statement> ::= <assignment>
| <expression>
<assignment> ::= <IDENTIFIER> <ASSIGN> <expression>
<expression> ::= <term>
| <expression> <PLUS> <term>
| <expression> <MINUS> <term>
<term> ::= <factor>
| <term> <MULTIPLY> <factor>
| <term> <DIVIDE> <factor>
<factor> ::= <NUMBER>
| <IDENTIFIER>
| <LPAREN> <expression> <RPAREN>
# Precedence (optional)
%precedence <MULTIPLY> <DIVIDE> 10
%precedence <PLUS> <MINUS> 5
%left <PLUS> <MINUS> <MULTIPLY> <DIVIDE>
%right <ASSIGN>
Add semantic actions to production rules:
<assignment> ::= <IDENTIFIER> <ASSIGN> <expression> => {
symbol_table.declare($1.value, $3.type);
cognitive_graph.add_assignment_node($1, $3);
}
Define error recovery rules:
<statement> ::= <assignment>
| <expression>
| error <NEWLINE> => { report_error("Invalid statement"); }
Create reusable grammar components:
Grammar: BaseLanguage
Inheritable: true
<IDENTIFIER> ::= /[a-zA-Z][a-zA-Z0-9]*/
<NUMBER> ::= /[0-9]+/
Grammar: ExtendedLanguage
Inherits: BaseLanguage
<FLOAT> ::= /[0-9]+\.[0-9]+/
<STRING> ::= /"[^"]*"/
Use the test framework to validate your grammar:
[Fact]
public void MyDSL_Should_ParseCorrectly()
{
var grammar = @"
Grammar: TestDSL
<NUMBER> ::= /[0-9]+/
<PLUS> ::= '+'
<expr> ::= <NUMBER> | <expr> <PLUS> <expr>
";
var engine = new StepParserEngine();
engine.LoadGrammarFromContent(grammar);
Assert.NotNull(engine.CurrentGrammar);
Assert.True(engine.CurrentGrammar.TokenRules.Count >= 2);
Assert.True(engine.CurrentGrammar.ProductionRules.Count >= 1);
}
The parser automatically integrates with CognitiveGraph for semantic analysis:
// Parse source code into CognitiveGraph
var result = stepParser.Parse(sourceCode);
var cognitiveGraph = result.CognitiveGraph;
// Query the semantic structure
var assignmentNodes = cognitiveGraph.Query("assignment");
var variableDeclarations = cognitiveGraph.Query("variable_declaration");
=> { skip } for whitespace tokens<expr> ::= <expr> <OP> <term>)<expression> ::= <logical_or>
<logical_or> ::= <logical_and> | <logical_or> <OR> <logical_and>
<logical_and> ::= <equality> | <logical_and> <AND> <equality>
<equality> ::= <comparison> | <equality> <EQ> <comparison>
<comparison> ::= <addition> | <comparison> <LT> <addition>
<addition> ::= <multiplication> | <addition> <PLUS> <multiplication>
<multiplication> ::= <unary> | <multiplication> <MULT> <unary>
<unary> ::= <primary> | <MINUS> <unary> | <NOT> <unary>
<primary> ::= <NUMBER> | <IDENTIFIER> | <LPAREN> <expression> <RPAREN>
<statement_list> ::= <statement>
| <statement_list> <statement>
<block> ::= <LBRACE> <statement_list> <RBRACE>
| <LBRACE> <RBRACE>
<function_declaration> ::= <FUNCTION> <IDENTIFIER> <LPAREN> <parameter_list> <RPAREN> <block>
| <FUNCTION> <IDENTIFIER> <LPAREN> <RPAREN> <block>
<parameter_list> ::= <parameter>
| <parameter_list> <COMMA> <parameter>
This grammar system provides a powerful foundation for creating sophisticated DSLs with full semantic analysis capabilities through CognitiveGraph integration.