A modern, high-performance lexical analysis and parsing system with comprehensive PCRE2 support and CognitiveGraph integration
DevelApp.StepParser is a modern, CognitiveGraph-integrated parser engine that processes tokens from DevelApp.StepLexer to build semantic representations of source code and patterns. It implements GLR-style parsing with context-sensitive grammar support, making it ideal for Domain-Specific Language (DSL) development and advanced code analysis.
The main parser engine that coordinates all parsing operations.
public class StepParserEngine
{
// Grammar management
public void LoadGrammar(string grammarFile);
public void LoadGrammarFromContent(string grammarContent, string fileName = "inline");
public GrammarDefinition? CurrentGrammar { get; }
// Parsing operations
public StepParsingResult Parse(string input, string fileName = "");
public CognitiveGraph.CognitiveGraph ParseAndMerge(
CognitiveGraph.CognitiveGraph existingGraph,
string input,
string fileName = "");
public StepParsingResult ParseMultipleFiles(Dictionary<string, string> files);
// Context management
public ParseContext Context { get; }
}
Key Methods:
LoadGrammarFromFile(): Load grammar definition from fileLoadGrammarFromContent(): Load grammar from string contentParse(): Parse source code or token streamRepresents a complete grammar definition loaded from grammar files.
public class GrammarDefinition
{
public string Name { get; set; }
public string TokenSplitter { get; set; }
public List<TokenRule> TokenRules { get; set; }
public List<ProductionRule> ProductionRules { get; set; }
public Dictionary<string, int> Precedence { get; set; }
public Dictionary<string, string> Associativity { get; set; }
public List<string> Contexts { get; set; }
public bool IsInheritable { get; set; }
public string FormatType { get; set; }
}
Properties:
TokenRules: Lexical analysis rules (regex patterns, literals)ProductionRules: Syntax analysis rules (grammar productions)Precedence: Operator precedence valuesAssociativity: Left/right associativity rulesContexts: Available parsing contextsDefines lexical analysis rules for token recognition.
public class TokenRule
{
public string Name { get; set; }
public string Pattern { get; set; }
public TokenRuleType Type { get; set; }
public string? Context { get; set; }
public Dictionary<string, object> Actions { get; set; }
public int Priority { get; set; }
}
Token Rule Types:
RegexPattern: Regular expression patterns (/[0-9]+/)LiteralString: Exact string matches ('+', "class")ContextSensitive: Rules that apply only in specific contextsDefines syntax analysis rules for parsing.
public class ProductionRule
{
public string LeftHandSide { get; set; }
public List<List<string>> RightHandSides { get; set; }
public string? Context { get; set; }
public Dictionary<string, object> SemanticActions { get; set; }
public int Precedence { get; set; }
}
Features:
| in grammar notation)Contains the complete result of a parsing operation.
public class StepParsingResult
{
public bool Success { get; set; }
public List<StepToken> Tokens { get; set; }
public ParseTree? ParseTree { get; set; }
public CognitiveGraph.CognitiveGraph? CognitiveGraph { get; set; }
public List<string> Errors { get; set; }
public List<string> Warnings { get; set; }
public Dictionary<string, object> Metadata { get; set; }
}
Result Components:
Tokens: Generated token stream from lexical analysisParseTree: Hierarchical parse tree structureCognitiveGraph: Semantic representation for analysisErrors/Warnings: Diagnostic informationManages hierarchical parsing contexts for complex language features.
public interface IContextStack
{
void Push(string context, string? name = null);
void Pop();
string Current();
bool InScope(string context);
int Depth();
string[] GetPath();
bool Contains(string context);
}
Context Management:
Provides scope-aware symbol table management.
public interface IScopeAwareSymbolTable
{
void Declare(string name, string type, string scope, ICodeLocation location);
SymbolEntry? Lookup(string name, string scope);
bool Exists(string name, string scope);
List<SymbolEntry> GetSymbolsInScope(string scope);
void EnterScope(string scopeName);
void ExitScope();
}
Symbol Management:
The StepParser uses a declarative grammar format for defining DSLs:
Grammar: MyLanguage
TokenSplitter: Space
FormatType: EBNF
# Token Rules (Lexical Analysis)
<NUMBER> ::= /[0-9]+(\.[0-9]+)?/
<IDENTIFIER> ::= /[a-zA-Z][a-zA-Z0-9]*/
<STRING> ::= /"([^"\\]|\\.)*"/
<PLUS> ::= '+'
<MINUS> ::= '-'
<TIMES> ::= '*'
<DIVIDE> ::= '/'
<ASSIGN> ::= '='
<LPAREN> ::= '('
<RPAREN> ::= ')'
<WS> ::= /[ \t\r\n]+/ => { skip }
# Production Rules (Syntax Analysis)
<program> ::= <statement_list>
<statement_list> ::= <statement>
| <statement_list> <statement>
<statement> ::= <assignment>
| <expression>
<assignment> ::= <IDENTIFIER> <ASSIGN> <expression>
<expression> ::= <term>
| <expression> <PLUS> <term>
| <expression> <MINUS> <term>
<term> ::= <factor>
| <term> <TIMES> <factor>
| <term> <DIVIDE> <factor>
<factor> ::= <NUMBER>
| <IDENTIFIER>
| <LPAREN> <expression> <RPAREN>
# Precedence and Associativity
%precedence <TIMES> <DIVIDE> 10
%precedence <PLUS> <MINUS> 5
%left <PLUS> <MINUS> <TIMES> <DIVIDE>
%right <ASSIGN>
Grammar: MyLanguage # Grammar name (required)
TokenSplitter: Space # Token splitting strategy
FormatType: EBNF # Grammar format type
Inheritable: true # Allow inheritance
# Regular expression patterns
<NUMBER> ::= /[0-9]+/
# Literal strings
<PLUS> ::= '+'
<CLASS> ::= "class"
# Context-sensitive rules
<STRING_CONTENT[string]> ::= /[^"]*/
# Actions
<WS> ::= /[ \t\r\n]+/ => { skip }
# Basic productions
<expr> ::= <term>
# Alternatives with |
<statement> ::= <assignment>
| <expression>
| <block>
# Semantic actions
<assignment> ::= <IDENTIFIER> <ASSIGN> <expression> => {
symbol_table.declare($1.value, $3.type);
cognitive_graph.add_assignment_node($1, $3);
}
The StepParser supports different parsing rules based on context:
# Different rules in different contexts
<expression> ::= <term>
<expression[function]> ::= <call> | <term>
<expression[class]> ::= <member_access> | <term>
# Context transitions
<string_start> ::= '"' => { enter_context: string }
<string_end[string]> ::= '"' => { exit_context }
GLR-style parsing handles ambiguous grammars:
// Grammar with ambiguous expressions
// 1 + 2 * 3 can be parsed as:
// - (1 + 2) * 3 (if + has higher precedence)
// - 1 + (2 * 3) (if * has higher precedence)
// Precedence resolves ambiguity
%precedence <TIMES> 10
%precedence <PLUS> 5
Automatic semantic analysis during parsing:
// Parse source code
var result = stepParser.Parse(sourceCode);
var cognitiveGraph = result.CognitiveGraph;
// Query semantic information
var variables = cognitiveGraph.Query("variable_declaration");
var functions = cognitiveGraph.Query("function_definition");
var dependencies = cognitiveGraph.Query("dependency_relationship");
Built-in support for code transformations:
public class RefactoringOperation
{
public string Name { get; set; }
public string[] ApplicableContexts { get; set; }
public Func<ParseContext, bool>? Preconditions { get; set; }
public Func<ICodeLocation, ParseContext, RefactoringResult>? Execute { get; set; }
}
// Example: Rename variable refactoring
var renameOp = new RefactoringOperation
{
Name = "Rename Variable",
ApplicableContexts = new[] { "variable_declaration", "variable_reference" },
Preconditions = context => context.SymbolTable.Exists(context.SelectedSymbol),
Execute = (location, context) => RenameVariable(location, context)
};
using DevelApp.StepParser;
using DevelApp.StepLexer;
// Create parser engine
var engine = new StepParserEngine();
// Load grammar from file
engine.LoadGrammar("my_language.grammar");
// Parse source code
var sourceCode = @"
x = 10 + 20;
y = x * 2;
";
var result = engine.Parse(sourceCode);
if (result.Success)
{
Console.WriteLine("Parse successful!");
Console.WriteLine($"Tokens: {result.Tokens.Count}");
// Access semantic information
var cognitiveGraph = result.CognitiveGraph;
if (cognitiveGraph != null)
{
Console.WriteLine("CognitiveGraph constructed successfully!");
}
}
else
{
Console.WriteLine("Parse failed:");
foreach (var error in result.Errors)
{
Console.WriteLine($" {error}");
}
}
// Grammar with context-sensitive rules
var grammar = @"
Grammar: ContextExample
TokenSplitter: Space
<IDENTIFIER> ::= /[a-zA-Z][a-zA-Z0-9]*/
<DOT> ::= '.'
<LPAREN> ::= '('
<RPAREN> ::= ')'
# Different behavior in different contexts
<expression> ::= <simple_expr>
<expression[method_call]> ::= <IDENTIFIER> <LPAREN> <args> <RPAREN>
<expression[member_access]> ::= <IDENTIFIER> <DOT> <IDENTIFIER>
<simple_expr> ::= <IDENTIFIER>
";
engine.LoadGrammarFromContent(grammar);
// Parse with context awareness
var result = engine.Parse("myMethod(arg1, arg2)");
// Access the parse context
var context = engine.Context;
Console.WriteLine($"Current context: {context.ContextStack.Current()}");
### Symbol Table Integration
```csharp
// Access symbol table from parse result
var symbolTable = result.CognitiveGraph?.SymbolTable;
// Declare symbols during parsing
symbolTable?.Declare("myVariable", "int", "global", codeLocation);
// Look up symbols
var symbol = symbolTable?.Lookup("myVariable", "global");
if (symbol != null)
{
Console.WriteLine($"Symbol: {symbol.Name}, Type: {symbol.Type}");
}
// Grammar with error recovery
var grammar = @"
<statement> ::= <assignment>
| <expression>
| error ';' => { report_error(""Invalid statement""); }
";
// Parse code with syntax errors
var result = engine.Parse("x = ; y = 10;"); // Invalid assignment
// Check for errors and warnings
if (!result.Success)
{
foreach (var error in result.Errors)
{
Console.WriteLine($"Error: {error}");
}
foreach (var warning in result.Warnings)
{
Console.WriteLine($"Warning: {warning}");
}
}
The StepParser provides comprehensive error handling:
try
{
var result = engine.Parse(sourceCode);
}
catch (ENFA_GrammarBuild_Exception ex)
{
Console.WriteLine($"Grammar build error: {ex.Message}");
Console.WriteLine($"Location: {ex.Location}");
}
catch (ENFA_Exception ex)
{
Console.WriteLine($"General parser error: {ex.Message}");
}
The StepParser works seamlessly with DevelApp.StepLexer:
// StepLexer tokenizes input
var lexer = new StepLexer();
var tokens = lexer.TokenizeSource(sourceCode);
// StepParser builds semantic representation
var parseResult = stepParser.Parse(tokens);
// Access both lexical and semantic information
var tokenStream = parseResult.Tokens;
var semanticGraph = parseResult.CognitiveGraph;
Comprehensive test coverage includes:
For consistency with StepLexer’s forward-only architecture:
These limitations maintain parsing performance and predictability.