A modern, high-performance lexical analysis and parsing system with comprehensive PCRE2 support and CognitiveGraph integration
A modern, high-performance lexical analysis and parsing system with comprehensive PCRE2 support and CognitiveGraph integration. The system consists of DevelApp.StepLexer for zero-copy tokenization and DevelApp.StepParser for semantic analysis and grammar-based parsing.
ENFAStepLexer-StepParser is a complete parsing solution designed for high-performance pattern recognition and semantic analysis. The system uses a two-phase approach: StepLexer handles zero-copy tokenization with PCRE2 support, while StepParser provides grammar-based parsing with CognitiveGraph integration for semantic analysis and code understanding.
\A, \Z, \z, \G for precise boundary matching\x{FFFF} code points, \p{property} classes, \R newlines[:alpha:], [:digit:], [:space:], etc.\1) and named (\k<name>) references# Clone the repository
git clone https://github.com/DevelApp-ai/ENFAStepLexer-StepPerser.git
cd ENFAStepLexer-StepPerser
# Restore dependencies
dotnet restore
# Build all projects
dotnet build
# Run tests
dotnet test
# Run the demo
cd src/ENFAStepLexer.Demo
dotnet run
using DevelApp.StepLexer;
using System.Text;
// Create a pattern parser for regex
var parser = new PatternParser(ParserType.Regex);
// Parse a regex pattern with zero-copy
string pattern = @"\d{2,4}-\w+@[a-z]+\.com";
var utf8Pattern = Encoding.UTF8.GetBytes(pattern);
bool success = parser.ParsePattern(utf8Pattern, "email_pattern");
if (success)
{
Console.WriteLine("Pattern compiled successfully!");
var results = parser.GetResults();
Console.WriteLine($"Phase 1 tokens: {results.Phase1TokenCount}");
Console.WriteLine($"Ambiguous tokens: {results.AmbiguousTokenCount}");
}
using DevelApp.StepLexer;
using System.Text;
using System.IO;
// Create a pattern parser
var parser = new PatternParser(ParserType.Regex);
// Parse pattern from UTF-16 encoded bytes
var pattern = @"\w+@\w+\.\w+";
var utf16Bytes = Encoding.Unicode.GetBytes(pattern);
// Automatically converts UTF-16 to UTF-8 for processing
bool success = parser.ParsePattern(utf16Bytes, Encoding.Unicode, "email_pattern");
// Or use encoding by name - supports hundreds of encodings!
var shiftJISBytes = Encoding.GetEncoding("shift_jis").GetBytes(pattern);
bool sjisSuccess = parser.ParsePattern(shiftJISBytes, "shift_jis", "file_pattern");
// Or auto-detect encoding from BOM in a stream
using var stream = File.OpenRead("pattern.txt");
bool streamSuccess = parser.ParsePatternFromStreamWithAutoDetect(
stream,
"file_pattern"
);
if (success || sjisSuccess || streamSuccess)
{
Console.WriteLine("Pattern parsed with encoding conversion!");
}
using DevelApp.StepParser;
// Create parser engine
var engine = new StepParserEngine();
// Load grammar for a simple expression language
var grammar = @"
Grammar: SimpleExpr
TokenSplitter: Space
<NUMBER> ::= /[0-9]+/
<IDENTIFIER> ::= /[a-zA-Z][a-zA-Z0-9]*/
<PLUS> ::= '+'
<MINUS> ::= '-'
<WS> ::= /[ \t\r\n]+/ => { skip }
<expr> ::= <expr> <PLUS> <expr>
| <expr> <MINUS> <expr>
| <NUMBER>
| <IDENTIFIER>
";
engine.LoadGrammarFromContent(grammar);
// Parse source code
var result = engine.Parse("x + 42 - y");
if (result.Success)
{
Console.WriteLine("Parse successful!");
var cognitiveGraph = result.CognitiveGraph;
// Access semantic analysis results
}
CognitiveGraph 1.1.0 introduces a V2 schema optimized for massive cognitive graphs. StepParser supports both V1 (default) and V2 schemas:
using DevelApp.StepParser;
using CognitiveGraph.Schema;
// Create parser engine with V2 schema for massive graphs
var engine = new StepParserEngine(SchemaVersion.V2);
var grammar = @"
Grammar: LargeCodebase
<NUMBER> ::= /[0-9]+/
<IDENTIFIER> ::= /[a-zA-Z][a-zA-Z0-9]*/
<expression> ::= <NUMBER> | <IDENTIFIER>
";
engine.LoadGrammarFromContent(grammar);
// Parse multiple files and build a massive cognitive graph
var files = new Dictionary<string, string>
{
{ "module1.txt", "identifier1" },
{ "module2.txt", "identifier2" },
// ... thousands more files
};
var result = engine.ParseMultipleFiles(files);
if (result.Success)
{
Console.WriteLine($"Schema Version: {result.CognitiveGraph.SchemaVersion}"); // V2
// Work with massive cognitive graph optimized for large codebases
}
// Default constructor maintains backward compatibility with V1
var engineV1 = new StepParserEngine(); // Uses SchemaVersion.V1
PatternParser: High-level pattern processing controllerStepLexer: Core tokenization engine with PCRE2 supportZeroCopyStringView: Memory-efficient string operationsSplittableToken: Ambiguity-aware token representationStepParserEngine: Main parsing controller with CognitiveGraph integrationGrammarDefinition: Complete grammar specification loaderTokenRule/ProductionRule: Grammar component definitionsIContextStack: Hierarchical context managementIScopeAwareSymbolTable: Symbol resolution and scopingThe system uses a two-phase processing approach:
The following features are intentionally excluded due to architectural design decisions:
(?>...))(?R), (?&name))*+, ++)(?(condition)yes|no))(?i), (?m))See docs/PCRE2-Support.md for complete feature matrix and detailed explanations.
ENFAStepLexer-StepPerser/
├── src/
│ ├── DevelApp.StepLexer/ # Zero-copy lexical analyzer
│ │ ├── StepLexer.cs # Core tokenization engine
│ │ ├── PatternParser.cs # High-level pattern controller
│ │ ├── ZeroCopyStringView.cs # Memory-efficient string operations
│ │ ├── SplittableToken.cs # Ambiguity-aware tokens
│ │ └── ...
│ ├── DevelApp.StepParser/ # Grammar-based semantic parser
│ │ ├── StepParserEngine.cs # Main parsing controller
│ │ ├── GrammarDefinition.cs # Grammar specification
│ │ ├── TokenRule.cs # Lexical analysis rules
│ │ ├── ProductionRule.cs # Syntax analysis rules
│ │ └── ...
│ ├── DevelApp.StepLexer.Tests/ # StepLexer unit tests
│ ├── DevelApp.StepParser.Tests/ # StepParser unit tests
│ └── ENFAStepLexer.Demo/ # Demo console application
├── docs/
│ ├── StepLexer.md # Complete StepLexer documentation
│ ├── StepParser.md # Complete StepParser documentation
│ ├── PCRE2-Support.md # Feature support matrix
│ └── Grammar_File_Creation_Guide.md # DSL development guide
└── README.md # This file
This project welcomes contributions in several areas:
The StepLexer-StepParser architecture provides:
(?i), (?m), etc.) in StepLexer\Q...\E)(?#...))This project is derived from @DevelApp/enfaparser but excludes the original license as requested. The enhancements and new code are provided for evaluation and development purposes.