Parser and AST

The parser transforms .cham schema files into a validated Abstract Syntax Tree (AST) using LALRPOP, a Rust parser generator based on LR(1) grammar.

Overview

The parsing pipeline consists of:

Lexical Analysis - Tokenize input using LALRPOP’s built-in lexer
Syntax Analysis - Build AST from tokens following grammar rules
Error Enhancement - Add source context and helpful suggestions
AST Construction - Create immutable schema representation

Architecture

.cham source
     ↓
LALRPOP Lexer (schema.lalrpop)
     ↓
Tokens: entity, Ident, "{", ":", etc.
     ↓
LALRPOP Parser (LR(1) grammar)
     ↓
AST (Schema → Entity → Field/Relation)
     ↓
Enhanced Error Reporting
     ↓
Validated Schema

Parser Entry Point

Location: chameleon-core/src/parser/mod.rs:13

pub fn parse_schema(input: &str) -> Result<Schema, ChameleonError> {
    match schema::SchemaParser::new().parse(input) {
        Ok(schema) => Ok(schema),
        Err(e) => {
            let err: ChameleonError = e.into();
            Err(enhance_parse_error(err, input))
        }
    }
}

Key features:

Zero-copy parsing where possible
Enhanced error messages with source snippets
Line/column precision for all syntax errors

LALRPOP Grammar

Location: chameleon-core/src/parser/schema.lalrpop

Entry Point

pub Schema: Schema = {
    <entities:Entity*> => {
        let mut schema = Schema::new();
        for entity in entities {
            schema.add_entity(entity);
        }
        schema
    }
};

Entity Syntax

Entity: Entity = {
    "entity" <name:Ident> "{" <items:EntityItem*> "}" => {
        let mut entity = Entity::new(name);
        for item in items {
            match item {
                EntityItem::Field(f) => entity.add_field(f),
                EntityItem::Relation(r) => entity.add_relation(r),
            }
        }
        entity
    }
};

Field Syntax

Field: Field = {
    <name:Ident> ":" <ft:FieldType> <mods:FieldModifier*> <backend:BackendAnnotation?> "," => {
        let mut field = Field {
            name,
            field_type: ft,
            nullable: false,
            unique: false,
            primary_key: false,
            default: None,
            backend: backend,
        };
        
        for modifier in mods {
            match modifier {
                FieldModifier::Primary => field.primary_key = true,
                FieldModifier::Unique => field.unique = true,
                FieldModifier::Nullable => field.nullable = true,
                FieldModifier::Default(v) => field.default = Some(v),
            }
        }
        
        field
    }
};

Supported Types

FieldType: FieldType = {
    "uuid" => FieldType::UUID,
    "string" => FieldType::String,
    "int" => FieldType::Int,
    "decimal" => FieldType::Decimal,
    "bool" => FieldType::Bool,
    "timestamp" => FieldType::Timestamp,
    "float" => FieldType::Float,
    "vector" "(" <n:NumericLit> ")" => FieldType::Vector(n),
    "[" <inner:FieldType> "]" => FieldType::Array(Box::new(inner)),
};

Whitespace and Comments

match {
    r"\s*" => { },                    // Skip whitespace
    r"//[^\n\r]*[\n\r]*" => { },     // Skip line comments
    _
}

AST Structures

Location: chameleon-core/src/ast/mod.rs

Schema

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Schema {
    pub entities: Vec<Entity>,
}

impl Schema {
    pub fn new() -> Self
    pub fn add_entity(&mut self, entity: Entity)
    pub fn get_entity(&self, name: &str) -> Option<&Entity>
    pub fn get_entity_mut(&mut self, name: &str) -> Option<&mut Entity>
}

Entity

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Entity {
    pub name: String,
    pub fields: HashMap<String, Field>,
    pub relations: HashMap<String, Relation>,
}

impl Entity {
    pub fn new(name: String) -> Self
    pub fn add_field(&mut self, field: Field)
    pub fn add_relation(&mut self, relation: Relation)
}

Field

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Field {
    pub name: String,
    pub field_type: FieldType,
    pub nullable: bool,
    pub unique: bool,
    pub primary_key: bool,
    pub default: Option<DefaultValue>,
    pub backend: Option<BackendAnnotation>,
}

Field constraints:

primary_key - Entity identifier (required, exactly one per entity)
unique - Unique constraint (enforced at DB level)
nullable - NULL allowed (defaults to NOT NULL)

FieldType

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum FieldType {
    UUID,
    String,
    Int,
    Decimal,
    Bool,
    Timestamp,
    Float,
    Vector(usize),              // Vector embeddings with dimension
    Array(Box<FieldType>),      // Arrays of any supported type
}

Relation

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Relation {
    pub name: String,
    pub kind: RelationKind,
    pub target_entity: String,
    pub foreign_key: Option<String>,
    pub through: Option<String>,       // For many-to-many
}

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum RelationKind {
    HasOne,
    HasMany,
    BelongsTo,
    ManyToMany,
}

Backend Annotations

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum BackendAnnotation {
    OLTP,        // Default (PostgreSQL)
    Cache,       // @cache (planned: Redis)
    OLAP,        // @olap (planned: DuckDB)
    Vector,      // @vector (planned: pgvector/Milvus)
    ML,          // @ml (planned: feature store)
}

Error Enhancement

Location: chameleon-core/src/parser/mod.rs:80 The parser enhances errors with:

1. Source Context

fn extract_snippet(source: &str, line: usize, column: usize) -> String {
    let lines: Vec<&str> = source.lines().collect();
    let target_line = lines[line - 1];
    
    // Format: line number │ source code
    //                     │    ^^^^ error pointer
    snippet.push_str(&format!("{:>width$} │ {}\n", line, target_line, ...));
    snippet.push_str(&format!("{:>width$} │ ", "", ...));
    // Add ^ characters to underline the error
}

Example output:

  12 │     email: strnig unique,
     │            ^^^^^^
Did you mean 'string'?

2. Position Calculation

fn offset_to_position(source: &str, offset: usize) -> (usize, usize) {
    let mut line = 1;
    let mut column = 1;
    
    for (i, ch) in source.chars().enumerate() {
        if i >= offset { break; }
        if ch == '\n' {
            line += 1;
            column = 1;
        } else {
            column += 1;
        }
    }
    
    (line, column)
}

3. Smart Suggestions

fn add_suggestions(mut detail: ParseErrorDetail) -> ParseErrorDetail {
    if let Some(token) = &detail.token {
        let token_lower = token.to_lowercase();
        
        // Detect common typos
        if token_lower.contains("entiy") {
            detail.suggestion = Some("Did you mean 'entity'?");
        }
        else if token_lower.contains("primry") {
            detail.suggestion = Some("Did you mean 'primary'?");
        }
        // ... more heuristics
    }
    detail
}

Common suggestions:

Keyword typos: entiy → entity, primry → primary
Missing colons: “Fields must have a type after the colon”
Unclosed braces: “Missing closing brace”
EOF errors: “You may be missing a closing brace }“

Build Process

Location: chameleon-core/build.rs

fn main() {
    // Generate parser from LALRPOP grammar
    lalrpop::Configuration::new()
        .set_out_dir(out_dir)
        .process_dir(parser_dir)
        .expect("Failed to process LALRPOP files");
    
    // Generated parser: OUT_DIR/parser/schema.rs
    // Included via: include!(concat!(env!("OUT_DIR"), "/parser/schema.rs"));
}

Generated artifacts:

$OUT_DIR/parser/schema.rs - LR(1) parser state machine
Compile-time only - not included in library distribution

Performance Characteristics

Operation	Time	Notes
Parse schema (cold)	~10ms	One-time cost per schema load
Parse schema (warm)	~2ms	With OS page cache
AST construction	~1ms	Minimal allocation overhead
Error enhancement	~0.5ms	Only on parse failures

Memory usage:

Schema AST: ~50 bytes per field + ~80 bytes per relation
Parser state machine: ~200KB (generated at compile time)

Example Usage

Basic Parsing

use chameleon::parser::parse_schema;

let input = r#"
    entity User {
        id: uuid primary,
        email: string unique,
        created_at: timestamp default now(),
    }
"#;

let schema = parse_schema(input)?;
assert_eq!(schema.entities.len(), 1);

Accessing AST

let user = schema.get_entity("User").unwrap();
assert_eq!(user.fields.len(), 3);

let id_field = user.fields.get("id").unwrap();
assert!(id_field.primary_key);
assert_eq!(id_field.field_type, FieldType::UUID);

Error Handling

match parse_schema("invalid { syntax") {
    Ok(_) => unreachable!(),
    Err(ChameleonError::ParseError(detail)) => {
        println!("Line {}, Column {}", detail.line, detail.column);
        println!("{}", detail.snippet.unwrap());
        println!("Suggestion: {}", detail.suggestion.unwrap());
    }
    Err(e) => panic!("Unexpected error: {}", e),
}

Testing

Location: chameleon-core/src/parser/mod.rs:170 Test coverage:

Simple entities with fields
Relations (HasMany, BelongsTo, etc.)
Backend annotations (@cache, @olap, @vector)
Vector types with dimensions
Array types ([string], [decimal])
Complex multi-backend schemas

Example test:

#[test]
fn test_backend_annotations() {
    let input = r#"
        entity Product {
            id: uuid primary,
            views_today: int @cache,
            monthly_sales: decimal @olap,
            embedding: vector(384) @vector,
        }
    "#;
    
    let schema = parse_schema(input).unwrap();
    let product = schema.get_entity("Product").unwrap();
    
    assert_eq!(product.fields.get("views_today").unwrap().backend, 
               Some(BackendAnnotation::Cache));
    assert_eq!(product.fields.get("embedding").unwrap().field_type, 
               FieldType::Vector(384));
}

Go SDK

Rust Core

Parser and AST

Parser and AST

Overview

Architecture

Parser Entry Point

LALRPOP Grammar

Entry Point

Entity Syntax

Field Syntax

Supported Types

Whitespace and Comments

AST Structures

Schema

Entity

Field

FieldType

Relation

Backend Annotations

Error Enhancement

1. Source Context

2. Position Calculation

3. Smart Suggestions

Build Process

Performance Characteristics

Example Usage

Basic Parsing

Accessing AST

Error Handling

Testing

See Also

Go SDK

Rust Core

Documentation Index

​Parser and AST

​Overview

​Architecture

​Parser Entry Point

​LALRPOP Grammar

​Entry Point

​Entity Syntax

​Field Syntax

​Supported Types

​Whitespace and Comments

​AST Structures

​Schema

​Entity

​Field

​FieldType

​Relation

​Backend Annotations

​Error Enhancement

​1. Source Context

​2. Position Calculation

​3. Smart Suggestions

​Build Process

​Performance Characteristics

​Example Usage

​Basic Parsing

​Accessing AST

​Error Handling

​Testing

​See Also

Parser and AST

Overview

Architecture

Parser Entry Point

LALRPOP Grammar

Entry Point

Entity Syntax

Field Syntax

Supported Types

Whitespace and Comments

AST Structures

Schema

Entity

Field

FieldType

Relation

Backend Annotations

Error Enhancement

1. Source Context

2. Position Calculation

3. Smart Suggestions

Build Process

Performance Characteristics

Example Usage

Basic Parsing

Accessing AST

Error Handling

Testing

See Also