Skip to main content

Parser and AST

The parser transforms .cham schema files into a validated Abstract Syntax Tree (AST) using LALRPOP, a Rust parser generator based on LR(1) grammar.

Overview

The parsing pipeline consists of:
  1. Lexical Analysis - Tokenize input using LALRPOP’s built-in lexer
  2. Syntax Analysis - Build AST from tokens following grammar rules
  3. Error Enhancement - Add source context and helpful suggestions
  4. AST Construction - Create immutable schema representation

Architecture

.cham source

LALRPOP Lexer (schema.lalrpop)

Tokens: entity, Ident, "{", ":", etc.

LALRPOP Parser (LR(1) grammar)

AST (Schema → Entity → Field/Relation)

Enhanced Error Reporting

Validated Schema

Parser Entry Point

Location: chameleon-core/src/parser/mod.rs:13
pub fn parse_schema(input: &str) -> Result<Schema, ChameleonError> {
    match schema::SchemaParser::new().parse(input) {
        Ok(schema) => Ok(schema),
        Err(e) => {
            let err: ChameleonError = e.into();
            Err(enhance_parse_error(err, input))
        }
    }
}
Key features:
  • Zero-copy parsing where possible
  • Enhanced error messages with source snippets
  • Line/column precision for all syntax errors

LALRPOP Grammar

Location: chameleon-core/src/parser/schema.lalrpop

Entry Point

pub Schema: Schema = {
    <entities:Entity*> => {
        let mut schema = Schema::new();
        for entity in entities {
            schema.add_entity(entity);
        }
        schema
    }
};

Entity Syntax

Entity: Entity = {
    "entity" <name:Ident> "{" <items:EntityItem*> "}" => {
        let mut entity = Entity::new(name);
        for item in items {
            match item {
                EntityItem::Field(f) => entity.add_field(f),
                EntityItem::Relation(r) => entity.add_relation(r),
            }
        }
        entity
    }
};

Field Syntax

Field: Field = {
    <name:Ident> ":" <ft:FieldType> <mods:FieldModifier*> <backend:BackendAnnotation?> "," => {
        let mut field = Field {
            name,
            field_type: ft,
            nullable: false,
            unique: false,
            primary_key: false,
            default: None,
            backend: backend,
        };
        
        for modifier in mods {
            match modifier {
                FieldModifier::Primary => field.primary_key = true,
                FieldModifier::Unique => field.unique = true,
                FieldModifier::Nullable => field.nullable = true,
                FieldModifier::Default(v) => field.default = Some(v),
            }
        }
        
        field
    }
};

Supported Types

FieldType: FieldType = {
    "uuid" => FieldType::UUID,
    "string" => FieldType::String,
    "int" => FieldType::Int,
    "decimal" => FieldType::Decimal,
    "bool" => FieldType::Bool,
    "timestamp" => FieldType::Timestamp,
    "float" => FieldType::Float,
    "vector" "(" <n:NumericLit> ")" => FieldType::Vector(n),
    "[" <inner:FieldType> "]" => FieldType::Array(Box::new(inner)),
};

Whitespace and Comments

match {
    r"\s*" => { },                    // Skip whitespace
    r"//[^\n\r]*[\n\r]*" => { },     // Skip line comments
    _
}

AST Structures

Location: chameleon-core/src/ast/mod.rs

Schema

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Schema {
    pub entities: Vec<Entity>,
}

impl Schema {
    pub fn new() -> Self
    pub fn add_entity(&mut self, entity: Entity)
    pub fn get_entity(&self, name: &str) -> Option<&Entity>
    pub fn get_entity_mut(&mut self, name: &str) -> Option<&mut Entity>
}

Entity

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Entity {
    pub name: String,
    pub fields: HashMap<String, Field>,
    pub relations: HashMap<String, Relation>,
}

impl Entity {
    pub fn new(name: String) -> Self
    pub fn add_field(&mut self, field: Field)
    pub fn add_relation(&mut self, relation: Relation)
}

Field

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Field {
    pub name: String,
    pub field_type: FieldType,
    pub nullable: bool,
    pub unique: bool,
    pub primary_key: bool,
    pub default: Option<DefaultValue>,
    pub backend: Option<BackendAnnotation>,
}
Field constraints:
  • primary_key - Entity identifier (required, exactly one per entity)
  • unique - Unique constraint (enforced at DB level)
  • nullable - NULL allowed (defaults to NOT NULL)

FieldType

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum FieldType {
    UUID,
    String,
    Int,
    Decimal,
    Bool,
    Timestamp,
    Float,
    Vector(usize),              // Vector embeddings with dimension
    Array(Box<FieldType>),      // Arrays of any supported type
}

Relation

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Relation {
    pub name: String,
    pub kind: RelationKind,
    pub target_entity: String,
    pub foreign_key: Option<String>,
    pub through: Option<String>,       // For many-to-many
}

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum RelationKind {
    HasOne,
    HasMany,
    BelongsTo,
    ManyToMany,
}

Backend Annotations

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum BackendAnnotation {
    OLTP,        // Default (PostgreSQL)
    Cache,       // @cache (planned: Redis)
    OLAP,        // @olap (planned: DuckDB)
    Vector,      // @vector (planned: pgvector/Milvus)
    ML,          // @ml (planned: feature store)
}

Error Enhancement

Location: chameleon-core/src/parser/mod.rs:80 The parser enhances errors with:

1. Source Context

fn extract_snippet(source: &str, line: usize, column: usize) -> String {
    let lines: Vec<&str> = source.lines().collect();
    let target_line = lines[line - 1];
    
    // Format: line number │ source code
    //                     │    ^^^^ error pointer
    snippet.push_str(&format!("{:>width$} │ {}\n", line, target_line, ...));
    snippet.push_str(&format!("{:>width$} │ ", "", ...));
    // Add ^ characters to underline the error
}
Example output:
  12 │     email: strnig unique,
     │            ^^^^^^
Did you mean 'string'?

2. Position Calculation

fn offset_to_position(source: &str, offset: usize) -> (usize, usize) {
    let mut line = 1;
    let mut column = 1;
    
    for (i, ch) in source.chars().enumerate() {
        if i >= offset { break; }
        if ch == '\n' {
            line += 1;
            column = 1;
        } else {
            column += 1;
        }
    }
    
    (line, column)
}

3. Smart Suggestions

fn add_suggestions(mut detail: ParseErrorDetail) -> ParseErrorDetail {
    if let Some(token) = &detail.token {
        let token_lower = token.to_lowercase();
        
        // Detect common typos
        if token_lower.contains("entiy") {
            detail.suggestion = Some("Did you mean 'entity'?");
        }
        else if token_lower.contains("primry") {
            detail.suggestion = Some("Did you mean 'primary'?");
        }
        // ... more heuristics
    }
    detail
}
Common suggestions:
  • Keyword typos: entiyentity, primryprimary
  • Missing colons: “Fields must have a type after the colon”
  • Unclosed braces: “Missing closing brace”
  • EOF errors: “You may be missing a closing brace }“

Build Process

Location: chameleon-core/build.rs
fn main() {
    // Generate parser from LALRPOP grammar
    lalrpop::Configuration::new()
        .set_out_dir(out_dir)
        .process_dir(parser_dir)
        .expect("Failed to process LALRPOP files");
    
    // Generated parser: OUT_DIR/parser/schema.rs
    // Included via: include!(concat!(env!("OUT_DIR"), "/parser/schema.rs"));
}
Generated artifacts:
  • $OUT_DIR/parser/schema.rs - LR(1) parser state machine
  • Compile-time only - not included in library distribution

Performance Characteristics

OperationTimeNotes
Parse schema (cold)~10msOne-time cost per schema load
Parse schema (warm)~2msWith OS page cache
AST construction~1msMinimal allocation overhead
Error enhancement~0.5msOnly on parse failures
Memory usage:
  • Schema AST: ~50 bytes per field + ~80 bytes per relation
  • Parser state machine: ~200KB (generated at compile time)

Example Usage

Basic Parsing

use chameleon::parser::parse_schema;

let input = r#"
    entity User {
        id: uuid primary,
        email: string unique,
        created_at: timestamp default now(),
    }
"#;

let schema = parse_schema(input)?;
assert_eq!(schema.entities.len(), 1);

Accessing AST

let user = schema.get_entity("User").unwrap();
assert_eq!(user.fields.len(), 3);

let id_field = user.fields.get("id").unwrap();
assert!(id_field.primary_key);
assert_eq!(id_field.field_type, FieldType::UUID);

Error Handling

match parse_schema("invalid { syntax") {
    Ok(_) => unreachable!(),
    Err(ChameleonError::ParseError(detail)) => {
        println!("Line {}, Column {}", detail.line, detail.column);
        println!("{}", detail.snippet.unwrap());
        println!("Suggestion: {}", detail.suggestion.unwrap());
    }
    Err(e) => panic!("Unexpected error: {}", e),
}

Testing

Location: chameleon-core/src/parser/mod.rs:170 Test coverage:
  • Simple entities with fields
  • Relations (HasMany, BelongsTo, etc.)
  • Backend annotations (@cache, @olap, @vector)
  • Vector types with dimensions
  • Array types ([string], [decimal])
  • Complex multi-backend schemas
Example test:
#[test]
fn test_backend_annotations() {
    let input = r#"
        entity Product {
            id: uuid primary,
            views_today: int @cache,
            monthly_sales: decimal @olap,
            embedding: vector(384) @vector,
        }
    "#;
    
    let schema = parse_schema(input).unwrap();
    let product = schema.get_entity("Product").unwrap();
    
    assert_eq!(product.fields.get("views_today").unwrap().backend, 
               Some(BackendAnnotation::Cache));
    assert_eq!(product.fields.get("embedding").unwrap().field_type, 
               FieldType::Vector(384));
}

See Also