> ## Documentation Index
> Fetch the complete documentation index at: https://docs.chameleondb.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Parser and AST

> LALRPOP-based parser and Abstract Syntax Tree structures

# Parser and AST

The parser transforms `.cham` schema files into a validated Abstract Syntax Tree (AST) using [LALRPOP](https://github.com/lalrpop/lalrpop), a Rust parser generator based on LR(1) grammar.

## Overview

The parsing pipeline consists of:

1. **Lexical Analysis** - Tokenize input using LALRPOP's built-in lexer
2. **Syntax Analysis** - Build AST from tokens following grammar rules
3. **Error Enhancement** - Add source context and helpful suggestions
4. **AST Construction** - Create immutable schema representation

## Architecture

```
.cham source
     ↓
LALRPOP Lexer (schema.lalrpop)
     ↓
Tokens: entity, Ident, "{", ":", etc.
     ↓
LALRPOP Parser (LR(1) grammar)
     ↓
AST (Schema → Entity → Field/Relation)
     ↓
Enhanced Error Reporting
     ↓
Validated Schema
```

## Parser Entry Point

Location: `chameleon-core/src/parser/mod.rs:13`

```rust theme={null}
pub fn parse_schema(input: &str) -> Result<Schema, ChameleonError> {
    match schema::SchemaParser::new().parse(input) {
        Ok(schema) => Ok(schema),
        Err(e) => {
            let err: ChameleonError = e.into();
            Err(enhance_parse_error(err, input))
        }
    }
}
```

**Key features:**

* Zero-copy parsing where possible
* Enhanced error messages with source snippets
* Line/column precision for all syntax errors

## LALRPOP Grammar

Location: `chameleon-core/src/parser/schema.lalrpop`

### Entry Point

```lalrpop theme={null}
pub Schema: Schema = {
    <entities:Entity*> => {
        let mut schema = Schema::new();
        for entity in entities {
            schema.add_entity(entity);
        }
        schema
    }
};
```

### Entity Syntax

```lalrpop theme={null}
Entity: Entity = {
    "entity" <name:Ident> "{" <items:EntityItem*> "}" => {
        let mut entity = Entity::new(name);
        for item in items {
            match item {
                EntityItem::Field(f) => entity.add_field(f),
                EntityItem::Relation(r) => entity.add_relation(r),
            }
        }
        entity
    }
};
```

### Field Syntax

```lalrpop theme={null}
Field: Field = {
    <name:Ident> ":" <ft:FieldType> <mods:FieldModifier*> <backend:BackendAnnotation?> "," => {
        let mut field = Field {
            name,
            field_type: ft,
            nullable: false,
            unique: false,
            primary_key: false,
            default: None,
            backend: backend,
        };
        
        for modifier in mods {
            match modifier {
                FieldModifier::Primary => field.primary_key = true,
                FieldModifier::Unique => field.unique = true,
                FieldModifier::Nullable => field.nullable = true,
                FieldModifier::Default(v) => field.default = Some(v),
            }
        }
        
        field
    }
};
```

### Supported Types

```lalrpop theme={null}
FieldType: FieldType = {
    "uuid" => FieldType::UUID,
    "string" => FieldType::String,
    "int" => FieldType::Int,
    "decimal" => FieldType::Decimal,
    "bool" => FieldType::Bool,
    "timestamp" => FieldType::Timestamp,
    "float" => FieldType::Float,
    "vector" "(" <n:NumericLit> ")" => FieldType::Vector(n),
    "[" <inner:FieldType> "]" => FieldType::Array(Box::new(inner)),
};
```

### Whitespace and Comments

```lalrpop theme={null}
match {
    r"\s*" => { },                    // Skip whitespace
    r"//[^\n\r]*[\n\r]*" => { },     // Skip line comments
    _
}
```

## AST Structures

Location: `chameleon-core/src/ast/mod.rs`

### Schema

```rust theme={null}
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Schema {
    pub entities: Vec<Entity>,
}

impl Schema {
    pub fn new() -> Self
    pub fn add_entity(&mut self, entity: Entity)
    pub fn get_entity(&self, name: &str) -> Option<&Entity>
    pub fn get_entity_mut(&mut self, name: &str) -> Option<&mut Entity>
}
```

### Entity

```rust theme={null}
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Entity {
    pub name: String,
    pub fields: HashMap<String, Field>,
    pub relations: HashMap<String, Relation>,
}

impl Entity {
    pub fn new(name: String) -> Self
    pub fn add_field(&mut self, field: Field)
    pub fn add_relation(&mut self, relation: Relation)
}
```

### Field

```rust theme={null}
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Field {
    pub name: String,
    pub field_type: FieldType,
    pub nullable: bool,
    pub unique: bool,
    pub primary_key: bool,
    pub default: Option<DefaultValue>,
    pub backend: Option<BackendAnnotation>,
}
```

**Field constraints:**

* `primary_key` - Entity identifier (required, exactly one per entity)
* `unique` - Unique constraint (enforced at DB level)
* `nullable` - NULL allowed (defaults to NOT NULL)

### FieldType

```rust theme={null}
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum FieldType {
    UUID,
    String,
    Int,
    Decimal,
    Bool,
    Timestamp,
    Float,
    Vector(usize),              // Vector embeddings with dimension
    Array(Box<FieldType>),      // Arrays of any supported type
}
```

### Relation

```rust theme={null}
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Relation {
    pub name: String,
    pub kind: RelationKind,
    pub target_entity: String,
    pub foreign_key: Option<String>,
    pub through: Option<String>,       // For many-to-many
}

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum RelationKind {
    HasOne,
    HasMany,
    BelongsTo,
    ManyToMany,
}
```

### Backend Annotations

```rust theme={null}
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum BackendAnnotation {
    OLTP,        // Default (PostgreSQL)
    Cache,       // @cache (planned: Redis)
    OLAP,        // @olap (planned: DuckDB)
    Vector,      // @vector (planned: pgvector/Milvus)
    ML,          // @ml (planned: feature store)
}
```

## Error Enhancement

Location: `chameleon-core/src/parser/mod.rs:80`

The parser enhances errors with:

### 1. Source Context

```rust theme={null}
fn extract_snippet(source: &str, line: usize, column: usize) -> String {
    let lines: Vec<&str> = source.lines().collect();
    let target_line = lines[line - 1];
    
    // Format: line number │ source code
    //                     │    ^^^^ error pointer
    snippet.push_str(&format!("{:>width$} │ {}\n", line, target_line, ...));
    snippet.push_str(&format!("{:>width$} │ ", "", ...));
    // Add ^ characters to underline the error
}
```

**Example output:**

```
  12 │     email: strnig unique,
     │            ^^^^^^
Did you mean 'string'?
```

### 2. Position Calculation

```rust theme={null}
fn offset_to_position(source: &str, offset: usize) -> (usize, usize) {
    let mut line = 1;
    let mut column = 1;
    
    for (i, ch) in source.chars().enumerate() {
        if i >= offset { break; }
        if ch == '\n' {
            line += 1;
            column = 1;
        } else {
            column += 1;
        }
    }
    
    (line, column)
}
```

### 3. Smart Suggestions

```rust theme={null}
fn add_suggestions(mut detail: ParseErrorDetail) -> ParseErrorDetail {
    if let Some(token) = &detail.token {
        let token_lower = token.to_lowercase();
        
        // Detect common typos
        if token_lower.contains("entiy") {
            detail.suggestion = Some("Did you mean 'entity'?");
        }
        else if token_lower.contains("primry") {
            detail.suggestion = Some("Did you mean 'primary'?");
        }
        // ... more heuristics
    }
    detail
}
```

**Common suggestions:**

* Keyword typos: `entiy` → `entity`, `primry` → `primary`
* Missing colons: "Fields must have a type after the colon"
* Unclosed braces: "Missing closing brace"
* EOF errors: "You may be missing a closing brace }"

## Build Process

Location: `chameleon-core/build.rs`

```rust theme={null}
fn main() {
    // Generate parser from LALRPOP grammar
    lalrpop::Configuration::new()
        .set_out_dir(out_dir)
        .process_dir(parser_dir)
        .expect("Failed to process LALRPOP files");
    
    // Generated parser: OUT_DIR/parser/schema.rs
    // Included via: include!(concat!(env!("OUT_DIR"), "/parser/schema.rs"));
}
```

**Generated artifacts:**

* `$OUT_DIR/parser/schema.rs` - LR(1) parser state machine
* Compile-time only - not included in library distribution

## Performance Characteristics

| Operation           | Time    | Notes                         |
| ------------------- | ------- | ----------------------------- |
| Parse schema (cold) | \~10ms  | One-time cost per schema load |
| Parse schema (warm) | \~2ms   | With OS page cache            |
| AST construction    | \~1ms   | Minimal allocation overhead   |
| Error enhancement   | \~0.5ms | Only on parse failures        |

**Memory usage:**

* Schema AST: \~50 bytes per field + \~80 bytes per relation
* Parser state machine: \~200KB (generated at compile time)

## Example Usage

### Basic Parsing

```rust theme={null}
use chameleon::parser::parse_schema;

let input = r#"
    entity User {
        id: uuid primary,
        email: string unique,
        created_at: timestamp default now(),
    }
"#;

let schema = parse_schema(input)?;
assert_eq!(schema.entities.len(), 1);
```

### Accessing AST

```rust theme={null}
let user = schema.get_entity("User").unwrap();
assert_eq!(user.fields.len(), 3);

let id_field = user.fields.get("id").unwrap();
assert!(id_field.primary_key);
assert_eq!(id_field.field_type, FieldType::UUID);
```

### Error Handling

```rust theme={null}
match parse_schema("invalid { syntax") {
    Ok(_) => unreachable!(),
    Err(ChameleonError::ParseError(detail)) => {
        println!("Line {}, Column {}", detail.line, detail.column);
        println!("{}", detail.snippet.unwrap());
        println!("Suggestion: {}", detail.suggestion.unwrap());
    }
    Err(e) => panic!("Unexpected error: {}", e),
}
```

## Testing

Location: `chameleon-core/src/parser/mod.rs:170`

**Test coverage:**

* Simple entities with fields
* Relations (HasMany, BelongsTo, etc.)
* Backend annotations (`@cache`, `@olap`, `@vector`)
* Vector types with dimensions
* Array types (`[string]`, `[decimal]`)
* Complex multi-backend schemas

**Example test:**

```rust theme={null}
#[test]
fn test_backend_annotations() {
    let input = r#"
        entity Product {
            id: uuid primary,
            views_today: int @cache,
            monthly_sales: decimal @olap,
            embedding: vector(384) @vector,
        }
    "#;
    
    let schema = parse_schema(input).unwrap();
    let product = schema.get_entity("Product").unwrap();
    
    assert_eq!(product.fields.get("views_today").unwrap().backend, 
               Some(BackendAnnotation::Cache));
    assert_eq!(product.fields.get("embedding").unwrap().field_type, 
               FieldType::Vector(384));
}
```

## See Also

* [Type Checker](/api/type-checker) - AST validation
* [FFI Interface](/api/ffi-interface) - Exporting AST to Go
* [LALRPOP Documentation](https://lalrpop.github.io/lalrpop/) - Parser generator reference
