# PDSL Language Specification
**Version:** 0.1.0
**Status:** Design Complete
This document provides the complete formal specification for the Probabilistic Domain-Specific Language (PDSL).
## Table of Contents
1. [Overview](#overview)
2. [Lexical Structure](#lexical-structure)
3. [Grammar (EBNF)](#grammar-ebnf)
4. [Type System](#type-system)
5. [Semantics](#semantics)
6. [Operator Precedence](#operator-precedence)
7. [Scoping Rules](#scoping-rules)
8. [Probability Constraints](#probability-constraints)
9. [Type Inference](#type-inference)
10. [Validation Rules](#validation-rules)
## Overview
PDSL is a declarative language for expressing probabilistic knowledge. It extends logic programming with probabilistic annotations and compiles to ProbLog.
### Design Goals
1. **Natural syntax** - Readable and writeable by humans and LLMs
2. **Type safety** - Catch errors at parse time
3. **Expressiveness** - Support full ProbLog capabilities
4. **Bidirectional** - PDSL ↔ ProbLog conversion
## Lexical Structure
### Keywords
Reserved words that cannot be used as identifiers:
```
probabilistic_model # Model declaration
observe # Evidence declaration
query # Query declaration
learn # Parameter learning
parameters # Used with learn
from # Used with learn
dataset # Data source
not # Logical negation
true # Boolean constant
false # Boolean constant
```
### Identifiers
**Variables:** Start with uppercase letter
```
Variable := [A-Z][a-zA-Z0-9_]*
Examples: X, Y, Person, PatientId, _Temp
```
**Constants/Predicates:** Start with lowercase letter
```
Constant := [a-z][a-zA-Z0-9_]*
Examples: alice, flu, temperature, isFriend
```
**Predicate names follow constant syntax.**
### Literals
**Probability:** Float in range [0.0, 1.0]
```
Probability := [0-9]+ ('.' [0-9]+)?
Examples: 0.5, 0.95, 1.0, 0, 1
```
**Number:** Integer or float
```
Number := [0-9]+ ('.' [0-9]+)?
Examples: 42, 3.14, 100, 0.001
```
**String:** Double-quoted text
```
String := '"' [^"]* '"'
Examples: "medical_data.csv", "error message"
```
### Operators
**Probabilistic Annotation:** `::`
```
0.7 :: sunny
```
**Logical Implication:** `:-`
```
flies(X) :- bird(X)
```
**Annotated Disjunction:** `;`
```
0.3 :: a; 0.7 :: b
```
**Conjunction:** `,`
```
bird(X), not penguin(X)
```
**Negation:** `not`
```
not raining
```
### Delimiters
```
( ) # Parentheses for predicates and grouping
{ } # Braces for model body
[ ] # Brackets for lists (future use)
. # Statement terminator (optional in PDSL)
, # Separator for conjunctions and arguments
; # Separator for disjunctions
: # Used in module syntax (future)
```
### Comments
**Single-line comments:** Start with `#`
```
# This is a comment
0.5 :: rain # Inline comment
```
**Multi-line comments:** Not supported in v0.1.0 (future feature)
### Whitespace
Spaces, tabs, newlines are ignored except as token separators.
## Grammar (EBNF)
Complete PDSL grammar in Extended Backus-Naur Form:
```ebnf
(* Top-level structure *)
Program ::= Model+
Model ::= 'probabilistic_model' Identifier '{' Statement* '}'
Statement ::=
| ProbabilisticFact
| ProbabilisticRule
| DeterministicFact
| Observation
| Query
| LearningDirective
| Comment
(* Probabilistic constructs *)
ProbabilisticFact ::= Probability '::' Atom
ProbabilisticRule ::= Probability '::' Atom ':-' Body
AnnotatedDisjunction ::= ProbabilisticFact (';' ProbabilisticFact)+
DeterministicFact ::= Atom
(* Queries and observations *)
Observation ::= 'observe' Literal
Query ::= 'query' Atom
LearningDirective ::= 'learn' 'parameters' 'from' 'dataset' '(' String ')'
(* Logical expressions *)
Atom ::= Predicate
| Predicate '(' ArgumentList ')'
Predicate ::= Constant
ArgumentList ::= Term (',' Term)*
Term ::= Variable
| Constant
| Number
| String
| Atom
Body ::= Literal (',' Literal)*
Literal ::= Atom
| 'not' Atom
| '(' Body ')'
(* Lexical elements *)
Probability ::= [0-9]+ ('.' [0-9]+)?
Variable ::= [A-Z] [a-zA-Z0-9_]*
Constant ::= [a-z] [a-zA-Z0-9_]*
Number ::= [0-9]+ ('.' [0-9]+)?
String ::= '"' [^"]* '"'
Identifier ::= [a-zA-Z] [a-zA-Z0-9_]*
Comment ::= '#' [^\n]* '\n'
```
### Grammar Notes
1. **Optional statement terminators** - Newlines act as statement separators
2. **Case-sensitive** - Variables start uppercase, constants/predicates lowercase
3. **Implicit conjunction** - Comma binds tighter than semicolon
4. **Left-associative** - All binary operators associate left
## Type System
PDSL has a simple but effective type system:
### Base Types
```typescript
type Probability = number; // [0.0, 1.0]
type Variable = string; // Starts with uppercase
type Constant = string; // Starts with lowercase
type Number = number; // Any numeric value
type String = string; // Quoted text
type Boolean = true | false;
```
### Composite Types
```typescript
// Predicate with arguments
type Predicate = {
name: string;
arity: number;
arguments: Term[];
}
// Terms can be variables, constants, or nested predicates
type Term = Variable | Constant | Number | Predicate;
// Probabilistic fact
type ProbFact = {
probability: Probability;
atom: Predicate;
}
// Probabilistic rule
type ProbRule = {
probability: Probability;
head: Predicate;
body: Literal[];
}
// Literal (positive or negative atom)
type Literal = {
negated: boolean;
atom: Predicate;
}
```
### Type Predicates
Type checking predicates:
```typescript
isVariable(term: Term): boolean
// Returns true if term starts with uppercase
isConstant(term: Term): boolean
// Returns true if term starts with lowercase
isProbability(value: number): boolean
// Returns true if 0.0 <= value <= 1.0
isGroundTerm(term: Term): boolean
// Returns true if term contains no variables
isGroundAtom(atom: Predicate): boolean
// Returns true if all arguments are ground terms
```
## Semantics
### Probabilistic Facts
A probabilistic fact represents an uncertain ground truth:
```pdsl
0.7 :: sunny
```
**Semantics:** There's a 70% probability that `sunny` is true.
**Interpretation:** In ProbLog, this creates a probabilistic choice:
- 70% chance: `sunny = true`
- 30% chance: `sunny = false`
### Probabilistic Rules
A rule defines conditional probabilities:
```pdsl
0.9 :: flies(X) :- bird(X), not penguin(X)
```
**Semantics:** For any X that is a bird and not a penguin, there's a 90% chance X flies.
**Interpretation:** Creates conditional probability P(flies(X) | bird(X) ∧ ¬penguin(X)) = 0.9
### Annotated Disjunctions
Mutually exclusive alternatives:
```pdsl
0.3 :: weather(rainy); 0.5 :: weather(cloudy); 0.2 :: weather(sunny)
```
**Semantics:** Exactly one outcome occurs, with given probabilities.
**Constraints:**
- Probabilities must sum to ≤ 1.0
- If sum < 1.0, implicit "none" option has probability (1.0 - sum)
### Observations
Evidence updates beliefs:
```pdsl
observe fever
```
**Semantics:** Condition all probabilities on `fever = true`.
**Effect:** Apply Bayes' theorem to compute posterior probabilities.
### Queries
Request probability computation:
```pdsl
query flu
```
**Semantics:** Compute P(flu | evidence).
**Output:** Probability that `flu` is true given all observations.
### Deterministic Facts
Facts with probability 1.0:
```pdsl
bird(sparrow)
```
**Semantics:** Equivalent to `1.0 :: bird(sparrow)`.
**Interpretation:** Known with certainty.
## Operator Precedence
From highest to lowest precedence:
| Precedence | Operator | Associativity | Description |
|------------|----------|---------------|-------------|
| 1 | `( )` | N/A | Grouping |
| 2 | `not` | Right | Negation |
| 3 | `,` | Left | Conjunction (AND) |
| 4 | `;` | Left | Disjunction (OR) / AD separator |
| 5 | `:-` | Non-associative | Implication |
| 6 | `::` | Non-associative | Probabilistic annotation |
### Examples
```pdsl
# Parentheses override precedence
not (a, b) # not(a AND b)
(not a), b # (not a) AND b
# Conjunction binds tighter than disjunction
a, b; c, d # (a AND b) OR (c AND d)
# Negation has high precedence
not a, b # (not a) AND b
# Probabilistic annotation has lowest precedence
0.7 :: a :- b, c # 0.7 :: (a :- (b AND c))
```
## Scoping Rules
### Variable Scope
Variables are scoped to the rule or query in which they appear:
```pdsl
# X is scoped to this rule
0.9 :: flies(X) :- bird(X)
# Different X, different scope
0.8 :: swims(X) :- fish(X)
```
**Rule:** Variables in the head must appear in the body (safety condition).
❌ **Unsafe:**
```pdsl
flies(X) :- bird(Y) # X not bound in body
```
✅ **Safe:**
```pdsl
flies(X) :- bird(X)
```
### Predicate Scope
Predicates are globally scoped within a model:
```pdsl
probabilistic_model Example {
bird(sparrow) # Define bird/1
0.9 :: flies(X) :- bird(X) # Use bird/1
}
```
### Model Scope
Models create namespaces (future feature for imports):
```pdsl
probabilistic_model Medical {
disease(flu)
}
probabilistic_model Weather {
# Cannot access Medical.disease without import
}
```
## Probability Constraints
### Valid Probabilities
All probabilities must satisfy:
```
0.0 ≤ P ≤ 1.0
```
**Validation:** Checked at parse time.
❌ **Invalid:**
```pdsl
1.5 :: impossible # Error: P > 1.0
-0.3 :: negative # Error: P < 0.0
```
### Annotated Disjunction Constraints
For annotated disjunctions `P1 :: A1; P2 :: A2; ... ; Pn :: An`:
```
P1 + P2 + ... + Pn ≤ 1.0
```
**Validation:** Checked at parse time.
❌ **Invalid:**
```pdsl
0.6 :: a; 0.5 :: b # Error: 0.6 + 0.5 = 1.1 > 1.0
```
✅ **Valid:**
```pdsl
0.6 :: a; 0.4 :: b # OK: 0.6 + 0.4 = 1.0
0.5 :: a; 0.3 :: b # OK: 0.5 + 0.3 = 0.8 < 1.0 (0.2 implicit)
```
### Independence Assumptions
Multiple probabilistic facts are independent:
```pdsl
0.3 :: rain
0.8 :: cloudy
```
**Semantics:** P(rain ∧ cloudy) = P(rain) × P(cloudy) = 0.24
To model dependence, use rules:
```pdsl
0.3 :: rain
0.9 :: cloudy :- rain
0.5 :: cloudy :- not rain
```
## Type Inference
PDSL uses simple type inference:
### Constant Type Inference
```pdsl
bird(sparrow)
```
**Inferred types:**
- `bird`: predicate of arity 1
- `sparrow`: constant (ground term)
### Variable Type Inference
```pdsl
flies(X) :- bird(X)
```
**Inferred types:**
- `X`: variable (must be bound by `bird(X)`)
- Type of `X` determined by domain of `bird/1`
### Arity Inference
```pdsl
person(alice)
person(bob, 30) # Error: arity mismatch
```
**Rule:** All uses of predicate `p` must have same arity.
**Validation:** Checked at parse time.
### Probability Type Inference
```pdsl
0.7 :: sunny
p1 :: rain # p1 must be in [0, 1]
```
**Rule:** Probability literals must be valid floats in [0, 1].
**Named probabilities** (like `p1`) are for parameter learning and must be declared or learned.
## Validation Rules
### Syntactic Validation
1. **Variable naming:** Variables start with uppercase
2. **Constant naming:** Constants/predicates start with lowercase
3. **Probability range:** All probabilities in [0.0, 1.0]
4. **Annotated disjunction sum:** Sum ≤ 1.0
5. **Matching parentheses:** All `(` have matching `)`
6. **Statement termination:** Statements separated by newlines or semicolons
### Semantic Validation
1. **Variable safety:** All head variables appear in body
2. **Arity consistency:** All uses of predicate have same arity
3. **Grounding:** Queries must be ground or have bound variables
4. **Observation validity:** Observed atoms must be defined
5. **Cyclic dependencies:** Detect and warn about cycles (optional)
### Type Validation
1. **Probability type:** Numeric values in probabilistic annotations
2. **Predicate arguments:** Terms match expected types
3. **Variable binding:** Variables bound before use
4. **Constant usage:** Constants not used as predicates
## Well-Formed Programs
A PDSL program is well-formed if:
1. ✅ **Syntactically valid** - Parses according to grammar
2. ✅ **Type-correct** - All types are valid and consistent
3. ✅ **Probability-valid** - All probabilities in [0, 1], ADs sum to ≤ 1
4. ✅ **Variable-safe** - All variables properly bound
5. ✅ **Arity-consistent** - Predicates used with consistent arity
## Error Messages
PDSL provides clear, actionable error messages:
### Syntax Errors
```
Error: Invalid probability value
Line 3: 1.5 :: unlikely
^^^
Probability must be between 0.0 and 1.0
Suggestion: Did you mean 0.15?
```
### Type Errors
```
Error: Variable not bound in rule body
Line 5: flies(X) :- bird(Y)
^
Variable X in head must appear in body
Suggestion: Change to flies(Y) :- bird(Y)
```
### Probability Constraint Errors
```
Error: Annotated disjunction sum exceeds 1.0
Line 7: 0.6 :: a; 0.5 :: b
^^^^^^^^^^^^^^^^^^
Sum: 0.6 + 0.5 = 1.1 > 1.0
Suggestion: Probabilities must sum to at most 1.0
```
## Extensions and Future Work
### Planned Features (v0.2.0)
1. **Multi-line comments:** `/* ... */`
2. **List syntax:** `[1, 2, 3]` for aggregates
3. **Arithmetic:** `X is Y + 1` for computations
4. **Comparison:** `X > Y`, `X =< Y` for numeric comparisons
5. **Aggregates:** `count`, `sum`, `avg` for probabilistic aggregation
6. **Modules:** Import/export between models
7. **Type annotations:** Explicit type declarations
### Experimental Features (v0.3.0+)
1. **Continuous distributions:** Support for Gaussian, Beta, etc.
2. **Utility theory:** Decision-theoretic reasoning
3. **Temporal reasoning:** Time-indexed predicates
4. **Causal inference:** `do` operator for interventions
## Formal Semantics
### Distribution Semantics
PDSL follows ProbLog's distribution semantics:
1. **Probabilistic facts** define a probability distribution over possible worlds
2. **Rules** are evaluated in each possible world
3. **Queries** compute marginal probabilities over all worlds
4. **Evidence** restricts to worlds where observations hold
### Probability Computation
Given a PDSL program P and query Q:
```
P(Q) = Σ P(ω) × [[Q]]_ω
ω∈Ω
```
Where:
- Ω = set of all possible worlds
- P(ω) = probability of world ω
- [[Q]]_ω = 1 if Q is true in world ω, 0 otherwise
### Conditional Probability
With evidence E:
```
P(Q | E) = P(Q ∧ E) / P(E)
```
## Reference Implementation
See [PROBABILISTIC_DSL_PARSER_DESIGN.md](PROBABILISTIC_DSL_PARSER_DESIGN.md) for implementation details.
## Examples
See [PROBABILISTIC_DSL_EXAMPLES.md](PROBABILISTIC_DSL_EXAMPLES.md) for comprehensive examples.
## BNF Summary
Quick reference BNF (simplified):
```bnf
<program> ::= <model>+
<model> ::= "probabilistic_model" <id> "{" <stmt>* "}"
<stmt> ::= <prob_fact> | <prob_rule> | <fact> | <obs> | <query>
<prob_fact> ::= <prob> "::" <atom>
<prob_rule> ::= <prob> "::" <atom> ":-" <body>
<fact> ::= <atom>
<obs> ::= "observe" <atom>
<query> ::= "query" <atom>
<atom> ::= <pred> | <pred> "(" <terms> ")"
<body> ::= <literal> ("," <literal>)*
<literal> ::= <atom> | "not" <atom>
<terms> ::= <term> ("," <term>)*
<term> ::= <var> | <const> | <number> | <atom>
<prob> ::= [0-9]+ ("." [0-9]+)?
<var> ::= [A-Z][a-zA-Z0-9_]*
<const> ::= [a-z][a-zA-Z0-9_]*
<pred> ::= <const>
```
---
**This specification defines PDSL v0.1.0. For updates, see the project repository.**