The THP Language Specification

This series of pages define the THP Programming Language.

THP’s grammar is context-dependant.

The syntax is specified using a weird mix of Extended Backus Naur Form and RegExp:

; comments

syntax        = concatenation
concatenation = alternation grouping

alternation   = "a" | "b"
              | "c"
grouping      = ("a", "b")

optional      = "a"?
one_or_more   = "a"+
zero_or_more  = "a"*

range         = "1".."9"
literal       = "a"

Compiler architecture

The compiler consists of 5 common phases:

Source Code representation

Source code is encoded in UTF-8, and a single UTF-8 codepoint is a single character. As THP is implemented using the Rust programming language, rules around Rust’s UTF-8 usage are followed.

Basic characters

Although the source code must be encoded in UTF-8, most of the actual source code will use only the basic 128 ASCII characters. String contents may contain any Unicode code point.

underscore    = "_"

decimal_digit = "0".."9"
binary_digit  = "0" | "1"
octal_digit   = "0".."7"
hex_digit     = decimal_digit | "a".."f" | "A".."F"

lowercase_letter = "a".."z"
uppercase_letter = "A".."Z"

Whitespace

THP is partially whitespace sensitive. It uses the following tokens: Indent, Dedent & NewLine to determine when an expression spans multiple lines.

The lexer stores the indentation level of every line, and when scanning the next line, compares the previous indentation to the new one. If the amount of whitespace is greater than before, it emits a Indent token. If it’s lower, emits a Dedent token, and if it’s the same it does nothing.

1 + 2
    + 3
    + 4thp
    

The previous code would emit the following tokens: 1 + 2 NewLine Indent + 3 NewLine + 4 Dedent

Additionaly, it is a lexical error to have wrong indentation. The lexer stores all previous indentation levels in a stack, and reports an error if a decrease in indentation doesn’t match a previous level.

if true {   // 0 indentation
    print() // 4 indentation
  print()   // 2 indentation. Error. There is no 2-indentation level
}thp
    
Lexical error: Indentation error: expected 0 spaces, found 2 at line 3:1

All productions of the grammar ignore whitespace/indentation, except those involved in semicolon inference.

Statement termination / Semicolon inference

Only inside a block of code whitespace is used to determine where a statement ends and a new one begins. Everywhere else whitespace is ignored.

Statements in THP end when a new line is encountered:

// The statement ends         | here, on the newline
val value = (123 + 456) * 0.75thp
    
// Each line contains a different statement. They all end on their new lines

var a = 1 + 2   // a = 3
+ 3             // this is not part of `a`, this is a different statementthp
    
Syntax error: Expected an statement or an expresion at the top level. at line 4:1

This is true even if the line ends with an operator:

// These are still different statements

var a = 1 + 2 +     // This is now a compile error, there is a hanging 
3                   // This is still a different statementthp
    

Parenthesis

Exception 1: When a parenthesis is open, all following whitespace is ignored until the closing parenthesis.

// open parenthesis found, all whitespace is ignored until the closing
name.contains(
"weird"
    )thp
    
Syntax error: Unexpected token `.`, expected a new line at line 2:5

However, for a parenthesis to begin to act, it needs to be open on the same line.

// Still 2 statements, because the parenthesis is in a new line
print
(
    "hello"
)

// Now it's one single statement
print(
    "hello"
)thp
    

Indented binary operator

Exception 2:

val sum = 1 + 2 +   // The line ends with a binary operator
    3               // There is indentationthp
    
val sum = 1 + 2
    + 3             // Indentation and a binary operatorthp
    
Fatal Compiler error: thread 'main' panicked at src/syntax/parsers/expression/utils.rs:79:18: internal error: entered unreachable code: Illegal parser state: Expected DEDENT (count: 1) note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

In theses cases, all whitespace will be ignored until the indentation returns to the initial level.

// This method chain is a single statement because of the indentation
val person = PersonBuilder()
    .set_name("john")
    .set_lastname("doe")
    .set_age(32)
    .set_children(2)
    .build()

// Here indentation returns, and a new statement begins
print(person)thp
    
Syntax error: Expected an expression after the equal `=` operator at line 2:12