Compiler

Scope: This document describes how the compiler/transpiler works and is written as a “tech spec for outsiders” who want to study Hachi’s current compiler pipeline.

Primary implementation directories:

hachi/src/ - compiler + interpreter implementation
hachi/h/ - public headers for compiler runtime
hachi/src/Actions/ - executable “Action” nodes used by both interpreter and C++ codegen
hachi/src/HachiStdLib.cpp - standard library population into the global namespace

1. High-level architecture

Hachi is implemented as a pipeline that turns source text into:

Tokens (lexing)
AST nodes (parsing, using operator precedence + bracket handling)
Action tree (name/type resolution against namespaces + operator overload resolution)
Either:
- Interpreter execution (Action tree evaluated directly), or
- C++ codegen (Action tree emits a C++ program via CppProgram), optionally compiled and executed.

The central orchestrator is HachiProgram (hachi/src/HachiProgram.cpp).

1.1 The four “tree” layers

Layer	Class / representation	Built by	Purpose
Tokens	`Token` (`hachi/h/Token.h`)	`lexString()` (`hachi/src/Lexer.cpp`)	Line/column-aware token stream with operator tokens attached
AST	`AstNode` variants (`hachi/h/AstNode.h`)	`astNodeFromTokens()` (`hachi/src/Parser.cpp`)	Structural parse using operators/brackets
Action tree	`Action` / `ActionData` (`hachi/h/Action.h`)	`AstNodeBase::getAction()` implementations (`hachi/src/AstNode.cpp`)	Typed executable nodes + C++ emission
Output	Interpreter runtime OR `CppProgram` (`hachi/h/CppProgram.h`)	`ActionData::execute()` or `ActionData::addToProg()`	Run the program or produce C++

2. Entrypoint and CLI modes

The CLI entrypoint is main() in hachi/src/main.cpp.

2.1 Modes

A single Hachi source file is supported (multiple inputs are rejected).

Mode	Trigger	What happens
REPL	no file args	Reads lines into `/tmp/tmp-hachi.8`, repeatedly runs `hachi /tmp/tmp-hachi.8 -go`
Interpret	default when a file is given and no `-cpp/-build/-go` flags	Builds Action tree then executes it (`HachiProgram::execute()`)
Transpile to C++	`-cpp <file>`	Builds Action tree then emits C++ (`HachiProgram::getCpp()`), writes to file
Build binary	`-build <file>` or `-b <file>`	Emits C++ then compiles with `clang++ ... -o <file>`
Build+run	`-go` (also aliases `-e/-execute/-ko`)	Emits C++ then compiles then runs `./tmp_hc_compiled ...`
Leak-sanitized build	`-buildml <file>`	Compiles with AddressSanitizer flags
One-liner	`-c "<code>"`	Writes code to `/tmp/tmp-hachi.8`, then runs `hachi /tmp/tmp-hachi.8 -go`

Compiler invocation is performed via system() calls in main.cpp, constructing a clang++ command string.

3. Pipeline in `HachiProgram::resolveProgram`

HachiProgram::resolveProgram(filename, debug) performs the compilation pipeline:

AllOperators::init()
- Global ops pointer is assigned to a new AllOperators instance.
- Operator tokens and their precedence are declared in hachi/h/AllOperators.h.
populateHachiStdLib()
- Populates the global namespace with built-in types and actions (standard library).
- Implemented in hachi/src/HachiStdLib.cpp (large file).
Load source file: SourceFile(filename, debug)
- Loads file contents and appends a trailing newline (hachi/src/SourceFile.cpp).
Lex: lexString(file, tokens)
- Converts source bytes to a vector of Token objects.
Parse: astRoot = astNodeFromTokens(tokens, 0, tokens.size()-1)
- Produces an AST rooted at an AstNode.
Type + name resolution pass: astRoot->setInput(globalNamespace, true, Void, Void)
- Sets up a root namespace input and recursively configures AST nodes with:
  - namespace
  - dynamic flag
  - input types for each subtree
Build Action tree: actionRoot = astRoot->getAction()
- Each AST node resolves into one or more Action nodes.
Final-destruction wrapper: actionRoot = globalNamespace->wrapInDestroyer(actionRoot)
- Ensures the overall program result is routed through __destroy__ if one exists.

If any stage throws HachiError, it is logged and compilation continues in an error-safe state (often falling back to AstVoid).

4. Lexing (tokenization)

Implemented in hachi/src/Lexer.cpp.

4.1 Character classifier

CharClassifier categorizes each character into:

whitespace (' ', '\t', '\r')
newline ('\n') and “line break” (';')
letters ([A-Za-z_])
digits ([0-9] plus special handling for . if followed by a digit)
operator characters (any character that appears in any operator text in ops->getOpsMap())
string delimiter ('"')
single-line comment ('#')
block comment markers:
- start: //
- end: a backslash whose previous character is also backslash (\\)

Note: “block comments” in this snapshot are unusual: the start marker is //, while the end condition is detected by \\ (a double backslash). This is literal behavior in CharClassifier::get().

4.2 Token types

The lexer emits tokens of (conceptual) types:

IDENTIFIER
LITERAL (numbers)
STRING_LITERAL (content between quotes, escape processed)
OPERATOR (later split into specific Operator objects)
LINE_END (from newline or ;)
comments are not emitted as tokens

4.3 Operator splitting

When a contiguous run is classified as OPERATOR, it is split by calling:

ops->get(tokenTxt, opMatches) (hachi/src/AllOperators.cpp)

This function greedily matches operator substrings from left to right by shrinking end until a known operator text is found, then continuing.

Each matched operator becomes its own Token carrying a pointer to the matched Operator.

4.4 String escapes

Within string literals, backslash escapes are handled:

\n, \t, \", \\ supported
Any other \X throws a source error

5. Operators and precedence

Operators are declared in hachi/h/AllOperators.h via the ALL_OPS macro.

Each operator has:

textual spelling
precedence integer
associativity/input kind (OperatorData::LEFT, RIGHT, BOTH)
overloadability flag

Operator list in this snapshot (text → precedence):

@ 5
? 6
| 6
: 24
:: 24
, 35
^ 36
&& 38
= 40
!= 40
> 50
< 50
>= 50
<= 50
+ 61
- 61
* 71
/ 71
% 70
! 74
. 81
-> 83
<- 83
>@ 90

6. Parsing (tokens → AST)

Implemented in hachi/src/Parser.cpp as astNodeFromTokens(tokens, left, right).

6.1 Bracket stripping and empty program

getOuterBracOffset() detects whether a token range is fully wrapped by matching brackets.
If so, the parser strips the outer brackets and re-parses the inside.
If the resulting range is empty, returns AstVoid::make().

6.2 AST node kinds

The parser produces AST nodes defined in hachi/h/AstNode.h:

AST type	Represents
`AstVoid`	empty / no-op
`AstLiteral`	identifier token, number literal token, or string-literal token
`AstList`	a semicolon/newline-separated list (sequence)
`AstExpression`	a binary operator expression with `left`, `center` token, and `right`
`AstFunctionBody`	`( ... )` or `{ ... }` bodies parsed as function bodies
`AstType`	a type literal or type construct
`AstTypeTuple`	tuple type definition `a:T, b:U, ...`
`AstTypeArrow`	function type `LeftType -> RightType`

6.3 Lists (statement sequences)

At top level (or inside blocks), the parser looks for LINE_END boundaries and builds:

AstList(elements...)

Each element is parsed independently with astNodeFromTokens on its token subrange.

6.4 Operator parse

If the range is not a list:

The parser calls parseOperator(tokens, left, right).
It scans the range for the “best” operator using precedence rules and bracket-depth tracking:
- operators inside nested parentheses/brackets/braces are ignored
- operators whose input-direction conflicts with missing left/right operand are ignored

The chosen operator is called mainOp, and AstExpression(leftSubtree, mainOpToken, rightSubtree) is created.

6.5 Type parsing

Types are parsed by astNodeTypeFromTokens() and use a limited grammar:

identifier → AstType referencing a named type
{ ... } → tuple type (AstTypeTuple)
A -> B → function type (AstTypeArrow)

Tuple types are parsed from comma-separated name: Type pairs.

7. Namespace model and resolution

Namespaces are implemented by NamespaceData (hachi/h/Namespace.h, hachi/src/Namespace.cpp).

A Namespace is a shared_ptr<NamespaceData>.

7.1 What a namespace stores

A namespace has four “maps” (internally IdMap from string → vector of AST nodes):

types - nodes that represent types (meta-types)
actions - statically resolved actions (pure, not tied to runtime variables)
dynamicActions - actions tied to runtime state (variables, assignments)
whatevActions - templates / AnyT actions that can be specialized at call time

A namespace also:

points to a StackFrame (shared among nested scopes inside a function)
points to a parent namespace (lexical scope chain)
maintains a destructorActions list used to auto-insert cleanup for variables

7.2 Stack frames and input registers (Li / Ri)

Each function-like scope has a StackFrame (see hachi/h/StackFrame.h, hachi/src/StackFrame.cpp).

NamespaceData::setInput(leftType, rightType):

configures the stack frame’s input types
injects special variables into the namespace:
- Li for left input (if creatable)
- Ri for right input (if creatable)

Both getter and setter actions for Li and Ri are added to the namespace.

7.3 Action lookup: `getActionForTokenWithInput`

The core resolver is:

NamespaceData::getActionForTokenWithInput(token, leftType, rightType, dynamic, throwOnError, tokenForError)

Resolution order:

Look up nodes matching the token text (or operator text) in:
- actions (always)
- dynamicActions (only if dynamic == true)
Convert those AST nodes to Actions and keep only the ones whose input types match (leftType, rightType).
Special-case tuple field access:
- If leftType is a tuple and the token is an identifier, getSubType(name) is checked.
- If found and rightType is Void, a synthetic node is created for getElemFromTupleAction(leftType, name).
If no exact matches, attempt AnyT specialization:
- Retrieve nodes from whatevActions
- Ask each node to makeCopyWithSpecificTypes(leftType, rightType)
- If a specialized copy exists, its action is used and the specialized node is cached into actions.
Dynamic variable creation (only if not foundNodes):
- If dynamic == true
- token is an identifier
- leftType is Void
- rightType is creatable
- then addVar(rightType, tokenText) is invoked and the returned setter action is used.

Errors:

If multiple matches exist for the same signature: throws or returns null depending on throwSourceError
If name exists but signature mismatch: “correct overload not found…”
If name not found at all: “‘<name>’ not found”

7.4 Variable creation: `addVar`

NamespaceData::addVar(type, name) allocates storage in the current stack frame and registers:

a getter action
a setter action
(optionally) a copy action wrapper (if __copy__ exists)
(optionally) a destructor wrapper (if __destroy__ exists), appended into destructorActions

Getter/setter selection differs depending on whether this variable is in the global stack frame or a local stack frame.

7.5 Automatic cleanup integration

For any scope, the namespace carries destructorActions for variables declared in that scope.

Later, when an AstList resolves into an Action list, it appends the namespace’s destructorActions to the end of the statement sequence (see §9.2).

8. AST semantic pass: `AstNodeBase::setInput`

Every AST node supports:

setInput(Namespace ns, bool dynamic, Type leftIn, Type rightIn)

This pass:

records the namespace and dynamic setting on the node
computes input types for children
ensures type constructs and function bodies establish correct scoping

Key behavior by node type (from hachi/src/AstNode.cpp):

AstLiteral: stores ns/dynamic/inputs; no children
AstExpression: sets child inputs based on operator and expected operand behavior
AstList: creates a child namespace (ns->makeChild()) and sets each element input to (Void, Void)
AstFunctionBody: creates a child namespace with a new stack frame (ns->makeChildAndFrame(...)) and sets the body input to the function’s parameter types
AstType*: configures type subtrees; produces meta-types

9. AST → Action conversion

Each AST node has getAction(), which lazily calls resolveAction() once and caches the resulting Action.

9.1 Action interface

Actions are shared_ptr<ActionData> (typedef in hachi/h/Action.h).

Each Action has:

Type getReturnType()
Type getInLeftType(), getInRightType()
void* execute(void* inLeft, void* inRight) (interpreter path)
void addToProg(Action inLeft, Action inRight, CppProgram* prog) (C++ emission path)

9.2 `AstList` → `ListAction`

AstList::resolveAction():

resolves each element into an Action
appends destructor actions from the list’s namespace (*nameSpace->getDestroyerActions())
returns listAction(actions, destroyers) from hachi/src/Actions/ListAction.cpp

Execution semantics (ListAction::execute)

Executes each action in order; frees intermediate results
Preserves the last action’s output as the list return value
Executes all destroyers at the end and frees their outputs
Returns the last action output (or null if void)

C++ codegen (ListAction::addToProg)

Emits a scoped block
If the list is the top-level of a non-main function and must return a value, it stores final expr into -out and emits return -out;

9.3 `AstFunctionBody` → `FunctionAction`

AstFunctionBody::resolveAction() builds:

functionAction(bodyNode, returnType, stackFrame) (hachi/src/Actions/FunctionAction.cpp)

FunctionAction supports two construction modes:

from a concrete Action (already resolved)
from an AST node + return type (lazy resolution)

Interpreter semantics

Allocates a new stack buffer sized from the function’s StackFrame
Copies incoming left/right data into the stack frame offsets
Executes the function body action
Frees stack, restores prior stack pointer

C++ codegen

Creates a C++ function in CppProgram if not already present
Names are generated as:
- %<hint>_hc and then suffixed with _1, _2, … to avoid collisions (prog->hasFunc(name))
The function definition emits the body action and returns the result if appropriate.

9.4 `AstExpression` → operator dispatch

AstExpression::resolveAction() is where most language semantics live.

Resolution is driven by the center token:

If it is an operator token: center->getOp() != nullptr
Else it is an identifier / literal token whose action is looked up by name

The general pattern is:

resolve left/right actions
call NamespaceData::getActionForTokenWithInput(centerToken, leftType, rightType, dynamic, throwOnError, centerToken)
assemble those actions with branchAction(...) as needed

9.4.1 Special forms implemented directly in `AstExpression::resolveAction`

These are hard-coded in hachi/src/AstNode.cpp:

Import (>@)

Expects the right side to evaluate to a String at compile time (must be an AstLiteral string literal).
Loads module file:
- this->getToken()->getFile()->getDirPath() + "/hMods/" + modName + ".h"
Lexes and parses that file into modAst
Calls modAst->setInput(nameSpace, true, Void, Void)
Each element of the module (expected to be an AstList) is added to the current namespace using nameSpace->addNode(node, node->getToken()->getText())
The import expression itself returns voidAction.

Dot access (.)

Requires the right operand to be an identifier literal (AstLiteral).
Produces an action equivalent to: get field name from left tuple.

Conditional (?)

Always treats the left subtree as a Bool condition.
The right subtree is either:
- a single if-body, producing ifAction(condition, ifBody)
- or an if/else pair encoded via : inside the right subtree:
  - cond ? (ifExpr : elseExpr) becomes ifElseAction(condition, ifExpr, elseExpr)

Loop (@)

Always treats left subtree as Bool condition.
Right subtree:
- loop body action
- or a loop body + end-step pair encoded via ::
  - cond @ (endExpr : loopExpr) becomes loopAction(condition, endExpr, loopExpr)
  - (the endExpr is executed after each iteration)

Comma (,) tuple creation

Forms a tuple from comma-separated expressions.
Flattening behavior:
- If the left or right subtree is itself a comma-expression, its tuple elements are concatenated.
Generates makeTupleAction(actions) (hachi/src/Actions/MakeTupleAction.cpp).

9.4.2 Everything else: overloaded operator + identifier resolution

If the center token is not one of the special forms above, resolution goes through:

nameSpace->getActionForTokenWithInput(...)

This is used for:

:, ::, assignment behavior, and any stdlib-defined actions
arithmetic operators, comparisons, boolean operators
calling named functions/actions (when center is an identifier token and left/right are the operand actions)

Because operator overloadability is recorded in OperatorData, the standard library can expose multiple overloads for the same operator text and rely on (leftType, rightType) matching to select the correct Action.

10. Action primitives (runtime + codegen)

Hachi’s “executable IR” is the Action system in hachi/src/Actions/.

10.1 `branchAction(...)`

Defined in hachi/src/Actions/BranchAction.cpp.

Purpose:

Compose an operator action with pre-computed left/right input-producing actions, while enforcing:
- input-producing actions take (Void, Void) input
- their return types match the operator action’s input types

Variants:

BranchAction(leftProducer, opAction, rightProducer) when both inputs are non-void
LeftBranchAction(leftProducer, opAction) when right is void
RightBranchAction(opAction, rightProducer) when left is void
returns opAction directly if both producers return void

Interpreter behavior:

Executes producer(s), then opAction, then frees producer outputs.

C++ codegen behavior:

If needed, inserts tuple casts (cppTupleCastAction) to match operand tuple types before emitting.

10.2 `ListAction(...)`

See §9.2.

10.3 Tuple operations

Defined in hachi/src/Actions/MakeTupleAction.cpp:

makeTupleAction(vector<Action>):
- Builds a tuple return type by concatenating each source action’s return type
- Interpreter: allocates a flat buffer of summed sizes and memcpy’s each element sequentially
- C++: emits a tuple constructor expression TupleType(elem1, elem2, ...)
getElemFromTupleAction(tupleType, fieldName):
- Interpreter: allocates field-sized buffer and memcpy’s bytes from the tuple by offset
- C++: emits .fieldName, with optimization if the tuple source is a MakeTupleAction (it can emit the producing expression directly)
cppTupleCastAction(action, targetTupleType):
- Codegen-only cast helper used when tuple shapes “match” but names differ or a narrower/wider tuple is expected.

10.4 Control flow

Defined in:

hachi/src/Actions/IfAction.cpp
hachi/src/Actions/LoopAction.cpp

IfAction:

statement form only (returns void)
emits if (cond) { ... }

IfElseAction:

return type is the common type of the two branches if they match; otherwise Void
if used in expression position (prog->getExprLevel() > 0) and returns a value, emits C++ ternary cond ? a : b
otherwise emits statement if/else blocks and discards return values if needed

LoopAction:

statement form only (returns void)
emits while (cond) { loopBody; endAction; } (endAction optional)

11. Type system (as used by the transpiler)

Types are shared_ptr<TypeBase> (hachi/h/Type.h, hachi/src/Type.cpp).

11.1 Primitive kinds

TypeBase::PrimitiveType includes:

UNKNOWN, VOID, BYTE, DUB, INT, PTR, BOOL, TUPLE, WHATEV, METATYPE

Key globals (declared extern):

Unknown, AnyT, Void, Bool, Byte, Int, Flt (DUB), String (non-const extern in this snapshot)

11.2 Tuple layout

Tuple values are a flat byte concatenation of element values in declared order.

Offsets are computed as the running sum of prior element sizes (TupleType::getSubType()).

There is no alignment padding inserted by the tuple type implementation; the tuple’s getSize() is the sum of element sizes.

11.3 Meta-types

Types are represented in the AST as nodes whose return type is TypeBase::METATYPE and whose getSubType() yields the real type.

This is how AstType nodes communicate type objects through the AST/action system.

12. C++ output generation (`CppProgram`)

Actions emit C++ by calling methods on CppProgram (hachi/h/CppProgram.h, hachi/src/CppProgram.cpp).

Key ideas:

CppProgram maintains:
- global include/code sections
- global variable declarations
- global type declarations
- a map of generated functions (funcs)
- an “active function” being emitted
It uses a CppNameContainer hierarchy to map Hachi identifiers to collision-safe C++ identifiers.

Hachi “hardcoded names” in codegen often use special prefixes like:

global funcs: $
global vars: *
local vars: -
types: {...}

(These conventions are documented in comments in CppProgram.h.)

13. Cross-reference: language constructs → implementation

This section maps major Hachi constructs (as implemented) to the AST and Action machinery.

13.1 Core constructs

Construct	Parsed as	Resolved as
Program / block sequence	`AstList`	`ListAction` via `listAction(actions, destroyers)`
Identifier / numeric literal / string literal	`AstLiteral`	`resolveLiteral(...)` or `getActionForTokenWithInput(...)` (depends on token type)
Binary operator expression	`AstExpression`	usually `branchAction(leftProducer, opAction, rightProducer)`
Tuple creation via comma	nested `AstExpression` with `,`	`makeTupleAction(flattenedActions)`
Tuple field access `a.b`	`AstExpression` with `.`	`getElemFromTupleAction(type(a), "b")` (codegen uses `.b`)
If statement `cond ? body`	`AstExpression` with `?`	`ifAction(condAction, bodyAction)`
If-else `cond ? (if : else)`	`AstExpression` `?` with right `:`	`ifElseAction(cond, ifAct, elseAct)`
While loop `cond @ body`	`AstExpression` with `@`	`loopAction(cond, body)`
While loop with end-step `cond @ (end : body)`	`AstExpression` `@` with right `:`	`loopAction(cond, end, body)`
Import `>@ "mod"`	`AstExpression` with `>@`	module load + namespace injection, returns `voidAction`

13.2 Type constructs

Construct	Parsed as	Output
Named type `Int`	`AstType`	meta-type whose subtype is resolved from namespace
Tuple type `{a:Int, b:String}`	`AstTypeTuple`	tuple `Type` built from fields
Arrow type `A -> B`	`AstTypeArrow`	function type encoded as tuple types (implementation-defined in `Type.cpp`)

14. Standard library integration

The compiler calls populateHachiStdLib() during program resolution (HachiProgram::resolveProgram).

What it does (structurally):

Creates the root namespace (globalNamespace, declared extern)
Adds:
- core types (including meta-types)
- operator actions for built-in operators
- helper actions like __destroy__ and __copy__ for selected types
- module-related entries (as used by the import system)

Because stdlib population is the source of most actions/operators, understanding HachiStdLib.cpp is required to fully enumerate built-in operators and functions. This spec describes how the compiler consumes those definitions; it does not attempt to restate stdlib contents.

15. Observable invariants and “gotchas” (from implementation)

These are not opinions; they are direct consequences of the snapshot’s code paths:

The lexer does not flush a final token at EOF
lexString() pushes tokens only when a type transition occurs. The SourceFile constructor appends a newline, which forces a flush via a transition at the end, preventing missing-final-token in practice.
Tuple memory layout is packed without padding
Tuple element offsets are computed by summing getSize(); no alignment is applied.
Dynamic variable creation is signature-gated
A new variable is created only when resolving an identifier with (leftType=Void, rightType=creatable) in a dynamic scope and the identifier was not found in any namespace map.
Import expects a compile-time string literal
The import operator loads a file only when the right operand is a string literal node.
Block comments have nonstandard terminators in this snapshot
Start marker is // and end is detected by a \\ pattern. (Single-line # comments also exist.)

Appendix A - File index for further study

Core pipeline:

hachi/src/main.cpp
hachi/src/HachiProgram.cpp
hachi/src/Lexer.cpp
hachi/src/Parser.cpp
hachi/src/AstNode.cpp
hachi/src/Namespace.cpp
hachi/src/Type.cpp
hachi/src/CppProgram.cpp

Action implementations:

hachi/src/Actions/BranchAction.cpp
hachi/src/Actions/ListAction.cpp
hachi/src/Actions/FunctionAction.cpp
hachi/src/Actions/IfAction.cpp
hachi/src/Actions/LoopAction.cpp
hachi/src/Actions/MakeTupleAction.cpp