Compiler

Scope: This document describes how the compiler/transpiler works and is written as a “tech spec for outsiders” who want to study Hachi’s current compiler pipeline.

Primary implementation directories:

  • hachi/src/ - compiler + interpreter implementation
  • hachi/h/ - public headers for compiler runtime
  • hachi/src/Actions/ - executable “Action” nodes used by both interpreter and C++ codegen
  • hachi/src/HachiStdLib.cpp - standard library population into the global namespace

1. High-level architecture

Hachi is implemented as a pipeline that turns source text into:

  1. Tokens (lexing)
  2. AST nodes (parsing, using operator precedence + bracket handling)
  3. Action tree (name/type resolution against namespaces + operator overload resolution)
  4. Either:
    • Interpreter execution (Action tree evaluated directly), or
    • C++ codegen (Action tree emits a C++ program via CppProgram), optionally compiled and executed.

The central orchestrator is HachiProgram (hachi/src/HachiProgram.cpp).

1.1 The four “tree” layers

LayerClass / representationBuilt byPurpose
TokensToken (hachi/h/Token.h)lexString() (hachi/src/Lexer.cpp)Line/column-aware token stream with operator tokens attached
ASTAstNode variants (hachi/h/AstNode.h)astNodeFromTokens() (hachi/src/Parser.cpp)Structural parse using operators/brackets
Action treeAction / ActionData (hachi/h/Action.h)AstNodeBase::getAction() implementations (hachi/src/AstNode.cpp)Typed executable nodes + C++ emission
OutputInterpreter runtime OR CppProgram (hachi/h/CppProgram.h)ActionData::execute() or ActionData::addToProg()Run the program or produce C++

2. Entrypoint and CLI modes

The CLI entrypoint is main() in hachi/src/main.cpp.

2.1 Modes

A single Hachi source file is supported (multiple inputs are rejected).

ModeTriggerWhat happens
REPLno file argsReads lines into /tmp/tmp-hachi.8, repeatedly runs hachi /tmp/tmp-hachi.8 -go
Interpretdefault when a file is given and no -cpp/-build/-go flagsBuilds Action tree then executes it (HachiProgram::execute())
Transpile to C++-cpp <file>Builds Action tree then emits C++ (HachiProgram::getCpp()), writes to file
Build binary-build <file> or -b <file>Emits C++ then compiles with clang++ ... -o <file>
Build+run-go (also aliases -e/-execute/-ko)Emits C++ then compiles then runs ./tmp_hc_compiled ...
Leak-sanitized build-buildml <file>Compiles with AddressSanitizer flags
One-liner-c "<code>"Writes code to /tmp/tmp-hachi.8, then runs hachi /tmp/tmp-hachi.8 -go

Compiler invocation is performed via system() calls in main.cpp, constructing a clang++ command string.


3. Pipeline in HachiProgram::resolveProgram

HachiProgram::resolveProgram(filename, debug) performs the compilation pipeline:

  1. AllOperators::init()

    • Global ops pointer is assigned to a new AllOperators instance.
    • Operator tokens and their precedence are declared in hachi/h/AllOperators.h.
  2. populateHachiStdLib()

    • Populates the global namespace with built-in types and actions (standard library).
    • Implemented in hachi/src/HachiStdLib.cpp (large file).
  3. Load source file: SourceFile(filename, debug)

    • Loads file contents and appends a trailing newline (hachi/src/SourceFile.cpp).
  4. Lex: lexString(file, tokens)

    • Converts source bytes to a vector of Token objects.
  5. Parse: astRoot = astNodeFromTokens(tokens, 0, tokens.size()-1)

    • Produces an AST rooted at an AstNode.
  6. Type + name resolution pass: astRoot->setInput(globalNamespace, true, Void, Void)

    • Sets up a root namespace input and recursively configures AST nodes with:
      • namespace
      • dynamic flag
      • input types for each subtree
  7. Build Action tree: actionRoot = astRoot->getAction()

    • Each AST node resolves into one or more Action nodes.
  8. Final-destruction wrapper: actionRoot = globalNamespace->wrapInDestroyer(actionRoot)

    • Ensures the overall program result is routed through __destroy__ if one exists.

If any stage throws HachiError, it is logged and compilation continues in an error-safe state (often falling back to AstVoid).


4. Lexing (tokenization)

Implemented in hachi/src/Lexer.cpp.

4.1 Character classifier

CharClassifier categorizes each character into:

  • whitespace (' ', '\t', '\r')
  • newline ('\n') and “line break” (';')
  • letters ([A-Za-z_])
  • digits ([0-9] plus special handling for . if followed by a digit)
  • operator characters (any character that appears in any operator text in ops->getOpsMap())
  • string delimiter ('"')
  • single-line comment ('#')
  • block comment markers:
    • start: //
    • end: a backslash whose previous character is also backslash (\\)

Note: “block comments” in this snapshot are unusual: the start marker is //, while the end condition is detected by \\ (a double backslash). This is literal behavior in CharClassifier::get().

4.2 Token types

The lexer emits tokens of (conceptual) types:

  • IDENTIFIER
  • LITERAL (numbers)
  • STRING_LITERAL (content between quotes, escape processed)
  • OPERATOR (later split into specific Operator objects)
  • LINE_END (from newline or ;)
  • comments are not emitted as tokens

4.3 Operator splitting

When a contiguous run is classified as OPERATOR, it is split by calling:

  • ops->get(tokenTxt, opMatches) (hachi/src/AllOperators.cpp)

This function greedily matches operator substrings from left to right by shrinking end until a known operator text is found, then continuing.

Each matched operator becomes its own Token carrying a pointer to the matched Operator.

4.4 String escapes

Within string literals, backslash escapes are handled:

  • \n, \t, \", \\ supported
  • Any other \X throws a source error

5. Operators and precedence

Operators are declared in hachi/h/AllOperators.h via the ALL_OPS macro.

Each operator has:

  • textual spelling
  • precedence integer
  • associativity/input kind (OperatorData::LEFT, RIGHT, BOTH)
  • overloadability flag

Operator list in this snapshot (text → precedence):

  • @ 5
  • ? 6
  • | 6
  • : 24
  • :: 24
  • , 35
  • ^ 36
  • && 38
  • = 40
  • != 40
  • > 50
  • < 50
  • >= 50
  • <= 50
  • + 61
  • - 61
  • * 71
  • / 71
  • % 70
  • ! 74
  • . 81
  • -> 83
  • <- 83
  • >@ 90

6. Parsing (tokens → AST)

Implemented in hachi/src/Parser.cpp as astNodeFromTokens(tokens, left, right).

6.1 Bracket stripping and empty program

  • getOuterBracOffset() detects whether a token range is fully wrapped by matching brackets.
  • If so, the parser strips the outer brackets and re-parses the inside.
  • If the resulting range is empty, returns AstVoid::make().

6.2 AST node kinds

The parser produces AST nodes defined in hachi/h/AstNode.h:

AST typeRepresents
AstVoidempty / no-op
AstLiteralidentifier token, number literal token, or string-literal token
AstLista semicolon/newline-separated list (sequence)
AstExpressiona binary operator expression with left, center token, and right
AstFunctionBody( ... ) or { ... } bodies parsed as function bodies
AstTypea type literal or type construct
AstTypeTupletuple type definition a:T, b:U, ...
AstTypeArrowfunction type LeftType -> RightType

6.3 Lists (statement sequences)

At top level (or inside blocks), the parser looks for LINE_END boundaries and builds:

  • AstList(elements...)

Each element is parsed independently with astNodeFromTokens on its token subrange.

6.4 Operator parse

If the range is not a list:

  • The parser calls parseOperator(tokens, left, right).
  • It scans the range for the “best” operator using precedence rules and bracket-depth tracking:
    • operators inside nested parentheses/brackets/braces are ignored
    • operators whose input-direction conflicts with missing left/right operand are ignored

The chosen operator is called mainOp, and AstExpression(leftSubtree, mainOpToken, rightSubtree) is created.

6.5 Type parsing

Types are parsed by astNodeTypeFromTokens() and use a limited grammar:

  • identifier → AstType referencing a named type
  • { ... } → tuple type (AstTypeTuple)
  • A -> B → function type (AstTypeArrow)

Tuple types are parsed from comma-separated name: Type pairs.


7. Namespace model and resolution

Namespaces are implemented by NamespaceData (hachi/h/Namespace.h, hachi/src/Namespace.cpp).

A Namespace is a shared_ptr<NamespaceData>.

7.1 What a namespace stores

A namespace has four “maps” (internally IdMap from string → vector of AST nodes):

  • types - nodes that represent types (meta-types)
  • actions - statically resolved actions (pure, not tied to runtime variables)
  • dynamicActions - actions tied to runtime state (variables, assignments)
  • whatevActions - templates / AnyT actions that can be specialized at call time

A namespace also:

  • points to a StackFrame (shared among nested scopes inside a function)
  • points to a parent namespace (lexical scope chain)
  • maintains a destructorActions list used to auto-insert cleanup for variables

7.2 Stack frames and input registers (Li / Ri)

Each function-like scope has a StackFrame (see hachi/h/StackFrame.h, hachi/src/StackFrame.cpp).

NamespaceData::setInput(leftType, rightType):

  • configures the stack frame’s input types
  • injects special variables into the namespace:
    • Li for left input (if creatable)
    • Ri for right input (if creatable)

Both getter and setter actions for Li and Ri are added to the namespace.

7.3 Action lookup: getActionForTokenWithInput

The core resolver is:

NamespaceData::getActionForTokenWithInput(token, leftType, rightType, dynamic, throwOnError, tokenForError)

Resolution order:

  1. Look up nodes matching the token text (or operator text) in:
    • actions (always)
    • dynamicActions (only if dynamic == true)
  2. Convert those AST nodes to Actions and keep only the ones whose input types match (leftType, rightType).
  3. Special-case tuple field access:
    • If leftType is a tuple and the token is an identifier, getSubType(name) is checked.
    • If found and rightType is Void, a synthetic node is created for getElemFromTupleAction(leftType, name).
  4. If no exact matches, attempt AnyT specialization:
    • Retrieve nodes from whatevActions
    • Ask each node to makeCopyWithSpecificTypes(leftType, rightType)
    • If a specialized copy exists, its action is used and the specialized node is cached into actions.
  5. Dynamic variable creation (only if not foundNodes):
    • If dynamic == true
    • token is an identifier
    • leftType is Void
    • rightType is creatable
    • then addVar(rightType, tokenText) is invoked and the returned setter action is used.

Errors:

  • If multiple matches exist for the same signature: throws or returns null depending on throwSourceError
  • If name exists but signature mismatch: “correct overload not found
”
  • If name not found at all: “‘<name>’ not found”

7.4 Variable creation: addVar

NamespaceData::addVar(type, name) allocates storage in the current stack frame and registers:

  • a getter action
  • a setter action
  • (optionally) a copy action wrapper (if __copy__ exists)
  • (optionally) a destructor wrapper (if __destroy__ exists), appended into destructorActions

Getter/setter selection differs depending on whether this variable is in the global stack frame or a local stack frame.

7.5 Automatic cleanup integration

For any scope, the namespace carries destructorActions for variables declared in that scope.

Later, when an AstList resolves into an Action list, it appends the namespace’s destructorActions to the end of the statement sequence (see §9.2).


8. AST semantic pass: AstNodeBase::setInput

Every AST node supports:

setInput(Namespace ns, bool dynamic, Type leftIn, Type rightIn)

This pass:

  • records the namespace and dynamic setting on the node
  • computes input types for children
  • ensures type constructs and function bodies establish correct scoping

Key behavior by node type (from hachi/src/AstNode.cpp):

  • AstLiteral: stores ns/dynamic/inputs; no children
  • AstExpression: sets child inputs based on operator and expected operand behavior
  • AstList: creates a child namespace (ns->makeChild()) and sets each element input to (Void, Void)
  • AstFunctionBody: creates a child namespace with a new stack frame (ns->makeChildAndFrame(...)) and sets the body input to the function’s parameter types
  • AstType*: configures type subtrees; produces meta-types

9. AST → Action conversion

Each AST node has getAction(), which lazily calls resolveAction() once and caches the resulting Action.

9.1 Action interface

Actions are shared_ptr<ActionData> (typedef in hachi/h/Action.h).

Each Action has:

  • Type getReturnType()
  • Type getInLeftType(), getInRightType()
  • void* execute(void* inLeft, void* inRight) (interpreter path)
  • void addToProg(Action inLeft, Action inRight, CppProgram* prog) (C++ emission path)

9.2 AstList → ListAction

AstList::resolveAction():

  • resolves each element into an Action
  • appends destructor actions from the list’s namespace (*nameSpace->getDestroyerActions())
  • returns listAction(actions, destroyers) from hachi/src/Actions/ListAction.cpp

Execution semantics (ListAction::execute)

  • Executes each action in order; frees intermediate results
  • Preserves the last action’s output as the list return value
  • Executes all destroyers at the end and frees their outputs
  • Returns the last action output (or null if void)

C++ codegen (ListAction::addToProg)

  • Emits a scoped block
  • If the list is the top-level of a non-main function and must return a value, it stores final expr into -out and emits return -out;

9.3 AstFunctionBody → FunctionAction

AstFunctionBody::resolveAction() builds:

functionAction(bodyNode, returnType, stackFrame) (hachi/src/Actions/FunctionAction.cpp)

FunctionAction supports two construction modes:

  • from a concrete Action (already resolved)
  • from an AST node + return type (lazy resolution)

Interpreter semantics

  • Allocates a new stack buffer sized from the function’s StackFrame
  • Copies incoming left/right data into the stack frame offsets
  • Executes the function body action
  • Frees stack, restores prior stack pointer

C++ codegen

  • Creates a C++ function in CppProgram if not already present
  • Names are generated as:
    • %<hint>_hc and then suffixed with _1, _2, 
 to avoid collisions (prog->hasFunc(name))
  • The function definition emits the body action and returns the result if appropriate.

9.4 AstExpression → operator dispatch

AstExpression::resolveAction() is where most language semantics live.

Resolution is driven by the center token:

  • If it is an operator token: center->getOp() != nullptr
  • Else it is an identifier / literal token whose action is looked up by name

The general pattern is:

  1. resolve left/right actions
  2. call NamespaceData::getActionForTokenWithInput(centerToken, leftType, rightType, dynamic, throwOnError, centerToken)
  3. assemble those actions with branchAction(...) as needed

9.4.1 Special forms implemented directly in AstExpression::resolveAction

These are hard-coded in hachi/src/AstNode.cpp:

  1. Import (>@)
  • Expects the right side to evaluate to a String at compile time (must be an AstLiteral string literal).
  • Loads module file:
    • this->getToken()->getFile()->getDirPath() + "/hMods/" + modName + ".h"
  • Lexes and parses that file into modAst
  • Calls modAst->setInput(nameSpace, true, Void, Void)
  • Each element of the module (expected to be an AstList) is added to the current namespace using nameSpace->addNode(node, node->getToken()->getText())
  • The import expression itself returns voidAction.
  1. Dot access (.)
  • Requires the right operand to be an identifier literal (AstLiteral).
  • Produces an action equivalent to: get field name from left tuple.
  1. Conditional (?)
  • Always treats the left subtree as a Bool condition.
  • The right subtree is either:
    • a single if-body, producing ifAction(condition, ifBody)
    • or an if/else pair encoded via : inside the right subtree:
      • cond ? (ifExpr : elseExpr) becomes ifElseAction(condition, ifExpr, elseExpr)
  1. Loop (@)
  • Always treats left subtree as Bool condition.
  • Right subtree:
    • loop body action
    • or a loop body + end-step pair encoded via ::
      • cond @ (endExpr : loopExpr) becomes loopAction(condition, endExpr, loopExpr)
      • (the endExpr is executed after each iteration)
  1. Comma (,) tuple creation
  • Forms a tuple from comma-separated expressions.
  • Flattening behavior:
    • If the left or right subtree is itself a comma-expression, its tuple elements are concatenated.
  • Generates makeTupleAction(actions) (hachi/src/Actions/MakeTupleAction.cpp).

9.4.2 Everything else: overloaded operator + identifier resolution

If the center token is not one of the special forms above, resolution goes through:

nameSpace->getActionForTokenWithInput(...)

This is used for:

  • :, ::, assignment behavior, and any stdlib-defined actions
  • arithmetic operators, comparisons, boolean operators
  • calling named functions/actions (when center is an identifier token and left/right are the operand actions)

Because operator overloadability is recorded in OperatorData, the standard library can expose multiple overloads for the same operator text and rely on (leftType, rightType) matching to select the correct Action.


10. Action primitives (runtime + codegen)

Hachi’s “executable IR” is the Action system in hachi/src/Actions/.

10.1 branchAction(...)

Defined in hachi/src/Actions/BranchAction.cpp.

Purpose:

  • Compose an operator action with pre-computed left/right input-producing actions, while enforcing:
    • input-producing actions take (Void, Void) input
    • their return types match the operator action’s input types

Variants:

  • BranchAction(leftProducer, opAction, rightProducer) when both inputs are non-void
  • LeftBranchAction(leftProducer, opAction) when right is void
  • RightBranchAction(opAction, rightProducer) when left is void
  • returns opAction directly if both producers return void

Interpreter behavior:

  • Executes producer(s), then opAction, then frees producer outputs.

C++ codegen behavior:

  • If needed, inserts tuple casts (cppTupleCastAction) to match operand tuple types before emitting.

10.2 ListAction(...)

See §9.2.

10.3 Tuple operations

Defined in hachi/src/Actions/MakeTupleAction.cpp:

  • makeTupleAction(vector<Action>):

    • Builds a tuple return type by concatenating each source action’s return type
    • Interpreter: allocates a flat buffer of summed sizes and memcpy’s each element sequentially
    • C++: emits a tuple constructor expression TupleType(elem1, elem2, ...)
  • getElemFromTupleAction(tupleType, fieldName):

    • Interpreter: allocates field-sized buffer and memcpy’s bytes from the tuple by offset
    • C++: emits .fieldName, with optimization if the tuple source is a MakeTupleAction (it can emit the producing expression directly)
  • cppTupleCastAction(action, targetTupleType):

    • Codegen-only cast helper used when tuple shapes “match” but names differ or a narrower/wider tuple is expected.

10.4 Control flow

Defined in:

  • hachi/src/Actions/IfAction.cpp
  • hachi/src/Actions/LoopAction.cpp

IfAction:

  • statement form only (returns void)
  • emits if (cond) { ... }

IfElseAction:

  • return type is the common type of the two branches if they match; otherwise Void
  • if used in expression position (prog->getExprLevel() > 0) and returns a value, emits C++ ternary cond ? a : b
  • otherwise emits statement if/else blocks and discards return values if needed

LoopAction:

  • statement form only (returns void)
  • emits while (cond) { loopBody; endAction; } (endAction optional)

11. Type system (as used by the transpiler)

Types are shared_ptr<TypeBase> (hachi/h/Type.h, hachi/src/Type.cpp).

11.1 Primitive kinds

TypeBase::PrimitiveType includes:

  • UNKNOWN, VOID, BYTE, DUB, INT, PTR, BOOL, TUPLE, WHATEV, METATYPE

Key globals (declared extern):

  • Unknown, AnyT, Void, Bool, Byte, Int, Flt (DUB), String (non-const extern in this snapshot)

11.2 Tuple layout

Tuple values are a flat byte concatenation of element values in declared order.

Offsets are computed as the running sum of prior element sizes (TupleType::getSubType()).

There is no alignment padding inserted by the tuple type implementation; the tuple’s getSize() is the sum of element sizes.

11.3 Meta-types

Types are represented in the AST as nodes whose return type is TypeBase::METATYPE and whose getSubType() yields the real type.

This is how AstType nodes communicate type objects through the AST/action system.


12. C++ output generation (CppProgram)

Actions emit C++ by calling methods on CppProgram (hachi/h/CppProgram.h, hachi/src/CppProgram.cpp).

Key ideas:

  • CppProgram maintains:
    • global include/code sections
    • global variable declarations
    • global type declarations
    • a map of generated functions (funcs)
    • an “active function” being emitted
  • It uses a CppNameContainer hierarchy to map Hachi identifiers to collision-safe C++ identifiers.

Hachi “hardcoded names” in codegen often use special prefixes like:

  • global funcs: $
  • global vars: *
  • local vars: -
  • types: {...}

(These conventions are documented in comments in CppProgram.h.)


13. Cross-reference: language constructs → implementation

This section maps major Hachi constructs (as implemented) to the AST and Action machinery.

13.1 Core constructs

ConstructParsed asResolved as
Program / block sequenceAstListListAction via listAction(actions, destroyers)
Identifier / numeric literal / string literalAstLiteralresolveLiteral(...) or getActionForTokenWithInput(...) (depends on token type)
Binary operator expressionAstExpressionusually branchAction(leftProducer, opAction, rightProducer)
Tuple creation via commanested AstExpression with ,makeTupleAction(flattenedActions)
Tuple field access a.bAstExpression with .getElemFromTupleAction(type(a), "b") (codegen uses .b)
If statement cond ? bodyAstExpression with ?ifAction(condAction, bodyAction)
If-else cond ? (if : else)AstExpression ? with right :ifElseAction(cond, ifAct, elseAct)
While loop cond @ bodyAstExpression with @loopAction(cond, body)
While loop with end-step cond @ (end : body)AstExpression @ with right :loopAction(cond, end, body)
Import >@ "mod"AstExpression with >@module load + namespace injection, returns voidAction

13.2 Type constructs

ConstructParsed asOutput
Named type IntAstTypemeta-type whose subtype is resolved from namespace
Tuple type {a:Int, b:String}AstTypeTupletuple Type built from fields
Arrow type A -> BAstTypeArrowfunction type encoded as tuple types (implementation-defined in Type.cpp)

14. Standard library integration

The compiler calls populateHachiStdLib() during program resolution (HachiProgram::resolveProgram).

What it does (structurally):

  • Creates the root namespace (globalNamespace, declared extern)
  • Adds:
    • core types (including meta-types)
    • operator actions for built-in operators
    • helper actions like __destroy__ and __copy__ for selected types
    • module-related entries (as used by the import system)

Because stdlib population is the source of most actions/operators, understanding HachiStdLib.cpp is required to fully enumerate built-in operators and functions. This spec describes how the compiler consumes those definitions; it does not attempt to restate stdlib contents.


15. Observable invariants and “gotchas” (from implementation)

These are not opinions; they are direct consequences of the snapshot’s code paths:

  1. The lexer does not flush a final token at EOF
    lexString() pushes tokens only when a type transition occurs. The SourceFile constructor appends a newline, which forces a flush via a transition at the end, preventing missing-final-token in practice.

  2. Tuple memory layout is packed without padding
    Tuple element offsets are computed by summing getSize(); no alignment is applied.

  3. Dynamic variable creation is signature-gated
    A new variable is created only when resolving an identifier with (leftType=Void, rightType=creatable) in a dynamic scope and the identifier was not found in any namespace map.

  4. Import expects a compile-time string literal
    The import operator loads a file only when the right operand is a string literal node.

  5. Block comments have nonstandard terminators in this snapshot
    Start marker is // and end is detected by a \\ pattern. (Single-line # comments also exist.)


Appendix A - File index for further study

Core pipeline:

  • hachi/src/main.cpp
  • hachi/src/HachiProgram.cpp
  • hachi/src/Lexer.cpp
  • hachi/src/Parser.cpp
  • hachi/src/AstNode.cpp
  • hachi/src/Namespace.cpp
  • hachi/src/Type.cpp
  • hachi/src/CppProgram.cpp

Action implementations:

  • hachi/src/Actions/BranchAction.cpp
  • hachi/src/Actions/ListAction.cpp
  • hachi/src/Actions/FunctionAction.cpp
  • hachi/src/Actions/IfAction.cpp
  • hachi/src/Actions/LoopAction.cpp
  • hachi/src/Actions/MakeTupleAction.cpp