Compiler
Scope: This document describes how the compiler/transpiler works and is written as a âtech spec for outsidersâ who want to study Hachiâs current compiler pipeline.
Primary implementation directories:
hachi/src/- compiler + interpreter implementationhachi/h/- public headers for compiler runtimehachi/src/Actions/- executable âActionâ nodes used by both interpreter and C++ codegenhachi/src/HachiStdLib.cpp- standard library population into the global namespace
1. High-level architecture
Hachi is implemented as a pipeline that turns source text into:
- Tokens (lexing)
- AST nodes (parsing, using operator precedence + bracket handling)
- Action tree (name/type resolution against namespaces + operator overload resolution)
- Either:
- Interpreter execution (Action tree evaluated directly), or
- C++ codegen (Action tree emits a C++ program via
CppProgram), optionally compiled and executed.
The central orchestrator is HachiProgram (hachi/src/HachiProgram.cpp).
1.1 The four âtreeâ layers
| Layer | Class / representation | Built by | Purpose |
|---|---|---|---|
| Tokens | Token (hachi/h/Token.h) | lexString() (hachi/src/Lexer.cpp) | Line/column-aware token stream with operator tokens attached |
| AST | AstNode variants (hachi/h/AstNode.h) | astNodeFromTokens() (hachi/src/Parser.cpp) | Structural parse using operators/brackets |
| Action tree | Action / ActionData (hachi/h/Action.h) | AstNodeBase::getAction() implementations (hachi/src/AstNode.cpp) | Typed executable nodes + C++ emission |
| Output | Interpreter runtime OR CppProgram (hachi/h/CppProgram.h) | ActionData::execute() or ActionData::addToProg() | Run the program or produce C++ |
2. Entrypoint and CLI modes
The CLI entrypoint is main() in hachi/src/main.cpp.
2.1 Modes
A single Hachi source file is supported (multiple inputs are rejected).
| Mode | Trigger | What happens |
|---|---|---|
| REPL | no file args | Reads lines into /tmp/tmp-hachi.8, repeatedly runs hachi /tmp/tmp-hachi.8 -go |
| Interpret | default when a file is given and no -cpp/-build/-go flags | Builds Action tree then executes it (HachiProgram::execute()) |
| Transpile to C++ | -cpp <file> | Builds Action tree then emits C++ (HachiProgram::getCpp()), writes to file |
| Build binary | -build <file> or -b <file> | Emits C++ then compiles with clang++ ... -o <file> |
| Build+run | -go (also aliases -e/-execute/-ko) | Emits C++ then compiles then runs ./tmp_hc_compiled ... |
| Leak-sanitized build | -buildml <file> | Compiles with AddressSanitizer flags |
| One-liner | -c "<code>" | Writes code to /tmp/tmp-hachi.8, then runs hachi /tmp/tmp-hachi.8 -go |
Compiler invocation is performed via system() calls in main.cpp, constructing a clang++ command string.
3. Pipeline in HachiProgram::resolveProgram
HachiProgram::resolveProgram(filename, debug) performs the compilation pipeline:
-
AllOperators::init()- Global
opspointer is assigned to a newAllOperatorsinstance. - Operator tokens and their precedence are declared in
hachi/h/AllOperators.h.
- Global
-
populateHachiStdLib()- Populates the global namespace with built-in types and actions (standard library).
- Implemented in
hachi/src/HachiStdLib.cpp(large file).
-
Load source file:
SourceFile(filename, debug)- Loads file contents and appends a trailing newline (
hachi/src/SourceFile.cpp).
- Loads file contents and appends a trailing newline (
-
Lex:
lexString(file, tokens)- Converts source bytes to a vector of
Tokenobjects.
- Converts source bytes to a vector of
-
Parse:
astRoot = astNodeFromTokens(tokens, 0, tokens.size()-1)- Produces an AST rooted at an
AstNode.
- Produces an AST rooted at an
-
Type + name resolution pass:
astRoot->setInput(globalNamespace, true, Void, Void)- Sets up a root namespace input and recursively configures AST nodes with:
- namespace
dynamicflag- input types for each subtree
- Sets up a root namespace input and recursively configures AST nodes with:
-
Build Action tree:
actionRoot = astRoot->getAction()- Each AST node resolves into one or more
Actionnodes.
- Each AST node resolves into one or more
-
Final-destruction wrapper:
actionRoot = globalNamespace->wrapInDestroyer(actionRoot)- Ensures the overall program result is routed through
__destroy__if one exists.
- Ensures the overall program result is routed through
If any stage throws HachiError, it is logged and compilation continues in an error-safe state (often falling back to AstVoid).
4. Lexing (tokenization)
Implemented in hachi/src/Lexer.cpp.
4.1 Character classifier
CharClassifier categorizes each character into:
- whitespace (
' ','\t','\r') - newline (
'\n') and âline breakâ (';') - letters (
[A-Za-z_]) - digits (
[0-9]plus special handling for.if followed by a digit) - operator characters (any character that appears in any operator text in
ops->getOpsMap()) - string delimiter (
'"') - single-line comment (
'#') - block comment markers:
- start:
// - end: a backslash whose previous character is also backslash (
\\)
- start:
Note: âblock commentsâ in this snapshot are unusual: the start marker is
//, while the end condition is detected by\\(a double backslash). This is literal behavior inCharClassifier::get().
4.2 Token types
The lexer emits tokens of (conceptual) types:
IDENTIFIERLITERAL(numbers)STRING_LITERAL(content between quotes, escape processed)OPERATOR(later split into specificOperatorobjects)LINE_END(from newline or;)- comments are not emitted as tokens
4.3 Operator splitting
When a contiguous run is classified as OPERATOR, it is split by calling:
ops->get(tokenTxt, opMatches)(hachi/src/AllOperators.cpp)
This function greedily matches operator substrings from left to right by shrinking end until a known operator text is found, then continuing.
Each matched operator becomes its own Token carrying a pointer to the matched Operator.
4.4 String escapes
Within string literals, backslash escapes are handled:
\n,\t,\",\\supported- Any other
\Xthrows a source error
5. Operators and precedence
Operators are declared in hachi/h/AllOperators.h via the ALL_OPS macro.
Each operator has:
- textual spelling
- precedence integer
- associativity/input kind (
OperatorData::LEFT,RIGHT,BOTH) - overloadability flag
Operator list in this snapshot (text â precedence):
@5?6|6:24::24,35^36&&38=40!=40>50<50>=50<=50+61-61*71/71%70!74.81->83<-83>@90
6. Parsing (tokens â AST)
Implemented in hachi/src/Parser.cpp as astNodeFromTokens(tokens, left, right).
6.1 Bracket stripping and empty program
getOuterBracOffset()detects whether a token range is fully wrapped by matching brackets.- If so, the parser strips the outer brackets and re-parses the inside.
- If the resulting range is empty, returns
AstVoid::make().
6.2 AST node kinds
The parser produces AST nodes defined in hachi/h/AstNode.h:
| AST type | Represents |
|---|---|
AstVoid | empty / no-op |
AstLiteral | identifier token, number literal token, or string-literal token |
AstList | a semicolon/newline-separated list (sequence) |
AstExpression | a binary operator expression with left, center token, and right |
AstFunctionBody | ( ... ) or { ... } bodies parsed as function bodies |
AstType | a type literal or type construct |
AstTypeTuple | tuple type definition a:T, b:U, ... |
AstTypeArrow | function type LeftType -> RightType |
6.3 Lists (statement sequences)
At top level (or inside blocks), the parser looks for LINE_END boundaries and builds:
AstList(elements...)
Each element is parsed independently with astNodeFromTokens on its token subrange.
6.4 Operator parse
If the range is not a list:
- The parser calls
parseOperator(tokens, left, right). - It scans the range for the âbestâ operator using precedence rules and bracket-depth tracking:
- operators inside nested parentheses/brackets/braces are ignored
- operators whose input-direction conflicts with missing left/right operand are ignored
The chosen operator is called mainOp, and AstExpression(leftSubtree, mainOpToken, rightSubtree) is created.
6.5 Type parsing
Types are parsed by astNodeTypeFromTokens() and use a limited grammar:
identifierâAstTypereferencing a named type{ ... }â tuple type (AstTypeTuple)A -> Bâ function type (AstTypeArrow)
Tuple types are parsed from comma-separated name: Type pairs.
7. Namespace model and resolution
Namespaces are implemented by NamespaceData (hachi/h/Namespace.h, hachi/src/Namespace.cpp).
A Namespace is a shared_ptr<NamespaceData>.
7.1 What a namespace stores
A namespace has four âmapsâ (internally IdMap from string â vector of AST nodes):
types- nodes that represent types (meta-types)actions- statically resolved actions (pure, not tied to runtime variables)dynamicActions- actions tied to runtime state (variables, assignments)whatevActions- templates / AnyT actions that can be specialized at call time
A namespace also:
- points to a
StackFrame(shared among nested scopes inside a function) - points to a
parentnamespace (lexical scope chain) - maintains a
destructorActionslist used to auto-insert cleanup for variables
7.2 Stack frames and input registers (Li / Ri)
Each function-like scope has a StackFrame (see hachi/h/StackFrame.h, hachi/src/StackFrame.cpp).
NamespaceData::setInput(leftType, rightType):
- configures the stack frameâs input types
- injects special variables into the namespace:
Lifor left input (if creatable)Rifor right input (if creatable)
Both getter and setter actions for Li and Ri are added to the namespace.
7.3 Action lookup: getActionForTokenWithInput
The core resolver is:
NamespaceData::getActionForTokenWithInput(token, leftType, rightType, dynamic, throwOnError, tokenForError)
Resolution order:
- Look up nodes matching the token text (or operator text) in:
actions(always)dynamicActions(only ifdynamic == true)
- Convert those AST nodes to
Actions and keep only the ones whose input types match(leftType, rightType). - Special-case tuple field access:
- If
leftTypeis a tuple and the token is an identifier,getSubType(name)is checked. - If found and
rightTypeisVoid, a synthetic node is created forgetElemFromTupleAction(leftType, name).
- If
- If no exact matches, attempt AnyT specialization:
- Retrieve nodes from
whatevActions - Ask each node to
makeCopyWithSpecificTypes(leftType, rightType) - If a specialized copy exists, its action is used and the specialized node is cached into
actions.
- Retrieve nodes from
- Dynamic variable creation (only if not foundNodes):
- If
dynamic == true - token is an identifier
leftTypeisVoidrightTypeis creatable- then
addVar(rightType, tokenText)is invoked and the returned setter action is used.
- If
Errors:
- If multiple matches exist for the same signature: throws or returns null depending on
throwSourceError - If name exists but signature mismatch: âcorrect overload not foundâŠâ
- If name not found at all: ââ<name>â not foundâ
7.4 Variable creation: addVar
NamespaceData::addVar(type, name) allocates storage in the current stack frame and registers:
- a getter action
- a setter action
- (optionally) a copy action wrapper (if
__copy__exists) - (optionally) a destructor wrapper (if
__destroy__exists), appended intodestructorActions
Getter/setter selection differs depending on whether this variable is in the global stack frame or a local stack frame.
7.5 Automatic cleanup integration
For any scope, the namespace carries destructorActions for variables declared in that scope.
Later, when an AstList resolves into an Action list, it appends the namespaceâs destructorActions to the end of the statement sequence (see §9.2).
8. AST semantic pass: AstNodeBase::setInput
Every AST node supports:
setInput(Namespace ns, bool dynamic, Type leftIn, Type rightIn)
This pass:
- records the namespace and
dynamicsetting on the node - computes input types for children
- ensures type constructs and function bodies establish correct scoping
Key behavior by node type (from hachi/src/AstNode.cpp):
AstLiteral: stores ns/dynamic/inputs; no childrenAstExpression: sets child inputs based on operator and expected operand behaviorAstList: creates a child namespace (ns->makeChild()) and sets each element input to(Void, Void)AstFunctionBody: creates a child namespace with a new stack frame (ns->makeChildAndFrame(...)) and sets the body input to the functionâs parameter typesAstType*: configures type subtrees; produces meta-types
9. AST â Action conversion
Each AST node has getAction(), which lazily calls resolveAction() once and caches the resulting Action.
9.1 Action interface
Actions are shared_ptr<ActionData> (typedef in hachi/h/Action.h).
Each Action has:
Type getReturnType()Type getInLeftType(),getInRightType()void* execute(void* inLeft, void* inRight)(interpreter path)void addToProg(Action inLeft, Action inRight, CppProgram* prog)(C++ emission path)
9.2 AstList â ListAction
AstList::resolveAction():
- resolves each element into an Action
- appends destructor actions from the listâs namespace (
*nameSpace->getDestroyerActions()) - returns
listAction(actions, destroyers)fromhachi/src/Actions/ListAction.cpp
Execution semantics (ListAction::execute)
- Executes each action in order; frees intermediate results
- Preserves the last actionâs output as the list return value
- Executes all destroyers at the end and frees their outputs
- Returns the last action output (or null if void)
C++ codegen (ListAction::addToProg)
- Emits a scoped block
- If the list is the top-level of a non-main function and must return a value, it stores final expr into
-outand emitsreturn -out;
9.3 AstFunctionBody â FunctionAction
AstFunctionBody::resolveAction() builds:
functionAction(bodyNode, returnType, stackFrame) (hachi/src/Actions/FunctionAction.cpp)
FunctionAction supports two construction modes:
- from a concrete Action (already resolved)
- from an AST node + return type (lazy resolution)
Interpreter semantics
- Allocates a new stack buffer sized from the functionâs
StackFrame - Copies incoming left/right data into the stack frame offsets
- Executes the function body action
- Frees stack, restores prior stack pointer
C++ codegen
- Creates a C++ function in
CppProgramif not already present - Names are generated as:
%<hint>_hcand then suffixed with_1,_2, ⊠to avoid collisions (prog->hasFunc(name))
- The function definition emits the body action and returns the result if appropriate.
9.4 AstExpression â operator dispatch
AstExpression::resolveAction() is where most language semantics live.
Resolution is driven by the center token:
- If it is an operator token:
center->getOp() != nullptr - Else it is an identifier / literal token whose action is looked up by name
The general pattern is:
- resolve left/right actions
- call
NamespaceData::getActionForTokenWithInput(centerToken, leftType, rightType, dynamic, throwOnError, centerToken) - assemble those actions with
branchAction(...)as needed
9.4.1 Special forms implemented directly in AstExpression::resolveAction
These are hard-coded in hachi/src/AstNode.cpp:
- Import (
>@)
- Expects the right side to evaluate to a
Stringat compile time (must be anAstLiteralstring literal). - Loads module file:
this->getToken()->getFile()->getDirPath() + "/hMods/" + modName + ".h"
- Lexes and parses that file into
modAst - Calls
modAst->setInput(nameSpace, true, Void, Void) - Each element of the module (expected to be an
AstList) is added to the current namespace usingnameSpace->addNode(node, node->getToken()->getText()) - The import expression itself returns
voidAction.
- Dot access (
.)
- Requires the right operand to be an identifier literal (
AstLiteral). - Produces an action equivalent to: get field
namefrom left tuple.
- Conditional (
?)
- Always treats the left subtree as a Bool condition.
- The right subtree is either:
- a single if-body, producing
ifAction(condition, ifBody) - or an if/else pair encoded via
:inside the right subtree:cond ? (ifExpr : elseExpr)becomesifElseAction(condition, ifExpr, elseExpr)
- a single if-body, producing
- Loop (
@)
- Always treats left subtree as Bool condition.
- Right subtree:
- loop body action
- or a loop body + end-step pair encoded via
::cond @ (endExpr : loopExpr)becomesloopAction(condition, endExpr, loopExpr)- (the
endExpris executed after each iteration)
- Comma (
,) tuple creation
- Forms a tuple from comma-separated expressions.
- Flattening behavior:
- If the left or right subtree is itself a comma-expression, its tuple elements are concatenated.
- Generates
makeTupleAction(actions)(hachi/src/Actions/MakeTupleAction.cpp).
9.4.2 Everything else: overloaded operator + identifier resolution
If the center token is not one of the special forms above, resolution goes through:
nameSpace->getActionForTokenWithInput(...)
This is used for:
:,::, assignment behavior, and any stdlib-defined actions- arithmetic operators, comparisons, boolean operators
- calling named functions/actions (when
centeris an identifier token and left/right are the operand actions)
Because operator overloadability is recorded in OperatorData, the standard library can expose multiple overloads for the same operator text and rely on (leftType, rightType) matching to select the correct Action.
10. Action primitives (runtime + codegen)
Hachiâs âexecutable IRâ is the Action system in hachi/src/Actions/.
10.1 branchAction(...)
Defined in hachi/src/Actions/BranchAction.cpp.
Purpose:
- Compose an operator action with pre-computed left/right input-producing actions, while enforcing:
- input-producing actions take
(Void, Void)input - their return types match the operator actionâs input types
- input-producing actions take
Variants:
BranchAction(leftProducer, opAction, rightProducer)when both inputs are non-voidLeftBranchAction(leftProducer, opAction)when right is voidRightBranchAction(opAction, rightProducer)when left is void- returns
opActiondirectly if both producers return void
Interpreter behavior:
- Executes producer(s), then opAction, then frees producer outputs.
C++ codegen behavior:
- If needed, inserts tuple casts (
cppTupleCastAction) to match operand tuple types before emitting.
10.2 ListAction(...)
See §9.2.
10.3 Tuple operations
Defined in hachi/src/Actions/MakeTupleAction.cpp:
-
makeTupleAction(vector<Action>):- Builds a tuple return type by concatenating each source actionâs return type
- Interpreter: allocates a flat buffer of summed sizes and memcpyâs each element sequentially
- C++: emits a tuple constructor expression
TupleType(elem1, elem2, ...)
-
getElemFromTupleAction(tupleType, fieldName):- Interpreter: allocates field-sized buffer and memcpyâs bytes from the tuple by offset
- C++: emits
.fieldName, with optimization if the tuple source is aMakeTupleAction(it can emit the producing expression directly)
-
cppTupleCastAction(action, targetTupleType):- Codegen-only cast helper used when tuple shapes âmatchâ but names differ or a narrower/wider tuple is expected.
10.4 Control flow
Defined in:
hachi/src/Actions/IfAction.cpphachi/src/Actions/LoopAction.cpp
IfAction:
- statement form only (returns void)
- emits
if (cond) { ... }
IfElseAction:
- return type is the common type of the two branches if they match; otherwise
Void - if used in expression position (
prog->getExprLevel() > 0) and returns a value, emits C++ ternarycond ? a : b - otherwise emits statement
if/elseblocks and discards return values if needed
LoopAction:
- statement form only (returns void)
- emits
while (cond) { loopBody; endAction; }(endAction optional)
11. Type system (as used by the transpiler)
Types are shared_ptr<TypeBase> (hachi/h/Type.h, hachi/src/Type.cpp).
11.1 Primitive kinds
TypeBase::PrimitiveType includes:
UNKNOWN,VOID,BYTE,DUB,INT,PTR,BOOL,TUPLE,WHATEV,METATYPE
Key globals (declared extern):
Unknown,AnyT,Void,Bool,Byte,Int,Flt(DUB),String(non-const extern in this snapshot)
11.2 Tuple layout
Tuple values are a flat byte concatenation of element values in declared order.
Offsets are computed as the running sum of prior element sizes (TupleType::getSubType()).
There is no alignment padding inserted by the tuple type implementation; the tupleâs getSize() is the sum of element sizes.
11.3 Meta-types
Types are represented in the AST as nodes whose return type is TypeBase::METATYPE and whose getSubType() yields the real type.
This is how AstType nodes communicate type objects through the AST/action system.
12. C++ output generation (CppProgram)
Actions emit C++ by calling methods on CppProgram (hachi/h/CppProgram.h, hachi/src/CppProgram.cpp).
Key ideas:
CppProgrammaintains:- global include/code sections
- global variable declarations
- global type declarations
- a map of generated functions (
funcs) - an âactive functionâ being emitted
- It uses a
CppNameContainerhierarchy to map Hachi identifiers to collision-safe C++ identifiers.
Hachi âhardcoded namesâ in codegen often use special prefixes like:
- global funcs:
$ - global vars:
* - local vars:
- - types:
{...}
(These conventions are documented in comments in CppProgram.h.)
13. Cross-reference: language constructs â implementation
This section maps major Hachi constructs (as implemented) to the AST and Action machinery.
13.1 Core constructs
| Construct | Parsed as | Resolved as |
|---|---|---|
| Program / block sequence | AstList | ListAction via listAction(actions, destroyers) |
| Identifier / numeric literal / string literal | AstLiteral | resolveLiteral(...) or getActionForTokenWithInput(...) (depends on token type) |
| Binary operator expression | AstExpression | usually branchAction(leftProducer, opAction, rightProducer) |
| Tuple creation via comma | nested AstExpression with , | makeTupleAction(flattenedActions) |
Tuple field access a.b | AstExpression with . | getElemFromTupleAction(type(a), "b") (codegen uses .b) |
If statement cond ? body | AstExpression with ? | ifAction(condAction, bodyAction) |
If-else cond ? (if : else) | AstExpression ? with right : | ifElseAction(cond, ifAct, elseAct) |
While loop cond @ body | AstExpression with @ | loopAction(cond, body) |
While loop with end-step cond @ (end : body) | AstExpression @ with right : | loopAction(cond, end, body) |
Import >@ "mod" | AstExpression with >@ | module load + namespace injection, returns voidAction |
13.2 Type constructs
| Construct | Parsed as | Output |
|---|---|---|
Named type Int | AstType | meta-type whose subtype is resolved from namespace |
Tuple type {a:Int, b:String} | AstTypeTuple | tuple Type built from fields |
Arrow type A -> B | AstTypeArrow | function type encoded as tuple types (implementation-defined in Type.cpp) |
14. Standard library integration
The compiler calls populateHachiStdLib() during program resolution (HachiProgram::resolveProgram).
What it does (structurally):
- Creates the root namespace (
globalNamespace, declared extern) - Adds:
- core types (including meta-types)
- operator actions for built-in operators
- helper actions like
__destroy__and__copy__for selected types - module-related entries (as used by the import system)
Because stdlib population is the source of most actions/operators, understanding HachiStdLib.cpp is required to fully enumerate built-in operators and functions. This spec describes how the compiler consumes those definitions; it does not attempt to restate stdlib contents.
15. Observable invariants and âgotchasâ (from implementation)
These are not opinions; they are direct consequences of the snapshotâs code paths:
-
The lexer does not flush a final token at EOF
lexString()pushes tokens only when a type transition occurs. TheSourceFileconstructor appends a newline, which forces a flush via a transition at the end, preventing missing-final-token in practice. -
Tuple memory layout is packed without padding
Tuple element offsets are computed by summinggetSize(); no alignment is applied. -
Dynamic variable creation is signature-gated
A new variable is created only when resolving an identifier with(leftType=Void, rightType=creatable)in a dynamic scope and the identifier was not found in any namespace map. -
Import expects a compile-time string literal
The import operator loads a file only when the right operand is a string literal node. -
Block comments have nonstandard terminators in this snapshot
Start marker is//and end is detected by a\\pattern. (Single-line#comments also exist.)
Appendix A - File index for further study
Core pipeline:
hachi/src/main.cpphachi/src/HachiProgram.cpphachi/src/Lexer.cpphachi/src/Parser.cpphachi/src/AstNode.cpphachi/src/Namespace.cpphachi/src/Type.cpphachi/src/CppProgram.cpp
Action implementations:
hachi/src/Actions/BranchAction.cpphachi/src/Actions/ListAction.cpphachi/src/Actions/FunctionAction.cpphachi/src/Actions/IfAction.cpphachi/src/Actions/LoopAction.cpphachi/src/Actions/MakeTupleAction.cpp