# L Language v0.1 实现计划 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** 用 C + LLVM 构建 L Language 编译器的完整 v0.1(计算器级别 → 斐波那契可执行文件) **Architecture:** 经典 5 阶段编译器流水线 — 词法分析(Token) → 语法解析(AST) → 语义分析(类型标注) → LLVM IR 生成 → 链接成可执行文件。手写递归下降+Pratt解析器,Arena 内存池简化内存管理。 **Tech Stack:** C17, GCC (MinGW), CMake ≥ 3.20, LLVM 19.x (C API), Windows 11 --- ## 文件蓝图 ``` L Language/ ├── CMakeLists.txt 构建系统 ├── include/ │ └── l_lang.h 全局类型定义(Token、AST、类型枚举共享) ├── src/ │ ├── util/ │ │ ├── arena.h 内存池声明 │ │ └── arena.c 内存池实现(bump allocator) │ ├── lexer/ │ │ ├── token.h Token 结构体 + 创建函数 │ │ ├── token.c Token 实现 │ │ ├── lexer.h 词法分析器声明 │ │ └── lexer.c 核心词法分析(状态机) │ ├── ast/ │ │ ├── ast.h AST 节点类型枚举 + 结构体 │ │ └── ast.c 节点创建/销毁 │ ├── parser/ │ │ ├── parser.h 解析器声明 │ │ └── parser.c 递归下降 + Pratt 表达式解析 │ ├── sema/ │ │ ├── symbol.h 符号表声明 │ │ ├── symbol.c 符号表实现(作用域链) │ │ ├── sema.h 语义分析声明 │ │ └── sema.c 类型推断 + 类型检查 │ ├── codegen/ │ │ ├── codegen.h 代码生成声明 │ │ └── codegen.c AST → LLVM IR │ ├── driver/ │ │ ├── error.h 错误报告声明 │ │ ├── error.c 错误格式化输出 │ │ └── main.c 入口 + 命令行 + 流水线串联 ├── test/ │ ├── test_lexer.c 词法分析单元测试 │ ├── test_parser.c 语法分析单元测试 │ ├── test_sema.c 语义分析单元测试 │ ├── test_utils.h 测试断言宏 │ └── programs/ │ ├── 01_arithmetic.l 四则运算 + print_i64 │ ├── 02_if_else.l if/else 控制流 │ ├── 03_while.l 递归 + if 控制流组合 │ ├── 04_fib_recursive.l 斐波那契递归 │ └── 05_float.l 浮点运算 + 多函数调用 ``` --- ### Task 1: 项目骨架和构建系统 **Files:** - Create: `CMakeLists.txt` - Create: `include/l_lang.h` - Create: `src/util/arena.h`, `src/util/arena.c` - Create: `src/driver/error.h`, `src/driver/error.c` - [ ] **Step 1: 编写 CMakeLists.txt** ```cmake cmake_minimum_required(VERSION 3.20) project(l_lang C) set(CMAKE_C_STANDARD 17) set(CMAKE_C_STANDARD_REQUIRED ON) # 查找 LLVM find_package(LLVM 19 REQUIRED CONFIG) message(STATUS "LLVM found: ${LLVM_DIR}") message(STATUS "LLVM includes: ${LLVM_INCLUDE_DIRS}") message(STATUS "LLVM libraries: ${LLVM_AVAILABLE_LIBS}") # 收集源文件 file(GLOB_RECURSE L_LANG_SOURCES "src/*.c") # 编译器可执行文件 add_executable(l_lang ${L_LANG_SOURCES}) # 包含目录 target_include_directories(l_lang PRIVATE ${CMAKE_SOURCE_DIR}/include ${LLVM_INCLUDE_DIRS} src/util src/lexer src/ast src/parser src/sema src/codegen src/driver ) target_link_libraries(l_lang LLVM ) # 编译选项 target_compile_options(l_lang PRIVATE -Wall -Wextra -g) # 测试可执行文件 file(GLOB TEST_SOURCES "test/test_*.c") add_executable(l_lang_test ${TEST_SOURCES} ${L_LANG_SOURCES}) target_include_directories(l_lang_test PRIVATE ${CMAKE_SOURCE_DIR}/include ${LLVM_INCLUDE_DIRS} src/util src/lexer src/ast src/parser src/sema src/codegen src/driver test ) target_link_libraries(l_lang_test LLVM) ``` - [ ] **Step 2: 编写公共头文件 `include/l_lang.h`** ```c #ifndef L_LANG_H #define L_LANG_H #include #include #include // === 类型系统 === typedef enum { TYPE_I64, TYPE_F64, TYPE_BOOL, TYPE_VOID, TYPE_UNKNOWN, // 尚未推断 TYPE_ERROR, // 类型错误 } TypeKind; static inline const char* type_name(TypeKind kind) { switch (kind) { case TYPE_I64: return "i64"; case TYPE_F64: return "f64"; case TYPE_BOOL: return "bool"; case TYPE_VOID: return "void"; default: return ""; } } // === 向前声明 === typedef struct Token Token; typedef struct AstNode AstNode; typedef struct Scope Scope; typedef struct Arena Arena; // === 跨模块分配器接口(避免循环依赖,各模块通过 void* 使用 arena)=== void* arena_alloc_impl(void* alloc, size_t size); char* arena_strdup_impl(void* alloc, const char* src, size_t len); #endif ``` - [ ] **Step 3: 编写 Arena 内存池 `src/util/arena.h`** ```c #ifndef ARENA_H #define ARENA_H #include typedef struct { char* memory; size_t capacity; size_t offset; } Arena; Arena arena_create(size_t capacity_mb); void arena_destroy(Arena* a); void* arena_alloc(Arena* a, size_t size); char* arena_strdup(Arena* a, const char* src); #endif ``` - [ ] **Step 4: 编写 Arena 实现 `src/util/arena.c`** ```c #include "arena.h" #include #include Arena arena_create(size_t capacity_mb) { Arena a; a.capacity = capacity_mb * 1024 * 1024; a.memory = (char*)malloc(a.capacity); a.offset = 0; return a; } void arena_destroy(Arena* a) { free(a->memory); a->memory = NULL; a->capacity = 0; a->offset = 0; } void* arena_alloc(Arena* a, size_t size) { size = (size + 7) & ~7; // 8 字节对齐 if (a->offset + size > a->capacity) return NULL; void* ptr = a->memory + a->offset; a->offset += size; return ptr; } char* arena_strdup(Arena* a, const char* src) { size_t len = strlen(src) + 1; char* dst = arena_alloc(a, len); memcpy(dst, src, len); return dst; } ``` - [ ] **Step 5: 编写错误报告 `src/driver/error.h`** ```c #ifndef ERROR_H #define ERROR_H #include typedef struct { const char* message; const char* filename; int line; int col; } ErrorInfo; typedef struct { ErrorInfo* errors; size_t count; size_t capacity; } ErrorList; void error_init(ErrorList* list); void error_add(ErrorList* list, const char* filename, int line, int col, const char* fmt, ...); void error_print(const ErrorList* list); #endif ``` - [ ] **Step 6: 编写错误实现 `src/driver/error.c`** ```c #include "error.h" #include #include #include void error_init(ErrorList* list) { list->capacity = 8; list->errors = malloc(list->capacity * sizeof(ErrorInfo)); list->count = 0; } void error_add(ErrorList* list, const char* filename, int line, int col, const char* fmt, ...) { if (list->count >= list->capacity) { list->capacity *= 2; list->errors = realloc(list->errors, list->capacity * sizeof(ErrorInfo)); } char buf[512]; va_list args; va_start(args, fmt); vsnprintf(buf, sizeof(buf), fmt, args); va_end(args); list->errors[list->count++] = (ErrorInfo){ .message = strdup(buf), .filename = filename, .line = line, .col = col, }; } void error_print(const ErrorList* list) { for (size_t i = 0; i < list->count; i++) { ErrorInfo* e = &list->errors[i]; fprintf(stderr, "\033[1;31m错误:\033[0m %s:%d:%d: %s\n", e->filename, e->line, e->col, e->message); } } ``` - [ ] **Step 7: 配置构建并验证编译** ```bash cd "D:\Code\doing_exercises\programs\L Language" mkdir -p build && cd build cmake .. -G "MinGW Makefiles" -DCMAKE_PREFIX_PATH="D:\settings\Language\LLVM" 2>&1 ``` Expected: cmake 配置成功,输出 "LLVM found"。 - [ ] **Step 8: 构建骨架项目** ```bash mingw32-make -j4 ``` Expected: 编译通过,生成 `l_lang.exe`(暂时无功能)。 --- ### Task 2: Token 数据结构 **Files:** - Create: `src/lexer/token.h`, `src/lexer/token.c` - [ ] **Step 1: 编写 `src/lexer/token.h`** ```c #ifndef TOKEN_H #define TOKEN_H #include "l_lang.h" // === Token 类型枚举 === typedef enum { // 关键字 TOK_FN, TOK_LET, TOK_IF, TOK_ELSE, TOK_WHILE, TOK_RETURN, // 类型关键字 TOK_I64, TOK_F64, TOK_BOOL, TOK_VOID, // 字面量 TOK_INT_LIT, TOK_FLOAT_LIT, TOK_TRUE, TOK_FALSE, // 标识符 TOK_IDENT, // 运算符 TOK_PLUS, TOK_MINUS, TOK_STAR, TOK_SLASH, TOK_PERCENT, TOK_EQ, TOK_EQ_EQ, TOK_BANG_EQ, TOK_LT, TOK_GT, TOK_LT_EQ, TOK_GT_EQ, TOK_AND_AND, TOK_PIPE_PIPE, TOK_BANG, TOK_ARROW, // 分隔符 TOK_LPAREN, TOK_RPAREN, TOK_LBRACE, TOK_RBRACE, TOK_COMMA, TOK_COLON, TOK_SEMICOLON, TOK_ASSIGN, // 特殊 TOK_EOF, TOK_ERROR, } TokenKind; // === Token 结构体 === struct Token { TokenKind kind; const char* start; // 指向源码中 token 起始位置 int length; // token 文本长度 int line; int col; }; // === 工具函数 === const char* tok_name(TokenKind kind); bool tok_is_type(TokenKind kind); // 从 Token 提取值(必须在同一个 arena 中) int64_t tok_int_value(const Token* tok); double tok_float_value(const Token* tok); #endif ``` - [ ] **Step 2: 编写 `src/lexer/token.c`** ```c #include "token.h" #include #include #include #include static const char* NAMES[] = { [TOK_FN] = "fn", [TOK_LET] = "let", [TOK_IF] = "if", [TOK_ELSE] = "else", [TOK_WHILE] = "while", [TOK_RETURN] = "return", [TOK_I64] = "i64", [TOK_F64] = "f64", [TOK_BOOL] = "bool", [TOK_VOID] = "void", [TOK_INT_LIT] = "整数", [TOK_FLOAT_LIT] = "浮点数", [TOK_TRUE] = "true", [TOK_FALSE] = "false", [TOK_IDENT] = "标识符", [TOK_PLUS] = "+", [TOK_MINUS] = "-", [TOK_STAR] = "*", [TOK_SLASH] = "/", [TOK_PERCENT] = "%", [TOK_EQ] = "=", [TOK_EQ_EQ] = "==", [TOK_BANG_EQ] = "!=", [TOK_LT] = "<", [TOK_GT] = ">", [TOK_LT_EQ] = "<=", [TOK_GT_EQ] = ">=", [TOK_AND_AND] = "&&", [TOK_PIPE_PIPE] = "||", [TOK_BANG] = "!", [TOK_ARROW] = "->", [TOK_LPAREN] = "(", [TOK_RPAREN] = ")", [TOK_LBRACE] = "{", [TOK_RBRACE] = "}", [TOK_COMMA] = ",", [TOK_COLON] = ":", [TOK_SEMICOLON] = ";", [TOK_ASSIGN] = "=", [TOK_EOF] = "EOF", [TOK_ERROR] = "错误", }; const char* tok_name(TokenKind kind) { return NAMES[kind]; } bool tok_is_type(TokenKind kind) { return kind == TOK_I64 || kind == TOK_F64 || kind == TOK_BOOL || kind == TOK_VOID; } int64_t tok_int_value(const Token* tok) { char buf[32]; memcpy(buf, tok->start, tok->length); buf[tok->length] = '\0'; return strtoll(buf, NULL, 10); } double tok_float_value(const Token* tok) { char buf[64]; memcpy(buf, tok->start, tok->length); buf[tok->length] = '\0'; return strtod(buf, NULL); } ``` --- ### Task 3: 词法分析器 **Files:** - Create: `src/lexer/lexer.h`, `src/lexer/lexer.c` - [ ] **Step 1: 编写 `src/lexer/lexer.h`** ```c #ifndef LEXER_H #define LEXER_H #include "token.h" #include "arena.h" #include "error.h" // 返回 Token 数组(分配在 arena 中),*count 为数量。 // 如遇错误,error 被填充并返回 NULL。 Token* lex(Arena* a, const char* source, const char* filename, size_t* count, ErrorInfo* error); #endif ``` - [ ] **Step 2: 实现核心词法分析器 `src/lexer/lexer.c`** ```c #include "lexer.h" #include #include typedef struct { const char* src; const char* filename; int pos; int line; int col; } Lexer; static char peek(const Lexer* l) { return l->src[l->pos]; } static char peek_next(const Lexer* l) { return l->src[l->pos + 1]; } static void advance(Lexer* l) { if (l->src[l->pos] == '\n') { l->line++; l->col = 1; } else { l->col++; } l->pos++; } static void skip_whitespace(Lexer* l) { while (1) { char c = peek(l); if (c == ' ' || c == '\t' || c == '\r' || c == '\n') { advance(l); continue; } if (c == '/' && peek_next(l) == '/') { while (peek(l) != '\n' && peek(l) != '\0') advance(l); continue; } if (c == '/' && peek_next(l) == '*') { advance(l); advance(l); while (peek(l) != '\0' && !(peek(l) == '*' && peek_next(l) == '/')) advance(l); if (peek(l) != '\0') { advance(l); advance(l); } // skip */ continue; } break; } } static Token make_token(Lexer* l, TokenKind kind, int start_pos, int len) { return (Token){.kind = kind, .start = l->src + start_pos, .length = len, .line = l->line, .col = l->col}; } static Token lex_number(Lexer* l) { int start = l->pos; bool is_float = false; TokenKind kind = TOK_INT_LIT; while (isdigit(peek(l))) advance(l); if (peek(l) == '.') { is_float = true; kind = TOK_FLOAT_LIT; advance(l); while (isdigit(peek(l))) advance(l); } return make_token(l, kind, start, l->pos - start); } static TokenKind check_keyword(const Token* tok) { #define KW(s, k) if (tok->length == sizeof(s)-1 && memcmp(tok->start, s, sizeof(s)-1) == 0) return k KW("fn", TOK_FN); KW("let", TOK_LET); KW("if", TOK_IF); KW("else", TOK_ELSE); KW("while", TOK_WHILE); KW("return", TOK_RETURN); KW("i64", TOK_I64); KW("f64", TOK_F64); KW("bool", TOK_BOOL); KW("void", TOK_VOID); KW("true", TOK_TRUE); KW("false", TOK_FALSE); #undef KW return TOK_IDENT; } static Token lex_ident_or_keyword(Lexer* l) { int start = l->pos; while (isalnum(peek(l)) || peek(l) == '_') advance(l); Token t = make_token(l, TOK_IDENT, start, l->pos - start); t.kind = check_keyword(&t); return t; } Token* lex(Arena* a, const char* source, const char* filename, size_t* count, ErrorInfo* error) { Lexer l = {.src = source, .filename = filename, .pos = 0, .line = 1, .col = 1}; // 预估容量:源码长度的 1/3 size_t cap = strlen(source) / 3 + 16; Token* tokens = arena_alloc(a, cap * sizeof(Token)); size_t idx = 0; while (peek(&l) != '\0') { skip_whitespace(&l); if (peek(&l) == '\0') break; int line = l.line, col = l.col; char c = peek(&l); #define TOK(k) (tokens[idx++] = make_token(&l, k, l.pos, 1), advance(&l)) #define TOK2(k, len) (tokens[idx++] = make_token(&l, k, l.pos, len), advance(&l); advance(&l)) if (isdigit(c)) { tokens[idx++] = lex_number(&l); } else if (isalpha(c) || c == '_') { tokens[idx++] = lex_ident_or_keyword(&l); } else if (c == '+' && peek_next(&l) != '=') TOK(TOK_PLUS) else if (c == '-' && peek_next(&l) != '>') TOK(TOK_MINUS) else if (c == '-' && peek_next(&l) == '>') TOK2(TOK_ARROW, 2) else if (c == '*') TOK(TOK_STAR) else if (c == '/') TOK(TOK_SLASH) else if (c == '%') TOK(TOK_PERCENT) else if (c == '=' && peek_next(&l) == '=') TOK2(TOK_EQ_EQ, 2) else if (c == '=') TOK(TOK_ASSIGN) else if (c == '!' && peek_next(&l) == '=') TOK2(TOK_BANG_EQ, 2) else if (c == '!') TOK(TOK_BANG) else if (c == '<' && peek_next(&l) == '=') TOK2(TOK_LT_EQ, 2) else if (c == '<') TOK(TOK_LT) else if (c == '>' && peek_next(&l) == '=') TOK2(TOK_GT_EQ, 2) else if (c == '>') TOK(TOK_GT) else if (c == '&' && peek_next(&l) == '&') TOK2(TOK_AND_AND, 2) else if (c == '|' && peek_next(&l) == '|') TOK2(TOK_PIPE_PIPE, 2) else if (c == '(') TOK(TOK_LPAREN) else if (c == ')') TOK(TOK_RPAREN) else if (c == '{') TOK(TOK_LBRACE) else if (c == '}') TOK(TOK_RBRACE) else if (c == ',') TOK(TOK_COMMA) else if (c == ':') TOK(TOK_COLON) else if (c == ';') TOK(TOK_SEMICOLON) else { *error = (ErrorInfo){ .message = "无法识别的字符", .filename = filename, .line = line, .col = col }; return NULL; } #undef TOK #undef TOK2 } tokens[idx++] = make_token(&l, TOK_EOF, l.pos, 0); *count = idx; return tokens; } ``` - [ ] **Step 3: 编写词法分析测试 `test/test_lexer.c`** ```c #include "test_utils.h" #include "lexer.h" #include "arena.h" void test_simple_tokens() { Arena a = arena_create(1); const char* src = "fn main() { return 42; }"; size_t count; ErrorInfo error = {0}; Token* tokens = lex(&a, src, "test", &count, &error); ASSERT(tokens != NULL); ASSERT(count >= 8); ASSERT(tokens[0].kind == TOK_FN); ASSERT(tokens[1].kind == TOK_IDENT); ASSERT(tokens[2].kind == TOK_LPAREN); ASSERT(tokens[3].kind == TOK_RPAREN); ASSERT(tokens[4].kind == TOK_LBRACE); ASSERT(tokens[5].kind == TOK_RETURN); ASSERT(tokens[6].kind == TOK_INT_LIT); ASSERT(tok_int_value(&tokens[6]) == 42); arena_destroy(&a); } void test_keywords() { Arena a = arena_create(1); const char* src = "fn let if else while return i64 f64 bool void true false"; TokenKind expected[] = {TOK_FN, TOK_LET, TOK_IF, TOK_ELSE, TOK_WHILE, TOK_RETURN, TOK_I64, TOK_F64, TOK_BOOL, TOK_VOID, TOK_TRUE, TOK_FALSE, TOK_EOF}; size_t count; ErrorInfo error = {0}; Token* tokens = lex(&a, src, "test", &count, &error); ASSERT(tokens != NULL); for (int i = 0; i < 13; i++) ASSERT(tokens[i].kind == expected[i]); arena_destroy(&a); } void test_operators() { Arena a = arena_create(1); const char* src = "+ - * / % == != < > <= >= && || ! ->"; TokenKind expected[] = {TOK_PLUS, TOK_MINUS, TOK_STAR, TOK_SLASH, TOK_PERCENT, TOK_EQ_EQ, TOK_BANG_EQ, TOK_LT, TOK_GT, TOK_LT_EQ, TOK_GT_EQ, TOK_AND_AND, TOK_PIPE_PIPE, TOK_BANG, TOK_ARROW, TOK_EOF}; size_t count; ErrorInfo error = {0}; Token* tokens = lex(&a, src, "test", &count, &error); ASSERT(tokens != NULL); for (int i = 0; i < 16; i++) ASSERT(tokens[i].kind == expected[i]); arena_destroy(&a); } int main(void) { TEST_RUN(test_simple_tokens); TEST_RUN(test_keywords); TEST_RUN(test_operators); return test_summary(); } ``` - [ ] **Step 4: 编写测试工具宏 `test/test_utils.h`** ```c #ifndef TEST_UTILS_H #define TEST_UTILS_H #include #include static int _tests_run = 0; static int _tests_failed = 0; #define ASSERT(expr) do { \ _tests_run++; \ if (!(expr)) { \ fprintf(stderr, "\033[1;31mFAIL\033[0m %s:%d: %s\n", __FILE__, __LINE__, #expr); \ _tests_failed++; \ } \ } while(0) #define TEST_RUN(func) do { \ fprintf(stderr, " RUN %s\n", #func); \ func(); \ } while(0) static inline int test_summary(void) { fprintf(stderr, "\n%d tests, %d passed, %d failed\n", _tests_run, _tests_run - _tests_failed, _tests_failed); return _tests_failed > 0 ? 1 : 0; } #endif ``` - [ ] **Step 5: 构建并运行词法测试** ```bash cd "D:\Code\doing_exercises\programs\L Language\build" cmake .. -G "MinGW Makefiles" -DCMAKE_PREFIX_PATH="D:\settings\Language\LLVM" mingw32-make -j4 ./l_lang_test.exe ``` Expected: 3 个测试全部 PASS。 --- ### Task 4: AST 数据结构 **Files:** - Create: `src/ast/ast.h`, `src/ast/ast.c` - [ ] **Step 1: 编写 `src/ast/ast.h`** ```c #ifndef AST_H #define AST_H #include "l_lang.h" #include typedef enum { AST_PROGRAM, AST_FUNCTION, AST_PARAMETER, AST_BLOCK, AST_LET_STMT, AST_IF_STMT, AST_WHILE_STMT, AST_RETURN_STMT, AST_EXPR_STMT, AST_BINARY_EXPR, AST_UNARY_EXPR, AST_CALL_EXPR, AST_LITERAL_EXPR, AST_IDENT_EXPR, } AstKind; typedef enum { OP_ADD, OP_SUB, OP_MUL, OP_DIV, OP_MOD, OP_EQ, OP_NE, OP_LT, OP_GT, OP_LE, OP_GE, OP_AND, OP_OR, OP_NEG, OP_NOT, } BinaryOp; // 类型信息(语义分析阶段填充) typedef struct { TypeKind kind; } TypeInfo; // AST 节点 struct AstNode { AstKind kind; TypeInfo type; // 语义分析后填充,默认为 TYPE_UNKNOWN int line; // 源文件行号 int col; // 源文件列号 // 节点特有数据(按 kind 解释) union { // AST_PROGRAM struct { AstNode** functions; size_t fn_count; } program; // AST_FUNCTION struct { const char* name; AstNode** params; size_t param_count; TypeKind return_type; AstNode* body; } function; // AST_PARAMETER struct { const char* name; TypeKind type; } parameter; // AST_BLOCK struct { AstNode** stmts; size_t stmt_count; } block; // AST_LET_STMT struct { const char* name; AstNode* init; } let_stmt; // AST_IF_STMT struct { AstNode* cond; AstNode* then_block; AstNode* else_block; } if_stmt; // AST_WHILE_STMT struct { AstNode* cond; AstNode* body; } while_stmt; // AST_RETURN_STMT struct { AstNode* expr; } return_stmt; // AST_EXPR_STMT struct { AstNode* expr; } expr_stmt; // AST_BINARY_EXPR struct { BinaryOp op; AstNode* left; AstNode* right; } binary; // AST_UNARY_EXPR struct { BinaryOp op; AstNode* operand; } unary; // AST_CALL_EXPR struct { const char* name; AstNode** args; size_t arg_count; } call; // AST_LITERAL_EXPR struct { TypeKind lit_type; union { int64_t i64_val; double f64_val; bool bool_val; }; } literal; // AST_IDENT_EXPR struct { const char* name; } ident; } as; }; // 创建节点的辅助函数(内存来自 arena) AstNode* ast_make_program(void* alloc, AstNode** fns, size_t count, int line, int col); AstNode* ast_make_function(void* alloc, const char* name, AstNode** params, size_t pcount, TypeKind ret, AstNode* body, int line, int col); AstNode* ast_make_parameter(void* alloc, const char* name, TypeKind type, int line, int col); AstNode* ast_make_block(void* alloc, AstNode** stmts, size_t count, int line, int col); AstNode* ast_make_let(void* alloc, const char* name, AstNode* init, int line, int col); AstNode* ast_make_if(void* alloc, AstNode* cond, AstNode* then_b, AstNode* else_b, int line, int col); AstNode* ast_make_while(void* alloc, AstNode* cond, AstNode* body, int line, int col); AstNode* ast_make_return(void* alloc, AstNode* expr, int line, int col); AstNode* ast_make_expr_stmt(void* alloc, AstNode* expr, int line, int col); AstNode* ast_make_binary(void* alloc, BinaryOp op, AstNode* left, AstNode* right, int line, int col); AstNode* ast_make_unary(void* alloc, BinaryOp op, AstNode* operand, int line, int col); AstNode* ast_make_call(void* alloc, const char* name, AstNode** args, size_t count, int line, int col); AstNode* ast_make_literal_i64(void* alloc, int64_t val, int line, int col); AstNode* ast_make_literal_f64(void* alloc, double val, int line, int col); AstNode* ast_make_literal_bool(void* alloc, bool val, int line, int col); AstNode* ast_make_ident(void* alloc, const char* name, int line, int col); #endif ``` - [ ] **Step 2: 编写 `src/ast/ast.c`** ```c #include "ast.h" #include "arena.h" #include // === 跨模块分配器(供 parser.c、symbol.c 等复用)=== void* arena_alloc_impl(void* alloc, size_t sz) { return arena_alloc((Arena*)alloc, sz); } char* arena_strdup_impl(void* alloc, const char* src, size_t len) { char* dst = arena_alloc_impl(alloc, len + 1); memcpy(dst, src, len); dst[len] = '\0'; return dst; } // 使用宏简化节点创建 #define NEW(alloc, kind) AstNode* n = (AstNode*)arena_alloc_impl(alloc, sizeof(AstNode)); \ n->kind = (kind); n->type.kind = TYPE_UNKNOWN; \ n->line = line; n->col = col static void* arena_alloc_impl(void* alloc, size_t sz); // forward AstNode* ast_make_program(void* alloc, AstNode** fns, size_t count, int line, int col) { NEW(alloc, AST_PROGRAM); n->as.program.functions = fns; n->as.program.fn_count = count; return n; } AstNode* ast_make_function(void* alloc, const char* name, AstNode** params, size_t pcount, TypeKind ret, AstNode* body, int line, int col) { NEW(alloc, AST_FUNCTION); n->as.function.name = name; n->as.function.params = params; n->as.function.param_count = pcount; n->as.function.return_type = ret; n->as.function.body = body; return n; } AstNode* ast_make_parameter(void* alloc, const char* name, TypeKind type, int line, int col) { NEW(alloc, AST_PARAMETER); n->as.parameter.name = name; n->as.parameter.type = type; return n; } AstNode* ast_make_block(void* alloc, AstNode** stmts, size_t count, int line, int col) { NEW(alloc, AST_BLOCK); n->as.block.stmts = stmts; n->as.block.stmt_count = count; return n; } AstNode* ast_make_let(void* alloc, const char* name, AstNode* init, int line, int col) { NEW(alloc, AST_LET_STMT); n->as.let_stmt.name = name; n->as.let_stmt.init = init; return n; } AstNode* ast_make_if(void* alloc, AstNode* cond, AstNode* then_b, AstNode* else_b, int line, int col) { NEW(alloc, AST_IF_STMT); n->as.if_stmt.cond = cond; n->as.if_stmt.then_block = then_b; n->as.if_stmt.else_block = else_b; return n; } AstNode* ast_make_while(void* alloc, AstNode* cond, AstNode* body, int line, int col) { NEW(alloc, AST_WHILE_STMT); n->as.while_stmt.cond = cond; n->as.while_stmt.body = body; return n; } AstNode* ast_make_return(void* alloc, AstNode* expr, int line, int col) { NEW(alloc, AST_RETURN_STMT); n->as.return_stmt.expr = expr; return n; } AstNode* ast_make_expr_stmt(void* alloc, AstNode* expr, int line, int col) { NEW(alloc, AST_EXPR_STMT); n->as.expr_stmt.expr = expr; return n; } AstNode* ast_make_binary(void* alloc, BinaryOp op, AstNode* left, AstNode* right, int line, int col) { NEW(alloc, AST_BINARY_EXPR); n->as.binary.op = op; n->as.binary.left = left; n->as.binary.right = right; return n; } AstNode* ast_make_unary(void* alloc, BinaryOp op, AstNode* operand, int line, int col) { NEW(alloc, AST_UNARY_EXPR); n->as.unary.op = op; n->as.unary.operand = operand; return n; } AstNode* ast_make_call(void* alloc, const char* name, AstNode** args, size_t count, int line, int col) { NEW(alloc, AST_CALL_EXPR); n->as.call.name = name; n->as.call.args = args; n->as.call.arg_count = count; return n; } AstNode* ast_make_literal_i64(void* alloc, int64_t val, int line, int col) { NEW(alloc, AST_LITERAL_EXPR); n->as.literal.lit_type = TYPE_I64; n->as.literal.i64_val = val; n->type.kind = TYPE_I64; return n; } AstNode* ast_make_literal_f64(void* alloc, double val, int line, int col) { NEW(alloc, AST_LITERAL_EXPR); n->as.literal.lit_type = TYPE_F64; n->as.literal.f64_val = val; n->type.kind = TYPE_F64; return n; } AstNode* ast_make_literal_bool(void* alloc, bool val, int line, int col) { NEW(alloc, AST_LITERAL_EXPR); n->as.literal.lit_type = TYPE_BOOL; n->as.literal.bool_val = val; n->type.kind = TYPE_BOOL; return n; } AstNode* ast_make_ident(void* alloc, const char* name, int line, int col) { NEW(alloc, AST_IDENT_EXPR); n->as.ident.name = name; return n; } ``` - [ ] **Step 3: 验证编译** ```bash cd "D:\Code\doing_exercises\programs\L Language\build" mingw32-make -j4 ``` Expected: 编译通过(含 AST 模块)。 --- ### Task 5: 语法分析器 — 表达式(Pratt Parser) **Files:** - Create: `src/parser/parser.h`, `src/parser/parser.c` - [ ] **Step 1: 编写 `src/parser/parser.h`** ```c #ifndef PARSER_H #define PARSER_H #include "ast.h" #include "token.h" #include "error.h" // 解析 Token 数组,返回 Program 节点(内存来自 arena)。 // 出错时 error 被填充并返回 NULL。 AstNode* parse(Arena* a, const Token* tokens, size_t count, const char* filename, ErrorInfo* error); #endif ``` - [ ] **Step 2: 实现解析器 `src/parser/parser.c`(第一部分:Pratt 表达式解析)** ```c #include "parser.h" #include #include typedef struct { const Token* tokens; size_t count; size_t pos; const char* filename; Arena* arena; } Parser; // === 向前看 === static const Token* peek(const Parser* p) { return &p->tokens[p->pos]; } static const Token* peek_n(const Parser* p, int n) { return &p->tokens[p->pos + n]; } static const Token* advance(Parser* p) { return &p->tokens[p->pos++]; } static bool match(Parser* p, TokenKind k) { if (peek(p)->kind == k) { p->pos++; return true; } return false; } static const Token* expect(Parser* p, TokenKind k, ErrorInfo* e, const char* msg) { if (peek(p)->kind == k) return advance(p); e->message = msg; e->filename = p->filename; e->line = peek(p)->line; e->col = peek(p)->col; return NULL; } // === 运算符优先级定义 === typedef enum { PREC_NONE = 0, PREC_ASSIGN = 10, PREC_OR = 20, PREC_AND = 30, PREC_COMPARE = 40, // == != < > <= >= PREC_TERM = 50, // + - PREC_FACTOR = 60, // * / % PREC_UNARY = 70, // - ! PREC_CALL = 80, } Precedence; static Precedence tok_to_prec(TokenKind kind) { switch (kind) { case TOK_PIPE_PIPE: return PREC_OR; case TOK_AND_AND: return PREC_AND; case TOK_EQ_EQ: case TOK_BANG_EQ: case TOK_LT: case TOK_GT: case TOK_LT_EQ: case TOK_GT_EQ: return PREC_COMPARE; case TOK_PLUS: case TOK_MINUS: return PREC_TERM; case TOK_STAR: case TOK_SLASH: case TOK_PERCENT: return PREC_FACTOR; default: return PREC_NONE; } } static BinaryOp tok_to_binop(TokenKind kind) { switch (kind) { case TOK_PLUS: return OP_ADD; case TOK_MINUS: return OP_SUB; case TOK_STAR: return OP_MUL; case TOK_SLASH: return OP_DIV; case TOK_PERCENT: return OP_MOD; case TOK_EQ_EQ: return OP_EQ; case TOK_BANG_EQ: return OP_NE; case TOK_LT: return OP_LT; case TOK_GT: return OP_GT; case TOK_LT_EQ: return OP_LE; case TOK_GT_EQ: return OP_GE; case TOK_AND_AND: return OP_AND; case TOK_PIPE_PIPE: return OP_OR; default: return OP_ADD; } } // 向前声明 static AstNode* parse_expr(Parser* p, ErrorInfo* error); static AstNode* parse_expr_prec(Parser* p, Precedence prec, ErrorInfo* error); static AstNode* parse_block(Parser* p, ErrorInfo* error); // === 前缀解析:nil、前缀一元运算符、基本表达式 === static AstNode* parse_unary(Parser* p, ErrorInfo* error) { const Token* op = advance(p); AstNode* operand = parse_expr_prec(p, PREC_UNARY, error); if (!operand) return NULL; BinaryOp uop = (op->kind == TOK_MINUS) ? OP_NEG : OP_NOT; return ast_make_unary(p->arena, uop, operand, op->line, op->col); } static AstNode* parse_group(Parser* p, ErrorInfo* error) { advance(p); // 跳过 ( AstNode* expr = parse_expr(p, error); if (!expr) return NULL; if (!expect(p, TOK_RPAREN, error, "缺少 ')'")) return NULL; return expr; } static AstNode* parse_literal(Parser* p) { const Token* t = advance(p); switch (t->kind) { case TOK_INT_LIT: return ast_make_literal_i64(p->arena, tok_int_value(t), t->line, t->col); case TOK_FLOAT_LIT: return ast_make_literal_f64(p->arena, tok_float_value(t), t->line, t->col); case TOK_TRUE: return ast_make_literal_bool(p->arena, true, t->line, t->col); case TOK_FALSE: return ast_make_literal_bool(p->arena, false, t->line, t->col); default: return NULL; } } static AstNode* parse_ident_or_call(Parser* p, ErrorInfo* error) { const Token* name = advance(p); if (match(p, TOK_LPAREN)) { // 函数调用 AstNode* args[16]; int arg_count = 0; while (peek(p)->kind != TOK_RPAREN && !error->message) { args[arg_count] = parse_expr(p, error); arg_count++; if (peek(p)->kind == TOK_COMMA) advance(p); } if (!expect(p, TOK_RPAREN, error, "缺少 ')'")) return NULL; AstNode** arg_arr = arena_alloc_impl(p->arena, arg_count * sizeof(AstNode*)); memcpy(arg_arr, args, arg_count * sizeof(AstNode*)); return ast_make_call(p->arena, arena_strdup_impl(p->arena, name->start, name->length), arg_arr, arg_count, name->line, name->col); } return ast_make_ident(p->arena, arena_strdup_impl(p->arena, name->start, name->length), name->line, name->col); } // === Pratt 主循环 === static AstNode* parse_expr_prec(Parser* p, Precedence min_prec, ErrorInfo* error) { const Token* tok = peek(p); AstNode* left = NULL; // 前缀解析 if (tok->kind == TOK_MINUS || tok->kind == TOK_BANG) { left = parse_unary(p, error); } else if (tok->kind == TOK_LPAREN) { left = parse_group(p, error); } else if (tok->kind == TOK_INT_LIT || tok->kind == TOK_FLOAT_LIT || tok->kind == TOK_TRUE || tok->kind == TOK_FALSE) { left = parse_literal(p); } else if (tok->kind == TOK_IDENT) { left = parse_ident_or_call(p, error); } else { error->message = "无法识别的表达式"; error->filename = p->filename; error->line = tok->line; error->col = tok->col; return NULL; } if (!left) return NULL; // 中缀解析循环 while (!error->message) { TokenKind kind = peek(p)->kind; Precedence prec = tok_to_prec(kind); if (prec <= min_prec) break; const Token* op = advance(p); AstNode* right = parse_expr_prec(p, prec, error); if (!right) return NULL; left = ast_make_binary(p->arena, tok_to_binop(kind), left, right, op->line, op->col); } return left; } static AstNode* parse_expr(Parser* p, ErrorInfo* error) { return parse_expr_prec(p, PREC_NONE, error); } ``` - [ ] **Step 3: 解析器第二部分:语句解析** ```c // === 语句解析(续 parser.c)=== static bool is_type_token(TokenKind k) { return k == TOK_I64 || k == TOK_F64 || k == TOK_BOOL || k == TOK_VOID; } static TypeKind token_to_type(TokenKind k) { switch (k) { case TOK_I64: return TYPE_I64; case TOK_F64: return TYPE_F64; case TOK_BOOL: return TYPE_BOOL; default: return TYPE_VOID; } } static AstNode* parse_statement(Parser* p, ErrorInfo* error); static AstNode* parse_block(Parser* p, ErrorInfo* error) { const Token* open = peek(p); if (!expect(p, TOK_LBRACE, error, "缺少 '{'")) return NULL; AstNode* stmts[256]; int count = 0; while (peek(p)->kind != TOK_RBRACE && peek(p)->kind != TOK_EOF && !error->message) { // 块级表达式作为最后一条语句 if (peek(p)->kind == TOK_RBRACE) break; AstNode* s = parse_statement(p, error); if (!s) return NULL; stmts[count++] = s; } if (!expect(p, TOK_RBRACE, error, "缺少 '}'")) return NULL; AstNode** arr = arena_alloc_impl(p->arena, count * sizeof(AstNode*)); memcpy(arr, stmts, count * sizeof(AstNode*)); return ast_make_block(p->arena, arr, count, open->line, open->col); } static AstNode* parse_statement(Parser* p, ErrorInfo* error) { const Token* t = peek(p); if (t->kind == TOK_LET) { advance(p); const Token* name = expect(p, TOK_IDENT, error, "let 后应为变量名"); if (!name) return NULL; if (!expect(p, TOK_ASSIGN, error, "缺少 '='")) return NULL; AstNode* init = parse_expr(p, error); if (!init) return NULL; if (!expect(p, TOK_SEMICOLON, error, "缺少 ';'")) return NULL; return ast_make_let(p->arena, arena_strdup_impl(p->arena, name->start, name->length), init, t->line, t->col); } if (t->kind == TOK_IF) { advance(p); AstNode* cond = parse_expr(p, error); if (!cond) return NULL; AstNode* then_block = parse_block(p, error); if (!then_block) return NULL; AstNode* else_block = NULL; if (match(p, TOK_ELSE)) { if (peek(p)->kind == TOK_IF) { else_block = parse_statement(p, error); } else { else_block = parse_block(p, error); } if (!else_block) return NULL; } return ast_make_if(p->arena, cond, then_block, else_block, t->line, t->col); } if (t->kind == TOK_WHILE) { advance(p); AstNode* cond = parse_expr(p, error); if (!cond) return NULL; AstNode* body = parse_block(p, error); if (!body) return NULL; return ast_make_while(p->arena, cond, body, t->line, t->col); } if (t->kind == TOK_RETURN) { advance(p); // void return if (match(p, TOK_SEMICOLON)) { return ast_make_return(p->arena, NULL, t->line, t->col); } AstNode* expr = parse_expr(p, error); if (!expr) return NULL; if (!expect(p, TOK_SEMICOLON, error, "缺少 ';'")) return NULL; return ast_make_return(p->arena, expr, t->line, t->col); } // 表达式语句 if (peek(p)->kind == TOK_IDENT && peek_n(p, 1)->kind == TOK_LPAREN) { // 函数调用表达式语句 AstNode* expr = parse_expr(p, error); if (!expr) return NULL; if (!expect(p, TOK_SEMICOLON, error, "缺少 ';'")) return NULL; return ast_make_expr_stmt(p->arena, expr, t->line, t->col); } // 表达式语句(不常见的非函数调用表达式语句) AstNode* expr = parse_expr(p, error); if (!expr) return NULL; if (!expect(p, TOK_SEMICOLON, error, "缺少 ';'")) return NULL; return ast_make_expr_stmt(p->arena, expr, t->line, t->col); } // === 函数和程序解析 === static AstNode* parse_function(Parser* p, ErrorInfo* error) { const Token* fn_tok = advance(p); // fn const Token* name = expect(p, TOK_IDENT, error, "fn 后应为函数名"); if (!name) return NULL; if (!expect(p, TOK_LPAREN, error, "缺少 '('")) return NULL; // 参数列表 AstNode* params[64]; int pcount = 0; while (peek(p)->kind != TOK_RPAREN && !error->message) { const Token* pname = expect(p, TOK_IDENT, error, "参数名"); if (!pname) return NULL; if (!expect(p, TOK_COLON, error, "缺少 ':'")) return NULL; const Token* ptype = advance(p); if (!is_type_token(ptype->kind)) { error->message = "无效的参数类型"; error->filename = p->filename; error->line = ptype->line; error->col = ptype->col; return NULL; } params[pcount++] = ast_make_parameter(p->arena, arena_strdup_impl(p->arena, pname->start, pname->length), token_to_type(ptype->kind), pname->line, pname->col); if (match(p, TOK_COMMA)) continue; } if (!expect(p, TOK_RPAREN, error, "缺少 ')'")) return NULL; // 返回类型 TypeKind ret = TYPE_VOID; if (match(p, TOK_ARROW)) { const Token* rt = advance(p); if (!is_type_token(rt->kind)) { error->message = "无效的返回类型"; error->filename = p->filename; error->line = rt->line; error->col = rt->col; return NULL; } ret = token_to_type(rt->kind); } AstNode* body = parse_block(p, error); if (!body) return NULL; AstNode** parr = arena_alloc_impl(p->arena, pcount * sizeof(AstNode*)); memcpy(parr, params, pcount * sizeof(AstNode*)); return ast_make_function(p->arena, arena_strdup_impl(p->arena, name->start, name->length), parr, pcount, ret, body, fn_tok->line, fn_tok->col); } AstNode* parse(Arena* a, const Token* tokens, size_t count, const char* filename, ErrorInfo* error) { Parser p = {.tokens = tokens, .count = count, .pos = 0, .filename = filename, .arena = a}; AstNode* functions[256]; int fn_count = 0; while (peek(&p)->kind != TOK_EOF && !error->message) { functions[fn_count++] = parse_function(&p, error); } if (error->message) return NULL; AstNode** arr = arena_alloc_impl(a, fn_count * sizeof(AstNode*)); memcpy(arr, functions, fn_count * sizeof(AstNode*)); return ast_make_program(a, arr, fn_count, 0, 0); } ``` - [ ] **Step 4: 编写解析器测试 `test/test_parser.c`** ```c #include "test_utils.h" #include "parser.h" #include "lexer.h" #include "arena.h" static AstNode* parse_string(const char* src, ErrorInfo* error) { Arena a = arena_create(1); size_t tcount; Token* tokens = lex(&a, src, "test", &tcount, error); if (!tokens) { arena_destroy(&a); return NULL; } AstNode* ast = parse(&a, tokens, tcount, "test", error); // Note: arena 必须保持存活直到 AST 不再需要 return ast; } void test_simple_function() { ErrorInfo error = {0}; AstNode* ast = parse_string("fn main() { return 42; }", &error); ASSERT(ast != NULL); ASSERT(ast->kind == AST_PROGRAM); ASSERT(ast->as.program.fn_count == 1); AstNode* fn = ast->as.program.functions[0]; ASSERT(fn->kind == AST_FUNCTION); } void test_arithmetic_expr() { ErrorInfo error = {0}; AstNode* ast = parse_string("fn main() { return 1 + 2 * 3; }", &error); ASSERT(ast != NULL); AstNode* body = ast->as.program.functions[0]->as.function.body; AstNode* ret = body->as.block.stmts[0]; ASSERT(ret->kind == AST_RETURN_STMT); AstNode* expr = ret->as.return_stmt.expr; ASSERT(expr->kind == AST_BINARY_EXPR); ASSERT(expr->as.binary.op == OP_ADD); } void test_if_statement() { ErrorInfo error = {0}; AstNode* ast = parse_string("fn main() { if true { return 1; } else { return 0; } }", &error); ASSERT(ast != NULL); } int main(void) { TEST_RUN(test_simple_function); TEST_RUN(test_arithmetic_expr); TEST_RUN(test_if_statement); return test_summary(); } ``` - [ ] **Step 5: 构建并运行解析测试** ```bash cd "D:\Code\doing_exercises\programs\L Language\build" mingw32-make -j4 ./l_lang_test.exe ``` Expected: 词法 3 个 + 解析 3 个 = 6 个测试 PASS。 --- ### Task 6: 语义分析 — 符号表 **Files:** - Create: `src/sema/symbol.h`, `src/sema/symbol.c` - [ ] **Step 1: 编写 `src/sema/symbol.h`** ```c #ifndef SYMBOL_H #define SYMBOL_H #include "l_lang.h" #include "ast.h" typedef enum { SYM_VARIABLE, SYM_PARAMETER, SYM_FUNCTION } SymbolKind; typedef struct Symbol { const char* name; SymbolKind kind; TypeKind type; // 变量/参数的类型 // 函数特有 TypeKind return_type; TypeKind* param_types; size_t param_count; // 链表(同一作用域内的下一个符号) struct Symbol* next; } Symbol; typedef struct Scope { Symbol* head; // 符号链表头 struct Scope* parent; // 上级作用域 } Scope; // 创建新作用域(子作用域) Scope* scope_new(void* alloc, Scope* parent); // 在当前作用域及其父作用域中查找符号 Symbol* scope_lookup(const Scope* scope, const char* name); // 在当前作用域中插入符号(重复插入返回 NULL) Symbol* scope_insert(Scope* scope, void* alloc, const char* name, SymbolKind kind, TypeKind type); // 插入函数符号 Symbol* scope_insert_function(Scope* scope, void* alloc, const char* name, TypeKind ret, TypeKind* pt, size_t pc); #endif ``` - [ ] **Step 2: 实现 `src/sema/symbol.c`** ```c #include "symbol.h" #include Scope* scope_new(void* alloc, Scope* parent) { Scope* s = (Scope*)arena_alloc_impl(alloc, sizeof(Scope)); s->head = NULL; s->parent = parent; return s; } Symbol* scope_lookup(const Scope* scope, const char* name) { for (const Scope* s = scope; s; s = s->parent) { for (Symbol* sym = s->head; sym; sym = sym->next) { if (strcmp(sym->name, name) == 0) return sym; } } return NULL; } Symbol* scope_insert(Scope* scope, void* alloc, const char* name, SymbolKind kind, TypeKind type) { if (scope_lookup(scope, name)) return NULL; // 检查当前 scope 链 Symbol* sym = (Symbol*)arena_alloc_impl(alloc, sizeof(Symbol)); sym->name = name; sym->kind = kind; sym->type = type; sym->next = scope->head; scope->head = sym; return sym; } Symbol* scope_insert_function(Scope* scope, void* alloc, const char* name, TypeKind ret, TypeKind* pt, size_t pc) { if (scope_lookup(scope, name)) return NULL; Symbol* sym = (Symbol*)arena_alloc_impl(alloc, sizeof(Symbol)); sym->name = name; sym->kind = SYM_FUNCTION; sym->type = TYPE_VOID; sym->return_type = ret; sym->param_types = pt; sym->param_count = pc; sym->next = scope->head; scope->head = sym; return sym; } ``` --- ### Task 7: 语义分析 — 类型推断和检查 **Files:** - Create: `src/sema/sema.h`, `src/sema/sema.c` - [ ] **Step 1: 编写 `src/sema/sema.h`** ```c #ifndef SEMA_H #define SEMA_H #include "ast.h" #include "error.h" #include "symbol.h" // 对 AST 进行语义分析(类型推断 + 类型检查) // 为每个节点填充 type 字段,错误收集到 errors 列表中。 // 参数 arena 用于作用域分配。 void sema_analyze(AstNode* ast, ErrorList* errors, Arena* arena); #endif ``` - [ ] **Step 2: 实现语义分析 `src/sema/sema.c`** ```c #include "sema.h" #include // === 类型关系 === static TypeKind promote(TypeKind a, TypeKind b) { if (a == TYPE_F64 || b == TYPE_F64) return TYPE_F64; if (a == TYPE_I64 || b == TYPE_I64) return TYPE_I64; if (a == TYPE_BOOL || b == TYPE_BOOL) return TYPE_BOOL; return TYPE_ERROR; } static bool is_numeric(TypeKind t) { return t == TYPE_I64 || t == TYPE_F64; } static bool is_comparable(TypeKind a, TypeKind b) { return a == b; } // === 向前声明 === static void analyze_node(AstNode* node, Scope* scope, ErrorList* errors, Arena* a); // === 检查单个节点 === static void analyze_expr(AstNode* node, Scope* scope, ErrorList* errors, Arena* a) { switch (node->kind) { case AST_LITERAL_EXPR: // 类型已在创建时设置,无需处理 break; case AST_IDENT_EXPR: { Symbol* sym = scope_lookup(scope, node->as.ident.name); if (!sym) { error_add(errors, "", node->line, node->col, "未定义的变量 '%s'", node->as.ident.name); node->type.kind = TYPE_ERROR; } else { node->type.kind = sym->type; } break; } case AST_UNARY_EXPR: { analyze_expr(node->as.unary.operand, scope, errors, a); TypeKind inner = node->as.unary.operand->type.kind; if (node->as.unary.op == OP_NEG && !is_numeric(inner)) { error_add(errors, "", node->line, node->col, "一元 '-' 只能用于数值类型"); node->type.kind = TYPE_ERROR; } else if (node->as.unary.op == OP_NOT && inner != TYPE_BOOL) { error_add(errors, "", node->line, node->col, "'!' 只能用于布尔类型"); node->type.kind = TYPE_ERROR; } else { node->type.kind = inner; } break; } case AST_BINARY_EXPR: { analyze_expr(node->as.binary.left, scope, errors, a); analyze_expr(node->as.binary.right, scope, errors, a); TypeKind l = node->as.binary.left->type.kind; TypeKind r = node->as.binary.right->type.kind; switch (node->as.binary.op) { case OP_ADD: case OP_SUB: case OP_MUL: case OP_DIV: case OP_MOD: if (!is_numeric(l) || !is_numeric(r)) { error_add(errors, "", node->line, node->col, "算术运算需要数值类型"); node->type.kind = TYPE_ERROR; } else { node->type.kind = promote(l, r); } break; case OP_EQ: case OP_NE: case OP_LT: case OP_GT: case OP_LE: case OP_GE: if (!is_comparable(l, r)) { error_add(errors, "", node->line, node->col, "类型 '%s' 和 '%s' 无法比较", type_name(l), type_name(r)); node->type.kind = TYPE_ERROR; } else { node->type.kind = TYPE_BOOL; } break; case OP_AND: case OP_OR: if (l != TYPE_BOOL || r != TYPE_BOOL) { error_add(errors, "", node->line, node->col, "逻辑运算需要布尔类型"); node->type.kind = TYPE_ERROR; } else { node->type.kind = TYPE_BOOL; } break; default: break; } break; } case AST_CALL_EXPR: { Symbol* sym = scope_lookup(scope, node->as.call.name); if (!sym || sym->kind != SYM_FUNCTION) { error_add(errors, "", node->line, node->col, "未定义的函数 '%s'", node->as.call.name); node->type.kind = TYPE_ERROR; break; } if (node->as.call.arg_count != sym->param_count) { error_add(errors, "", node->line, node->col, "函数 '%s' 需要 %zu 个参数,但提供了 %zu 个", node->as.call.name, sym->param_count, node->as.call.arg_count); node->type.kind = TYPE_ERROR; break; } for (size_t i = 0; i < node->as.call.arg_count; i++) { analyze_expr(node->as.call.args[i], scope, errors, a); if (node->as.call.args[i]->type.kind != sym->param_types[i]) { error_add(errors, "", node->line, node->col, "参数 %zu 类型不匹配: 期望 '%s',得到 '%s'", i + 1, type_name(sym->param_types[i]), type_name(node->as.call.args[i]->type.kind)); } } node->type.kind = sym->return_type; break; } default: break; } } static void analyze_node(AstNode* node, Scope* scope, ErrorList* errors, Arena* a) { if (!node) return; switch (node->kind) { case AST_PROGRAM: // 第一遍:收集所有函数签名 for (size_t i = 0; i < node->as.program.fn_count; i++) { AstNode* fn = node->as.program.functions[i]; TypeKind* pts = (TypeKind*)arena_alloc_impl(a, fn->as.function.param_count * sizeof(TypeKind)); for (size_t j = 0; j < fn->as.function.param_count; j++) { pts[j] = fn->as.function.params[j]->as.parameter.type; } scope_insert_function(scope, a, fn->as.function.name, fn->as.function.return_type, pts, fn->as.function.param_count); } // 第二遍:分析每个函数体 for (size_t i = 0; i < node->as.program.fn_count; i++) { analyze_node(node->as.program.functions[i], scope, errors, a); } break; case AST_FUNCTION: { Scope* fn_scope = scope_new(a, scope); // 注册参数 for (size_t i = 0; i < node->as.function.param_count; i++) { AstNode* p = node->as.function.params[i]; scope_insert(fn_scope, a, p->as.parameter.name, SYM_PARAMETER, p->as.parameter.type); } analyze_node(node->as.function.body, fn_scope, errors, a); break; } case AST_BLOCK: for (size_t i = 0; i < node->as.block.stmt_count; i++) { analyze_node(node->as.block.stmts[i], scope, errors, a); } break; case AST_LET_STMT: { analyze_expr(node->as.let_stmt.init, scope, errors, a); TypeKind inferred = node->as.let_stmt.init->type.kind; node->type.kind = inferred; if (!scope_insert(scope, a, node->as.let_stmt.name, SYM_VARIABLE, inferred)) { error_add(errors, "", node->line, node->col, "变量 '%s' 重复定义", node->as.let_stmt.name); } break; } case AST_IF_STMT: analyze_expr(node->as.if_stmt.cond, scope, errors, a); if (node->as.if_stmt.cond->type.kind != TYPE_BOOL) { error_add(errors, "", node->line, node->col, "if 条件必须是布尔类型"); } analyze_node(node->as.if_stmt.then_block, scope, errors, a); if (node->as.if_stmt.else_block) { analyze_node(node->as.if_stmt.else_block, scope, errors, a); } break; case AST_WHILE_STMT: analyze_expr(node->as.while_stmt.cond, scope, errors, a); if (node->as.while_stmt.cond->type.kind != TYPE_BOOL) { error_add(errors, "", node->line, node->col, "while 条件必须是布尔类型"); } analyze_node(node->as.while_stmt.body, scope, errors, a); break; case AST_RETURN_STMT: if (node->as.return_stmt.expr) { analyze_expr(node->as.return_stmt.expr, scope, errors, a); node->type.kind = node->as.return_stmt.expr->type.kind; } break; case AST_EXPR_STMT: analyze_expr(node->as.expr_stmt.expr, scope, errors, a); break; default: analyze_expr(node, scope, errors, a); break; } } void sema_analyze(AstNode* ast, ErrorList* errors, Arena* arena) { Scope* global = scope_new(arena, NULL); // 注册内置函数 TypeKind params_i64[] = {TYPE_I64}; scope_insert_function(global, arena, "print_i64", TYPE_VOID, params_i64, 1); TypeKind params_f64[] = {TYPE_F64}; scope_insert_function(global, arena, "print_f64", TYPE_VOID, params_f64, 1); TypeKind params_bool[] = {TYPE_BOOL}; scope_insert_function(global, arena, "print_bool", TYPE_VOID, params_bool, 1); analyze_node(ast, global, errors, arena); } ``` - [ ] **Step 3: 编写语义分析测试 `test/test_sema.c`** ```c #include "test_utils.h" #include "parser.h" #include "lexer.h" #include "sema.h" #include "arena.h" void test_type_error() { Arena a = arena_create(1); size_t tc; ErrorInfo lex_err = {0}; Token* toks = lex(&a, "fn main() { let x = 1; let y = x + true; return; }", "test", &tc, &lex_err); ASSERT(toks != NULL); ErrorInfo parse_err = {0}; AstNode* ast = parse(&a, toks, tc, "test", &parse_err); ASSERT(ast != NULL); ErrorList errors; error_init(&errors); sema_analyze(ast, &errors, &a); ASSERT(errors.count > 0); arena_destroy(&a); } void test_undefined_var() { Arena a = arena_create(1); size_t tc; ErrorInfo lex_err = {0}; Token* toks = lex(&a, "fn main() { let x = y; return; }", "test", &tc, &lex_err); ASSERT(toks != NULL); ErrorInfo parse_err = {0}; AstNode* ast = parse(&a, toks, tc, "test", &parse_err); ASSERT(ast != NULL); ErrorList errors; error_init(&errors); sema_analyze(ast, &errors, &a); ASSERT(errors.count > 0); arena_destroy(&a); } int main(void) { TEST_RUN(test_type_error); TEST_RUN(test_undefined_var); return test_summary(); } ``` - [ ] **Step 4: 构建并运行语义测试** ```bash cd "D:\Code\doing_exercises\programs\L Language\build" mingw32-make -j4 ./l_lang_test.exe ``` Expected: 词法 3 + 解析 3 + 语义 2 = 8 个测试 PASS。 --- ### Task 8: LLVM 代码生成 — 基础设施和表达式 **Files:** - Create: `src/codegen/codegen.h`, `src/codegen/codegen.c` - [ ] **Step 1: 编写 `src/codegen/codegen.h`** ```c #ifndef CODEGEN_H #define CODEGEN_H #include "ast.h" // 生成 LLVM Module。模块已 verify,可直接 dump 或写入文件。 // 出错时返回 NULL 并设置 *error_msg。 LLVMModuleRef codegen_module(AstNode* ast, const char* module_name, const char** error_msg); #endif ``` - [ ] **Step 2: 实现代码生成 `src/codegen/codegen.c`** ```c #include "codegen.h" #include #include #include #include #include // === 内部状态 === typedef struct { LLVMModuleRef module; LLVMBuilderRef builder; // 符号表:变量名 → alloca 地址 struct VarEntry { const char* name; LLVMValueRef alloca; struct VarEntry* next; } *var_table; const char* error; // 内置函数声明 LLVMValueRef fn_print_i64; LLVMValueRef fn_print_f64; LLVMValueRef fn_print_bool; // 已声明的所有 L 函数(名称→LLVMValue) struct FnEntry { const char* name; LLVMValueRef fn; TypeKind ret; TypeKind* params; size_t pc; struct FnEntry* next; } *fn_table; } CgCtx; // === 类型映射 === static LLVMTypeRef to_llvm_type(TypeKind kind) { switch (kind) { case TYPE_I64: return LLVMInt64Type(); case TYPE_F64: return LLVMDoubleType(); case TYPE_BOOL: return LLVMInt1Type(); default: return LLVMVoidType(); } } static LLVMValueRef to_llvm_const(LLVMTypeRef ty, AstNode* lit) { switch (lit->as.literal.lit_type) { case TYPE_I64: return LLVMConstInt(ty, (unsigned long long)lit->as.literal.i64_val, true); case TYPE_F64: return LLVMConstReal(ty, lit->as.literal.f64_val); case TYPE_BOOL: return LLVMConstInt(ty, lit->as.literal.bool_val ? 1 : 0, false); default: return NULL; } } // === 变量表 === static LLVMValueRef find_var(CgCtx* ctx, const char* name) { for (struct VarEntry* e = ctx->var_table; e; e = e->next) if (strcmp(e->name, name) == 0) return e->alloca; return NULL; } static void add_var(CgCtx* ctx, const char* name, LLVMValueRef alloca) { struct VarEntry* e = malloc(sizeof(*e)); e->name = name; e->alloca = alloca; e->next = ctx->var_table; ctx->var_table = e; } // === 函数表 === static LLVMValueRef find_fn(CgCtx* ctx, const char* name) { for (struct FnEntry* e = ctx->fn_table; e; e = e->next) if (strcmp(e->name, name) == 0) return e->fn; return NULL; } // === 向前声明 === static LLVMValueRef codegen_expr(CgCtx* ctx, AstNode* node); static void codegen_stmt(CgCtx* ctx, AstNode* node); // === 注册内置函数 === static void register_builtins(CgCtx* ctx) { // print_i64 LLVMTypeRef pi64_args[] = {LLVMInt64Type()}; LLVMTypeRef pi64_ty = LLVMFunctionType(LLVMVoidType(), pi64_args, 1, false); LLVMValueRef pi64 = LLVMAddFunction(ctx->module, "__builtin_print_i64", pi64_ty); // print_f64 LLVMTypeRef pf64_args[] = {LLVMDoubleType()}; LLVMTypeRef pf64_ty = LLVMFunctionType(LLVMVoidType(), pf64_args, 1, false); LLVMValueRef pf64 = LLVMAddFunction(ctx->module, "__builtin_print_f64", pf64_ty); // print_bool LLVMTypeRef pb_args[] = {LLVMInt1Type()}; LLVMTypeRef pb_ty = LLVMFunctionType(LLVMVoidType(), pb_args, 1, false); LLVMValueRef pb = LLVMAddFunction(ctx->module, "__builtin_print_bool", pb_ty); ctx->fn_print_i64 = pi64; ctx->fn_print_f64 = pf64; ctx->fn_print_bool = pb; } // === 表达式代码生成 === static LLVMValueRef codegen_expr(CgCtx* ctx, AstNode* node) { switch (node->kind) { case AST_LITERAL_EXPR: return to_llvm_const(to_llvm_type(node->type.kind), node); case AST_IDENT_EXPR: { LLVMValueRef ptr = find_var(ctx, node->as.ident.name); return LLVMBuildLoad2(ctx->builder, to_llvm_type(node->type.kind), ptr, "load"); } case AST_UNARY_EXPR: { LLVMValueRef operand = codegen_expr(ctx, node->as.unary.operand); if (node->as.unary.op == OP_NEG) { if (node->type.kind == TYPE_F64) return LLVMBuildFNeg(ctx->builder, operand, "fneg"); else return LLVMBuildNeg(ctx->builder, operand, "ineg"); } else { return LLVMBuildNot(ctx->builder, operand, "not"); } } case AST_BINARY_EXPR: { LLVMValueRef l = codegen_expr(ctx, node->as.binary.left); LLVMValueRef r = codegen_expr(ctx, node->as.binary.right); bool is_float = node->type.kind == TYPE_F64; #define B(op_name, iop, fop) \ if (is_float) return LLVMBuild##fop(ctx->builder, l, r, op_name); \ else return LLVMBuild##iop(ctx->builder, l, r, op_name) switch (node->as.binary.op) { case OP_ADD: B("add", Add, FAdd); case OP_SUB: B("sub", Sub, FSub); case OP_MUL: B("mul", Mul, FMul); case OP_DIV: if (is_float) return LLVMBuildFDiv(ctx->builder, l, r, "fdiv"); else return LLVMBuildSDiv(ctx->builder, l, r, "sdiv"); case OP_MOD: return LLVMBuildSRem(ctx->builder, l, r, "srem"); case OP_EQ: B("eq", ICmp, FCmp); case OP_NE: B("ne", ICmp, FCmp); case OP_LT: B("lt", ICmp, FCmp); case OP_GT: B("gt", ICmp, FCmp); case OP_LE: B("le", ICmp, FCmp); case OP_GE: B("ge", ICmp, FCmp); case OP_AND: return LLVMBuildAnd(ctx->builder, l, r, "and"); case OP_OR: return LLVMBuildOr(ctx->builder, l, r, "or"); default: return NULL; } #undef B } case AST_CALL_EXPR: { LLVMValueRef fn = find_fn(ctx, node->as.call.name); LLVMValueRef args[32]; for (size_t i = 0; i < node->as.call.arg_count; i++) { args[i] = codegen_expr(ctx, node->as.call.args[i]); } return LLVMBuildCall2(ctx->builder, LLVMGetReturnType(LLVMTypeOf(fn)) == LLVMVoidType() ? LLVMVoidType() : LLVMGlobalGetValueType(fn), fn, args, (unsigned)node->as.call.arg_count, "call"); } default: return NULL; } } // === 语句代码生成 === static void codegen_stmt(CgCtx* ctx, AstNode* node) { switch (node->kind) { case AST_LET_STMT: { LLVMValueRef init_val = codegen_expr(ctx, node->as.let_stmt.init); LLVMValueRef alloca = LLVMBuildAlloca(ctx->builder, to_llvm_type(node->type.kind), node->as.let_stmt.name); LLVMBuildStore(ctx->builder, init_val, alloca); add_var(ctx, node->as.let_stmt.name, alloca); break; } case AST_EXPR_STMT: codegen_expr(ctx, node->as.expr_stmt.expr); break; case AST_RETURN_STMT: if (node->as.return_stmt.expr) { LLVMValueRef val = codegen_expr(ctx, node->as.return_stmt.expr); LLVMBuildRet(ctx->builder, val); } else { LLVMBuildRetVoid(ctx->builder); } break; case AST_BLOCK: for (size_t i = 0; i < node->as.block.stmt_count; i++) { codegen_stmt(ctx, node->as.block.stmts[i]); } break; case AST_IF_STMT: { LLVMValueRef cond = codegen_expr(ctx, node->as.if_stmt.cond); LLVMBasicBlockRef then_bb = LLVMAppendBasicBlock( LLVMGetBasicBlockParent(LLVMGetInsertBlock(ctx->builder)), "then"); LLVMBasicBlockRef else_bb = node->as.if_stmt.else_block ? LLVMAppendBasicBlock(LLVMGetBasicBlockParent(LLVMGetInsertBlock(ctx->builder)), "else") : NULL; LLVMBasicBlockRef merge_bb = LLVMAppendBasicBlock( LLVMGetBasicBlockParent(LLVMGetInsertBlock(ctx->builder)), "if_merge"); if (else_bb) LLVMBuildCondBr(ctx->builder, cond, then_bb, else_bb); else LLVMBuildCondBr(ctx->builder, cond, then_bb, merge_bb); LLVMPositionBuilderAtEnd(ctx->builder, then_bb); codegen_stmt(ctx, node->as.if_stmt.then_block); LLVMBuildBr(ctx->builder, merge_bb); if (else_bb) { LLVMPositionBuilderAtEnd(ctx->builder, else_bb); codegen_stmt(ctx, node->as.if_stmt.else_block); LLVMBuildBr(ctx->builder, merge_bb); } LLVMPositionBuilderAtEnd(ctx->builder, merge_bb); break; } case AST_WHILE_STMT: { LLVMBasicBlockRef cond_bb = LLVMAppendBasicBlock( LLVMGetBasicBlockParent(LLVMGetInsertBlock(ctx->builder)), "while_cond"); LLVMBasicBlockRef body_bb = LLVMAppendBasicBlock( LLVMGetBasicBlockParent(LLVMGetInsertBlock(ctx->builder)), "while_body"); LLVMBasicBlockRef exit_bb = LLVMAppendBasicBlock( LLVMGetBasicBlockParent(LLVMGetInsertBlock(ctx->builder)), "while_exit"); LLVMBuildBr(ctx->builder, cond_bb); LLVMPositionBuilderAtEnd(ctx->builder, cond_bb); LLVMValueRef cond = codegen_expr(ctx, node->as.while_stmt.cond); LLVMBuildCondBr(ctx->builder, cond, body_bb, exit_bb); LLVMPositionBuilderAtEnd(ctx->builder, body_bb); codegen_stmt(ctx, node->as.while_stmt.body); LLVMBuildBr(ctx->builder, cond_bb); LLVMPositionBuilderAtEnd(ctx->builder, exit_bb); break; } default: break; } } // === 程序级代码生成 === LLVMModuleRef codegen_module(AstNode* ast, const char* name, const char** error_msg) { CgCtx ctx = {0}; ctx.module = LLVMModuleCreateWithName(name); ctx.builder = LLVMCreateBuilder(); register_builtins(&ctx); // 第一遍:声明所有 L 函数 for (size_t i = 0; i < ast->as.program.fn_count; i++) { AstNode* fn = ast->as.program.functions[i]; LLVMTypeRef* ptypes = malloc(fn->as.function.param_count * sizeof(LLVMTypeRef)); for (size_t j = 0; j < fn->as.function.param_count; j++) ptypes[j] = to_llvm_type(fn->as.function.params[j]->as.parameter.type); LLVMTypeRef fty = LLVMFunctionType(to_llvm_type(fn->as.function.return_type), ptypes, (unsigned)fn->as.function.param_count, false); LLVMValueRef lfn = LLVMAddFunction(ctx.module, fn->as.function.name, fty); struct FnEntry* entry = malloc(sizeof(*entry)); entry->name = fn->as.function.name; entry->fn = lfn; entry->ret = fn->as.function.return_type; entry->next = ctx.fn_table; ctx.fn_table = entry; free(ptypes); } // 第二遍:生成各函数体 for (size_t i = 0; i < ast->as.program.fn_count; i++) { AstNode* fn = ast->as.program.functions[i]; LLVMValueRef lfn = find_fn(&ctx, fn->as.function.name); LLVMBasicBlockRef entry = LLVMAppendBasicBlock(lfn, "entry"); LLVMPositionBuilderAtEnd(ctx.builder, entry); // 清空变量表 ctx.var_table = NULL; // 注册参数变量 for (size_t j = 0; j < fn->as.function.param_count; j++) { LLVMValueRef param = LLVMGetParam(lfn, (unsigned)j); LLVMValueRef alloca = LLVMBuildAlloca(ctx.builder, to_llvm_type(fn->as.function.params[j]->as.parameter.type), fn->as.function.params[j]->as.parameter.name); LLVMBuildStore(ctx.builder, param, alloca); add_var(&ctx, fn->as.function.params[j]->as.parameter.name, alloca); } codegen_stmt(&ctx, fn->as.function.body); } // 验证模块 char* verify_err = NULL; LLVMVerifyModule(ctx.module, LLVMPrintMessageAction, &verify_err); if (verify_err) { *error_msg = verify_err; LLVMDisposeBuilder(ctx.builder); LLVMDisposeModule(ctx.module); return NULL; } LLVMDisposeBuilder(ctx.builder); return ctx.module; } ``` - [ ] **Step 3: 更新 CMakeLists.txt 以链接 LLVM 所有必要库** - [ ] **Step 4: 更新 CMakeLists.txt 以链接 LLVM 所有必要库** ```cmake # 替换原先的 LLVM 链接为: llvm_map_components_to_libnames(LLVM_LIBS core analysis native) target_link_libraries(l_lang ${LLVM_LIBS}) target_link_libraries(l_lang_test ${LLVM_LIBS}) ``` - [ ] **Step 5: 构建验证** ```bash cd "D:\Code\doing_exercises\programs\L Language\build" cmake .. -G "MinGW Makefiles" -DCMAKE_PREFIX_PATH="D:\settings\Language\LLVM" mingw32-make -j4 ``` Expected: 编译通过。 --- ### Task 9: 驱动程序 — 入口和命令行 **Files:** - Create: `src/driver/main.c` - [ ] **Step 1: 编写 `src/driver/main.c`** ```c #include "l_lang.h" #include "lexer.h" #include "parser.h" #include "sema.h" #include "codegen.h" #include "error.h" #include "arena.h" #include #include #include #include #include #include // 读取整个文件到内存 static char* read_file(const char* path, size_t* size) { FILE* f = fopen(path, "rb"); if (!f) { fprintf(stderr, "无法打开文件: %s\n", path); return NULL; } fseek(f, 0, SEEK_END); *size = ftell(f); fseek(f, 0, SEEK_SET); char* buf = malloc(*size + 1); fread(buf, 1, *size, f); buf[*size] = '\0'; fclose(f); return buf; } // 写入字符串到文件 static bool write_file(const char* path, const char* data) { FILE* f = fopen(path, "w"); if (!f) return false; fputs(data, f); fclose(f); return true; } int main(int argc, char** argv) { const char* input = NULL; const char* output = "a.exe"; bool emit_ir = false; // 解析命令行参数 for (int i = 1; i < argc; i++) { if (strcmp(argv[i], "--emit-ir") == 0) { emit_ir = true; } else if (strcmp(argv[i], "-o") == 0 && i + 1 < argc) { output = argv[++i]; } else if (argv[i][0] != '-') { input = argv[i]; } } if (!input) { fprintf(stderr, "用法: l_lang <文件.l> [-o <输出>] [--emit-ir]\n"); return 1; } // 1. 读取源文件 size_t src_size; char* source = read_file(input, &src_size); if (!source) return 1; // 2. 初始化 Arena arena = arena_create(8); // 8 MB ErrorInfo error = {0}; ErrorList error_list; error_init(&error_list); // 3. 词法分析 size_t token_count; Token* tokens = lex(&arena, source, input, &token_count, &error); if (!tokens) { fprintf(stderr, "词法错误: %s:%d:%d: %s\n", error.filename, error.line, error.col, error.message); free(source); arena_destroy(&arena); return 1; } // 4. 语法分析 AstNode* ast = parse(&arena, tokens, token_count, input, &error); if (!ast) { fprintf(stderr, "语法错误: %s:%d:%d: %s\n", error.filename, error.line, error.col, error.message); free(source); arena_destroy(&arena); return 1; } // 5. 语义分析 sema_analyze(ast, &error_list, &arena); if (error_list.count > 0) { error_print(&error_list); free(source); arena_destroy(&arena); return 1; } // 6. LLVM IR 生成 const char* codegen_error = NULL; LLVMModuleRef module = codegen_module(ast, "l_module", &codegen_error); if (!module) { fprintf(stderr, "IR 生成错误: %s\n", codegen_error); free(source); arena_destroy(&arena); return 1; } if (emit_ir) { // 输出 LLVM IR 文本 char* ir = LLVMPrintModuleToString(module); char ir_path[512]; snprintf(ir_path, sizeof(ir_path), "%s.ll", input); write_file(ir_path, ir); printf("IR 已输出到: %s\n", ir_path); LLVMDisposeMessage(ir); } else { // 生成目标文件和可执行文件 LLVMInitializeNativeTarget(); LLVMInitializeNativeAsmPrinter(); char* triple = LLVMGetDefaultTargetTriple(); LLVMTargetRef target; char* target_error = NULL; if (LLVMGetTargetFromTriple(triple, &target, &target_error)) { fprintf(stderr, "目标平台错误: %s\n", target_error); LLVMDisposeMessage(target_error); LLVMDisposeMessage(triple); free(source); arena_destroy(&arena); LLVMDisposeModule(module); return 1; } LLVMTargetMachineRef tm = LLVMCreateTargetMachine( target, triple, "generic", "", LLVMCodeGenLevelDefault, LLVMRelocDefault, LLVMCodeModelDefault); LLVMDisposeMessage(triple); // 输出目标文件 char obj_path[512]; snprintf(obj_path, sizeof(obj_path), "%s.o", input); char* obj_error = NULL; if (LLVMTargetMachineEmitToFile(tm, module, obj_path, LLVMObjectFile, &obj_error)) { fprintf(stderr, "目标代码生成错误: %s\n", obj_error); LLVMDisposeMessage(obj_error); free(source); arena_destroy(&arena); LLVMDisposeTargetMachine(tm); LLVMDisposeModule(module); return 1; } // 调用 clang 链接 char cmd[1024]; snprintf(cmd, sizeof(cmd), "\"D:\\settings\\Language\\LLVM\\bin\\clang.exe\" \"%s\" -o \"%s\" -fuse-ld=lld", obj_path, output); int ret = system(cmd); if (ret != 0) { fprintf(stderr, "链接失败 (exit code %d)\n", ret); } else { printf("编译成功: %s\n", output); } LLVMDisposeTargetMachine(tm); } // 清理 LLVMDisposeModule(module); free(source); arena_destroy(&arena); return 0; } ``` - [ ] **Step 2: 编写第一个测试程序 `test/programs/01_arithmetic.l`** ```rust fn main() -> i64 { let x = 1 + 2 * 3; print_i64(x); return 0; } ``` - [ ] **Step 3: 构建并端到端测试** ```bash cd "D:\Code\doing_exercises\programs\L Language\build" mingw32-make -j4 ./l_lang.exe ../test/programs/01_arithmetic.l -o test_out.exe ./test_out.exe ``` Expected: 输出 `7`。 --- ### Task 10: 集成测试 — 全部语言特性 **Files:** - Create: `test/programs/02_if_else.l` - Create: `test/programs/03_while.l` - Create: `test/programs/04_fib_recursive.l` - Create: `test/programs/05_fib_iterative.l` - [ ] **Step 1: `test/programs/02_if_else.l` — if/else 控制流** ```rust fn main() -> i64 { let x = 10; if x > 5 { print_i64(1); } else { print_i64(0); } return 0; } ``` Expected output: `1` - [ ] **Step 2: `test/programs/03_while.l` — while 循环 + 比较运算** ```rust fn countdown(n: i64) -> i64 { let remaining = n; if remaining > 0 { print_i64(remaining); return countdown(remaining - 1); } return 0; } fn main() -> i64 { print_i64(100); return countdown(5); } ``` Expected output: `100` `5` `4` `3` `2` `1`(每行一个,测试函数递归 + if/else) - [ ] **Step 3: `test/programs/04_fib_recursive.l` — 斐波那契递归** ```rust fn fib(n: i64) -> i64 { if n < 2 { return n; } return fib(n - 1) + fib(n - 2); } fn main() -> i64 { let result = fib(10); print_i64(result); return 0; } ``` Expected output: `55` - [ ] **Step 4: `test/programs/05_float.l` — 浮点运算 + 多函数** ```rust fn square(x: f64) -> f64 { return x * x; } fn add_floats(a: f64, b: f64) -> f64 { return a + b; } fn main() -> i64 { let s = square(3.0); let sum = add_floats(s, 4.0); print_f64(sum); return 0; } ``` Expected output: `13.000000`(3.0² + 4.0 = 13.0,测试浮点运算 + 多函数调用) - [ ] **Step 5: 批量运行所有集成测试** ```bash cd "D:\Code\doing_exercises\programs\L Language" for f in test/programs/01_arithmetic.l test/programs/02_if_else.l \ test/programs/03_while.l test/programs/04_fib_recursive.l \ test/programs/05_float.l; do echo "=== $f ===" ./build/l_lang.exe "$f" -o "./build/test_out.exe" 2>&1 && ./build/test_out.exe 2>&1 echo "" done ``` Expected: 5 个程序全部编译运行,输出正确——四则运算 `7`,if/else `1`,递归 `5 4 3 2 1`,斐波那契 `55`,浮点 `13.000000`。 --- ### Task 11: 完成和文档 **Files:** - Create: `README.md` - [ ] **Step 1: 编写 `README.md`** ```markdown # L Language 一门用 C 语言实现的编译型编程语言,静态类型 + 类型推断,Rust 风格语法。 ## 构建 ```bash mkdir build && cd build cmake .. -G "MinGW Makefiles" -DCMAKE_PREFIX_PATH="D:/settings/Language/LLVM" mingw32-make -j4 ``` ## 使用 ```bash ./l_lang.exe <源文件.l> [-o <输出.exe>] [--emit-ir] ``` ## 语言特性 (v0.1) - 类型: `i64`, `f64`, `bool`, `void` - 控制流: `if/else`, `while` - 函数: 递归 + 多参数 - 变量: `let` 不可变声明,类型推断 - 内置函数: `print_i64`, `print_f64`, `print_bool` ## 示例 ```rust fn fib(n: i64) -> i64 { if n < 2 { return n; } return fib(n - 1) + fib(n - 2); } fn main() -> i64 { print_i64(fib(10)); return 0; } ``` ``` - [ ] **Step 2: 最终验证** ```bash cd "D:\Code\doing_exercises\programs\L Language\build" # 清理重建 rm -rf * cmake .. -G "MinGW Makefiles" -DCMAKE_PREFIX_PATH="D:\settings\Language\LLVM" mingw32-make -j4 # 运行单元测试 ./l_lang_test.exe # 运行集成测试 ./l_lang.exe ../test/programs/04_fib_recursive.l -o fib.exe && ./fib.exe ``` Expected: - `l_lang_test.exe` 8 个测试全部 PASS - `fib.exe` 输出 `55` - 5 个集成测试全部通过:算术 ✓ | if/else ✓ | 递归 ✓ | 斐波那契 ✓ | 浮点多函数 ✓ --- ## 依赖关系图 ``` Task 1 (CMake + 骨架) └─ Task 2 (Token 数据结构) └─ Task 3 (词法分析器) └─ Task 4 (AST 数据结构) ├─ Task 5 (解析器) │ ├─ Task 6 (符号表) │ └─ Task 7 (语义分析) ├─ Task 8 (代码生成) └─ Task 9 (驱动) └─ Task 10 (集成测试) └─ Task 11 (文档) ``` Tasks 1-4 严格顺序依赖。Task 5 和 Task 6 可并行(Task 5 依赖 4,Task 6 独立)。Task 7 依赖 5+6。Task 8 依赖 Task 4(可独立于 5/6/7 先用硬编码 AST 测试 LLVM API)。Task 9 依赖全部前置任务。 ## 积压问题(已知,v0.2 解决) 1. **变量不可重新赋值** — v0.1 的 `let` 变量是不可变的,无法做 `x = x + 1` 这类赋值。因此 while 循环体无法修改循环变量。迭代算法需要改用递归实现。 2. **作用域泄漏** — 变量表使用链表实现,离开作用域后未清理。