c2goasm-usage.md

Description

c2goasm 可以将 clang 生成的目标汇编转变为 Go 汇编, 进而直接嵌入 Go 程序中方便调用.
这意味着, 性能敏感的代码部分可以用 C/CPP 实现, 然后经过 c2goasm 转换后直接嵌入到你的 Go 程序中.
这个实现比 cgo 等接口实现性能更高 (性能提升接近20x, 详见 benchmark-against-cgo).

Preflight

Here is what you need:

Step 1, Prepare The C Code

benchmark.c

#include <stdint.h>

void benchmark(uint64_t *buf[]){
    uint64_t i;
    i = 0;
    double duration;
    for(uint64_t a=0;a<=1000;a++){
        for(uint64_t b=0;b<=1000;b++){
            for(uint64_t c=0;c<=1000;c++){
                if(a*a+b*b==c*c && a+b+c==1000){
                    *buf[i]   = a;
                    *buf[i+1] = b;
                    *buf[i+2] = c;
                    i += 3;
                }
            }
        }
    }
    
}

Step 2, Generate ASM by Clang

clang -O3 -mavx -mfma -masm=intel -mllvm -inline-threshold=1000 -mstackrealign -fno-asynchronous-unwind-tables -fno-exceptions -fno-rtti -fno-jump-tables -S ./benchmark.c -o ./benchmark.s

Step 3, Prepare Call Code in Go

benchmark_amd64.go

package main

import (
    "unsafe"
    "fmt"
)

//go:noescape
func _benchmark(buf unsafe.Pointer)

func benchmark() {
    var d0  uint64 = 0
    var d1  uint64 = 0
    var d2  uint64 = 0
    var d3  uint64 = 0
    var d4  uint64 = 0
    var d5  uint64 = 0
    var d6  uint64 = 0
    var d7  uint64 = 0
    var d8  uint64 = 0
    var d9  uint64 = 0
    var d10 uint64 = 0
    var d11 uint64 = 0

    a := []unsafe.Pointer{
        unsafe.Pointer(&d0),unsafe.Pointer(&d1),unsafe.Pointer(&d2),
        unsafe.Pointer(&d3),unsafe.Pointer(&d4),unsafe.Pointer(&d5),
        unsafe.Pointer(&d6),unsafe.Pointer(&d7),unsafe.Pointer(&d8),
        unsafe.Pointer(&d9),unsafe.Pointer(&d10),unsafe.Pointer(&d11)}

    p1 := unsafe.Pointer(&a[0])
    // emit
    _benchmark(p1)
    // output
    fmt.Println("-----------------------------")
    fmt.Printf("a:%d \t b:%d \t c:%d\n", d0, d1, d2);
    fmt.Printf("a:%d \t b:%d \t c:%d\n", d3, d4, d5);
    fmt.Printf("a:%d \t b:%d \t c:%d\n", d6, d7, d8);
    fmt.Printf("a:%d \t b:%d \t c:%d\n", d9, d10, d11);

}

Step 4, Generate GOASM by c2goasm

c2goasm -a -f ./benchmark.s ./benchmark_amd64.s

Step 5, Run it!

prepare main.go

package main

import (
    // "fmt"
)

func main(){
    benchmark()
}

Go build & Run

benchmark_amd64.s, benchmark_amd64.go, main.go 放在一起, 然后运行:

go build

然后直接运行 build 完毕的二进制文件即可.

Debug

初次编写遇到的最大问题其实是临界区变量传递和复制的问题. 如果编写不当很容易遇到 "预想传递引用结果传递了值", "修改了错误的内存" 等问题.

建议按照平时的 Debug 经验一步步缩小问题范围, 将代码精简到最小模式以缩小 Debug 范围. 有清晰的 Debug 方法论是节省 Debug 时间的最佳途径.

Reference

数据映射参考

Go C/CPP
int stdint.h/int64_t (on 64-bit system)
int8 stdint.h/int8_t
int16 stdint.h/int16_t
int32 stdint.h/int32_t
int64 stdint.h/int64_t
uint stdint.h/uint64_t (on 64-bit system)
uint8 stdint.h/uint8_t
uint16 stdint.h/uint16_t
uint32 stdint.h/uint32_t
uint64 stdint.h/uint64_t
uintptr
unsafe.Pointer *
byte (uint8) uint8_t
rune (int32) int32_t
float32 float.h/float32_t
float64 float.h/float64_t
bool
complex64
complex128
string (sequence of bytes) uint8_t var[]

指针数据传递

需要返回值的场景, 需要传递指针类型, 这要求全部结构全是指针才可以. 比如需要结果返回一个数组, 则传入需要指针数组才可以.

例如这是含有两个元素的指针数组:

C:

uint64_t *buf[]

Go:

var d0  uint64 = 0
var d1  uint64 = 0
a := []unsafe.Pointer{unsafe.Pointer(&d0),unsafe.Pointer(&d1)}
p1 := unsafe.Pointer(&a[0])

Issues