代码之家 › 专栏 › 技术社区 › Adrian

“asm”、“uuu asm”和“uu asm”有什么区别?

inline-assembly visual-c++ assembly gcc c

Adrian · 技术社区 · 14 年前

据我所知,唯一的区别是 __asm { ... }; 和 __asm__("..."); 这是第一次使用吗 mov eax, var 第二种用途 movl %0, %%eax 具有 :"=r" (var) 最后。还有什么区别?那就这样吧 asm ?

4 回复 | 直到 7 年前

Ben Voigt 14 年前

您使用哪一个取决于您的编译器。这不像C语言那样标准。

Peter Cordes 7 年前

msvc inline asm和gnu c inline asm有很大的区别。GCC语法是为优化输出而设计的,没有浪费的指令,用于包装单个指令或其他东西。MSVC语法设计得相当简单,但是如果没有输入和输出的内存往返的延迟和额外指令,就不可能使用它。

如果出于性能原因使用内联asm,那么只有在完全用asm编写整个循环时,MSVC内联asm才可行,而不是在内联函数中包装短序列。以下示例(包装 idiv 对于函数),msvc的缺点是:~8个额外的存储/加载指令。

msvc inline asm(由msvc和icc使用,也可能在某些商业编译器中提供):

查看您的ASM,找出哪些注册了您的代码步骤。
只能通过内存传输数据。寄存器中的数据由编译器存储,以准备 mov ecx, shift_count 例如。因此,使用编译器不会为您生成的单个asm指令,在输入和输出过程中都会涉及到一个往返的内存。
对初学者更友好,但通常无法避免数据输入/输出的开销。 . 即使除了语法限制之外,当前版本的MSVC中的优化器也不擅长围绕内联ASM块进行优化。

GNU C内联ASM is not a good way to learn asm . 你必须很好地理解asm,这样你才能告诉编译器你的代码。你必须理解编译器需要知道什么。该答案还链接到其他内联ASM指南和Q&A。这个 x86 一般来说,tag wiki有很多适合asm的好东西,但是只需链接到gnu内联asm。(答案中的内容也适用于非x86平台上的GNU内联ASM。)

gcc、clang、icc和一些实现gnu c的商业编译器使用gnu c inline asm语法:

你必须告诉编译器你打了什么。如果不这样做,将导致以不明显、难以调试的方式破坏周围的代码。
强大但难以阅读、学习和使用语法来告诉编译器如何提供输入以及在哪里找到输出。例如 "c" (shift_count) 会让编译器将 shift_count 变为 ecx 在内联ASM运行之前。
对于较大的代码块来说,这是非常笨拙的,因为asm必须位于字符串常量内。所以你通常需要
```
"insn   %[inputvar], %%reg\n\t"       // comment
"insn2  %%reg, %[outputvar]\n\t"
```
非常不可原谅/困难,但允许较低的开销(尤其是包装单个指令) . (包装单个指令是最初的设计意图,这就是为什么您必须特别告诉编译器关于早期的clobber,以防止它在出现问题时对输入和输出使用相同的寄存器。)

示例:全宽整数除法( `div` )

在32位CPU上,将64位整数除以32位整数,或执行完全乘法(32x32->64),可以从内联ASM中获益。GCC和Clang没有利用 伊迪夫 对于 (int64_t)a / (int32_t)b ,可能是因为如果结果不适合32位寄存器,则指令会出错。如此不同 this Q&A about getting quotient and remainder from one div ,这是内联ASM的用例。(除非有方法通知编译器结果是合适的,所以IDIV不会出错。)

我们将使用在寄存器中放入一些参数的调用约定(使用 hi 即使在 正确的 注册),以显示一种更接近于这样一个小函数的情况。

MSVC

使用内联asm时,请小心使用register arg调用约定。显然,内联ASM支持的设计/实现非常糟糕,以至于 the compiler might not save/restore arg registers around the inline asm, if those args aren't used in the inline asm . 感谢@rostridge指出这一点。

// MSVC.  Be careful with _vectorcall & inline-asm: see above
// we could return a struct, but that would complicate things
int _vectorcall div64(int hi, int lo, int divisor, int *premainder) {
    int quotient, tmp;
    __asm {
        mov   edx, hi;
        mov   eax, lo;
        idiv   divisor
        mov   quotient, eax
        mov   tmp, edx;
        // mov ecx, premainder   // Or this I guess?
        // mov   [ecx], edx
    }
    *premainder = tmp;
    return quotient;     // or omit the return with a value in eax
}

更新:显然在 eax 或 edx:eax 然后从非空函数的末端脱落(没有 return ) is supported, even when inlining . 我假设只有在 asm 语句。这样可以避免存储/重新加载输出(至少对于 quotient 但是我们不能对输入做任何事情。在具有stack参数的非内联函数中,它们已经在内存中了,但是在这个用例中,我们正在编写一个可以有效地内联的小函数。

用MSVC 19.00.23026编译 /O2 on rextester (用) main() 找到exe的目录 dumps the compiler's asm output to stdout )

## My added comments use. ##
; ... define some symbolic constants for stack offsets of parameters
; 48   : int ABI div64(int hi, int lo, int divisor, int *premainder) {
    sub esp, 16                 ; 00000010H
    mov DWORD PTR _lo$[esp+16], edx      ## these symbolic constants match up with the names of the stack args and locals
    mov DWORD PTR _hi$[esp+16], ecx

    ## start of __asm {
    mov edx, DWORD PTR _hi$[esp+16]
    mov eax, DWORD PTR _lo$[esp+16]
    idiv    DWORD PTR _divisor$[esp+12]
    mov DWORD PTR _quotient$[esp+16], eax  ## store to a local temporary, not *premainder
    mov DWORD PTR _tmp$[esp+16], edx
    ## end of __asm block

    mov ecx, DWORD PTR _premainder$[esp+12]
    mov eax, DWORD PTR _tmp$[esp+16]
    mov DWORD PTR [ecx], eax               ## I guess we should have done this inside the inline asm so this would suck slightly less
    mov eax, DWORD PTR _quotient$[esp+16]  ## but this one is unavoidable
    add esp, 16                 ; 00000010H
    ret 8

有很多额外的mov指令,编译器甚至都没有优化任何指令。我想也许它能看到并理解 mov tmp, edx 在内联ASM中,并将其作为存储 premainder . 但那需要装车 预演 我猜是从堆栈到内联asm块之前的寄存器。

这个函数实际上是 更糟的 具有 _vectorcall 比正常情况下堆栈上的所有内容都要多。在寄存器中有两个输入,它将它们存储到内存中,这样内联ASM就可以从命名变量中加载它们。如果这是内联的,甚至更多的参数可能在regs中,并且它必须存储所有参数,因此asm将具有内存操作数!因此,与GCC不同的是,我们从中获得的收益并不多。

做 *premainder = tmp 在asm块中意味着更多的代码是用asm编写的,但是在剩余的部分中避免了完全死机的存储/加载/存储路径。这将指令总数减少2,减少到11(不包括 ret )

我正在尝试从MSVC中获取尽可能最好的代码,而不是“错误地使用它”,并创建一个吸管人参数。但阿法西特,它是可怕的包装非常短的序列。 假设64/32->32除法有一个内在函数,它允许编译器为这种特殊情况生成良好的代码,所以在MSVC上使用内联asm的整个前提可能是一个草人论点。 . 但它确实表明了内在的 许多的 比MSVC的内联ASM更好。

GNU C(GCC/CLANG/ICC)

gcc甚至比这里显示的输入div64更好,因为它通常可以安排前面的代码首先在edx:eax中生成64位整数。

我无法让gcc为32位vectorcall abi编译。叮当声可以,但它吸在与 "rm" 约束(在godbolt链接上试试:它通过内存反弹函数arg,而不是在约束中使用register选项)。64位MS调用约定接近32位vectorcall,其中前两个参数位于edx、ecx中。不同的是,在使用堆栈之前,还有两个参数进入了regs(并且被调用方不会从堆栈中弹出参数,这就是 ret 8 在msvc输出中。)

// GNU C
// change everything to int64_t to do 128b/64b -> 64b division
// MSVC doesn't do x86-64 inline asm, so we'll use 32bit to be comparable
int div64(int lo, int hi, int *premainder, int divisor) {
    int quotient, rem;
    asm ("idivl  %[divsrc]"
          : "=a" (quotient), "=d" (rem)    // a means eax,  d means edx
          : "d" (hi), "a" (lo),
            [divsrc] "rm" (divisor)        // Could have just used %0 instead of naming divsrc
            // note the "rm" to allow the src to be in a register or not, whatever gcc chooses.
            // "rmi" would also allow an immediate, but unlike adc, idiv doesn't have an immediate form
          : // no clobbers
        );
    *premainder = rem;
    return quotient;
}

compiled with gcc -m64 -O3 -mabi=ms -fverbose-asm . 有了-m32,你只需要得到3个负载、IDIV和一个商店,从godbolt链接中的更改可以看出。

mov     eax, ecx  # lo, lo
idivl  r9d      # divisor
mov     DWORD PTR [r8], edx       # *premainder_7(D), rem
ret

对于32位的vectorcall,gcc将执行如下操作

## Not real compiler output, but probably similar to what you'd get
mov     eax, ecx               # lo, lo
mov     ecx, [esp+12]          # premainder
idivl   [esp+16]               # divisor
mov     DWORD PTR [ecx], edx   # *premainder_7(D), rem
ret   8

与GCC的4条指令相比,MSVC使用了13条指令(不包括RET)。正如我所说,使用内联,它可能只编译为一个,而MSVC可能仍然使用9。(它不需要保留堆栈空间或加载 预演 我假设它仍然需要存储3个输入中的2个。然后它将它们重新加载到ASM中,运行 伊迪夫 ,存储两个输出,并在ASM之外重新加载它们。这是4个加载/存储用于输入,另外4个用于输出。)

oDisPo 13 年前

使用gcc编译器,这并不是什么大区别。 asm 或 __asm 或 __asm__ 同样,它们只是用来避免名称空间目的冲突(有名为asm的用户定义函数等)。

Ciro Santilli OurBigBook.com 7 年前

asm VS __asm__ 在海湾合作委员会

ASM 不适用于 -std=c99 ,您有两种选择:

使用 阿斯阿姆斯
使用 -std=gnu99

__asm VS 阿斯阿姆斯 在海湾合作委员会

我找不到在哪 阿斯马 记录在案(特别是在 https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Alternate-Keywords.html#Alternate-Keywords ,但从 GCC 8.1 source 它们完全相同:

  { "__asm",        RID_ASM,    0 },
  { "__asm__",      RID_ASM,    0 },

所以我就用 阿斯阿姆斯 记录在案。

“asm”、“uuu asm”和“uu asm”有什么区别?

示例:全宽整数除法( div )

MSVC

GNU C(GCC/CLANG/ICC)

示例:全宽整数除法( `div` )