代码之家  ›  专栏  ›  技术社区  ›  LuminousNutria

loRecursion ARM Assembly中的示例

  •  -3
  • LuminousNutria  · 技术社区  · 8 年前

    有人能给我举一个例子,说明递归是如何在ARM组装中完成的,只需要这里列出的指令(为了直观起见)?

    https://salmanarif.bitbucket.io/visual/supported_instructions.html

    如果链接不起作用,我将使用visUAL,以下是我唯一可以使用的说明:

    • MVN公司
    • LDR公司
    • ADC公司
    • 附属的
    • RSB公司
    • RSC公司
    • 提高采收率
    • 奥尔
    • LSL
    • LSR
    • 语音识别
    • RRX公司
    • 化学机械抛光
    • CMN
    • TST公司
    • TEQ公司
    • LDR公司
    • LDM公司
    • STM公司
    • B
    • BL公司
    • 填充
    • 结束

    这不会为R4加载较旧的值,因此每次函数调用自身时,R4只会加倍。

        ;VisUAL initializess all registers to 0 except for R13/SP, which is -16777216
    
        MOV     R4, #0
        MOV     R5, #1
    
        MOV     r0, #4
    
        MOV     LR, #16             ;tells program to move to 4th instruction
    
    
    FIB
    
    
        STMDB   SP!, {R4-R6, LR}    ;Stores necessary values on stack (PUSH command)
        LDR     R4, [SP]            ;Loads older value for R4 from memory
        ADD     R4, R4, R5          ;Adds R5 to R4
        STR     R4, [SP], #8        ;stores current value for R4 to memory
        MOV     R5, R4              ;Makes R5 = R4
    
    
        CMP     R4, #144            ;If R4 >= 144:
        BGE     POP                 ;Branch to POP
    
        MOV     PC, LR              ;Moves to STMDB(PUSH) statement
    
    POP
        LDMIA   SP!, {R4-R6, LR}    ;Pops registers off stack
        END                         ;ends program
    
    2 回复  |  直到 7 年前
        1
  •  3
  •   NickJH    8 年前

    您需要使用堆栈、STMDB和LDMIA指令。在具有“统一”符号的真实ARM工具上,它们还具有助记符PUSH和POP。

    START
       MOV R0, #6
       BL FIB
       END ; pseudo-instruction to make your simulator terminate
    
    FIB                                 ; int fib(int i) {
       STMDB SP!, {R4,R5,R6,LR}         ;   int n, tmp;
       MOV R4, R0                       ;   n = i;
       CMP R0, #2                       ;   if (i <= 2) {
       MOV R0, #1                       ;     return 1;
       BLE FIB_END                      ;   }
       SUB R0, R4, #2                   ;   i = n-2;
       BL FIB                           ;   i = fib(i);
       MOV R5, R0                       ;   tmp = i;
       SUB R0, R4, #1                   ;   i = n-1;
       BL FIB                           ;   i = fib(i);
       ADD R0, R0, R5                   ;   i = i + tmp;
    FIB_END                             ;   return i;
       LDMIA SP!, {R4,R5,R6,PC}         ;  }
    

    它应该以包含fib(6)==8的R0终止。当然,这段代码效率很低,因为它对相同的值重复调用FIB。

    需要STM,因此可以使用寄存器r4、r5,因为另一个函数调用可以更改r0-r3和LR。推LR和弹出PC就像B LR。如果您调用C代码,则应该推动偶数个寄存器,以保持SP 64位对齐(在这里我们真的不需要这样做;忽略R6)。

        2
  •  1
  •   old_timer    8 年前

    其他递归函数:

    unsigned int so ( unsigned int x )
    {
        static unsigned int z=0;
        z+=x;
        if(x==0) return(z);
        so(x-1);
        return(z);
    }
    

    构建/拆卸

    arm-none-eabi-gcc -O2 -c Desktop/so.c -o so.o
    arm-none-eabi-objdump -D so.o
    
    
    00000000 <so>:
       0:   e92d4010    push    {r4, lr}
       4:   e59f4034    ldr r4, [pc, #52]   ; 40 <so+0x40>
       8:   e5943000    ldr r3, [r4]
       c:   e3500000    cmp r0, #0
      10:   e0803003    add r3, r0, r3
      14:   e5843000    str r3, [r4]
      18:   1a000002    bne 28 <so+0x28>
      1c:   e1a00003    mov r0, r3
      20:   e8bd4010    pop {r4, lr}
      24:   e12fff1e    bx  lr
      28:   e2400001    sub r0, r0, #1
      2c:   ebfffffe    bl  0 <so>
      30:   e5943000    ldr r3, [r4]
      34:   e8bd4010    pop {r4, lr}
      38:   e1a00003    mov r0, r3
      3c:   e12fff1e    bx  lr
      40:   00000000    
    

    push是stm的伪指令,pop是ldm的伪指令,所以可以使用它们。

    我使用了一个静态局部,我称之为局部全局,它在。数据不在堆栈上(在本例中是bss,因为我将其设置为零)

    Disassembly of section .bss:
    
    00000000 <z.4099>:
       0:   00000000    
    

    第一个加载是将该值加载到r3中。

    调用约定表示r0将在进入函数时包含第一个参数(有例外,但在这种情况下是真的)。

    我们从内存中得到z,r0已经有了参数x,所以我们把x添加到z并保存到内存中

    编译器出于未知的性能原因无序地进行了比较,编写的add和str不修改标志,所以没问题,

    从内存中读回r3(调用约定说r0-r3是易变的函数,您可以随意修改它们,不必保留它们,因此我们在r3中的z版本可能已被破坏,但r4由任何被调用方保留,因此我们将z读回r3。我们从堆栈中弹出r4和返回地址,用z准备返回寄存器r0并执行返回。

    如果x等于零(18上的bne失败,我们运行1c,然后20,然后24),那么我们将z(r3版本)复制到r0中,r0是用于根据此编译器使用的调用约定从此函数返回的寄存器(arms建议)。并返回。

    链接器将把z的地址填充到偏移量0x40,这是一个对象,而不是最终的二进制。。。

    arm-none-eabi-ld -Ttext=0x1000 -Tbss=0x2000 so.o -o so.elf
    arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
    arm-none-eabi-objdump -D so.elf
    
    so.elf:     file format elf32-littlearm
    
    
    Disassembly of section .text:
    
    00001000 <so>:
        1000:   e92d4010    push    {r4, lr}
        1004:   e59f4034    ldr r4, [pc, #52]   ; 1040 <so+0x40>
        1008:   e5943000    ldr r3, [r4]
        100c:   e3500000    cmp r0, #0
        1010:   e0803003    add r3, r0, r3
        1014:   e5843000    str r3, [r4]
        1018:   1a000002    bne 1028 <so+0x28>
        101c:   e1a00003    mov r0, r3
        1020:   e8bd4010    pop {r4, lr}
        1024:   e12fff1e    bx  lr
        1028:   e2400001    sub r0, r0, #1
        102c:   ebfffff3    bl  1000 <so>
        1030:   e5943000    ldr r3, [r4]
        1034:   e8bd4010    pop {r4, lr}
        1038:   e1a00003    mov r0, r3
        103c:   e12fff1e    bx  lr
        1040:   00002000    
    
    Disassembly of section .bss:
    
    00002000 <z.4099>:
        2000:   00000000    
    

    例如

    如果你有参数r0是第一个,r1是第二个,直到r3(如果它们合适,让你的代码这样做,你有四个或更少的参数) 在调用另一个函数时,需要将lr推到堆栈上 如果需要修改它们,如果需要一些本地存储,可以通过相应地修改堆栈指针(或执行push/stm)来使用堆栈。您可以看到,gcc将寄存器中的内容保存到堆栈中,然后在函数期间使用寄存器,至少有几个局部变量值得使用,除此之外,它还需要在堆栈上大量使用sp relative。 当您执行递归调用时,您可以按照调用约定执行与任何其他正常函数相同的操作,如果您需要在调用之前保存r0-r3,则可以在寄存器r4或更高版本或堆栈中执行,在函数返回后还原。您可以看到,只需将要在函数调用前后保留的值放入寄存器r4或更高版本中就更容易了。 编译器本可以在分支之前对r0进行比较,这样读起来更容易。同样,可以在pop之前将返回值的mov转换为r0

    我没有输入参数,所以我在这里构建的gcc似乎是armv4t,如果我要求更新一点的话

    arm-none-eabi-gcc -O2 -c -mcpu=mpcore Desktop/so.c -o so.o
    arm-none-eabi-objdump -D so.o
    
    so.o:     file format elf32-littlearm
    
    
    Disassembly of section .text:
    
    00000000 <so>:
       0:   e92d4010    push    {r4, lr}
       4:   e59f402c    ldr r4, [pc, #44]   ; 38 <so+0x38>
       8:   e3500000    cmp r0, #0
       c:   e5943000    ldr r3, [r4]
      10:   e0803003    add r3, r0, r3
      14:   e5843000    str r3, [r4]
      18:   1a000001    bne 24 <so+0x24>
      1c:   e1a00003    mov r0, r3
      20:   e8bd8010    pop {r4, pc}
      24:   e2400001    sub r0, r0, #1
      28:   ebfffffe    bl  0 <so>
      2c:   e5943000    ldr r3, [r4]
      30:   e1a00003    mov r0, r3
      34:   e8bd8010    pop {r4, pc}
      38:   00000000 
    

    你可以看到报税表读起来更容易一些

    虽然错过了优化,但它可以执行ldr r0、[r4]并保存一条指令。或者保持尾端不变,bne可能是beq到30(mov r0,r3;pop{r4,pc}),并共享一个出口。

    so:
        push    {r4, lr}
        @ z += x
        ldr r4, zptr
        ldr r3, [r4]
        add r3, r0, r3
        str r3, [r4]
        @ if x==0 return z
        cmp r0, #0
        beq l30
        @ so(x - 1)
        sub r0, r0, #1
        bl so
        ldr r3, [r4]
    l30:
        @ return z
        mov r0, r3
        pop {r4, pc}
    zptr: .word z
    
    .section .bss
    z: .word 0
    
    arm-none-eabi-as so.s -o so.o
    arm-none-eabi-objdump -D so.o
    
    so.o:     file format elf32-littlearm
    
    
    Disassembly of section .text:
    
    00000000 <so>:
       0:   e92d4010    push    {r4, lr}  (stmdb)
       4:   e59f4024    ldr r4, [pc, #36]   ; 30 <zptr>
       8:   e5943000    ldr r3, [r4]
       c:   e0803003    add r3, r0, r3
      10:   e5843000    str r3, [r4]
      14:   e3500000    cmp r0, #0
      18:   0a000002    beq 28 <l30>
      1c:   e2400001    sub r0, r0, #1
      20:   ebfffff6    bl  0 <so>
      24:   e5943000    ldr r3, [r4]
    
    00000028 <l30>:
      28:   e1a00003    mov r0, r3
      2c:   e8bd8010    pop {r4, pc}  (ldmia)
    
    00000030 <zptr>:
      30:   00000000    
    
    Disassembly of section .bss:
    
    00000000 <z>:
       0:   00000000    
    

    编辑

    让我们来看看最后一个。

    push {r4,lr}  which is a pseudo instruction for stmdb sp!,{r4,lr}
    
    Lr is the r14 which is the return address look at the bl instruction
    branch and link, so we branch to some address but lr (link register) is 
    set to the return address, the instruction after the bl.  So when main or some other function calls so(4);  lets assume so is at address 0x1000 so the program counter, r15, pc gets 0x1000, lr will get the value of the instruction after the caller so lets say that is 0x708.  Lets also assume the stack pointer during this first call to so() from main is at 0x8000, and lets say that .bss is at 0x2000 so z lives at address 0x2000 (which also means the value at 0x1030, zptr is 0x2000.
    
    We enter the function for the first time with r0 (x) = 4.
    
    When you read the arm docs for stmdb sp!,{r4,lr} it decrements before (db)  so sp on entry this time is 0x8000 so it decrements for the two items to 0x7FF8, the first item in the list is written there so
    
    0x7FF8 = r4 from main
    0x7FFC = 9x 0x708 return address to main
    
    the ! means sp stays modified so sp-0x7ff8
    
    then ldr r4,zptr  r4 = 0x2000
    ldr r3,[r4] this is an indirect load so what is at address r4 is read to 
    put in r3 so r3 = [0x2000] = 0x0000 at this point  the z variable.
    
    z+=x;  add r3,r0,r3  r3 = r0 + r3 = 4 + 0 = 4
    str r3,[r4]  [r4] = r3, [0x2000] = r3 write 4 to 0x2000
    
    cmp r0,#0   4 != 0
    
    beq to 28 nope, not equal so no branch
    
    sub r0,r0,#1   r0 = 4 - 1 = 3
    
    bl so  so this is so(3); pc = 0x1000 lr = 0x1024
    
    so now we enter so for the second time with r0 = 3
    
    stmdb sp!,{r4,lr}
    
    0x7FF0 = r4 (saving from so(4) call but we dont care its value even though we know it)
    0x7FF4 = lr from so(4) = 0x1024
    sp=0x7FF0
    ldr r4,zptr r4 = 0x2000
    ldr r3,[r4] r3 = [0x2000] = 4
    add r3,r0,r3  r3 = 3 + 4 = 7
    str r3,[r4]  write 7 to 0x2000
    cmp r0,#0 3 != 0
    beq 0x1028 not equal so dont branch
    sub r0,r0,#1   r0 = 3-1 = 2
    bl so  pc=0x1000 lr=0x1024
    
    so(2)
    
    stmdb sp!,{r4,lr}
    0x7FE8 = r4 from caller, just save it
    0x7FEC = lr from caller, 0x1024
    sp=0x7FE8
    ldr r4,zprt  r4=0x2000
    ldr r3,[r4]  r3 = read 7 from 0x2000
    add r3,r0,r3  r3 = 2 + 7 = 9
    str r3,[r4]  write 9 to 0x2000
    cmp r0,#0  2 != 0
    beq 0x1028  not equal so dont branch
    sub r0,r0,#1  r0 = 2 - 1 = 1
    bl 0x1000 pc=0x1000 lr=0x1024
    
    so(1)
    
    stmdb sp!,{r4,lr}
    0x7FE0 = save r4
    0x7FE4 = lr = 0x1024
    sp=0x7FE0
    ldr r4,zptr r4=0x2000
    ldr r3,[r4]  r3 = read 9 from 0x2000
    add r3,r0,r3  r3 = 1 + 9 = 10
    str r3,[r4]  write 10 to 0x2000
    cmp r0,#0  1 != 0
    beq 0x1028  not equal so dont branch
    sub r0,r0,#1  r0 = 1 - 1 = 0
    bl 0x1000  pc=0x1000 lr=0x1024
    
    so(0)
    
    stmdb sp!,{r4,lr}
    0x7FD8 = r4
    0x7FDC = lr = 0x1024
    sp = 0x7FD8
    ldr r4,zptr  r4 = 0x2000
    ldr r3,[r4]  r3 = read 10 from 0x2000
    add r3,r0,r3  r3 = 0 + 10 = 10
    str r0,[r4]  write 10 to 0x2000
    cmp r0,#0  0 = 0  so it matches
    beq 0x1028 it is equal so we finally take this branch
    mov r0,r3  r0 = 10
    ldmia sp!,{r4,pc}
    increment after
    r4 = [sp+0] = [0x7FD8] restore r4 from caller
    pc = [sp+4] = [0x7FDC] = 0x1024
    sp += 8 = 0x7FE0
    (branch to 0x1024)(return from so(0) to so(1))
    ldr r3,[r4]  read 10 from 0x2000
    mov r0,r3  r0 = 10
    ldmia sp!,{r4,pc}
    r4 = [sp+0] = [0x7FE0] restore r4 from caller
    pc = [sp+4] = [0x7FE4] = 0x1024
    sp += 8 = 0x7FE8
    (branch to 0x1024)(return from so(1) to so(2))
    ldr r3,[r4]  read 10 from 0x2000
    mov r0,r3  r0 = 10
    ldmia sp!,{r4,pc}
    r4 = [sp+0] = [0x7FE8] restore r4 from caller
    pc = [sp+4] = [0x7FEC] = 0x1024
    sp += 8 = 0x7FF0
    (branch to 0x1024)(return from so(2) to so(3))
    ldr r3,[r4]  read 10 from 0x2000
    mov r0,r3  r0 = 10
    ldmia sp!,{r4,pc}
    r4 = [sp+0] = [0x7FF0] restore r4 from caller
    pc = [sp+4] = [0x7FF4] = 0x1024
    sp += 8 = 0x7FF8
    (branch to 0x1024)(return from so(3) to so(4))
    ldr r3,[r4]  read 10 from 0x2000
    mov r0,r3  r0 = 10
    ldmia sp!,{r4,pc}
    r4 = [sp+0] = [0x7FF8] restore r4 from caller (main()'s r4)
    pc = [sp+4] = [0x7FFC] = 0x708
    sp += 8 = 0x8000
    (branch to 0x708)(return from so(4) to main())
    
    and we are done.
    

    因此,堆栈是函数的临时存储,在cup上写入一个数据项,然后将其推入保持器(从调用方保存r4)写入另一个项并将其推入保持器(lr,调用方的返回地址)。我们在这里每个函数只使用了两个项目,所以每个函数我可以把两个杯子推到支架上,每次调用函数我都会得到两个新的、唯一的存储位置来存储本地信息。当我退出函数时,我将两个杯子从支架中拉出,并使用它们的值(并丢弃它们)。这在某种程度上是递归的关键,堆栈为每个调用提供了新的本地存储,与之前对同一个函数的调用不同,如果没有其他需要返回地址的话(尽管确实制作了一些更简单的递归示例,但在优化时,它不够聪明,基本上不需要循环)。

    ldr rd,[rn]想象他在说该地址处的项目,因此读取rn中地址处的内存,并将该值保存在rd中。

    str rd,[rn]一个混乱的arm指令,其余的第一个参数是等号的左侧(加r1,r2,r3 r1=r2+r3,ldr r1,[r4]r1=r4])。这一个是向后的[rn]=rd将rd中的值存储到地址r4描述的内存位置,一级间接寻址。

    stmdb sp!,意味着在执行任何操作之前,将堆栈指针减量为列表中寄存器数量的4倍字节,然后将第一个编号最低的寄存器写入[sp+0],然后在[sp+4]旁边,依此类推,最后一个寄存器将比sp的起始值小4个字节!表示函数在sp为递减值时结束。您可以将ldm/stm用于堆栈推送和POP之外的事情。就像memcpy,但那是另一个故事。。。

    所有这些都在信息中心的arm文档中。臂com(arm架构参考手册,如果您没有阅读过armv5,则首选armv5)。