其他递归函数:
unsigned int so ( unsigned int x )
{
static unsigned int z=0;
z+=x;
if(x==0) return(z);
so(x-1);
return(z);
}
构建/拆卸
arm-none-eabi-gcc -O2 -c Desktop/so.c -o so.o
arm-none-eabi-objdump -D so.o
00000000 <so>:
0: e92d4010 push {r4, lr}
4: e59f4034 ldr r4, [pc, #52] ; 40 <so+0x40>
8: e5943000 ldr r3, [r4]
c: e3500000 cmp r0, #0
10: e0803003 add r3, r0, r3
14: e5843000 str r3, [r4]
18: 1a000002 bne 28 <so+0x28>
1c: e1a00003 mov r0, r3
20: e8bd4010 pop {r4, lr}
24: e12fff1e bx lr
28: e2400001 sub r0, r0, #1
2c: ebfffffe bl 0 <so>
30: e5943000 ldr r3, [r4]
34: e8bd4010 pop {r4, lr}
38: e1a00003 mov r0, r3
3c: e12fff1e bx lr
40: 00000000
push是stm的伪指令,pop是ldm的伪指令,所以可以使用它们。
我使用了一个静态局部,我称之为局部全局,它在。数据不在堆栈上(在本例中是bss,因为我将其设置为零)
Disassembly of section .bss:
00000000 <z.4099>:
0: 00000000
第一个加载是将该值加载到r3中。
调用约定表示r0将在进入函数时包含第一个参数(有例外,但在这种情况下是真的)。
我们从内存中得到z,r0已经有了参数x,所以我们把x添加到z并保存到内存中
编译器出于未知的性能原因无序地进行了比较,编写的add和str不修改标志,所以没问题,
从内存中读回r3(调用约定说r0-r3是易变的函数,您可以随意修改它们,不必保留它们,因此我们在r3中的z版本可能已被破坏,但r4由任何被调用方保留,因此我们将z读回r3。我们从堆栈中弹出r4和返回地址,用z准备返回寄存器r0并执行返回。
如果x等于零(18上的bne失败,我们运行1c,然后20,然后24),那么我们将z(r3版本)复制到r0中,r0是用于根据此编译器使用的调用约定从此函数返回的寄存器(arms建议)。并返回。
链接器将把z的地址填充到偏移量0x40,这是一个对象,而不是最终的二进制。。。
arm-none-eabi-ld -Ttext=0x1000 -Tbss=0x2000 so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
arm-none-eabi-objdump -D so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00001000 <so>:
1000: e92d4010 push {r4, lr}
1004: e59f4034 ldr r4, [pc, #52] ; 1040 <so+0x40>
1008: e5943000 ldr r3, [r4]
100c: e3500000 cmp r0, #0
1010: e0803003 add r3, r0, r3
1014: e5843000 str r3, [r4]
1018: 1a000002 bne 1028 <so+0x28>
101c: e1a00003 mov r0, r3
1020: e8bd4010 pop {r4, lr}
1024: e12fff1e bx lr
1028: e2400001 sub r0, r0, #1
102c: ebfffff3 bl 1000 <so>
1030: e5943000 ldr r3, [r4]
1034: e8bd4010 pop {r4, lr}
1038: e1a00003 mov r0, r3
103c: e12fff1e bx lr
1040: 00002000
Disassembly of section .bss:
00002000 <z.4099>:
2000: 00000000
例如
如果你有参数r0是第一个,r1是第二个,直到r3(如果它们合适,让你的代码这样做,你有四个或更少的参数)
在调用另一个函数时,需要将lr推到堆栈上
如果需要修改它们,如果需要一些本地存储,可以通过相应地修改堆栈指针(或执行push/stm)来使用堆栈。您可以看到,gcc将寄存器中的内容保存到堆栈中,然后在函数期间使用寄存器,至少有几个局部变量值得使用,除此之外,它还需要在堆栈上大量使用sp relative。
当您执行递归调用时,您可以按照调用约定执行与任何其他正常函数相同的操作,如果您需要在调用之前保存r0-r3,则可以在寄存器r4或更高版本或堆栈中执行,在函数返回后还原。您可以看到,只需将要在函数调用前后保留的值放入寄存器r4或更高版本中就更容易了。
编译器本可以在分支之前对r0进行比较,这样读起来更容易。同样,可以在pop之前将返回值的mov转换为r0
我没有输入参数,所以我在这里构建的gcc似乎是armv4t,如果我要求更新一点的话
arm-none-eabi-gcc -O2 -c -mcpu=mpcore Desktop/so.c -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <so>:
0: e92d4010 push {r4, lr}
4: e59f402c ldr r4, [pc, #44] ; 38 <so+0x38>
8: e3500000 cmp r0, #0
c: e5943000 ldr r3, [r4]
10: e0803003 add r3, r0, r3
14: e5843000 str r3, [r4]
18: 1a000001 bne 24 <so+0x24>
1c: e1a00003 mov r0, r3
20: e8bd8010 pop {r4, pc}
24: e2400001 sub r0, r0, #1
28: ebfffffe bl 0 <so>
2c: e5943000 ldr r3, [r4]
30: e1a00003 mov r0, r3
34: e8bd8010 pop {r4, pc}
38: 00000000
你可以看到报税表读起来更容易一些
虽然错过了优化,但它可以执行ldr r0、[r4]并保存一条指令。或者保持尾端不变,bne可能是beq到30(mov r0,r3;pop{r4,pc}),并共享一个出口。
so:
push {r4, lr}
@ z += x
ldr r4, zptr
ldr r3, [r4]
add r3, r0, r3
str r3, [r4]
@ if x==0 return z
cmp r0, #0
beq l30
@ so(x - 1)
sub r0, r0, #1
bl so
ldr r3, [r4]
l30:
@ return z
mov r0, r3
pop {r4, pc}
zptr: .word z
.section .bss
z: .word 0
arm-none-eabi-as so.s -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <so>:
0: e92d4010 push {r4, lr} (stmdb)
4: e59f4024 ldr r4, [pc, #36] ; 30 <zptr>
8: e5943000 ldr r3, [r4]
c: e0803003 add r3, r0, r3
10: e5843000 str r3, [r4]
14: e3500000 cmp r0, #0
18: 0a000002 beq 28 <l30>
1c: e2400001 sub r0, r0, #1
20: ebfffff6 bl 0 <so>
24: e5943000 ldr r3, [r4]
00000028 <l30>:
28: e1a00003 mov r0, r3
2c: e8bd8010 pop {r4, pc} (ldmia)
00000030 <zptr>:
30: 00000000
Disassembly of section .bss:
00000000 <z>:
0: 00000000
编辑
让我们来看看最后一个。
push {r4,lr} which is a pseudo instruction for stmdb sp!,{r4,lr}
Lr is the r14 which is the return address look at the bl instruction
branch and link, so we branch to some address but lr (link register) is
set to the return address, the instruction after the bl. So when main or some other function calls so(4); lets assume so is at address 0x1000 so the program counter, r15, pc gets 0x1000, lr will get the value of the instruction after the caller so lets say that is 0x708. Lets also assume the stack pointer during this first call to so() from main is at 0x8000, and lets say that .bss is at 0x2000 so z lives at address 0x2000 (which also means the value at 0x1030, zptr is 0x2000.
We enter the function for the first time with r0 (x) = 4.
When you read the arm docs for stmdb sp!,{r4,lr} it decrements before (db) so sp on entry this time is 0x8000 so it decrements for the two items to 0x7FF8, the first item in the list is written there so
0x7FF8 = r4 from main
0x7FFC = 9x 0x708 return address to main
the ! means sp stays modified so sp-0x7ff8
then ldr r4,zptr r4 = 0x2000
ldr r3,[r4] this is an indirect load so what is at address r4 is read to
put in r3 so r3 = [0x2000] = 0x0000 at this point the z variable.
z+=x; add r3,r0,r3 r3 = r0 + r3 = 4 + 0 = 4
str r3,[r4] [r4] = r3, [0x2000] = r3 write 4 to 0x2000
cmp r0,#0 4 != 0
beq to 28 nope, not equal so no branch
sub r0,r0,#1 r0 = 4 - 1 = 3
bl so so this is so(3); pc = 0x1000 lr = 0x1024
so now we enter so for the second time with r0 = 3
stmdb sp!,{r4,lr}
0x7FF0 = r4 (saving from so(4) call but we dont care its value even though we know it)
0x7FF4 = lr from so(4) = 0x1024
sp=0x7FF0
ldr r4,zptr r4 = 0x2000
ldr r3,[r4] r3 = [0x2000] = 4
add r3,r0,r3 r3 = 3 + 4 = 7
str r3,[r4] write 7 to 0x2000
cmp r0,#0 3 != 0
beq 0x1028 not equal so dont branch
sub r0,r0,#1 r0 = 3-1 = 2
bl so pc=0x1000 lr=0x1024
so(2)
stmdb sp!,{r4,lr}
0x7FE8 = r4 from caller, just save it
0x7FEC = lr from caller, 0x1024
sp=0x7FE8
ldr r4,zprt r4=0x2000
ldr r3,[r4] r3 = read 7 from 0x2000
add r3,r0,r3 r3 = 2 + 7 = 9
str r3,[r4] write 9 to 0x2000
cmp r0,#0 2 != 0
beq 0x1028 not equal so dont branch
sub r0,r0,#1 r0 = 2 - 1 = 1
bl 0x1000 pc=0x1000 lr=0x1024
so(1)
stmdb sp!,{r4,lr}
0x7FE0 = save r4
0x7FE4 = lr = 0x1024
sp=0x7FE0
ldr r4,zptr r4=0x2000
ldr r3,[r4] r3 = read 9 from 0x2000
add r3,r0,r3 r3 = 1 + 9 = 10
str r3,[r4] write 10 to 0x2000
cmp r0,#0 1 != 0
beq 0x1028 not equal so dont branch
sub r0,r0,#1 r0 = 1 - 1 = 0
bl 0x1000 pc=0x1000 lr=0x1024
so(0)
stmdb sp!,{r4,lr}
0x7FD8 = r4
0x7FDC = lr = 0x1024
sp = 0x7FD8
ldr r4,zptr r4 = 0x2000
ldr r3,[r4] r3 = read 10 from 0x2000
add r3,r0,r3 r3 = 0 + 10 = 10
str r0,[r4] write 10 to 0x2000
cmp r0,#0 0 = 0 so it matches
beq 0x1028 it is equal so we finally take this branch
mov r0,r3 r0 = 10
ldmia sp!,{r4,pc}
increment after
r4 = [sp+0] = [0x7FD8] restore r4 from caller
pc = [sp+4] = [0x7FDC] = 0x1024
sp += 8 = 0x7FE0
(branch to 0x1024)(return from so(0) to so(1))
ldr r3,[r4] read 10 from 0x2000
mov r0,r3 r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FE0] restore r4 from caller
pc = [sp+4] = [0x7FE4] = 0x1024
sp += 8 = 0x7FE8
(branch to 0x1024)(return from so(1) to so(2))
ldr r3,[r4] read 10 from 0x2000
mov r0,r3 r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FE8] restore r4 from caller
pc = [sp+4] = [0x7FEC] = 0x1024
sp += 8 = 0x7FF0
(branch to 0x1024)(return from so(2) to so(3))
ldr r3,[r4] read 10 from 0x2000
mov r0,r3 r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FF0] restore r4 from caller
pc = [sp+4] = [0x7FF4] = 0x1024
sp += 8 = 0x7FF8
(branch to 0x1024)(return from so(3) to so(4))
ldr r3,[r4] read 10 from 0x2000
mov r0,r3 r0 = 10
ldmia sp!,{r4,pc}
r4 = [sp+0] = [0x7FF8] restore r4 from caller (main()'s r4)
pc = [sp+4] = [0x7FFC] = 0x708
sp += 8 = 0x8000
(branch to 0x708)(return from so(4) to main())
and we are done.
因此,堆栈是函数的临时存储,在cup上写入一个数据项,然后将其推入保持器(从调用方保存r4)写入另一个项并将其推入保持器(lr,调用方的返回地址)。我们在这里每个函数只使用了两个项目,所以每个函数我可以把两个杯子推到支架上,每次调用函数我都会得到两个新的、唯一的存储位置来存储本地信息。当我退出函数时,我将两个杯子从支架中拉出,并使用它们的值(并丢弃它们)。这在某种程度上是递归的关键,堆栈为每个调用提供了新的本地存储,与之前对同一个函数的调用不同,如果没有其他需要返回地址的话(尽管确实制作了一些更简单的递归示例,但在优化时,它不够聪明,基本上不需要循环)。
ldr rd,[rn]想象他在说该地址处的项目,因此读取rn中地址处的内存,并将该值保存在rd中。
str rd,[rn]一个混乱的arm指令,其余的第一个参数是等号的左侧(加r1,r2,r3 r1=r2+r3,ldr r1,[r4]r1=r4])。这一个是向后的[rn]=rd将rd中的值存储到地址r4描述的内存位置,一级间接寻址。
stmdb sp!,意味着在执行任何操作之前,将堆栈指针减量为列表中寄存器数量的4倍字节,然后将第一个编号最低的寄存器写入[sp+0],然后在[sp+4]旁边,依此类推,最后一个寄存器将比sp的起始值小4个字节!表示函数在sp为递减值时结束。您可以将ldm/stm用于堆栈推送和POP之外的事情。就像memcpy,但那是另一个故事。。。
所有这些都在信息中心的arm文档中。臂com(arm架构参考手册,如果您没有阅读过armv5,则首选armv5)。