代码之家 › 专栏 › 技术社区 › Paul J. Lucas

从LLVM IR访问结构成员和结构数组

jit llvm c++

7

Paul J. Lucas · 技术社区 · 10 年前

如果我有一个C++程序声明 struct ,例如:

struct S {
    short s;
    union U {
        bool b;
        void *v;
    };
    U u;
};

并且我通过LLVM C++API生成一些LLVM IR以镜像C++声明:

vector<Type*> members;
members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
// since LLVM doesn't support unions, just use an ArrayType that's the same size
members.push_back( ArrayType::get( IntegerType::get( ctx, 8 ), sizeof( S::U ) ) );

StructType *const llvm_S = StructType::create( ctx, "S" );
llvm_S->setBody( members );

我如何确保 sizeof(S) 在C++代码中 StructType 在LLVM IR代码中?对于单个构件的偏移也是一样的。, u.b .

这也是我有一个数组 S 在C++中分配:

S *s_array = new S[10];

我通过了 s_array 在LLVM IR代码中,我访问阵列的各个元素。为了使其工作, 尺寸(S) 在C++和LLVM IR中必须相同,因此:

%elt = getelementptr %S* %ptr_to_start, i64 1

将访问 s_array[1] 正确地

当我编译并运行下面的程序时,它会输出:

sizeof(S) = 16
allocSize(S) = 10

问题是LLVM在 S::s 和 S::u 。C++编译器使 union 从8字节对齐的边界开始,而LLVM没有。

我在玩 DataLayout 对于我的机器[Mac OS X 10.9.5,g++Apple LLVM 6.0版(clang-600.0.57)(基于LLVM 3.5svn)],如果我打印数据布局字符串,我会得到:

e-m:o-i64:64-f80:128-n8:16:32:64-S128

如果我强制将数据布局设置为:

e-m:o-i64:64-f80:128-n8:16:32:64-S128-a:64

其中添加的是 a:64 这意味着聚合类型的对象在64位边界上对齐,然后我得到 相同的 大小那么,为什么默认数据布局不正确呢?

完成以下工作程序

// LLVM
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/ExecutionEngine/MCJIT.h>
#include <llvm/IR/DerivedTypes.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/Module.h>
#include <llvm/IR/Type.h>
#include <llvm/Support/TargetSelect.h>

// standard
#include <iostream>
#include <memory>
#include <string>

using namespace std;
using namespace llvm;

struct S {
    short s;
    union U {
        bool b;
        void *v;
    };
    U u;
};

ExecutionEngine* createEngine( Module *module ) {
    InitializeNativeTarget();
    InitializeNativeTargetAsmPrinter();

    unique_ptr<Module> u( module );
    EngineBuilder eb( move( u ) );
    string errStr;
    eb.setErrorStr( &errStr );
    eb.setEngineKind( EngineKind::JIT );
    ExecutionEngine *const exec = eb.create();
    if ( !exec ) {
        cerr << "Could not create ExecutionEngine: " << errStr << endl;
        exit( 1 );
    }
    return exec;
}

int main() {
    LLVMContext ctx;

    vector<Type*> members;
    members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
    members.push_back( ArrayType::get( IntegerType::get( ctx, 8 ), sizeof( S::U ) ) );

    StructType *const llvm_S = StructType::create( ctx, "S" );
    llvm_S->setBody( members );

    Module *const module = new Module( "size_test", ctx );
    ExecutionEngine *const exec = createEngine( module );
    DataLayout const *const layout = exec->getDataLayout();
    module->setDataLayout( layout );

    cout << "sizeof(S) = " << sizeof( S ) << endl;
    cout << "allocSize(S) = " << layout->getTypeAllocSize( llvm_S ) << endl;

    delete exec;
    return 0;
}

2 回复 | 直到 10 年前

1

7

Paul J. Lucas 10 年前

由于原始答案是“预编辑”问题的正确答案,所以我正在为新问题写一个全新的答案(我猜结构实际上并不相同)。

问题不在于 DataLayout 因此[但您需要DataLayout来解决问题,因此在开始制作LLVM-IR之前需要更新代码以创建模块] union 在 struct 具有较少的对齐限制:

struct S {
    short s;        // Alignment = 2 
    union U {    
        bool b;     // Alignment = 1
        void *v;    // Alignment = 4 or 8
    };
    U u;            // = Alignment = 4 or 8
};

现在在LLVM代码生成器中:

members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
members.push_back( ArrayType::get( IntegerType::get( ctx, 8 ), sizeof( S::U ) ) );

结构中的第二个元素是 char dummy[sizeof(S::U)] ,其对齐要求为1。因此,当然,LLVM将对齐 结构 与具有更严格对齐标准的C++编译器不同。

在这种特殊情况下,使用 i8 * (又名 void * )代替数组 i8 很明显 bitcast 在访问的值时,根据需要转换为其他类型 b ]

要以完全通用的方式解决此问题,需要生成 结构 由在 协会 ,然后用足够的 char 元素以弥补最大的尺寸。

我现在要吃点东西了,但我会用一些代码来解决它,但它比我最初想象的要复杂一些。

这是 main 上面的posted修改为使用指针而不是 烧焦 阵列:

int main() {
    LLVMContext ctx;

    vector<Type*> members;
    members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
    members.push_back( PointerType::getUnqual( IntegerType::get( ctx, 8 ) ) );

    StructType *const llvm_S = StructType::create( ctx, "S" );
    llvm_S->setBody( members );

    Module *const module = new Module( "size_test", ctx );
    ExecutionEngine *const exec = createEngine( module );
    DataLayout const *const layout = exec->getDataLayout();
    module->setDataLayout( *layout );

    cout << "sizeof(S) = " << sizeof( S ) << endl;
    cout << "allocSize(S) = " << layout->getTypeAllocSize( llvm_S ) << endl;

    delete exec;
    return 0;
}

还有一些微小的变化来掩盖这一事实 setDataLayout 在您的LLVM版本和我使用的LLVM之间发生了变化。

最后是允许使用任何类型的通用版本:

Type* MakeUnionType( Module* module, LLVMContext& ctx, vector<Type*> um )
{
    const DataLayout dl( module );
    size_t maxSize = 0;
    size_t maxAlign = 0;
    Type*  maxAlignTy = 0;

    for( auto m : um )
    {
        size_t sz = dl.getTypeAllocSize( m );
        size_t al = dl.getPrefTypeAlignment( m );
        if( sz > maxSize ) 
            maxSize = sz;
        if( al > maxAlign) 
        {
            maxAlign = al;
            maxAlignTy = m;
        }
    }
    vector<Type*> sv = { maxAlignTy };
    size_t mas = dl.getTypeAllocSize( maxAlignTy );
    if( mas < maxSize )
    {
        size_t n = maxSize - mas;
        sv.push_back(ArrayType::get( IntegerType::get( ctx, 8 ), n ) );
    }
    StructType* u = StructType::create( ctx, "U" );
    u->setBody( sv );
    return u;
}

int main() {
    LLVMContext ctx;

    Module *const module = new Module( "size_test", ctx );
    ExecutionEngine *const exec = createEngine( module );
    DataLayout const *const layout = exec->getDataLayout();
    module->setDataLayout( *layout );

    vector<Type*> members;
    members.push_back( IntegerType::get( ctx, sizeof( short ) * 8 ) );
    vector<Type*> unionMembers = { PointerType::getUnqual( IntegerType::get( ctx, 8 ) ), 
                   IntegerType::get( ctx, 1 )  };
    members.push_back( MakeUnionType( module, ctx, unionMembers ) );

    StructType *const llvm_S = StructType::create( ctx, "S" );
    llvm_S->setBody( members );

    cout << "sizeof(S) = " << sizeof( S ) << endl;
    cout << "allocSize(S) = " << layout->getTypeAllocSize( llvm_S ) << endl;

    delete exec;
    return 0;
}

请注意,在这两种情况下,您都需要 比特铸造 转换地址的操作 b -在第二种情况下,还需要位转换 结构 进入 无效* ,但假设您实际上想要泛型 协会 支持,无论如何你都必须这样做。

生成 协会 类型可以在这里找到,这是我的Pascal编译器的 variant [这是Pascal制作 协会 ]:

https://github.com/Leporacanthicus/lacsap/blob/master/types.cpp#L525 以及代码生成,包括比特流: https://github.com/Leporacanthicus/lacsap/blob/master/expr.cpp#L520

2

1

Mats Petersson 10 年前

主要目的是 DataLayout 就是知道元素的排列。如果您不需要知道代码中元素的大小、对齐方式或偏移量[并且LLVM除了GEP指令之外没有任何有用的方法来查找偏移量,因此您可以忽略偏移量部分],那么在您从IR执行(或生成对象文件)之前,您将不需要数据布局。

(当我为编译器实现-m32开关时,尝试使用64位“本机”数据布局编译32位代码时,我确实遇到了一些非常有趣的错误。在编译过程中切换datalayout不是一个好主意,我这样做是因为我使用了“默认”布局,然后在创建实际对象文件时设置了不同的布局)。