代码之家  ›  专栏  ›  技术社区  ›  Guillaume

C使用libconv将iso ISO8859-1转换为UTF-8

  •  0
  • Guillaume  · 技术社区  · 7 年前

    我第一次用这个答案 Is there a way to convert from UTF8 to iso-8859-1? 执行操作并工作;

    我遵循了这里提供的示例: https://www.lemoda.net/c/iconv-example/iconv-example.html

    我写了一个这样的方法:

    char *iconvISO2UTF8(char *iso) {
        iconv_t iconvDesc = iconv_open ("ISO−8859-1", "UTF-8//TRANSLIT//IGNORE");
    
        if (iconvDesc == (iconv_t) - 1) {
            /* Something went wrong.  */
            if (errno == EINVAL)
                fprintf(stderr, "conversion from '%s' to '%s' not available", "ISO−8859−1", "UTF-8");           
            else
                fprintf(stderr, "LibIcon initialization failure");          
    
            return NULL;
        }
    
        size_t iconv_value;
        char * utf8;
        size_t len;
        size_t utf8len; 
        char * utf8start;
    
        int len_start;
    
    
        len = strlen (iso);
        if (! len) {        
            fprintf(stderr, "iconvISO2UTF8: input String is empty.");           
            return NULL;
        }
    
        /* Assign enough space to put the UTF-8. */
        utf8len = 2 * len;
        utf8 = calloc (utf8len, sizeof (char));
        if (! utf8) {
            fprintf(stderr, "iconvISO2UTF8: Calloc failed.");           
            return NULL;
        }
        /* Keep track of the variables. */
        utf8start = utf8;
        len_start = len;
    
        iconv_value = iconv (iconvDesc, & iso, & len, & utf8, & utf8len);
        /* Handle failures. */
        if (iconv_value == (size_t) - 1) {      
            switch (errno) {
                    /* See "man 3 iconv" for an explanation. */
                case EILSEQ:
                    fprintf(stderr, "iconv failed: Invalid multibyte sequence, in string '%s', length %d, out string '%s', length %d\n", iso, (int) len, utf8start, (int) utf8len);             
                    break;
                case EINVAL:
                    fprintf(stderr, "iconv failed: Incomplete multibyte sequence, in string '%s', length %d, out string '%s', length %d\n", iso, (int) len, utf8start, (int) utf8len);              
                    break;
                case E2BIG:
                    fprintf(stderr, "iconv failed: No more room, in string '%s', length %d, out string '%s', length %d\n", iso, (int)  len, utf8start, (int) utf8len);                              
                    break;
                default:
                    fprintf(stderr, "iconv failed, in string '%s', length %d, out string '%s', length %d\n", iso, (int) len, utf8start, (int) utf8len);                             
            }
            return NULL;
        }
    
    
        if(iconv_close (iconvDesc) != 0) {
            fprintf(stderr, "libicon close failed: %s", strerror (errno));          
        }
    
        return utf8start;
    
    }
    

    当我用像ascii这样的普通字符来调用这个函数时,比如“abracadabra”,iconv就起作用了。

    iconv失败:字符串中的多字节序列无效¨¤', 长度

    以下是一个示例主程序,它存储在用ISO8859-1编码的源文件中,并在以ISO8859-1作为默认字符集的linux系统上编译时崩溃:

    int main(int argc, char **argv) {
        char *iso1 = "abracadabra";
        char *utf = iconvISO2UTF8(iso1);
        puts(utf);
        free(utf);
    
        char *iso2 = "éàèüöä";
        utf = iconvISO2UTF8(iso2);
        puts(utf);
        free(utf);
    }
    

    可以用iconv运行这种转换吗? 如果是,这个代码有什么问题?

    1 回复  |  直到 7 年前
        1
  •  5
  •   Antti Haapala -- Слава Україні    7 年前

    请看这本书 iconv_open(3) 手册页:

    iconv_t iconv_open(const char *tocode, const char *fromcode);
    

    如果你要改变信仰 那么这就不一致了:

    iconv_t iconvDesc = iconv_open ("ISO−8859-1", "UTF-8//TRANSLIT//IGNORE");
    

    应该说

    iconv_t iconvDesc = iconv_open ("UTF-8//TRANSLIT//IGNORE", "ISO−8859-1");
    
    推荐文章