代码之家  ›  专栏  ›  技术社区  ›  pablorc

ruby文件编码

  •  0
  • pablorc  · 技术社区  · 16 年前

    我收到了一个url编码的字符串,如“sometext%C3%B3+more+%26+andmore”,将其取消扫描、处理数据并用windows-1252编码保存。

    转换如下:

    irb(main) >> value
    => "sometext%C3%B3+more+%26+andmore"
    irb(main) >> CGI::unescape(value)
    => "sometext\303\263 more & andmore"
    irb(main) >> #Some code and saved into a file using open(filename, "w:WINDOWS-1252")
    irb(main) >> # result in the file:
    => sometextĂ³ more & andmore
    

    结果应该是 sometextó more & andmore

    1 回复  |  直到 16 年前
        1
  •  4
  •   Mladen Jablanović    16 年前

    Ruby 1.9中添加了编码支持,因此以下代码来自Ruby 1.9.1:

    require 'cgi'
    #=> true
    s = "sometext%C3%B3+more+%26+andmore"
    #=> "sometext%C3%B3+more+%26+andmore"
    t = CGI::unescape s
    #=> "sometext\xC3\xB3 more & andmore"
    t.force_encoding 'utf-8' # telling Ruby that the string is UTF-8 encoded
    #=> "sometextó more & andmore"
    t.encode! 'windows-1252' # changing encoding to windows-1252
    #=> "sometext? more & andmore"
    # here you do whatever you want to do with windows-1252 encoded string
    

    Here

    PS.Ruby 1.8.7没有内置的编码支持,因此您必须使用一些外部库进行转换,例如 iconv :

    require 'iconv'
    #=> true
    require 'cgi'
    #=> true
    s = "sometext%C3%B3+more+%26+andmore"
    #=> "sometext%C3%B3+more+%26+andmore"
    t = CGI::unescape s
    #=> "sometext\303\263 more & andmore"
    Iconv.conv 'windows-1252', 'utf-8', t
    #=> "sometext\363 more & andmore"
    # \363 is ó in windows-1252 encoding
    
    推荐文章