代码之家  ›  专栏  ›  技术社区  ›  Sam

hpricot编码问题

  •  3
  • Sam  · 技术社区  · 15 年前

    在Ruby1.9中尝试使用hpricot废弃网页时,出现以下编码错误:

    Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
    

    我可以通过执行以下操作来重现错误:

    ska:~ sam$ rvm 1.9.2@hpricot
    ska:~ sam$ ruby -v
    ruby 1.9.2dev (2010-05-31 revision 28117) [x86_64-darwin10.4.0]
    ska:~ sam$ gem list
    
    *** LOCAL GEMS ***
    
    hpricot (0.8.2)
    rake (0.8.7)
    rdoc (2.5.8)
    ska:~ sam$ irb
    ruby-1.9.2-preview3 > require 'rubygems'
     => false 
    ruby-1.9.2-preview3 > require 'hpricot'
     => true 
    ruby-1.9.2-preview3 > require 'open-uri'
     => true 
    
    ruby-1.9.2-preview3 > page = Hpricot(open('http://www.imdb.com/title/tt0435761/'))
     => #<Hpricot::Doc "\n" {doctype "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">"} "\n" {elem <html xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml"> "\n" {elem <head> "\n" __TRUNCATED__
    
    
    ruby-1.9.2-preview3 > page.search("//div[@class = 'info-content").collect { |f| f.inner_text }.join(', ')
    
    Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `join'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `block in inner_text'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `map'
            from /Users/sam/.rvm/gems/ruby-1.9.2-preview3@hpricot/gems/hpricot-0.8.2/lib/hpricot/traverse.rb:160:in `inner_text'
            from (irb):5:in `block in irb_binding'
            from (irb):5:in `collect'
            from (irb):5
            from /Users/sam/.rvm/rubies/ruby-1.9.2-preview3/bin/irb:17:in `<main>'ruby-1.9.2-preview3 > 
    
    2 回复  |  直到 14 年前
        1
  •  1
  •   Sam    14 年前

    使用 Nokogiri

        2
  •  0
  •   the Tin Man    15 年前

    尝试将xpath更改为:

        page.search("//div[@class = 'info-content")
    

    到:

        page.search('//div[@class=info-content]')
    

    在IRB中运行一个样本给了我:

    ruby-1.9.1-p378 > page.search("//div[@class=info-content]").map{ |i| i.inner_text }[0]
     => "Down 66% in popularity this week. See why on IMDbPro."