代码之家  ›  专栏  ›  技术社区  ›  Community wiki

使用RJB(Ruby java Bridge)的OpenNLP中的java.lang.NullPointerException

  •  3
  • Community wiki  · 技术社区  · 3 年前

    我正试图使用开放式nlp Ruby gem通过RJB(Ruby Java Bridge)访问Java OpenNLP处理器。我不是Java程序员,所以我不知道如何解决这个问题。任何关于解决、调试、收集更多信息等方面的建议都将不胜感激。

    环境为Windows 8、Ruby 1.9.3p448、Rails 4.0.0、JDK 1.7.0-40 x586。宝石是rjb 1.4.8和louismullie/open nlp 0.1.4。为了记录在案,这个文件在JRuby中运行,但我在那个环境中遇到了其他问题,现在我更喜欢保留原生Ruby。

    简而言之,打开的nlp-gem失败了,缺少java.lang.NullPointerException和Ruby错误方法。我不知道为什么会发生这种情况,但在我看来,Jars文件的动态加载opennlp.tools.postag.POSTaggerME@1b5080a无法访问,可能是因为OpenNLP::Bindings::Utils.tagWithArrayList的设置不正确。OpenNLP::Bindings是Ruby。Utils及其方法是Java。Utils被认为是“默认”的Jars和Class文件,这可能很重要。

    我做错了什么,在这里?谢谢

    我正在运行的代码是直接从中复制的 github/open-nlp 。我的代码副本是:

    class OpennlpTryer
    
      $DEBUG=false
    
      # From https://github.com/louismullie/open-nlp
      # Hints: Dir.pwd; File.expand_path('../../Gemfile', __FILE__);
      # Load the module
      require 'open-nlp'
      #require 'jruby-jars'
    
    =begin
      # Alias "write" to "print" to monkeypatch the NoMethod write error
      java_import java.io.PrintStream
      class PrintStream
        java_alias(:write, :print, [java.lang.String])
      end
    =end
    
    =begin
      # Display path of jruby-jars jars...
      puts JRubyJars.core_jar_path # => path to jruby-core-VERSION.jar
      puts JRubyJars.stdlib_jar_path # => path to jruby-stdlib-VERSION.jar
    =end
      puts ENV['CLASSPATH']
    
      # Set an alternative path to look for the JAR files.
      # Default is gem's bin folder.
      # OpenNLP.jar_path = '/path_to_jars/'
    
      OpenNLP.jar_path = File.join(ENV["GEM_HOME"],"gems/open-nlp-0.1.4/bin/")
      puts OpenNLP.jar_path
      # Set an alternative path to look for the model files.
      # Default is gem's bin folder.
      # OpenNLP.model_path = '/path_to_models/'
    
      OpenNLP.model_path = File.join(ENV["GEM_HOME"],"gems/open-nlp-0.1.4/bin/")
      puts OpenNLP.model_path
      # Pass some alternative arguments to the Java VM.
      # Default is ['-Xms512M', '-Xmx1024M'].
      # OpenNLP.jvm_args = ['-option1', '-option2']
      OpenNLP.jvm_args = ['-Xms512M', '-Xmx1024M']
      # Redirect VM output to log.txt
      OpenNLP.log_file = 'log.txt'
      # Set default models for a language.
      # OpenNLP.use :language
      OpenNLP.use :english          # Make sure this is lower case!!!!
    
    # Simple tokenizer
    
      OpenNLP.load
    
      sent = "The death of the poet was kept from his poems."
      tokenizer = OpenNLP::SimpleTokenizer.new
    
      tokens = tokenizer.tokenize(sent).to_a
    # => %w[The death of the poet was kept from his poems .]
      puts "Tokenize #{tokens}"
    
    # Maximum entropy tokenizer, chunker and POS tagger
    
      OpenNLP.load
    
      chunker = OpenNLP::ChunkerME.new
      tokenizer = OpenNLP::TokenizerME.new
      tagger = OpenNLP::POSTaggerME.new
    
      sent = "The death of the poet was kept from his poems."
    
      tokens = tokenizer.tokenize(sent).to_a
    # => %w[The death of the poet was kept from his poems .]
      puts "Tokenize #{tokens}"
    
      tags = tagger.tag(tokens).to_a
    # => %w[DT NN IN DT NN VBD VBN IN PRP$ NNS .]
      puts "Tags #{tags}"
    
      chunks = chunker.chunk(tokens, tags).to_a
    # => %w[B-NP I-NP B-PP B-NP I-NP B-VP I-VP B-PP B-NP I-NP O]
      puts "Chunks #{chunks}"
    
    
    # Abstract Bottom-Up Parser
    
      OpenNLP.load
    
      sent = "The death of the poet was kept from his poems."
      parser = OpenNLP::Parser.new
      parse = parser.parse(sent)
    
    =begin
      parse.get_text.should eql sent
    
      parse.get_span.get_start.should eql 0
      parse.get_span.get_end.should eql 46
      parse.get_child_count.should eql 1
    =end
    
      child = parse.get_children[0]
    
      child.text # => "The death of the poet was kept from his poems."
      child.get_child_count # => 3
      child.get_head_index #=> 5
      child.get_type # => "S"
    
      puts "Child: #{child}"
    
    # Maximum Entropy Name Finder*
    
      OpenNLP.load
    
      # puts File.expand_path('.', __FILE__)
      text = File.read('./spec/sample.txt').gsub!("\n", "")
    
      tokenizer = OpenNLP::TokenizerME.new
      segmenter = OpenNLP::SentenceDetectorME.new
      puts "Tokenizer: #{tokenizer}"
      puts "Segmenter: #{segmenter}"
    
      ner_models = ['person', 'time', 'money']
      ner_finders = ner_models.map do |model|
        OpenNLP::NameFinderME.new("en-ner-#{model}.bin")
      end
      puts "NER Finders: #{ner_finders}"
    
      sentences = segmenter.sent_detect(text)
      puts "Sentences: #{sentences}"
    
      named_entities = []
    
      sentences.each do |sentence|
        tokens = tokenizer.tokenize(sentence)
        ner_models.each_with_index do |model, i|
          finder = ner_finders[i]
          name_spans = finder.find(tokens)
          name_spans.each do |name_span|
            start = name_span.get_start
            stop = name_span.get_end-1
            slice = tokens[start..stop].to_a
            named_entities << [slice, model]
          end
        end
      end
      puts "Named Entities: #{named_entities}"
    
    # Loading specific models
    # Just pass the name of the model file to the constructor. The gem will search for the file in the OpenNLP.model_path folder.
    
      OpenNLP.load
    
      tokenizer = OpenNLP::TokenizerME.new('en-token.bin')
      tagger = OpenNLP::POSTaggerME.new('en-pos-perceptron.bin')
      name_finder = OpenNLP::NameFinderME.new('en-ner-person.bin')
    # etc.
      puts "Tokenizer: #{tokenizer}"
      puts "Tagger: #{tagger}"
      puts "Name Finder: #{name_finder}"
    
    # Loading specific classes
    # You may want to load specific classes from the OpenNLP library that are not loaded by default. The gem provides an API to do this:
    
    # Default base class is opennlp.tools.
      OpenNLP.load_class('SomeClassName')
    # => OpenNLP::SomeClassName
    
    # Here, we specify another base class.
      OpenNLP.load_class('SomeOtherClass', 'opennlp.tools.namefind')
      # => OpenNLP::SomeOtherClass
    
    end
    

    失败的一行是第73行:(标记==正在处理的句子。)

      tags = tagger.tag(tokens).to_a  # 
    # => %w[DT NN IN DT NN VBD VBN IN PRP$ NNS .]
    

    tagger.tag调用打开的nlp/classes.rb第13行,也就是抛出错误的地方。那里的代码是:

    class OpenNLP::POSTaggerME < OpenNLP::Base
    
      unless RUBY_PLATFORM =~ /java/
        def tag(*args)
          OpenNLP::Bindings::Utils.tagWithArrayList(@proxy_inst, args[0])  # <== Line 13
        end
      end
    
    end
    

    此时抛出的Ruby错误是:“method_missing”:未知异常(NullPointerException)。调试时,我发现错误java.lang.NullPointerException.args[0]是正在处理的句子@代理指令是opennlp.tools.postag.POSTaggerME@1b5080a.

    OpenNLP::Bindings设置Java环境。例如,它设置要加载的Jars和这些Jars中的类。在第54行中,它为RJB设置了默认值,RJB应按如下方式设置OpenNLP::Bindings::Utils及其方法:

      # Add in Rjb workarounds.
      unless RUBY_PLATFORM =~ /java/
        self.default_jars << 'utils.jar'
        self.default_classes << ['Utils', '']
      end
    

    utils.jar和utils.java在CLASSPATH中,其他jar正在加载。它们正在被访问,这是经过验证的,因为如果其他Jar不存在,它们就会抛出错误消息。CLASSPATH是:

    .;C:\Program Files (x86)Java\jdk1.7.0_40\lib;C:\Program Files (x86)Java\jre7\lib;D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin
    

    应用程序Jars位于D:\BitNami\rubystack-1.9.3-12\rubb\lib\rubb\gems\1.9.1\gems\open-nlp-0.1.4\bin中,同样,如果它们不在那里,我会在其他Jars上收到错误消息。中的Jars和Java文件。。。\垃圾箱包括:

    jwnl-1.3.3.jar
    opennlp-maxent-3.0.2-incubating.jar
    opennlp-tools-1.5.2-incubating.jar
    opennlp-uima-1.5.2-incubating.jar
    utils.jar
    Utils.java
    

    Utils.java如下所示:

    import java.util.Arrays;
    import java.util.ArrayList;
    import java.lang.String;
    import opennlp.tools.postag.POSTagger;
    import opennlp.tools.chunker.ChunkerME;
    import opennlp.tools.namefind.NameFinderME; // interface instead?
    import opennlp.tools.util.Span;
    
    // javac -cp '.:opennlp.tools.jar' Utils.java
    // jar cf utils.jar Utils.class
    public class Utils {
    
        public static String[] tagWithArrayList(POSTagger posTagger, ArrayList[] objectArray) {
          return posTagger.tag(getStringArray(objectArray));
        }
        public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
          return nameFinder.find(getStringArray(tokens));
        }
        public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
          return chunker.chunk(getStringArray(tokens), getStringArray(tags));
        }
        public static String[] getStringArray(ArrayList[] objectArray) {
          String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
              return stringArray;
        }
    }
    

    因此,它应该定义tagWithArrayList并导入opennlp.tools.postagg.POSTagger

    正如预期的那样,工具Jar文件opennlp-tools-1.5.2-cubating.Jar包括postag/POSTagger和POSTaggerME类文件。

    错误消息为:

    D:\BitNami\rubystack-1.9.3-12\ruby\bin\ruby.exe -e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift) D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb
    .;C:\Program Files (x86)\Java\jdk1.7.0_40\lib;C:\Program Files (x86)\Java\jre7\lib;D:\BitNami\rubystack-1.9.3-12\ruby\lib\ruby\gems\1.9.1\gems\open-nlp-0.1.4\bin
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/bin/
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/bin/
    Tokenize ["The", "death", "of", "the", "poet", "was", "kept", "from", "his", "poems", "."]
    Tokenize ["The", "death", "of", "the", "poet", "was", "kept", "from", "his", "poems", "."]
    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `method_missing': unknown exception (NullPointerException)
        from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `tag'
        from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:73:in `<class:OpennlpTryer>'
        from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
        from -e:1:in `load'
        from -e:1:in `<main>'
    

    修改后的Utils.java:

    import java.util.Arrays;
    import java.util.Object;
    import java.lang.String;
    import opennlp.tools.postag.POSTagger;
    import opennlp.tools.chunker.ChunkerME;
    import opennlp.tools.namefind.NameFinderME; // interface instead?
    import opennlp.tools.util.Span;
    
    // javac -cp '.:opennlp.tools.jar' Utils.java
    // jar cf utils.jar Utils.class
    public class Utils {
    
        public static String[] tagWithArrayList(POSTagger posTagger, Object[] objectArray) {
          return posTagger.tag(getStringArray(objectArray));
        }f
        public static Object[] findWithArrayList(NameFinderME nameFinder, Object[] tokens) {
          return nameFinder.find(getStringArray(tokens));
        }
        public static Object[] chunkWithArrays(ChunkerME chunker, Object[] tokens, Object[] tags) {
          return chunker.chunk(getStringArray(tokens), getStringArray(tags));
        }
        public static String[] getStringArray(Object[] objectArray) {
          String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
              return stringArray;
        }
    }
    

    修改的错误消息:

    Uncaught exception: uninitialized constant OpennlpTryer::ArrayStoreException
        D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:81:in `rescue in <class:OpennlpTryer>'
        D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:77:in `<class:OpennlpTryer>'
        D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
    

    Utils.java修改为“import java.lang.Object;”时出现的修改错误:

    未捕获的异常:未初始化的常量OpennlpTryer::ArrayStoreException
    D: /BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:81:in`rescue in<类:OpennlpTryer>'
    D: /BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:77:in`<类:OpennlpTryer>'
    D: /BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in`<顶部(必需)>'
    

    从OpennlpTryer中删除的救援显示类中存在错误。rb:

    Uncaught exception: uninitialized constant OpenNLP::POSTaggerME::ArrayStoreException
        D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:16:in `rescue in tag'
        D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:13:in `tag'
        D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:78:in `<class:OpennlpTryer>'
        D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
    

    同样的错误,但删除了所有救援,所以它是“原生Ruby”

    Uncaught exception: unknown exception
        D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:15:in `method_missing'
        D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp/classes.rb:15:in `tag'
        D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:78:in `<class:OpennlpTryer>'
        D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
    

    修订后的Utils.java:

    import java.util.Arrays;
    import java.util.ArrayList;
    import java.lang.String;
    import opennlp.tools.postag.POSTagger;
    import opennlp.tools.chunker.ChunkerME;
    import opennlp.tools.namefind.NameFinderME; // interface instead?
    import opennlp.tools.util.Span;
    
    // javac -cp '.:opennlp.tools.jar' Utils.java
    // jar cf utils.jar Utils.class
    public class Utils {
    
        public static String[] tagWithArrayList(
          System.out.println("Tokens: ("+objectArray.getClass().getSimpleName()+"): \n"+objectArray);
          POSTagger posTagger, ArrayList[] objectArray) {
          return posTagger.tag(getStringArray(objectArray));
        }
        public static Object[] findWithArrayList(NameFinderME nameFinder, ArrayList[] tokens) {
          return nameFinder.find(getStringArray(tokens));
        }
        public static Object[] chunkWithArrays(ChunkerME chunker, ArrayList[] tokens, ArrayList[] tags) {
          return chunker.chunk(getStringArray(tokens), getStringArray(tags));
        }
        public static String[] getStringArray(ArrayList[] objectArray) {
          String[] stringArray = Arrays.copyOf(objectArray, objectArray.length, String[].class);
              return stringArray;
        }
    }
    

    我在Utils.class上运行了cavaj,我从util.jar中解压了它,这就是我发现的。它与Utils.java有很大的不同。两者都安装了开放的nlp 1.4.8 gem。我不知道这是否是问题的根本原因,但这个文件是它崩溃的核心,我们有很大的差异。我们应该使用哪一个?

    import java.util.ArrayList;
    import java.util.Arrays;
    import opennlp.tools.chunker.ChunkerME;
    import opennlp.tools.namefind.NameFinderME;
    import opennlp.tools.postag.POSTagger;
    
    public class Utils
    {
    
        public Utils()
        {
        }
    
        public static String[] tagWithArrayList(POSTagger postagger, ArrayList aarraylist[])
        {
            return postagger.tag(getStringArray(aarraylist));
        }
    
        public static Object[] findWithArrayList(NameFinderME namefinderme, ArrayList aarraylist[])
        {
            return namefinderme.find(getStringArray(aarraylist));
        }
    
        public static Object[] chunkWithArrays(ChunkerME chunkerme, ArrayList aarraylist[], ArrayList aarraylist1[])
        {
            return chunkerme.chunk(getStringArray(aarraylist), getStringArray(aarraylist1));
        }
    
        public static String[] getStringArray(ArrayList aarraylist[])
        {
            String as[] = (String[])Arrays.copyOf(aarraylist, aarraylist.length, [Ljava/lang/String;);
            return as;
        }
    }
    

    自2007年10月起使用的Utils.java,已编译并压缩为Utils.jar:

    导入java.util.Arrays;
    导入java.util.ArrayList;
    导入java.lang.String;
    导入opennlp.tools.postagg.POSTagger;
    导入opennlp.tools.chunker.ChunkerME;
    导入opennlp.tools.namefind.NameFinderME;//接口?
    导入opennlp.tools.util.Span;
    
    //javac-cp'。:opennlp.tools.jar的Utils.java
    //jar cf实用工具.jar实用工具类
    公共类Utils{
    
    public static String[]tagWithArrayList(POSTagger POSTagger,ArrayList[]objectArray){
    return posTagger.tag(getStringArray(objectArray));
    }
    公共静态对象[]findWithArrayList(NameFinderME nameFinder,ArrayList[]令牌){
    return nameFinder.find(getStringArray(tokens));
    }
    public static Object[]chunkWithArrays(ChunkerME chunker、ArrayList[]令牌、ArrayList[]标记){
    return chunker.chunk(getStringArray(tokens),getStringArray(tags));
    }
    public static String[]getStringArray(ArrayList[]objectArray){
    String[]stringArray=Arrays.copyOf(objectArray,objectArray.length,String[].class);
    return字符串数组;
    }
    }
    

    故障发生在第110行的BindIt::Binding::load_klass中,此处为:

    # Private function to load classes.
    # Doesn't check if initialized.
    def load_klass(klass, base, name=nil)
      base += '.' unless base == ''
      fqcn = "#{base}#{klass}"
      name ||= klass
      if RUBY_PLATFORM =~ /java/
        rb_class = java_import(fqcn)
        if name != klass
          if rb_class.is_a?(Array)
            rb_class = rb_class.first
          end
          const_set(name.intern, rb_class)
        end
      else
        rb_class = Rjb::import(fqcn)             # <== This is line 110
        const_set(name.intern, rb_class)
      end
    end
    

    消息如下,但就所标识的特定方法而言,它们是不一致的。每次运行都可以显示不同的方法,POSTagger、ChunkerME或NameFinderME中的任何一个。

    D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:110:in `import': opennlp/tools/namefind/NameFinderME (NoClassDefFoundError)
        from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:110:in `load_klass'
        from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:89:in `block in load_default_classes'
        from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:87:in `each'
        from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:87:in `load_default_classes'
        from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/bind-it-0.2.7/lib/bind-it/binding.rb:56:in `bind'
        from D:/BitNami/rubystack-1.9.3-12/ruby/lib/ruby/gems/1.9.1/gems/open-nlp-0.1.4/lib/open-nlp.rb:14:in `load'
        from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:54:in `<class:OpennlpTryer>'
        from D:/BitNami/rubystack-1.9.3-12/projects/RjbTest/app/helpers/opennlp_tryer.rb:1:in `<top (required)>'
        from -e:1:in `load'
        from -e:1:in `<main>'
    

    关于这些错误的有趣之处在于,它们源自OpennlpTryer行54,即:

      OpenNLP.load
    

    此时,OpenNLP启动RJB,RJB使用BindIt加载jar和类。这远远早于我在这个问题开始时看到的错误。然而,我忍不住认为这一切都是相关的。我真的一点也不理解这些错误的前后矛盾。

    我可以将日志记录函数添加到Utils.java中,在添加“import java.io.*”后对其进行编译并压缩。然而,由于这些错误,我将其删除,因为我不知道是否涉及它。我不认为是这样。然而,由于这些错误是在加载过程中发生的,因此无论如何都不会调用该方法,因此在那里进行日志记录不会有帮助。。。

    对于其他每个jar,加载jar,然后使用RJB导入每个类。Utils的处理方式不同,并被指定为“默认”。据我所知,执行Utils.class是为了加载自己的类吗?

    2007年10月更新:

    我想我就在这里。首先,我在替换Utils.java时遇到了一些问题,正如我今天早些时候所描述的那样。在我安装修复程序之前,这个问题可能需要解决。

    其次,我现在理解了POSTagger和POSTaggerME之间的区别,因为ME的意思是最大熵。测试代码试图调用POSTaggerME,但在我看来,Utils.java在实现时支持POSTagger。我尝试更改测试代码以调用POSTagger,但它说找不到初始值设定项。看看每一个的来源,我猜在这里,我认为POSTagger的存在只是为了支持实现它的POSTaggerME。

    来源是 opennlp-tools 文件opennlp-tools-1.5.2-cubating-sources.jar。

    我不明白乌蒂尔的全部原因是什么?为什么bindings.rb中提供的jars/classes还不够?这感觉像是一场糟糕的猴痘。我的意思是,看看bindings.rb首先做了什么:

      # Default JARs to load.
      self.default_jars = [
        'jwnl-1.3.3.jar',
        'opennlp-tools-1.5.2-incubating.jar',
        'opennlp-maxent-3.0.2-incubating.jar',
        'opennlp-uima-1.5.2-incubating.jar'
      ]
    
      # Default namespace.
      self.default_namespace = 'opennlp.tools'
    
      # Default classes.
      self.default_classes = [
        # OpenNLP classes.
        ['AbstractBottomUpParser', 'opennlp.tools.parser'],
        ['DocumentCategorizerME', 'opennlp.tools.doccat'],
        ['ChunkerME', 'opennlp.tools.chunker'],
        ['DictionaryDetokenizer', 'opennlp.tools.tokenize'],
        ['NameFinderME', 'opennlp.tools.namefind'],
        ['Parser', 'opennlp.tools.parser.chunking'],
        ['Parse', 'opennlp.tools.parser'],
        ['ParserFactory', 'opennlp.tools.parser'],
        ['POSTaggerME', 'opennlp.tools.postag'],
        ['SentenceDetectorME', 'opennlp.tools.sentdetect'],
        ['SimpleTokenizer', 'opennlp.tools.tokenize'],
        ['Span', 'opennlp.tools.util'],
        ['TokenizerME', 'opennlp.tools.tokenize'],
    
        # Generic Java classes.
        ['FileInputStream', 'java.io'],
        ['String', 'java.lang'],
        ['ArrayList', 'java.util']
      ]
    
      # Add in Rjb workarounds.
      unless RUBY_PLATFORM =~ /java/
        self.default_jars << 'utils.jar'
        self.default_classes << ['Utils', '']
      end
    
    2 回复  |  直到 12 年前
        1
  •  3
  •   R_G    12 年前

    请参阅末尾的完整代码以获得完整的已更正类.RB模块

    我今天遇到了同样的问题。我不太明白为什么要使用Utils类,所以我以以下方式修改了classes.rb文件:

    unless RUBY_PLATFORM =~ /java/
      def tag(*args)
        @proxy_inst.tag(args[0])
        #OpenNLP::Bindings::Utils.tagWithArrayList(@proxy_inst, args[0])
      end
    end
    

    这样,我就可以通过以下测试:

    sent   = "The death of the poet was kept from his poems."
    tokens = tokenizer.tokenize(sent).to_a
    # => %w[The death of the poet was kept from his poems .]
    tags   = tagger.tag(tokens).to_a
    # => ["prop", "prp", "n", "v-fin", "n", "adj", "prop", "v-fin", "n", "adj", "punc"]
    

    R_G编辑: 我测试了这个变化,它消除了错误。我将不得不做更多的测试,以确保结果符合预期。然而,按照同样的模式,我在classes.rb中也进行了以下更改:

    def chunk(tokens, tags)
      chunks = @proxy_inst.chunk(tokens, tags)
      # chunks = OpenNLP::Bindings::Utils.chunkWithArrays(@proxy_inst, tokens,tags)
      chunks.map { |c| c.to_s }
    end
    

    ...

    class OpenNLP::NameFinderME < OpenNLP::Base
      unless RUBY_PLATFORM =~ /java/
        def find(*args)
          @proxy_inst.find(args[0])
          # OpenNLP::Bindings::Utils.findWithArrayList(@proxy_inst, args[0])
        end
      end
    end
    

    这使得整个样本测试能够顺利执行。我稍后将提供有关验证结果的最新情况。

    最终编辑和更新课程。每个Space Pope和R_G的RB:

    事实证明,这个答案是所需解决方案的关键。然而,经过纠正后,结果并不一致。我们继续深入研究它,并按照RJB的规定在调用过程中实现了强类型。这将调用转换为使用_invoke方法,其中参数包括所需的方法、强类型和附加参数。安德烈的建议是解决问题的关键,因此值得称赞。这是完整的模块。它消除了对试图进行这些调用但失败的Utils.class的需要。我们计划为开放的nlp gem发布github拉取请求,以更新此模块:

    require 'open-nlp/base'
    
    class OpenNLP::SentenceDetectorME < OpenNLP::Base; end
    
    class OpenNLP::SimpleTokenizer < OpenNLP::Base; end
    
    class OpenNLP::TokenizerME < OpenNLP::Base; end
    
    class OpenNLP::POSTaggerME < OpenNLP::Base
    
      unless RUBY_PLATFORM =~ /java/
        def tag(*args)
            @proxy_inst._invoke("tag", "[Ljava.lang.String;", args[0])
        end
    
      end
    end
    
    
    class OpenNLP::ChunkerME < OpenNLP::Base
    
      if RUBY_PLATFORM =~ /java/
    
        def chunk(tokens, tags)
          if !tokens.is_a?(Array)
            tokens = tokens.to_a
            tags = tags.to_a
          end
          tokens = tokens.to_java(:String)
          tags = tags.to_java(:String)
          @proxy_inst.chunk(tokens,tags).to_a
        end
    
      else
    
        def chunk(tokens, tags)
          chunks = @proxy_inst._invoke("chunk", "[Ljava.lang.String;[Ljava.lang.String;", tokens, tags)
          chunks.map { |c| c.to_s }
        end
    
      end
    
    end
    
    class OpenNLP::Parser < OpenNLP::Base
    
      def parse(text)
    
        tokenizer = OpenNLP::TokenizerME.new
        full_span = OpenNLP::Bindings::Span.new(0, text.size)
    
        parse_obj = OpenNLP::Bindings::Parse.new(
        text, full_span, "INC", 1, 0)
    
        tokens = tokenizer.tokenize_pos(text)
    
        tokens.each_with_index do |tok,i|
          start, stop = tok.get_start, tok.get_end
          token = text[start..stop-1]
          span = OpenNLP::Bindings::Span.new(start, stop)
          parse = OpenNLP::Bindings::Parse.new(text, span, "TK", 0, i)
          parse_obj.insert(parse)
        end
    
        @proxy_inst.parse(parse_obj)
    
      end
    
    end
    
    class OpenNLP::NameFinderME < OpenNLP::Base
      unless RUBY_PLATFORM =~ /java/
        def find(*args)
          @proxy_inst._invoke("find", "[Ljava.lang.String;", args[0])
        end
      end
    end
    
        2
  •  3
  •   Josh    12 年前

    我认为你根本没有做错什么。你也是 not the only one with this problem 。它看起来像是一个窃听器 Utils .创建 ArrayList[] 在Java中没有多大意义——它在技术上是合法的,但它将是 ArrayList s、 a)这很奇怪,b)关于Java泛型的糟糕做法,c)不会正确地转换为 String[] 就像作者在 getStringArray() .

    考虑到实用程序的编写方式,以及OpenNLP确实期望接收 字符串[] 作为其输入 tag() 方法,我最好的猜测是原作者 Object[] 他们在哪里 排列列表[] Utils公司

    使现代化

    要输出到项目目录根目录中的文件,请尝试如下调整日志记录(我添加了另一行用于打印输入数组的内容):

    try {
        File log = new File("log.txt");
        FileWriter fileWriter = new FileWriter(log);
        BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
        bufferedWriter.write("Tokens ("+objectArray.getClass().getSimpleName()+"): \r\n"+objectArray.toString()+"\r\n");
        bufferedWriter.write(Arrays.toString(objectArray));
        bufferedWriter.close(); 
    }
    catch (Exception e) {
        e.printStackTrace();
    }