代码之家  ›  专栏  ›  技术社区  ›  user2148249

Mahout群集未读取输入

  •  0
  • user2148249  · 技术社区  · 13 年前

    嗨,伙计们,我正在尝试为k均值聚类算法运行一个聚类转储。它不起作用。有什么想法吗?这是来自psudo模式集群上的Mahout in Action的示例。

    可视化集群转储输出或K-mean输出的任何工具或手段。

    [186946@01HW534064 bin]$ ./mahout clusterdump -dt sequencefile -d /home/186946/reuters-vectors/dictionary.file-0-i reuters-fkmeans-clusters/clusters-3 -o /home/186946/clusters.txt -b 10 -n 10
    Running on hadoop, using HADOOP_HOME=/home/186946/hadoop-0.20.2-cdh3u5
    No HADOOP_CONF_DIR set, using /home/186946/hadoop-0.20.2-cdh3u5/src/conf 
    MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar
    MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar
    13/03/08 17:26:11 ERROR common.AbstractJob: Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific Options:
    usage: <command> [Generic Options] [Job-Specific Options]
    Generic Options:
     -archives <paths>              comma separated archives to be unarchived
                                    on the compute machines.
     -conf <configuration file>     specify an application configuration file
     -D <property=value>            use value for given property
     -files <paths>                 comma separated files to be copied to the
                                    map reduce cluster
     -fs <local|namenode:port>      specify a namenode
     -jt <local|jobtracker:port>    specify a job tracker
     -libjars <paths>               comma separated jar files to include in
                                    the classpath.
     -tokenCacheFile <tokensFile>   name of the file with the tokens
    Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific    
    Options:                                                                        
    Usage:                                                                          
     [--seqFileDir <seqFileDir> --output <output> --substring <substring>           
    --numWords <numWords> --pointsDir <pointsDir> --dictionary <dictionary>         
    --dictionaryType <dictionaryType> --help --tempDir <tempDir> --startPhase       
    <startPhase> --endPhase <endPhase>]                                             
    Job-Specific Options:                                                           
      --seqFileDir (-s) seqFileDir             The directory containing Sequence    
                                               Files for the Clusters               
      --output (-o) output                     Optional output directory. Default   
                                               is to output to the console.         
      --substring (-b) substring               The number of chars of the           
                                               asFormatString() to print            
      --numWords (-n) numWords                 The number of top terms to print     
      --pointsDir (-p) pointsDir               The directory containing points      
                                               sequence files mapping input vectors 
                                               to their cluster.  If specified,     
                                               then the program will output the     
                                               points associated with a cluster     
      --dictionary (-d) dictionary             The dictionary file                  
      --dictionaryType (-dt) dictionaryType    The dictionary file type             
                                               (text|sequencefile)                  
      --help (-h)                              Print out help                       
      --tempDir tempDir                        Intermediate output directory        
      --startPhase startPhase                  First phase to run                   
      --endPhase endPhase                      Last phase to run                    
    13/03/08 17:26:11 INFO driver.MahoutDriver: Program took 133 ms
    

    谢谢

    1 回复  |  直到 9 年前
        1
  •  0
  •   tuxdna    12 年前
    mahout clusterdump \
    -d output/vectors/dictionary.file-0 \
    -dt sequencefile \
    -i output/clusters/clusters-2-final/part-00000 \
    -n 20 \
    -b 100 \
    -o cdump.txt \
    -p output/clusters/clusteredPoints/
    

    只需在文本编辑器中复制粘贴上面的所有行,将您的参数放在 -d , -dt , -i , -p 像我一样小心。

    p.s路径来自HDFS。

    推荐文章