代码之家 › 专栏 › 技术社区 › user2148249

Mahout群集未读取输入

mahout cluster-computing machine-learning

0

user2148249 · 技术社区 · 13 年前

嗨,伙计们,我正在尝试为k均值聚类算法运行一个聚类转储。它不起作用。有什么想法吗?这是来自psudo模式集群上的Mahout in Action的示例。

可视化集群转储输出或K-mean输出的任何工具或手段。

[186946@01HW534064 bin]$ ./mahout clusterdump -dt sequencefile -d /home/186946/reuters-vectors/dictionary.file-0-i reuters-fkmeans-clusters/clusters-3 -o /home/186946/clusters.txt -b 10 -n 10
Running on hadoop, using HADOOP_HOME=/home/186946/hadoop-0.20.2-cdh3u5
No HADOOP_CONF_DIR set, using /home/186946/hadoop-0.20.2-cdh3u5/src/conf 
MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar
MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar
13/03/08 17:26:11 ERROR common.AbstractJob: Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
 -archives <paths>              comma separated archives to be unarchived
                                on the compute machines.
 -conf <configuration file>     specify an application configuration file
 -D <property=value>            use value for given property
 -files <paths>                 comma separated files to be copied to the
                                map reduce cluster
 -fs <local|namenode:port>      specify a namenode
 -jt <local|jobtracker:port>    specify a job tracker
 -libjars <paths>               comma separated jar files to include in
                                the classpath.
 -tokenCacheFile <tokensFile>   name of the file with the tokens
Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific    
Options:                                                                        
Usage:                                                                          
 [--seqFileDir <seqFileDir> --output <output> --substring <substring>           
--numWords <numWords> --pointsDir <pointsDir> --dictionary <dictionary>         
--dictionaryType <dictionaryType> --help --tempDir <tempDir> --startPhase       
<startPhase> --endPhase <endPhase>]                                             
Job-Specific Options:                                                           
  --seqFileDir (-s) seqFileDir             The directory containing Sequence    
                                           Files for the Clusters               
  --output (-o) output                     Optional output directory. Default   
                                           is to output to the console.         
  --substring (-b) substring               The number of chars of the           
                                           asFormatString() to print            
  --numWords (-n) numWords                 The number of top terms to print     
  --pointsDir (-p) pointsDir               The directory containing points      
                                           sequence files mapping input vectors 
                                           to their cluster.  If specified,     
                                           then the program will output the     
                                           points associated with a cluster     
  --dictionary (-d) dictionary             The dictionary file                  
  --dictionaryType (-dt) dictionaryType    The dictionary file type             
                                           (text|sequencefile)                  
  --help (-h)                              Print out help                       
  --tempDir tempDir                        Intermediate output directory        
  --startPhase startPhase                  First phase to run                   
  --endPhase endPhase                      Last phase to run                    
13/03/08 17:26:11 INFO driver.MahoutDriver: Program took 133 ms

谢谢

1 回复 | 直到 9 年前

1

0

tuxdna 12 年前

mahout clusterdump \
-d output/vectors/dictionary.file-0 \
-dt sequencefile \
-i output/clusters/clusters-2-final/part-00000 \
-n 20 \
-b 100 \
-o cdump.txt \
-p output/clusters/clusteredPoints/

只需在文本编辑器中复制粘贴上面的所有行,将您的参数放在 -d , -dt , -i , -p 像我一样小心。

p.s路径来自HDFS。