代码之家  ›  专栏  ›  技术社区  ›  Zeth

文本中出现的单词数(以及类似单词)

  •  3
  • Zeth  · 技术社区  · 7 年前

    我试着做一个函数,它可以发现文本中出现了多少次不同的单词。问题是,我想把相似的词(和昵称)捆绑在一起。

    $interesting_words = [
      'test' => [
        'number_of_occurances' => 0,
        'connected_words' => [
            'TEST',
            'TESTER',
            'TESTING'
          ]
        ],
      'foobar' => [
        'number_of_occurances' => 0,
        'connected_words' => [
            'FOO',
            'FOOBAR',
            'BAR'
          ]
        ]
    ]
    

    Lorem ipsum TEST sit amet,Concetetur测试仪elit。塞德在turpis酒后驾车。 维尼那蒂斯餐厅设施。格言,直径 美国东部时间。在测试中。舌苔。 佩伦茨克奥奇乌尔德纳。

    Number of occurances for 'test': 4
    Number of occurances for 'foobar': 3
    

    有没有一种聪明的方法可以在没有1.000.000 for循环的情况下做到这一点?

    如果有帮助的话,我在拉威尔做这个函数。

    3 回复  |  直到 7 年前
        1
  •  1
  •   Elementary    7 年前

    你可以用 str_word_count && array_count_values, 获取所有单词和事件 strtolower 要在性能和只计算出现次数时使搜索不区分大小写,请执行以下操作:

    $words=array_count_values(str_word_count(strtolower($str),1));
    foreach($interesting_words as $index=>&$details){
        foreach($details['connected_words'] as $key=>$similar){
            $details['number_of_occurances'] += $words[strtolower($similar)];
        }
    }           
    print_r($interesting_words );
    

    Array
    (
        [test] => Array
            (
                [number_of_occurances] => 4
                [connected_words] => Array
                    (
                        [0] => TEST
                        [1] => TESTER
                        [2] => TESTING
                    )
    
            )
    
        [foobar] => Array
            (
                [number_of_occurances] => 3
                [connected_words] => Array
                    (
                        [0] => FOO
                        [1] => FOOBAR
                        [2] => BAR
                    )
    
            )
    
    )
    
        2
  •  0
  •   Niklesh Raut    7 年前

    explode array_count_values 为了让它在下面的例子中工作,我删除了 . ,

    <?php
    $interesting_words = [
      'test' => [
        'number_of_occurances' => 0,
        'connected_words' => [
            'TEST',
            'TESTER',
            'TESTING'
          ]
        ],
      'foobar' => [
        'number_of_occurances' => 0,
        'connected_words' => [
            'FOO',
            'FOOBAR',
            'BAR'
          ]
        ]
    ];
    $str = 'Lorem ipsum TEST sit amet, consectetur TESTER elit. Sed in turpis dui. Maecenas venenatis FOOBAR facilisis. Quisque dictum, diam consequat mollis TESTING, orci tellus aliquet nisl, BAR molestie FOO augue at est. In TESTING vehicula lectus. Curabitur ac varius ligula. Pellentesque orci urdna.';
    $str = preg_replace('/[\.\,]/i','',$str);
    $str = strtolower($str);
    $str_arr = explode(" ",$str);
    $str_occurance_counts = array_count_values($str_arr);
    foreach($interesting_words as $k=>&$v){
      foreach($v['connected_words'] as $c=>$cVal){
        $v['number_of_occurances'] += $str_occurance_counts[strtolower($cVal)];
      }
    }
    print_r($interesting_words );
    ?>
    

    Live Demo Server1

    Live Demo Server2

        3
  •  0
  •   cool_benn    7 年前
    <?php
    
    
    $interesting_words = [
      'test' => [
        'number_of_occurances' => 0,
        'connected_words' => [
            'TEST',
            'TESTER',
            'TESTING'
          ]
        ],
      'foobar' => [
        'number_of_occurances' => 0,
        'connected_words' => [
            'FOO',
            'FOOBAR',
            'BAR'
          ]
        ]
    ];
    
    $testCount=$interesting_words['test']['number_of_occurances'];
    $foobarCount=$interesting_words['foobar']['number_of_occurances'];
    
    $text="Lorem ipsum TEST sit amet, consectetur TESTER elit. Sed in turpis dui. Maecenas venenatis 
    FOOBAR facilisis. Quisque dictum, diam consequat mollis TESTING, orci tellus aliquet nisl, BAR 
    molestie FOO augue at est. In TESTING vehicula lectus. Curabitur ac varius ligula. 
    Pellentesque orci urdna.";
    
    $arr= explode(" ", $text);
    $numberOfWords=count($arr);
    for($i=0;$i<$numberOfWords;$i++)
    {
        echo "<br/>";
    
        if(strpos($arr[$i],'TEST') !== false){
            $testCount=$testCount+1;
        }
    
        elseif(strpos($arr[$i],'TESTER') !== false){          
    
        $testCount=$testCount+1;
        }
        elseif(strpos($arr[$i],'TESTING') !== false){
    
        $testCount=$testCount+1;
        } 
    
       elseif(strpos($arr[$i],'FOO') !== false){
    
        $foobarCount=$foobarCount+1;
        }  
    
       elseif(strpos($arr[$i],'FOOBAR') !== false){
    
        $foobarCount=$foobarCount+1;
        } 
    
       elseif(strpos($arr[$i],'BAR') !== false){ 
    
        $foobarCount=$foobarCount+1;
        }   
    }
    echo "Number of occurances for 'test':".$testCount;
    echo "</br>";
    echo "Number of occurances for 'foobar':".$foobarCount;