代码之家  ›  专栏  ›  技术社区  ›  Tyler Carter

从字符串中获取URL

  •  5
  • Tyler Carter  · 技术社区  · 16 年前

    有一段时间,我一直在搜索一个代码,用PHP从字符串中提取URL。我基本上是想从一条消息中得到一个缩短的URL,然后再做一个HEAD请求来找到实际的链接。

    有人有从字符串返回URL的代码吗?

    为幽灵狗编辑:

    下面是我正在分析的示例:

    $test = "I am testing this application for http://test.com YAY!";
    

    $regex = '$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i';
    
    preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER);
    $A = $result[0];
    
    foreach($A as $B)
    {
        $URL = GetRealURL($B);
        echo "$URL<BR>";    
    }
    
    
    function GetRealURL( $url ) 
    { 
        $options = array(
            CURLOPT_RETURNTRANSFER => true,
            CURLOPT_HEADER         => true,
            CURLOPT_FOLLOWLOCATION => true,
            CURLOPT_ENCODING       => "",
            CURLOPT_USERAGENT      => "spider",
            CURLOPT_AUTOREFERER    => true,
            CURLOPT_CONNECTTIMEOUT => 120,
            CURLOPT_TIMEOUT        => 120,
            CURLOPT_MAXREDIRS      => 10,
        ); 
    
        $ch      = curl_init( $url ); 
        curl_setopt_array( $ch, $options ); 
        $content = curl_exec( $ch ); 
        $err     = curl_errno( $ch ); 
        $errmsg  = curl_error( $ch ); 
        $header  = curl_getinfo( $ch ); 
        curl_close( $ch ); 
        return $header['url']; 
    } 
    

    有关详细信息,请参见答案。

    2 回复  |  直到 11 年前
        1
  •  10
  •   dbr    16 年前

    此代码可能会有所帮助(请参阅MadTechie的最新帖子):

    http://www.phpfreaks.com/forums/index.php/topic,245248.msg1146218.html#msg1146218

    <?php
    $string = "some random text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988";
    
    $regex = '$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i';
    
    preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER);
    $A = $result[0];
    
    foreach($A as $B)
    {
       $URL = GetRealURL($B);
       echo "$URL<BR>";   
    }
    
    
    function GetRealURL( $url ) 
    { 
       $options = array(
          CURLOPT_RETURNTRANSFER => true,
          CURLOPT_HEADER         => true,
          CURLOPT_FOLLOWLOCATION => true,
          CURLOPT_ENCODING       => "",
          CURLOPT_USERAGENT      => "spider",
          CURLOPT_AUTOREFERER    => true,
          CURLOPT_CONNECTTIMEOUT => 120,
          CURLOPT_TIMEOUT        => 120,
          CURLOPT_MAXREDIRS      => 10,
       ); 
    
       $ch      = curl_init( $url ); 
       curl_setopt_array( $ch, $options ); 
       $content = curl_exec( $ch ); 
       $err     = curl_errno( $ch ); 
       $errmsg  = curl_error( $ch ); 
       $header  = curl_getinfo( $ch ); 
       curl_close( $ch ); 
       return $header['url']; 
    }  
    
    ?>
    
        2
  •  2
  •   gahooa    16 年前

    比如:

    $matches = array();
    preg_match_all('/http:\/\/[a-zA-Z0-9.-]+\/[a-zA-Z0-9.-]+/', $text, $matches);
    print_r($matches);
    

    您需要调整regexp以获得您想要的结果。

    要获取URL,考虑一些简单的事情:

    curl -I http://url.com/path | grep Location: | awk '{print $2}'

    推荐文章