代码之家  ›  专栏  ›  技术社区  ›  4532066

正在分析维基百科URL-无法打开流:HTTP请求失败

php
  •  -1
  • 4532066  · 技术社区  · 7 年前

    我正在处理一个简单的PHP页面,它执行以下操作:

    1. 从url querystring获取搜索字符串(例如,警官)
    2. 将搜索字符串附加到维基百科搜索URL(` https://en.wikipedia.org/w/index.php?search=police+officer ')
    3. 使用curl获取该搜索字符串的最终重定向URL
    4. 检查重定向的URL是否包含 index.php?search -如果有,什么都不做
    5. 否则,分解重定向的URL并从该URL获取最后一个值( Police_officer )
    6. 将该值附加到wikipedia url,该url返回wiki记录的JSON数据( https://en.wikipedia.org/api/rest_v1/page/summary/Police_officer )
    7. 使用 file_get_contents() 读取JSON数据并返回数据-例如 title

    出于某种原因,在这行代码上:

    $json = file_get_contents($url_json);
    

    其中$url_json

    https://en.wikipedia.org/api/rest_v1/page/summary/Santa_claus
    

    我得到这个错误:

    Warning: file_get_contents(https://en.wikipedia.org/api/rest_v1/page/summary/Santa_claus): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in C:\xampp\public_html\test.php on line 49
    

    但我可以在浏览器中转到该URL,并查看与此URL相同的数据类型:

    https://en.wikipedia.org/api/rest_v1/page/summary/Police_officer
    

    对于那个, file_get_contents 返回数据。

    我用了这个代码:

    function get_http_response_code($url) {
        $headers = get_headers($url);
        return substr($headers[0], 9, 3);
    }
    

    确认两页的响应代码均为200。

    这是我的基本测试代码:

    $var = $_GET['var'];
    $var = str_replace(" ", "+", $var);
    
    $url1 = "https://en.wikipedia.org/w/index.php?search=$var";
    
    echo "<hr /> url1: $url1 <hr />";
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url1);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $a = curl_exec($ch);
    $redirected_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
    
    echo "<hr /> url2: $redirected_url <hr />";
    
    $url_search = strpos($redirected_url, "index.php?search");
    
    echo "<hr /> url_search: $url_search <hr />";
    
    function get_http_response_code($url) {
        $headers = get_headers($url);
        return substr($headers[0], 9, 3);
    }
    
    $url_response = get_http_response_code($redirected_url);
    
    echo "<hr /> url_response: $url_response <hr />";
    
    if ($url_search > 0) {
    
        // do nothing
    
    } else {
    
        $tmp = explode('/', $redirected_url);
        $end = end($tmp);
    
        $url_json = "https://en.wikipedia.org/api/rest_v1/page/summary/$end";
    
        echo "<hr /> url_json: $url_json <hr />";
    
        $json = file_get_contents($url_json);
    
        if ($json) {
    
            $data = json_decode($json, TRUE);
    
            if ($data) {
                $wiki_page = $data['content_urls']['desktop']['page'];
                echo "<hr /> wiki_page: $wiki_page <hr />";
            }
    
        }
    
    }
    

    我错过了什么?

    1 回复  |  直到 7 年前
        1
  •  0
  •   4532066    7 年前

    修正了我用卷发代替文件获取内容

    $var = $_GET['var'];
    $var = str_replace(" ", "+", $var);
    
    $url1 = "https://en.wikipedia.org/w/index.php?search=$var";
    
    echo "<hr /> url1: $url1 <hr />";
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url1);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $a = curl_exec($ch);
    $redirected_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
    
    echo "<hr /> url2: $redirected_url <hr />";
    
    $url_search = strpos($redirected_url, "index.php?search");
    
    echo "<hr /> url_search: $url_search <hr />";
    
    function get_http_response_code($url) {
        $headers = get_headers($url);
        return substr($headers[0], 9, 3);
    }
    
    function file_get_contents_curl($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);  
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 3);     
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
        $html = curl_exec($ch);
        curl_close($ch);
        return $html;
    }
    
    $url_response = get_http_response_code($redirected_url);
    
    echo "<hr /> url_response: $url_response <hr />";
    
    if ($url_search > 0) {
    
        // do nothing
    
    } else {
    
        $tmp = explode('/', $redirected_url);
        $end = end($tmp);
    
        $url_json = "https://en.wikipedia.org/api/rest_v1/page/summary/$end";
    
        echo "<hr /> url_json: $url_json <hr />";
    
        //$json = file_get_contents($url_json);
    
        $json = file_get_contents_curl($url_json);
    
        echo "<hr /> json: $json <hr />";
    
        if ($json) {
    
            $data = json_decode($json, TRUE);
    
            echo "<hr /> data: $data <hr />";
    
            if ($data) {
                $wiki_page = $data['content_urls']['desktop']['page'];
                echo "<hr /> wiki_page: $wiki_page <hr />";
            }
    
        }
    
    }
    
    推荐文章