代码之家  ›  专栏  ›  技术社区  ›  TheAdam122

重写“curl…”的正确方法到perl

  •  2
  • TheAdam122  · 技术社区  · 7 年前

    我写了一个程序,请求网页的源和响应头,现在我需要它来跨平台运行。我使用外部命令curl(在linux中)来实现它。我得到如下来源:

    #!/usr/bin/perl -w
    
    use strict;
    
    #declaring variables here#
    
    my $result = `curl 'https://$host$request' -H 'Host: $host' -H 'User-Agent: $useragent' -H 'Accept: $accept' -H 'Accept-Language: $acceptlanguage' --compressed -H 'Cookie: $cookie' -H 'DNT: $dnt' -H 'Connection: $connection' -H 'Upgrade-Insecure-Requests: $upgradeinsecure' -H 'Cache-Control: $cachecontrol'`;
    print "$result\n";
    

    响应标题如下:

    #!/usr/bin/perl -w
    
    use strict;
    
    #declaring variables here#
    
    my $result = `curl -I 'https://$host$request' -H 'Host: $host' -H 'User-Agent: $useragent' -H 'Accept: $accept' -H 'Accept-Language: $acceptlanguage' --compressed -H 'Cookie: $cookie' -H 'DNT: $dnt' -H 'Connection: $connection' -H 'Upgrade-Insecure-Requests: $upgradeinsecure' -H 'Cache-Control: $cachecontrol'`;
    print "$result\n";
    

    这些操作很好,但我需要在perl中调用它们,而不是作为外部命令。 我用 LWP::UserAgent 要获取源,请执行以下操作:

    #!/usr/bin/perl -w
    
    use strict;
    use LWP::UserAgent;
    
    #declaring variables here#
    
    my $ua = LWP::UserAgent->new;
    my $req = HTTP::Request->new(GET => "https://$host$request HTTP/1.1");
    $req->header('Host' => "$host");
    $req->header('User-Agent' => "$useragent");
    $req->header('Accept' => "$accept");
    $req->header('Accept-Language' => "$acceptlanguage");
    $req->header('Accept-Encoding' => "$acceptencoding");
    $req->header('Cookie' => "$cookie");
    $req->header('DNT' => "$dnt");
    $req->header('Connection' => "$connection");
    $req->header('Upgrade-Insecure-Requests' => "$upgradeinsecure");
    $req->header('Cache-Control' => "$cachecontrol");
    
    my $resp = $ua->request($req);
    if ($resp->is_success) {
        my $message = $resp->decoded_content;
        print "$message\n";
    }
    

    这有时运行正常,但有时 decoded_content 不返回任何内容,我确实收到响应,我可以使用 content ,但它仍然是编码的。

    并使用 LWP::用户代理 不可能,因此我使用 Net::HTTP :

    #!/usr/bin/perl -w
    
    use strict;
    use Net::HTTP;
    
    #declaring variables here#
    
    my $s = Net::HTTP->new(Host => "$host") || die $@;
    $s->write_request(GET => "$request", 'Host' => "$host", 'User-Agent' => "$useragent", 'Accept' => "$accept", 'Accept-Language' => "$acceptlanguage", 'Accept-Encoding' => "$acceptencoding", 'Cookie' => "$cookie", 'DNT' => "$dnt", 'Connection' => "$connection", 'Upgrade-Insecure-Requests' => "$upgradeinsecure", 'Cache-Control' => "$cachecontrol");
    
    my @headers;
    
    while(my $line = <$s>) {
        last unless $line =~ /\S/;
        push @headers, $line;
    }
    print @headers;
    

    这又回来了

    HTTP/1.1 302 Found
    Content-Type: text/html; charset=UTF-8
    Connection: close
    Content-Length: 0
    

    我的语法是否有问题?我是否使用了错误的工具?我知道 WWW::Curl::Easy 可以同时请求源和头,但我不知道如何将变量传递给它的请求。有没有人能告诉我问题是什么,或者用相同的变量正确地重写这些请求 WWW:Curl::Easy ? 如果能使用 WWW::Curl::简单 . 提前谢谢。

    1 回复  |  直到 7 年前
        1
  •  2
  •   Kjetil S.    7 年前

    使用LWP,可以通过几种方式获取响应头。此处演示:

    use LWP::UserAgent;
    my($host,$request) = ('example.com', '/my/request');
    my @header=( [Host         => $host],
                 ['User-Agent' => 'James Bond 2.0'],
                 [Accept       => 'text/plain'],
                 [Cookie       => 'cookie=x'],
               );
    my $ua = LWP::UserAgent->new;
    my $req = HTTP::Request->new(GET => "https://$host$request");  #dont add HTTP/1.1
    $req->header(@$_) for @header;
    my $resp = $ua->request($req);
    
    if ($resp->is_success) {
    
        my %h; $resp->headers->scan( sub{ $h{shift()}=shift() } );
        printf "Header name: %-30s  Value: %-30s\n", $_, $h{$_} for sort keys %h;
    
        print "\n<<<".$resp->headers()->as_string.">>>\n\n"; #all header lines in one big string
    
        print $resp->header('Content-Type'),"\n\n";  #get one specific header line
    
        my $content = $resp->decoded_content;
        print "$content\n";
    }
    

    注意:“HTTP/1.1”不应是后面字符串的一部分 GET => .

    和呼叫 curl 作为一个子流程,您不需要调用它两次。您可以使用 -i 这样地:

    my $response = ` curl -s -i "http://somewhere.com/path" -H 'User-Agent: Yes' `;
    my($headers,$content) = split /\cM?\cJ\cM?\cJ/, $response, 2;
    print "Headers: <<<$headers>>>\n\n";
    print "Content: <<<$content>>>\n\n";