代码之家 › 专栏 › 技术社区 › Tony

避免SocketTimeoutException的最佳方法是什么?

jsoup url sockets java

Tony · 技术社区 · 11 年前

我正在编写一个小应用程序,它可以扫描所有页面 URL 使用 Depth-first search 。所以我应该经常联系。之后 n 我通常正在捕捉的页面 SocketTimeoutException 我的应用程序崩溃。那么,避免这种情况的最佳方法是什么?可能会增加 time out 还是什么? 这是我使用递归的方式:

public static ArrayList<String> getResponse(String url) throws IOException {
        ArrayList<String> resultList = new ArrayList<>();
        try {
            Document doc = Jsoup.connect(url).get();
            Elements links = doc.select("a");
            int j = 0;

            for (int i = 0; i < links.size(); i++) {
                if (links.get(i).attr("abs:href").contains("http")) {
                    resultList.add(j, links.get(i).attr("abs:href"));
                    j++;
                }
            }
            return resultList;
        } catch (HttpStatusException e) {

            resultList.add(0, "");
            return resultList;
        } catch (SocketTimeoutException e) {
            getResponse(url);
        }
        return resultList;
    }

它应该发送请求,直到没有 套接字超时异常 .我说得对吗?

2 回复 | 直到 11 年前

luksch 11 年前

我会稍微改变一下常规:

public static ArrayList<String> getResponse(String url) throws IOException {
    return getResponse(ulr, 3);
} 

private static ArrayList<String> getResponse(String url, int retryCount) throws IOException {
    ArrayList<String> resultList = new ArrayList<>();
    if (retryCount <= 0){
        //fail gracefully
        resultList.add(0, "");
        return resultList;
    }
    retryCount--;
    try {
        Document doc = Jsoup.connect(url).timeout(10000).get();
        Elements links = doc.select("a");
        int j = 0;

        for (int i = 0; i < links.size(); i++) {
            if (links.get(i).attr("abs:href").contains("http")) {
                resultList.add(j, links.get(i).attr("abs:href"));
                j++;
            }
        }
        return resultList;
    } catch (HttpStatusException e) {

        resultList.add(0, "");
        return resultList;
    } catch (SocketTimeoutException e) {

        getResponse(url, retryCount);
    }
    return resultList;
}

这将每次连接的超时设置为10秒。超时(0)将永远等待。这是多么危险,因为你可能永远不会完成你的日常生活。这取决于您对实际可以访问URL的确定程度。

第二种机制可以避免不确定递归,这可能就是程序失败的原因。交出一个计数器,只有当计数器大于0时才重试。

Jerry Andrews 11 年前

有几件事看起来很奇怪——没有太深入地挖掘。(a) 你用“j”做什么?(b) 看起来您正在为每个请求打开一个新的套接字(Jthop.connect(url)),但看起来您从未关闭过套接字。给定递归,您很可能会同时打开大量套接字,最早的套接字肯定会超时并最终关闭。因此,我建议首先:

完成后,关闭您使用的每个套接字,并且
考虑以某种方式限制搜索的深度,这样就不会出现成千上万个打开的套接字。大多数系统不能同时有效地处理几百个以上的开放套接字。

我认为您需要在连接对象上调用“execute()”来实际执行“get()”;不确定这是否与你的问题有关。