下面是我用来获取维基百科特定主题的“内容”部分的代码。我还需要帮助获得文本的层次结构,然后将其添加到地图中。E、 g.如果我们搜索咖啡,我们会得到:
1 Etymology
2 History
2.1 Legendary accounts
2.2 Historical transmission
3 Biology
4 Cultivation
4.1 Ecological effects
5 Production
6 Processing
6.1 Roasting
6.2 Grading roasted beans
6.3 Roast characteristics
6.4 Decaffeination
6.5 Storage
我想保留层次结构(4,4.1),即父节点和相应的子节点以及文本,并将它们作为键值对添加到hashmap中。如何使用我的代码做到这一点?
public static void getWikiNodesForTopic(String url) throws IOException {
Response res = Jsoup.connect(url)
.execute();
String html = res.body();
Document doc = Jsoup.parseBodyFragment(html);
Elements elements = doc.body().select(".toctext");
for (Element element : elements) {
if (element.text().contentEquals("See also") || element.text().contentEquals("References") || element.text().contentEquals("Bibliography") || element.text().contentEquals("External links") || element.text().contentEquals("Bibliography"))
continue;
else
//System.out.println(element.select(".tocnumber"));
System.out.println(element.ownText());
}
}