a tag
具有
XPath
lxml
from lxml.html import etree, fromstring
reference_titles = root.xpath("//table[@id='vulnrefstable']/tr/td")
for tree in reference_titles:
a_tag = tree.xpath('a/@href')[0]
title = tree.xpath('a/following-sibling::text()')
这适用于此HTML:
<tr>
<td class="r_average">
<a href="http://somelink.com" target="_blank" title="External url">
http://somelink.com
</a>
<br/> SECUNIA 27633
</td>
</tr>
<tr>
<td class="r_average">
<a href="http://somelink.com" target="_blank" title="External url">
http://somelink.com
</a>
<br/> SECUNIA 27633 <i>Release Date:</i> tomorrow
</td>
</tr>
SECUNIA 27633 tomorrow
SECUNIA 27633 Release Date: tomorrow
node()
而不是
text()
返回其中的所有节点。所以我用这个创建最后一个字符串
for
title = tree.xpath('a/following-sibling::node()')