代码之家 › 专栏 › 技术社区 › Teslaturing

在python中从结果中获取指定单词后的动态数字并存储在数据库中

web-crawler python

Teslaturing · 技术社区 · 7 年前

嗨,我想得到结果中“citedby”后面的数字。每次搜索的数字都会发生变化`

import scholarly
import re

m = next(scholarly.search_pubs_query('Perception of physical stability and center of mass of 3D objects'))
n = re.search('citedby (\d+)', m , re.IGNORECASE)`

我用学术方法查找引文,并存储在m变量中。现在我想得到“citedby”后面的数字:34567。示例现在,我想在“citedby”之后获取34567:。请帮帮我,我是python新手。添加了示例结果。 Result ,则, error

1 回复 | 直到 7 年前

Austin 7 年前

您可以尝试使用 findall 它以字符串列表的形式返回Patterninstring的所有非重叠匹配项。

import re
m = "Example text 'citedby':34567"  # just an example.
n = re.findall(r"'citedby':\s?(\d+)", m, re.IGNORECASE)
print(' '.join(n))  # 34567

对于您的具体问题:

import scholarly
import re

m = next(scholarly.search_pubs_query('Perception of physical stability and center of mass of 3D objects'))
n = re.findall(r"'citedby':\s?(\d+)", str(m), re.IGNORECASE)
print(''.join(n))  # 13

笔记 :此处 m 是 <class 'scholarly.Publication'> 对象 str(m) 成功了 <class 'str'> 。 芬德尔 仅适用于字符串。

推荐文章

chans.best · StormCrawler和Hortonworks 1.1.0.2.6.4.0-91之间的Commons日志版本冲突

7 年前

Tae · Python3中方法has\u key的替换

7 年前

Jonas Pohlmann · Stormcrawler没有为elasticsearch获取/索引页面

7 年前

Teslaturing · 在python中从结果中获取指定单词后的动态数字并存储在数据库中

7 年前

Vega · 如何从DOM获取所有链接?

7 年前

Vivek Kumar Sinha · 产生刮痕。请求()无法正常工作以爬网下一页

7 年前

bob9123 · 为什么状态和索引中的文档计数不同?

7 年前

Konstantin · crawler中未启动回调函数,scrapy

7 年前

SY9 · 刮:已爬网并刮取0个项目

7 年前

Vani4ka · Crawler4j、Jsoup和JavaScript:提取用JavaScript修改的属性值

7 年前