代码之家 › 专栏 › 技术社区 › pylearner

在文档中搜索短语

python

pylearner · 技术社区 · 6 年前

任务是匹配一个段落中的关键字,我所做的是我把段落分解成单词,并将它们放入一个列表中,然后使用另一个列表中的搜索词进行匹配。

数据:

Automatic Product Title Tagging
Aim: To automate the process of product title tagging using manually tagged data. 

ROUTE OPTIMIZATION â Spring Clean
Aim:  Minimizing the overall travel time using optimization techniques. 

CUSTOMER SEGMENTATION:
Aim:  Develop an engine which segments and provides the score for
      customers based on their behavior and analyze their purchasing pattern.

s = ['tagged', 'product title',  'tagging', 'analyze']

skills = []
for word in data.split():

    print(word)    
    word.lower()
    if word in s:

        skills.append(word)
skills1 = list(set(skills))

print(skills1)

['tagged', 'tagging', 'analyze']

当我使用split函数时,每个单词都被拆分,因此我无法检测单词 product title

如果有人能帮忙,我将不胜感激。

4 回复 | 直到 6 年前

icedwater PedroMorgan 6 年前

迭代列表 s 并检查字符串中是否有元素。

演示:

data = """
 Automatic Product Title Tagging  
 Aim: To automate the process of product title tagging using manually tagged data.
 ROUTE OPTIMIZATION â Spring Clean
 Aim:  Minimizing the overall travel time using optimization techniques.
 CUSTOMER SEGMENTATION:
 Aim:  Develop an engine which segments and provides the score for  
       customers based on their behavior and analyze their purchasing
       pattern. 
"""
s = ['tagged', 'product title',  'tagging', 'analyze']
data = data.lower()

skills = []
for i in s:
    if i.lower() in data:
        skills.append(i)
print(skills)

skills = [i for i in s if i.lower() in data]

输出:

['tagged', 'product title', 'tagging', 'analyze']

Leo K 6 年前

你要搜索的不是“关键字”,而是短语。一种解决方案是使用正则表达式搜索(一个简单的 substring is in text 构造不会很好地工作,因为当给定“产品标题”时,它可能会 byproduct titles

这应该做到:

import re
[ k for k in skills if re.search( r'\b' + k + r'\b', data, flags=re.IGNORECASE ) ]

guroosh 6 年前

2) 如果拆分,则可以在i和i+1索引中搜索匹配项

wailinux 6 年前

“目标:”必须在“数据”的每行中所以我会找到这个词的索引(“Aim:”)

p = "Automatic Product Title Tagging  Aim: To automate the process of product title tagging using manually tagged data."
index = p.find("Aim:") # 33
print(p[33:])
output:
"Aim: To automate the process of product title tagging using manually tagged data."
w_lenght = len("Aim:") # 4 : for exclude word "Aim:"
print(p[37:])
output:
" To automate the process of product title tagging using manually tagged data."

例子:

s = ['tagged', 'product title',  'tagging', 'analyze']
skills = []
for line in data.split("\n"):
    index = line.find("Aim:") + len("Aim:") #4
    if index != -1:
    for word in line[index:].split():
        if word.lower() in s:
            skills.append(word)
            print(word)

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

5 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

5 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

5 月前

user29715306 · from_users=和chats=电视节目中的差异

5 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

6 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

6 月前

prayner · 更新嵌套字典包含列表中的项

6 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

6 月前

Dave · 如何在for循环中修改列表值

6 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

6 月前