300字范文,内容丰富有趣,生活中的好帮手!
300字范文 > python 模糊匹配字符串 excel_如何使用Pandas在excel文件上进行模糊匹配?

python 模糊匹配字符串 excel_如何使用Pandas在excel文件上进行模糊匹配?

时间:2023-03-31 13:20:36

相关推荐

python 模糊匹配字符串 excel_如何使用Pandas在excel文件上进行模糊匹配?

我认为你不需要在熊猫身上这样做。这是我草率的解决方案,但它通过字典获得您想要的输出。在from fuzzywuzzy import process

df = pd.DataFrame([

['0016F00001c7GDZQA2', 'Daniela Abriani'],

['0016F00001c7GPnQAM', 'Daniel Abriani'],

['0016F00001c7JRrQAM', 'Nisha Well'],

['0016F00001c7Jv8QAE', 'Katherine'],

['0016F00001c7cXiQAI', 'Katerine'],

['0016F00001c7dA3QAI', 'Katherin'],

['0016F00001c7kHyQAI', 'Nursing and Midwifery Council Research Office'],

['0016F00001c8G8OQAU', 'Nisa Well']],

columns=['ID', 'NAME'])

在字典中获取唯一的哈希值。在

^{pr2}$

定义函数checkpair。你需要它来删除相互的哈希对。此方法将添加(hash1, hash2)和(hash2, hash1),但我认为您只希望保留其中一对:def checkpair (a,b,l):

for x in l:

if (a,b) == (x[2],x[0]):

l.remove(x)

现在迭代hashdict.items()查找前3个匹配项。fuzzyfuzzy docs详细介绍了process方法。在matches = []

for k,v in hashdict.items():

#see docs for extract 4 because you are comparing a name to itself

top3 = process.extract(v, hashdict, limit=4)

#remove the hashID compared to itself

for h in top3:

if k == h[2]:

top3.remove(h)

#append tuples to the list "matches" if it meets a score criteria

[matches.append((k, v, x[2], x[0], x[1])) for x in top3 if x[1] > 60] #change score?

#remove reciprocal pairs

[checkpair(m[0], m[2], matches) for m in matches]

df = pd.DataFrame(matches, columns=['id1', 'name1', 'id2', 'name2', 'score'])

# write to file

writer = pd.ExcelWriter('/path/to/your/file.xlsx')

df.to_excel(writer,'Sheet1')

writer.save()

输出:id1 name1 id2 name2 score

0 0016F00001c7JRrQAM Nisha Well 0016F00001c8G8OQAU Nisa Well 95

1 0016F00001c7GPnQAM Daniel Abriani 0016F00001c7GDZQA2 Daniela Abriani 97

2 0016F00001c7Jv8QAE Katherine 0016F00001c7dA3QAI Katherin 94

3 0016F00001c7Jv8QAE Katherine 0016F00001c7cXiQAI Katerine 94

4 0016F00001c7dA3QAI Katherin 0016F00001c7cXiQAI Katerine 88

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。