我有一个看起来像这样的dataframe:
dataFrame = pd.DataFrame({'Name': (("' Verbundmörtel ', ' Compound Mortar ', ' Malta per stucchi e per incollaggio '"),
("' StoLevell In Absolute ', ' StoLevell In Absolute '"),
("' Anhydrit-FlieÃ\x9festrich ', ' Anhydrite Flowing Screed ', ' Massetto a base di anidrite '"),
("' Ansetzmörtel SLP ', ' Attachment mortar SLP ', ' Malta minerale adesiva SLP + iQ-Fix '"),
("' AQUAPANEL Cement Mörtel ', ' AQUAPANEL Cement Mortar '"),
("' Armatop por ', ' Armatop por '"),
("' Armatop por ', ' Armatop por '")),
"File_name":(( "esiveCoveringPlaster_2" ),
("AdhesiveMortarLevellInForAEVERO_720"),
("AnhydriteFlowingScreed_20"),
("AnsetzmoertelSLPRemmers_21"),
("AquaboardMoertel_655"),
("ArmatopPor479korr_797"),
("ArmatopPor_479"))})
和我正在搜索的关键字:
words = ['Mortar','hist','lime',
'loam','adhesive','clay',
'cement','insulation','sealing',
'light','base', 'glue',
'gyps', 'mineral', 'fine',
'Levelling', 'mould','Silicate'
'Porous','Concrete','screed',
'Rendering', 'Silicate','Renovation'
'Perlite','Waterproof','Porous',
'Old', 'Inside', 'por']
我想获得一个关键字列表。 我正在尝试两种方法,但没有得到预期的结果
方法1
test = ((dataFrame['Name'] + dataFrame['File_name'])).str.findall('|'.join(words),flags=re.IGNORECASE).map(','.join)
结果1
0 Mortar
1 Adhesive,Mortar
2 Screed,base,Screed
3 mortar,mineral
4 Cement,Cement,Mortar
5 por,por,Por
6 por,por,Por
方法2
test = pd.concat([(dataFrame['Name'] + dataFrame['File_name'])
.str
.contains(word, case=False)
.map({True: word, False: ''})
for word in words], axis=1).agg(list, axis=1).str.join(',').str.strip(',')
结果2
0 Mortar
1 Mortar,,,,adhesive
2 base,,,,,,,,,screed
3 Mortar,,,,,,,,,,,,,mineral
4 Mortar,,,,,,cement
5 por
6 por
我的目标是找到两列中的单词。 然后,新列将被添加到dataframe中。 我期望结果中的单词列表:
words = [['Mortar'],
['Mortar', 'adhesive'],
['Base', 'screed'],
['Mortar', 'mineral'],
['Mortar', 'cement'],
['por'],
['por']]
我正在创建散点图,函数"色调"将不得不引用第二列。 我希望我已经说得够清楚了。