德语重音的umlaut字符__195;¶_157;、__·_和___通常在用户键入时替换为非重音版本,通常是为了方便他们没有正确的键盘。
With most accented characters there is a particular non-accented version that most people use. The accented âèâ, for instance, is always replaced with a standard âeâ.
British users will replace them with âoâ, âaâ and âuâ respectively, where as...
美国用户将分别替换为__oe__、__ae_157;和__ue_157;。
我们的搜索是建立在
Lucene.Net
, and like with any search framework, the technique used to match all combinations of accented characters is to replace them, both when the index is created and when the search criteria is supplied, therefore allowing the matching to be done with purely non-accented characters.
如何解析重音字符以支持以下内容…
德国客户类型___G_¶TZ__
A英国客户类型___Gotz__
美国客户类型__Goetz__
Given that the name is in our database in its correct form of âGötzâ, then how would I parse âGötzâ so that all three of the users can find it in the index?
编辑
我在上找到这篇文章
CodeProject
这正是我要找的。示例显示了如何将单词的同义词添加到lucene索引中,以便它们与原始单词同时匹配。有了一个小小的适应,我就可以做我想做的事情。