代码之家  ›  专栏  ›  技术社区  ›  Andy McCluggage hunter

处理umlaut字符的不同非重音版本

  •  1
  • Andy McCluggage hunter  · 技术社区  · 14 年前

    德语重音的umlaut字符__195;¶_157;、__·_和___通常在用户键入时替换为非重音版本,通常是为了方便他们没有正确的键盘。

    With most accented characters there is a particular non-accented version that most people use. The accented “è”, for instance, is always replaced with a standard “e”.

    British users will replace them with “o”, “a” and “u” respectively, where as...
    美国用户将分别替换为__oe__、__ae_157;和__ue_157;。

    我们的搜索是建立在 Lucene.Net , and like with any search framework, the technique used to match all combinations of accented characters is to replace them, both when the index is created and when the search criteria is supplied, therefore allowing the matching to be done with purely non-accented characters.

    如何解析重音字符以支持以下内容…

    德国客户类型___G_¶TZ__
    A英国客户类型___Gotz__
    美国客户类型__Goetz__

    Given that the name is in our database in its correct form of “Götz”, then how would I parse “Götz” so that all three of the users can find it in the index?

    编辑

    我在上找到这篇文章 CodeProject 这正是我要找的。示例显示了如何将单词的同义词添加到lucene索引中,以便它们与原始单词同时匹配。有了一个小小的适应,我就可以做我想做的事情。

    2 回复  |  直到 13 年前
        1
  •  2
  •   KenE    14 年前
        2
  •  0
  •   Andy McCluggage hunter    13 年前

    I found this article on CodeProject that was exactly what I was looking for. The example shows how Synonyms for words can also be added to the Lucene index so that they are matched as well as the original word. With a small adaptation I was able to do exactly what I wanted.