代码之家 › 专栏 › 技术社区 › Andy McCluggage hunter

处理umlaut字符的不同非重音版本

diacritics lucene.net lucene internationalization

Andy McCluggage hunter · 技术社区 · 14 年前

德语重音的umlaut字符__195;¶_157;、__·_和___通常在用户键入时替换为非重音版本,通常是为了方便他们没有正确的键盘。

With most accented characters there is a particular non-accented version that most people use. The accented âÃ¨â, for instance, is always replaced with a standard âeâ.

British users will replace them with âoâ, âaâ and âuâ respectively, where as...
美国用户将分别替换为__oe__、__ae_157;和__ue_157;。

我们的搜索是建立在 Lucene.Net , and like with any search framework, the technique used to match all combinations of accented characters is to replace them, both when the index is created and when the search criteria is supplied, therefore allowing the matching to be done with purely non-accented characters.

如何解析重音字符以支持以下内容…

德国客户类型___G_¶TZ__
A英国客户类型___Gotz__
美国客户类型__Goetz__

Given that the name is in our database in its correct form of âGÃ¶tzâ, then how would I parse âGÃ¶tzâ so that all three of the users can find it in the index?

编辑

我在上找到这篇文章 CodeProject 这正是我要找的。示例显示了如何将单词的同义词添加到lucene索引中,以便它们与原始单词同时匹配。有了一个小小的适应,我就可以做我想做的事情。

2 回复 | 直到 13 年前

KenE 14 年前

setPositionIncrement(0)

Andy McCluggage hunter 13 年前

I found this article on CodeProject that was exactly what I was looking for. The example shows how Synonyms for words can also be added to the Lucene index so that they are matched as well as the original word. With a small adaptation I was able to do exactly what I wanted.

推荐文章

Thor-x86_128 · 如何使用TypeScript确保两个对象具有相同的结构(没有接口)?

3 年前

Gwen_vere · Wordpress i18n JavaScript文本未翻译

3 年前

spondiirty · 带有宽字符(jp或cn)的python input()在Mac终端上无法正常工作

3 年前

Kingsley Simon · 如果从帮助器方法rails更改,则不会显示I18n转换的值

7 年前

devBem · 反应来自<FormattedMessage>

7 年前

beechy · 两级国际化;资源绑定

7 年前

lpt · 外语中的词云或可视化

7 年前

SeaFuzz · 如何将国际化对象传递给Flutter中的子小部件

7 年前

Heiko · 是否可以在Google Play Store中放置一个带有翻译后应用程序名称的应用程序?

7 年前

lio · 具有三个表和一个动态列的Mysql查询(i18n)

7 年前