我有以下函数,它使用正则表达式将文本分割成句子。然而,经过测试,这些例子中正则表达式不太有效,文本被错误地分割。例如,如果文本包含St.bernard,我不希望这句话在St。
作为一种很好的解决方法,我修改了正则表达式,以允许忽略异常。请看
here
我现在希望将其合并到excel中,这样任何用户都可以应用他们自己的异常,但是我很难将用户参数传递到函数的字符串(regex)中。
以下是我试图实现的目标(在正则表达式中明确指出| Flam | Liq | St):
正则表达式:
\s*((?:\b(?:[djms]rs?|flam|liq|St)\.|\b(?:[a-z]\.){2,}|\.\d[\d.]*|\.(?:com|net|org)\b|[^.?!])+(?:[.?!]+|$))
https://regex101.com/r/nXf0TM/6
)
然而,我想要实现的是:
\s*((?:\b(?:[djms]rs?|"&Exceptions&")\.|\b(?:[a-z]\.){2,}|\.\d[\d.]*|\.(?:com|net|org)\b|[^.?!])+(?:[.?!]+|$))
Exceptions
即。:
试图实现这一点的M代码会导致错误:
let
Exceptions = Exceptions,
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Text", type text}}),
#"Replaced Value1" = Table.ReplaceValue(#"Changed Type","#(lf)"," ",Replacer.ReplaceText,{"Text"}),
#"Replaced Value" = Table.ReplaceValue(#"Replaced Value1","'","&apos",Replacer.ReplaceText,{"Text"}),
#"Invoked Custom Function" = Table.AddColumn(#"Replaced Value", "fnRegexReplace", each fnRegexReplace([Text], "\s*((?:\b(?:[djms]rs?"&Exceptions&")\.|\b(?:[a-z]\.){2,}|\.\d[\d.]*|\.(?:com|net|org)\b|[^.?!])+(?:[.?!]+|$))", "$1|")),
#"Removed Other Columns" = Table.SelectColumns(#"Invoked Custom Function",{"fnRegexReplace"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Removed Other Columns", {{"fnRegexReplace", Splitter.SplitTextByDelimiter("|", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "fnRegexReplace"),
#"Filtered Rows" = Table.SelectRows(#"Split Column by Delimiter", each ([fnRegexReplace] <> ""))
in
#"Filtered Rows"
例外情况:
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each "Exceptions"),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Custom"}, {{"Exceptions", each Text.Combine([#"Do not split if:"],"|"), type text}})
in
#"Grouped Rows"
fnRegexReplace
(x,y,z)=>
let
y = Text.Replace(y,"\","\\"),
Source = Web.Page(
"<script>var x="&"'"&x&"'"&";var z="&"'"&z&
"'"&";var y=new RegExp('"&y&"','gmi');
var b=x.replace(y,z);document.write(b);</script>")
[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in
Source
错误:
Highly Flammable Liquid Flam. H223 Liq. H334.
St. Bernard Dog was present.
The MW of gold is 100.1. Solubility is 40mg/L.
我相信这是一个很容易的修复,但无论我尝试什么,即记录。从表{0}等我得到各种错误。
如果有人能帮我,那太好了。