代码之家 › 专栏 › 技术社区 › Ben McCormack

我可以使用什么正则表达式从未格式化的文本体中提取XML文本体?

regex .net

Ben McCormack · 技术社区 · 14 年前

假设我有以下正文:

Call me Ishmael. Some years ago- never mind how long precisely- having little 
or no money in my purse, and nothing particular to interest me on shore, I 
thought I would sail about a little and see the watery part of the world. It is  
<?xml version="1.0" encoding="utf-8"?>
<RootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <ChildElement />
   <ChildElement />
</RootElement>
a way I have of driving off the spleen and regulating the circulation. Whenever  
I find myself growing grim about the mouth; whenever it is a damp, drizzly 
November in my soul;

我可以使用什么正则表达式来返回嵌入在字符串中的XML?

注:我可以假设 <RootElement> 和 </RootElement> 总是有相同的名字。

2 回复 | 直到 14 年前

SLaks 14 年前

如果你知道根元素总是 <RootElement ...> <RootElement> tag,你可以这样做:

\<\?xml .+?\</RootElement\>

这个正则表达式将惰性地匹配 <?xml </RootElement> .

Tim Pietzcker 14 年前

我知道根元素并不总是被调用 RootElement

<\?xml[^>]+>\s*<\s*(\w+).*?<\s*/\s*\1>

使用 RegexOptions.SingleLine

在C#:

resultString = Regex.Match(subjectString, @"<\?xml[^>]+>\s*<\s*(\w+).*?<\s*/\s*\1>", RegexOptions.Singleline).Value;

推荐文章

DotFX · RegEx捕获关键字前但括号后的所有内容

5 月前

user66001 · 正则表达式用于匹配有引号和无引号的文本,并且不匹配任何部分

6 月前

perlchamp · 为什么这也匹配?

6 月前

con · Negative Lookaward在perl正则表达式中不起作用

6 月前

Andrus · 如何在sql中查找第二个匹配项

6 月前

iato · 确保正则表达式不从命名材料中的数字中提取

6 月前

vr8ce · 非成对标记中特定字符的正则表达式

7 月前

MARTIN · 交换第一个和最后一个单词,反转所有中间的字符

7 月前

Carsten · 使用最近的搜索模式更改文本块

7 月前

Eric Marceau · Grep:有没有一种特殊的方法可以将“无字符”作为“字符位置”匹配的置换?

7 月前