代码之家 › 专栏 › 技术社区 › Thunder

有没有一种方法可以使用C或VB从XML递归查找最里面的节点?

xpath xml .net

Thunder · 技术社区 · 15 年前

我有一个XML文件说

  <items>
      <item1>
        <piece>1300</piece>
        <itemc>665583</itemc> 
      </item1>
      <item2>
        <piece>100</piece>
        <itemc>665584</itemc>
      </item2>
    </items>

我正在尝试编写一个C应用程序,以获取到最内部节点的所有X路径,例如:

items/item1/piece
items/item1/itemc
items/item2/piece
items/item2/itemc

有没有一种方法可以用C或VB来实现?提前感谢您提供可能的解决方案。

7 回复 | 直到 13 年前

A9S6 15 年前

你走了:

static void Main()
{
   XmlDocument doc = new XmlDocument();
   doc.Load(@"C:\Test.xml");

   foreach (XmlNode node in doc.DocumentElement.ChildNodes)
   {
        ProcesNode(node, doc.DocumentElement.Name);
   }
}


    private void ProcesNode(XmlNode node, string parentPath)
    {
        if (!node.HasChildNodes
            || ((node.ChildNodes.Count == 1) && (node.FirstChild is System.Xml.XmlText)))
        {
            System.Diagnostics.Debug.WriteLine(parentPath + "/" + node.Name);
        }
        else
        {
            foreach (XmlNode child in node.ChildNodes)
            {
                ProcesNode(child, parentPath + "/" + node.Name);
            }
        }
    }

上述代码将为任何类型的文件生成所需的输出。如有需要,请添加支票。主要部分是我们从输出中忽略文本节点(节点内的文本)。

helios 15 年前

//*[not(*)]

是用于查找没有子元素的所有子元素的xpath,因此可以执行类似的操作

doc.SelectNodes("//*[not(*)]")

但我对.NET API不是很确定,所以看看吧。

参考

// --> descendant (not only children)
*  --> any name
[] --> predicate to evaluate
not(*) --> not having children

Kevin Nixon 15 年前

只需稍微扩展一下Helios的答案,您就可以使用[text()]对xpath进行质量评估,以仅指定具有text()节点的节点:

// XDocument
foreach(XElement textNode in xdoc.XPathSelectElements("//*[not(*)][text()]"))
{
    Console.WriteLine(textNode.Value);
}

// XmlDocument
foreach(XmlText textNode in doc.SelectNodes("//*[not(*)]/text()"))
{
    Console.WriteLine(textNode.Value);
}

Mads Hansen 15 年前

这里是一个 XSLT 产生 XPath 每个最内部元素的表达式。

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="/">
        <xsl:apply-templates />
    </xsl:template>

    <!--Match on all elements that do not contain child elements -->
    <xsl:template match="//*[not(*)]">
        <!--look up the node tree and write out:
           - a slash
           - the name of the element
           - and a predicate filter for the position of the element at each step -->
        <xsl:for-each select="ancestor-or-self::*">
            <xsl:text>/</xsl:text>
            <xsl:value-of select="local-name()"/>
            <!--add a predicate filter to specify the position, in case there are more than one element with that name at that step -->
            <xsl:text>[</xsl:text>
            <xsl:value-of select="count(preceding-sibling::*[name()=name(current())])+1" />
            <xsl:text>]</xsl:text>
        </xsl:for-each>  
        <!--Create a new line after ever element -->
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>

<!--override default template to prevent extra whitespace and carriage return from being copied into the output-->
<xsl:template match="text()" />

</xsl:stylesheet>

我添加了谓词过滤器来指定元素的位置。这样,如果你有不止一个 piece 或 itemc 元素在同一级别上,xpath将指定正确的元素。

所以,不是:

items/item1/piece
items/item1/itemc
items/item2/piece
items/item2/itemc

它产生:

/items[1]/item1[1]/piece[1]
/items[1]/item1[1]/itemc[1]
/items[1]/item2[1]/piece[1]
/items[1]/item2[1]/itemc[1]

Robert Rossney 15 年前

下面的代码查找文档中的所有叶元素,并为每个叶元素输出一个xpath表达式,该表达式将明确地从文档根导航到元素,包括每个节点步骤的谓词,以消除同名元素之间的歧义:

static void Main(string[] arguments)
{
    XDocument d = XDocument.Load("xmlfile1.xml");

    foreach (XElement e in d.XPathSelectElements("//*[not(*)]"))
    {
        Console.WriteLine("/" + string.Join("/",
            e.XPathSelectElements("ancestor-or-self::*")
                .Select(x => x.Name.LocalName 
                    + "[" 
                    + (x.ElementsBeforeSelf()
                        .Where(y => y.Name.LocalName == x.Name.LocalName)
                        .Count() + 1)
                    + "]")
                .ToArray()));            
    }

    Console.ReadKey();
}

例如,此输入:

<foo>
  <bar>
    <fizz/>
    <baz>
      <bat/>
    </baz>
    <fizz/>
  </bar>
  <buzz></buzz>
</foo>

生成此输出:

/foo[1]/bar[1]/fizz[1]
/foo[1]/bar[1]/baz[1]/bat[1]
/foo[1]/bar[1]/fizz[2]
/foo[1]/buzz[1]

gingerbreadboy 15 年前

它是未经测试的,prob需要对它做一些工作来获得编译,但是您想要这样的东西吗?

class Program
{
    static void Main()
    {
        XmlDocument xml = new XmlDocument();
        xml.Load("test.xml");

        var toReturn = new List<string>();
        GetPaths(string.Empty, xml.ChildNodes[0], toReturn);
    }

    public static void GetPaths(string pathSoFar, XmlNode node, List<string> results)
    {
        string scopedPath = pathSoFar + node.Name + "/";

        if (node.HasChildNodes)
        {
            foreach (XmlNode itemNode in node.ChildNodes)
            {
                GetPaths(scopedPath, itemNode, results);
            }
        }
        else
        {
            results.Add(scopedPath);
        }
    }
}

对于大块的XML,尽管它可能不是非常节省内存。

Jeroen Huinink 15 年前

也许不是最快的解决方案,但它显示允许将任意的xpath表达式用作选择器,在我看来,这也最清楚地表达了代码的意图。

class Program
{
    static void Main(string[] args)
    {
        XmlDocument xml = new XmlDocument();
        xml.Load("test.xml");

        IEnumerable innerItems = (IEnumerable)e.XPathEvaluate("//*[not(*)]");
        foreach (XElement innerItem in innerItems)
        {
            Console.WriteLine(GetPath(innerItem));
        }
    }

    public static string GetPath(XElement e)
    {
        if (e.Parent == null)
        {
            return "/" + e.Name;
        }
        else
        {
            return GetPath(e.Parent) + "/" + e.Name;
        }
    }
}