代码之家  ›  专栏  ›  技术社区  ›  Prabhu

xpath查询中的特殊字符

  •  39
  • Prabhu  · 技术社区  · 15 年前

    我用下面的 XPATH Query 列出站点下的对象。 ListObject[@Title='SomeValue'] . 有些价值是动态的。只要someValue没有撇号('),此查询就可以工作。也尝试使用转义序列。不起作用。

    我做错什么了?

    10 回复  |  直到 7 年前
        1
  •  57
  •   hanshenrik    7 年前

    这是非常困难的。

    看看 XPath Recommendation ,您将看到它将文本定义为:

    Literal ::=   '"' [^"]* '"' 
                | "'" [^']* "'"
    

    也就是说,xpath表达式中的字符串文本可以包含撇号或双引号,但不能同时包含两者。

    你不能用逃跑来逃避这个。像这样的文字:

    'Some'Value'
    

    将匹配此XML文本:

    Some'Value
    

    这意味着有可能有一段XML文本无法生成要匹配的xpath文本,例如:

    <elm att="&quot;&apos"/>
    

    但这并不意味着不可能将该文本与xpath匹配,这只是一个棘手的问题。在任何情况下,如果要匹配的值同时包含单引号和双引号,则可以构造使用 concat 要生成要匹配的文本:

    elm[@att=concat('"', "'")]
    

    这就导致了这一点,这比我希望的要复杂得多:

    /// <summary>
    /// Produce an XPath literal equal to the value if possible; if not, produce
    /// an XPath expression that will match the value.
    /// 
    /// Note that this function will produce very long XPath expressions if a value
    /// contains a long run of double quotes.
    /// </summary>
    /// <param name="value">The value to match.</param>
    /// <returns>If the value contains only single or double quotes, an XPath
    /// literal equal to the value.  If it contains both, an XPath expression,
    /// using concat(), that evaluates to the value.</returns>
    static string XPathLiteral(string value)
    {
        // if the value contains only single or double quotes, construct
        // an XPath literal
        if (!value.Contains("\""))
        {
            return "\"" + value + "\"";
        }
        if (!value.Contains("'"))
        {
            return "'" + value + "'";
        }
    
        // if the value contains both single and double quotes, construct an
        // expression that concatenates all non-double-quote substrings with
        // the quotes, e.g.:
        //
        //    concat("foo", '"', "bar")
        StringBuilder sb = new StringBuilder();
        sb.Append("concat(");
        string[] substrings = value.Split('\"');
        for (int i = 0; i < substrings.Length; i++ )
        {
            bool needComma = (i>0);
            if (substrings[i] != "")
            {
                if (i > 0)
                {
                    sb.Append(", ");
                }
                sb.Append("\"");
                sb.Append(substrings[i]);
                sb.Append("\"");
                needComma = true;
            }
            if (i < substrings.Length - 1)
            {
                if (needComma)
                {
                    sb.Append(", ");                    
                }
                sb.Append("'\"'");
            }
    
        }
        sb.Append(")");
        return sb.ToString();
    }
    

    是的,我用所有边缘的箱子测试过。这就是逻辑如此复杂的原因:

        foreach (string s in new[]
        {
            "foo",              // no quotes
            "\"foo",            // double quotes only
            "'foo",             // single quotes only
            "'foo\"bar",        // both; double quotes in mid-string
            "'foo\"bar\"baz",   // multiple double quotes in mid-string
            "'foo\"",           // string ends with double quotes
            "'foo\"\"",         // string ends with run of double quotes
            "\"'foo",           // string begins with double quotes
            "\"\"'foo",         // string begins with run of double quotes
            "'foo\"\"bar"       // run of double quotes in mid-string
        })
        {
            Console.Write(s);
            Console.Write(" = ");
            Console.WriteLine(XPathLiteral(s));
            XmlElement elm = d.CreateElement("test");
            d.DocumentElement.AppendChild(elm);
            elm.SetAttribute("value", s);
    
            string xpath = "/root/test[@value = " + XPathLiteral(s) + "]";
            if (d.SelectSingleNode(xpath) == elm)
            {
                Console.WriteLine("OK");
            }
            else
            {
                Console.WriteLine("Should have found a match for {0}, and didn't.", s);
            }
        }
        Console.ReadKey();
    }
    
        2
  •  6
  •   Christian Hayter    15 年前

    编辑: 在一个繁重的单元测试会话之后,检查 XPath Standards ,我修改了我的职能如下:

    public static string ToXPath(string value) {
    
        const string apostrophe = "'";
        const string quote = "\"";
    
        if(value.Contains(quote)) {
            if(value.Contains(apostrophe)) {
                throw new XPathException("Illegal XPath string literal.");
            } else {
                return apostrophe + value + apostrophe;
            }
        } else {
            return quote + value + quote;
        }
    }
    

    看来xpath根本没有字符转义系统,它真的很原始。显然,我的原始代码只是巧合。我很抱歉误导了任何人!

    以下原始答案仅供参考-请忽略

    为了安全起见,请确保对xpath字符串中所有5个预定义XML实体的任何出现进行转义,例如

    public static string ToXPath(string value) {
        return "'" + XmlEncode(value) + "'";
    }
    
    public static string XmlEncode(string value) {
        StringBuilder text = new StringBuilder(value);
        text.Replace("&", "&amp;");
        text.Replace("'", "&apos;");
        text.Replace(@"""", "&quot;");
        text.Replace("<", "&lt;");
        text.Replace(">", "&gt;");
        return text.ToString();
    }
    

    我以前做过,而且效果很好。如果它对您不起作用,也许您需要让我们了解这个问题的一些额外背景。

        3
  •  5
  •   Cody S    12 年前

    我将罗伯特的答案移植到Java(在1.6中测试):

    /// <summary>
    /// Produce an XPath literal equal to the value if possible; if not, produce
    /// an XPath expression that will match the value.
    ///
    /// Note that this function will produce very long XPath expressions if a value
    /// contains a long run of double quotes.
    /// </summary>
    /// <param name="value">The value to match.</param>
    /// <returns>If the value contains only single or double quotes, an XPath
    /// literal equal to the value.  If it contains both, an XPath expression,
    /// using concat(), that evaluates to the value.</returns>
    public static String XPathLiteral(String value) {
        if(!value.contains("\"") && !value.contains("'")) {
            return "'" + value + "'";
        }
        // if the value contains only single or double quotes, construct
        // an XPath literal
        if (!value.contains("\"")) {
            System.out.println("Doesn't contain Quotes");
            String s = "\"" + value + "\"";
            System.out.println(s);
            return s;
        }
        if (!value.contains("'")) {
            System.out.println("Doesn't contain apostophes");
            String s =  "'" + value + "'";
            System.out.println(s);
            return s;
        }
    
        // if the value contains both single and double quotes, construct an
        // expression that concatenates all non-double-quote substrings with
        // the quotes, e.g.:
        //
        //    concat("foo", '"', "bar")
        StringBuilder sb = new StringBuilder();
        sb.append("concat(");
        String[] substrings = value.split("\"");
        for (int i = 0; i < substrings.length; i++) {
            boolean needComma = (i > 0);
            if (!substrings[i].equals("")) {
                if (i > 0) {
                    sb.append(", ");
                }
                sb.append("\"");
                sb.append(substrings[i]);
                sb.append("\"");
                needComma = true;
            }
            if (i < substrings.length - 1) {
                if (needComma) {
                    sb.append(", ");
                }
                sb.append("'\"'");
            }
            System.out.println("Step " + i + ": " + sb.toString());
        }
        //This stuff is because Java is being stupid about splitting strings
        if(value.endsWith("\"")) {
            sb.append(", '\"'");
        }
        //The code works if the string ends in a apos
        /*else if(value.endsWith("'")) {
            sb.append(", \"'\"");
        }*/
        sb.append(")");
        String s = sb.toString();
        System.out.println(s);
        return s;
    }
    

    希望这能帮助别人!

        4
  •  5
  •   Ian Roberts    11 年前

    到目前为止,解决这个问题的最佳方法是使用XPath库提供的工具来声明一个可以在表达式中引用的XPath级别变量。然后,变量值可以是宿主编程语言中的任何字符串,并且不受xpath字符串文本的限制。例如,在Java中 javax.xml.xpath :

    XPathFactory xpf = XPathFactory.newInstance();
    final Map<String, Object> variables = new HashMap<>();
    xpf.setXPathVariableResolver(new XPathVariableResolver() {
      public Object resolveVariable(QName name) {
        return variables.get(name.getLocalPart());
      }
    });
    
    XPath xpath = xpf.newXPath();
    XPathExpression expr = xpath.compile("ListObject[@Title=$val]");
    variables.put("val", someValue);
    NodeList nodes = (NodeList)expr.evaluate(someNode, XPathConstants.NODESET);
    

    对C XPathNavigator 您将定义一个自定义 XsltContext as described in this MSDN article (您只需要这个示例中与变量相关的部分,而不需要扩展函数)。

        5
  •  3
  •   Community CDub    8 年前

    这里的大多数答案都集中在如何使用字符串操作来拼凑一个使用有效的字符串分隔符的XPath。

    我要说的是,最佳做法是不要依赖如此复杂和潜在脆弱的方法。

    以下内容适用于.NET,因为此问题标记为C。阿兰·罗伯茨提供了我认为在Java中使用XPath时最好的解决方案。

    现在,您可以使用linq-to-xml来查询XML文档,这种方式允许您直接在查询中使用变量。这不是xpath,但目的是相同的。

    对于OP中给出的示例,可以这样查询所需的节点:

    var value = "Some value with 'apostrophes' and \"quotes\"";
    
    // doc is an instance of XElement or XDocument
    IEnumerable<XElement> nodes = 
                          doc.Descendants("ListObject")
                             .Where(lo => (string)lo.Attribute("Title") == value);
    

    或者使用查询理解语法:

    IEnumerable<XElement> nodes = from lo in doc.Descendants("ListObject")
                                  where (string)lo.Attribute("Title") == value
                                  select lo;
    

    .NET还提供了一种在xpath查询中使用xpath变量的方法。遗憾的是,要做到这一点并非易事,而是使用我在中提供的一个简单的助手类。 this other SO answer 很容易。

    您可以这样使用它:

    var value = "Some value with 'apostrophes' and \"quotes\"";
    
    var variableContext = new VariableContext { { "matchValue", value } };
    // ixn is an instance of IXPathNavigable
    XPathNodeIterator nodes = ixn.CreateNavigator()
                                 .SelectNodes("ListObject[@Title = $matchValue]", 
                                              variableContext);
    
        6
  •  2
  •   Jonathan Gilbert    12 年前

    这里有一个替代Robert Rossney的StringBuilder方法,可能更直观:

        /// <summary>
        /// Produce an XPath literal equal to the value if possible; if not, produce
        /// an XPath expression that will match the value.
        /// 
        /// Note that this function will produce very long XPath expressions if a value
        /// contains a long run of double quotes.
        /// 
        /// From: http://stackoverflow.com/questions/1341847/special-character-in-xpath-query
        /// </summary>
        /// <param name="value">The value to match.</param>
        /// <returns>If the value contains only single or double quotes, an XPath
        /// literal equal to the value.  If it contains both, an XPath expression,
        /// using concat(), that evaluates to the value.</returns>
        public static string XPathLiteral(string value)
        {
            // If the value contains only single or double quotes, construct
            // an XPath literal
            if (!value.Contains("\""))
                return "\"" + value + "\"";
    
            if (!value.Contains("'"))
                return "'" + value + "'";
    
            // If the value contains both single and double quotes, construct an
            // expression that concatenates all non-double-quote substrings with
            // the quotes, e.g.:
            //
            //    concat("foo",'"',"bar")
    
            List<string> parts = new List<string>();
    
            // First, put a '"' after each component in the string.
            foreach (var str in value.Split('"'))
            {
                if (!string.IsNullOrEmpty(str))
                    parts.Add('"' + str + '"'); // (edited -- thanks Daniel :-)
    
                parts.Add("'\"'");
            }
    
            // Then remove the extra '"' after the last component.
            parts.RemoveAt(parts.Count - 1);
    
            // Finally, put it together into a concat() function call.
            return "concat(" + string.Join(",", parts) + ")";
        }
    
        7
  •  2
  •   Fortune    11 年前

    您可以使用搜索和替换来引用xpath字符串。

    在f*中

    let quoteString (s : string) =
        if      not (s.Contains "'" ) then sprintf "'%s'"   s
        else if not (s.Contains "\"") then sprintf "\"%s\"" s
        else "concat('" + s.Replace ("'", "', \"'\", '") + "')"
    

    我没有广泛地测试过它,但似乎有效。

        8
  •  0
  •   48klocs    15 年前

    如果某个值中没有双引号,可以使用转义双引号指定要在XPath搜索字符串中搜索的值。

    ListObject[@Title=\"SomeValue\"]
    
        9
  •  0
  •   slavoo user3099232    11 年前

    您可以使用 double quotes 而不是 single quotes XPath 表达式。

    对于EX:

    element.XPathSelectElements(String.Format("//group[@title=\"{0}\"]", "Man's"));
    
        10
  •  -1
  •   Gyuri    15 年前

    我以前遇到过这个问题,看起来是最简单但不是最快的解决方案,就是在XML文档中添加一个新节点,该节点具有值为“someValue”的属性,然后使用简单的xpath搜索来查找该属性值。完成操作后,可以从XML文档中删除“临时节点”。

    这样,整个比较都是“内部”进行的,因此您不必构建奇怪的XPath查询。

    我似乎记得为了加快速度,您应该向根节点添加temp值。

    祝你好运。。。