代码之家  ›  专栏  ›  技术社区  ›  Rudramuni TP

基于少量起始文本对文本节点和元素节点进行分组

  •  0
  • Rudramuni TP  · 技术社区  · 7 年前

    请建议将text()+元素节点根据一些文本格式进行分组,如(Fig.| Figs.| Figure | Tables | Tables)。如果这些引用文本以括号开头,以括号结尾,如(,[,{,),],}符号,分组也应将括号括起来,否则图|要分组的表字+外部参照元素(<col1>***</列1>。

    这些分组应适用于除“Refs”元素下之外的任何text()节点。

    输入:

    <root>
        <Para>The citations are like (Fig. <xref refID="f1">1</xref>).</Para>
        <Para>The <b>citations are like (Fig. <xref refID="f1">1</xref>).</b></Para>
        <Extract>The citations are like (Figs. <xref refID="f1">1</xref> and <xref refID="f2">2</xref>).</Extract>
        <DispQuote>The citations are like (Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>).</DispQuote>
        <Para1>The citations are like (Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>; Fig. <xref refID="f1">1</xref>).</Para1>
        <Para2>The citations are like (analysation of Fig. <xref refID="f1">1</xref>).</Para2>
        <Para>The citations are like (explained in Figs. <xref refID="f1">1</xref> and <xref refID="f2">2</xref>).</Para>
        <Para>The citations are like (Chapter 1 and 3 are explained in Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>).</Para>
        <Refs>The citations are like (Fig. <xref refID="f1">1</xref>).</Refs>
    </root>
    

    XSLT2:

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="@*|node()">
        <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
    </xsl:template>
    <xsl:template match="Para">
        <xsl:copy><xsl:call-template name="tempCrossRef1"/></xsl:copy>
    </xsl:template>
    
    <xsl:template name="tempCrossRef1">
        <!--xsl:analyze-string select="." regex="\([ ]+)|([\+])|([=])|([%])|([/])|([\[])|([\]])"-->
        <!-- (Fig. <xref refID="f1">1</xref>) -->
        <!--xsl:analyze-string select="node()" regex="\(Fig. ">
            <xsl:matching-substring>
                <xsl:choose>
                    <xsl:when test="following-sibling::node()[2][parent::*/name()='xref']">
                        <col><xsl:apply-templates select="."/></col>
                    </xsl:when>
                    <xsl:otherwise><xsl:apply-templates select="."/></xsl:otherwise>
                </xsl:choose>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string-->
        <xsl:for-each select="node()">
            <xsl:choose>
                <xsl:when test="ends-with(., 'Fig.')">
                    <xsl:for-each-group select="self::node()[ends-with(., 'Fig.')]" group-adjacent="boolean(self::xref)">
                        <xsl:choose>
                            <xsl:when test="current-grouping-key()">
                                <xsl:apply-templates select="current-group()" />
                            </xsl:when>
                            <xsl:otherwise>
                                <p1>
                                    <xsl:apply-templates select="current-group()" />
                                </p1>
                            </xsl:otherwise>
                            </xsl:choose>
                    </xsl:for-each-group>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:for-each>
    </xsl:template>
    
    <xsl:template match="xref">
        <xref>
            <xsl:apply-templates select="@*"/>
            <xsl:apply-templates />
        </xref>
    </xsl:template>
    </xsl:stylesheet>
    

    要求的结果:

    <root>
        <Para>The citations are like <col1>(Fig. <xref refID="f1">1</xref>)</col1>.</Para>
        <Para>The <b>citations are like <col1>(Fig. <xref refID="f1">1</xref>)</col1>.</b></Para>
        <Para>The citations are like <col1>(Fig. <xref refID="f1">1</xref>)</col1>.</Para>
        <Extract>The citations are like <col1>(Figs. <xref refID="f1">1</xref> and <xref refID="f2">2</xref>)</col1>.</Extract>
        <DispQuote>The citations are like <col1>(Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>)</col1>.</DispQuote>
        <Para1>The citations are like <col1>(Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>; Fig. <xref refID="f1">1</xref>)</col1>.</Para1>
        <Para2>The citations are like (analysation of <col1>Fig. <xref refID="f1">1</xref></col1>).</Para2>
        <Para>The citations are like (explained in <col1>Figs. <xref refID="f1">1</xref> and <xref refID="f2">2</xref></col1>).</Para>
        <Para>The citations are like (Chapter 1 and 3 are explained in <col1>Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref></col1>).</Para>
        <Refs>The citations are like (Fig. <xref refID="f1">1</xref>).</Refs><!-- Within this element, grouping not required-->
    </root>
    
    1 回复  |  直到 7 年前
        1
  •  1
  •   Martin Honnen    7 年前

    这里尝试使用两个步骤,第一个步骤转换任何模式 [(]?(Fig\.|Figs\.|Figure|Table[s]?) 进入 start 元素和结束模式 [)] 进入 end 元素,然后第二步尝试使用 group-starting-with/ending-with 将此类内容包装到 col1 :

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        exclude-result-prefixes="xs"
        version="3.0">
    
      <xsl:param name="start-patterns" as="xs:string">[(]?(Fig\.|Figs\.|Figure|Table[s]?)</xsl:param>
      <xsl:param name="end-patterns" as="xs:string">[)]</xsl:param>
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:mode name="text-to-elements" on-no-match="shallow-copy"/>
    
      <xsl:template match="root/*[not(self::Refs)][matches(., $start-patterns)]">
          <xsl:copy>
              <xsl:variable name="text-to-elements" as="node()*">
                  <xsl:apply-templates mode="text-to-elements"/>
              </xsl:variable>
              <xsl:for-each-group select="$text-to-elements" group-starting-with="start">
                  <xsl:choose>
                      <xsl:when test="self::start">
                          <xsl:for-each-group select="current-group()" group-ending-with="end">
                              <xsl:choose>
                                  <xsl:when test="current-group()[last()][self::end]">
                                      <col1>
                                          <xsl:apply-templates select="current-group()"/>
                                      </col1>
                                  </xsl:when>
                                  <xsl:otherwise>
                                      <xsl:apply-templates select="current-group()"/>
                                  </xsl:otherwise>
                              </xsl:choose>
                          </xsl:for-each-group>                      
                      </xsl:when>
                      <xsl:otherwise>
                          <xsl:apply-templates select="current-group()"/>
                      </xsl:otherwise>
                  </xsl:choose>
              </xsl:for-each-group>
          </xsl:copy>
      </xsl:template>
    
      <xsl:template match="start | end">
          <xsl:apply-templates/>
      </xsl:template>
    
      <xsl:template match="text()" mode="text-to-elements">
          <xsl:analyze-string select="." regex="{$start-patterns}">
              <xsl:matching-substring>
                  <start>
                      <xsl:value-of select="."/>
                  </start>
              </xsl:matching-substring>
              <xsl:non-matching-substring>
                  <xsl:analyze-string select="." regex="{$end-patterns}">
                      <xsl:matching-substring>
                          <end>
                              <xsl:value-of select="."/>
                          </end>                      
                      </xsl:matching-substring>
                      <xsl:non-matching-substring>
                          <xsl:value-of select="."/>
                      </xsl:non-matching-substring>
                  </xsl:analyze-string>
              </xsl:non-matching-substring>
          </xsl:analyze-string>
      </xsl:template>
    
    </xsl:stylesheet>
    

    如您所见 https://xsltfiddle.liberty-development.net/pPgCcow ,此方法似乎为您发布的输入生成所需的结果,但for元素除外

    <Para1>The citations are like (Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>; Fig. <xref refID="f1">1</xref>).</Para1>