代码之家  ›  专栏  ›  技术社区  ›  Yevgeny Simkin

在AppleScript中是否有类似于regex的东西,如果没有,还有什么选择?

  •  24
  • Yevgeny Simkin  · 技术社区  · 16 年前

    我需要解析文件名的前10个字符,看看它们是否都是数字。很明显的方法是filename=~m/^\d 10/但是在applescript引用中没有看到任何regexy,所以,我很好奇必须执行此验证的其他选项。

    6 回复  |  直到 11 年前
        1
  •  23
  •   stib    16 年前

    不要绝望,因为OSX也可以访问 sed 以及grep通过“do shell script”。所以:

    set thecommandstring to "echo \"" & filename & "\"|sed \"s/[0-9]\\{10\\}/*good*(&)/\"" as string
    set sedResult to do shell script thecommandstring
    set isgood to sedResult starts with "*good*"
    

    我的SED技能并不太火爆,所以可能有一种比在匹配[0-9]10的任何名称上加上“*good”,然后在结果的开头查找“*good”更优雅的方法。但是,基本上,如果文件名为“1234567890dfoo.mov”,则将运行以下命令:

    echo "1234567890foo.mov"|sed "s/[0-9]\{10\}/*good*(&)/"
    

    注意applescript中的转义引号\“和转义反斜杠\”。如果你要逃离贝壳里的东西,你必须逃离逃离。因此,要运行一个包含反斜杠的shell脚本,您必须用类似\\的shell对其进行转义,然后用类似\ \\的applescript对每个反斜杠进行转义。这很难理解。

    所以在命令行上可以做的任何事情都可以通过从applescript(woohoo!).stdout上的任何结果都将作为结果返回到脚本。

        2
  •  14
  •   mklement0    11 年前

    有一种更简单的方法可以使用shell(在bash 3.2+上工作)进行regex匹配:

    set isMatch to "0" = (do shell script ¬
      "[[ " & quoted form of fileName & " =~ ^[[:digit:]]{10} ]]; printf $?")
    

    注:

    • 使用现代bash测试表达式 [[ ... ]] 使用regex匹配运算符, =~ ; 在bash 3.2+上引用右操作数(或至少是特殊的regex字符)是必须的,除非预先声明 shopt -s compat31;
    • 这个 do shell script 语句执行测试并通过附加命令返回其exit命令(谢谢,@lauriranta); "0" 表示成功。
    • 请注意 = 运算符 支持快捷字符类,如 \d 以及诸如 \b (从OS X 10.9.4开始是正确的-这不太可能很快改变)。
    • 为了 不区分大小写 匹配,在命令字符串前面加上 shopt -s nocasematch;
    • 为了 区域意识 ,在命令字符串前面加上 export LANG='" & user locale of (system info) & ".UTF-8'; .
    • 如果regex包含 捕获组 ,您可以通过内置的 ${BASH_REMATCH[@]} 数组变量。
    • 正如被接受的答案一样,你必须 \ -转义双引号和反斜杠。

    这里有一个替代方法 egrep :

    set isMatch to "0" = (do shell script ¬
      "egrep -q '^\\d{10}' <<<" & quoted form of filename & "; printf $?")
    

    虽然这可能表现得更差,但它有两个优势:

    • 可以使用快捷字符类,例如 D 以及诸如 \b
    • 您可以通过调用 EGRIP 具有 -i :
    • 但是,您不能通过捕获组访问子匹配;请使用 [[ ... =~ ... ]] 如果需要,请联系。

    最后,这里是 效用函数 这两种方法都打包了(语法突出显示关闭,但它们确实有效):

    # SYNOPIS
    #   doesMatch(text, regexString) -> Boolean
    # DESCRIPTION
    #   Matches string s against regular expression (string) regex using bash's extended regular expression language *including* 
    #   support for shortcut classes such as `\d`, and assertions such as `\b`, and *returns a Boolean* to indicate if
    #   there is a match or not.
    #    - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless inside
    #      a 'considering case' block.
    #    - The current user's locale is respected.
    # EXAMPLE
    #    my doesMatch("127.0.0.1", "^(\\d{1,3}\\.){3}\\d{1,3}$") # -> true
    on doesMatch(s, regex)
        local ignoreCase, extraGrepOption
        set ignoreCase to "a" is "A"
        if ignoreCase then
            set extraGrepOption to "i"
        else
            set extraGrepOption to ""
        end if
        # Note: So that classes such as \w work with different locales, we need to set the shell's locale explicitly to the current user's.
        #       Rather than let the shell command fail we return the exit code and test for "0" to avoid having to deal with exception handling in AppleScript.
        tell me to return "0" = (do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; egrep -q" & extraGrepOption & " " & quoted form of regex & " <<< " & quoted form of s & "; printf $?")
    end doesMatch
    
    # SYNOPSIS
    #   getMatch(text, regexString) -> { overallMatch[, captureGroup1Match ...] } or {}
    # DESCRIPTION
    #   Matches string s against regular expression (string) regex using bash's extended regular expression language and
    #   *returns the matching string and substrings matching capture groups, if any.*
    #   
    #   - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless this subroutine is called inside
    #     a 'considering case' block.
    #   - The current user's locale is respected.
    #   
    #   IMPORTANT: 
    #   
    #   Unlike doesMatch(), this subroutine does NOT support shortcut character classes such as \d.
    #   Instead, use one of the following POSIX classes (see `man re_format`):
    #       [[:alpha:]] [[:word:]] [[:lower:]] [[:upper:]] [[:ascii:]]
    #       [[:alnum:]] [[:digit:]] [[:xdigit:]]
    #       [[:blank:]] [[:space:]] [[:punct:]] [[:cntrl:]] 
    #       [[:graph:]]  [[:print:]] 
    #   
    #   Also, `\b`, '\B', '\<', and '\>' are not supported; you can use `[[:<:]]` for '\<' and `[[:>:]]` for `\>`
    #   
    #   Always returns a *list*:
    #    - an empty list, if no match is found
    #    - otherwise, the first list element contains the matching string
    #       - if regex contains capture groups, additional elements return the strings captured by the capture groups; note that *named* capture groups are NOT supported.
    #  EXAMPLE
    #       my getMatch("127.0.0.1", "^([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})$") # -> { "127.0.0.1", "127", "0", "0", "1" }
    on getMatch(s, regex)
        local ignoreCase, extraCommand
        set ignoreCase to "a" is "A"
        if ignoreCase then
            set extraCommand to "shopt -s nocasematch; "
        else
            set extraCommand to ""
        end if
        # Note: 
        #  So that classes such as [[:alpha:]] work with different locales, we need to set the shell's locale explicitly to the current user's.
        #  Since `quoted form of` encloses its argument in single quotes, we must set compatibility option `shopt -s compat31` for the =~ operator to work.
        #  Rather than let the shell command fail we return '' in case of non-match to avoid having to deal with exception handling in AppleScript.
        tell me to do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; shopt -s compat31; " & extraCommand & "[[ " & quoted form of s & " =~ " & quoted form of regex & " ]] && printf '%s\\n' \"${BASH_REMATCH[@]}\" || printf ''"
        return paragraphs of result
    end getMatch
    
        3
  •  11
  •   Community CDub    8 年前

    我最近在一个脚本中需要正则表达式,并希望找到一个脚本添加来处理它,这样就更容易阅读正在发生的事情。我找到了 Satimage.osax ,允许您使用如下语法:

    find text "n(.*)" in "to be or not to be" with regexp
    

    唯一的缺点是(截至2010年8月11日),它是一个32位的加法,所以当从64位进程调用它时,它会抛出错误。这个咬了我一口 Mail rule for Snow Leopard ,因为我必须以32位模式运行邮件。不过,从一个独立脚本调用时,我没有任何保留—这非常好,可以让您选择任何内容 regex syntax 你想用,用 back-references .

    更新日期:2011年5月28日

    感谢MitchellModel在下面的评论,他们指出他们已经将其更新为64位,所以没有更多的保留——它可以满足我的所有需求。

        4
  •  3
  •   Philip Regan    16 年前

    我确信有一个applescript的添加或者一个shell脚本可以被调用来将regex带到折叠中,但是我避免了对简单事物的依赖。我一直使用这种样式的图案…

    set filename to "1234567890abcdefghijkl"
    
    return isPrefixGood(filename)
    
    on isPrefixGood(filename) --returns boolean
        set legalCharacters to {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}
    
        set thePrefix to (characters 1 thru 10) of filename as text
    
        set badPrefix to false
    
        repeat with thisChr from 1 to (get count of characters in thePrefix)
            set theChr to character thisChr of thePrefix
            if theChr is not in legalCharacters then
                set badPrefix to true
            end if
        end repeat
    
        if badPrefix is true then
            return "bad prefix"
        end if
    
        return "good prefix"
    end isPrefixGood
    
        5
  •  3
  •   Moshu    11 年前

    这是另一种检查字符串前十个字符是否是数字的方法。

        on checkFilename(thisName)
            set {n, isOk} to {length of fileName, true}
            try
                repeat with i from 1 to 10
                    set isOk to (isOk and ((character i of thisName) is in "0123456789"))
                end repeat
                return isOk
            on error
                return false
            end try
        end checkFilename
    
        6
  •  1
  •   McUsr    12 年前

    我有另一个选择,直到我为汤普森NFA算法实现了角色类,我才在applescript中做了简单的工作。如果有人想用applescript解析非常基本的regex,那么代码将发布在macscripters的codeexchange中,请看一下!

    下面是计算文本/字符串的前十个字符是否:

     set mstr to "1234567889Abcdefg"
    set isnum to prefixIsOnlyDigits for mstr
    to prefixIsOnlyDigits for aText
        set aProbe to text 1 thru 10 of aText
        set isnum to false
        if not ((offset of "," in aProbe) > 0 or (offset of "." in aProbe) > 0 or (offset of "-" in aProbe) > 0) then
            try
                set aNumber to aProbe as number
                set isnum to true
            end try
        end if
        return isnum
    end prefixIsOnlyDigits