代码之家  ›  专栏  ›  技术社区  ›  Matias Nino

.NET-如何将以“caps”分隔的字符串拆分为数组?

  •  111
  • Matias Nino  · 技术社区  · 16 年前

    干杯

    17 回复  |  直到 16 年前
        1
  •  177
  •   Markus Jarderot    8 年前

    我刚才做的。它匹配CamelCase名称的每个组件。

    /([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z]+)/g
    

    例如:

    "SimpleHTTPServer" => ["Simple", "HTTP", "Server"]
    "camelCase" => ["camel", "Case"]
    

    要将其转换为仅在单词之间插入空格,请执行以下操作:

    Regex.Replace(s, "([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))", "$1 ")
    

    如果需要处理数字:

    /([A-Z]+(?=$|[A-Z][a-z]|[0-9])|[A-Z]?[a-z]+|[0-9]+)/g
    
    Regex.Replace(s,"([a-z](?=[A-Z]|[0-9])|[A-Z](?=[A-Z][a-z]|[0-9])|[0-9](?=[^0-9]))","$1 ")
    
        2
  •  39
  •   Wayne    16 年前
    Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")
    
        3
  •  19
  •   JoshL    16 年前

    回答得好,米扎克斯!我稍微调整了一下,将数字视为单独的单词,因此“AddressLine1”将变成“AddressLine1”,而不是“AddressLine1”:

    Regex.Replace(s, "([a-z](?=[A-Z0-9])|[A-Z](?=[A-Z][a-z]))", "$1 ")
    
        4
  •  18
  •   jpmc26    6 年前

    只是为了一点变化。。。这里有一个不使用正则表达式的扩展方法。

    public static class CamelSpaceExtensions
    {
        public static string SpaceCamelCase(this String input)
        {
            return new string(Enumerable.Concat(
                input.Take(1), // No space before initial cap
                InsertSpacesBeforeCaps(input.Skip(1))
            ).ToArray());
        }
    
        private static IEnumerable<char> InsertSpacesBeforeCaps(IEnumerable<char> input)
        {
            foreach (char c in input)
            {
                if (char.IsUpper(c)) 
                { 
                    yield return ' '; 
                }
    
                yield return c;
            }
        }
    }
    
        5
  •  12
  •   Pseudo Masochist    16 年前

    撇开格兰特·瓦格纳的精彩评论不谈:

    Dim s As String = RegularExpressions.Regex.Replace("ThisIsMyCapsDelimitedString", "([A-Z])", " $1")
    
        6
  •  10
  •   Dan Malcolm    9 年前

    我需要一个支持首字母缩写和数字的解决方案。此基于正则表达式的解决方案将以下模式视为单独的“单词”:

    • 大写字母后跟小写字母
    • 连续的数字序列
    • 连续大写字母(解释为首字母缩略词)-一个新词可以开始使用最后一个大写字母,例如HTMLGuide=>“HTML指南”、“团队”=>“A队”

    能够 作为一个班轮:

    Regex.Replace(value, @"(?<!^)((?<!\d)\d|(?(?<=[A-Z])[A-Z](?=[a-z])|[A-Z]))", " $1")
    

    更具可读性的方法可能更好:

    using System.Text.RegularExpressions;
    
    namespace Demo
    {
        public class IntercappedStringHelper
        {
            private static readonly Regex SeparatorRegex;
    
            static IntercappedStringHelper()
            {
                const string pattern = @"
                    (?<!^) # Not start
                    (
                        # Digit, not preceded by another digit
                        (?<!\d)\d 
                        |
                        # Upper-case letter, followed by lower-case letter if
                        # preceded by another upper-case letter, e.g. 'G' in HTMLGuide
                        (?(?<=[A-Z])[A-Z](?=[a-z])|[A-Z])
                    )";
    
                var options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled;
    
                SeparatorRegex = new Regex(pattern, options);
            }
    
            public static string SeparateWords(string value, string separator = " ")
            {
                return SeparatorRegex.Replace(value, separator + "$1");
            }
        }
    }
    

    以下是(XUnit)测试的摘录:

    [Theory]
    [InlineData("PurchaseOrders", "Purchase-Orders")]
    [InlineData("purchaseOrders", "purchase-Orders")]
    [InlineData("2Unlimited", "2-Unlimited")]
    [InlineData("The2Unlimited", "The-2-Unlimited")]
    [InlineData("Unlimited2", "Unlimited-2")]
    [InlineData("222Unlimited", "222-Unlimited")]
    [InlineData("The222Unlimited", "The-222-Unlimited")]
    [InlineData("Unlimited222", "Unlimited-222")]
    [InlineData("ATeam", "A-Team")]
    [InlineData("TheATeam", "The-A-Team")]
    [InlineData("TeamA", "Team-A")]
    [InlineData("HTMLGuide", "HTML-Guide")]
    [InlineData("TheHTMLGuide", "The-HTML-Guide")]
    [InlineData("TheGuideToHTML", "The-Guide-To-HTML")]
    [InlineData("HTMLGuide5", "HTML-Guide-5")]
    [InlineData("TheHTML5Guide", "The-HTML-5-Guide")]
    [InlineData("TheGuideToHTML5", "The-Guide-To-HTML-5")]
    [InlineData("TheUKAllStars", "The-UK-All-Stars")]
    [InlineData("AllStarsUK", "All-Stars-UK")]
    [InlineData("UKAllStars", "UK-All-Stars")]
    
        7
  •  4
  •   Robert Paulson    16 年前

    为了实现更多的多样性,使用普通的旧C#对象,下面的代码生成与@MizardX优秀的正则表达式相同的输出。

    public string FromCamelCase(string camel)
    {   // omitted checking camel for null
        StringBuilder sb = new StringBuilder();
        int upperCaseRun = 0;
        foreach (char c in camel)
        {   // append a space only if we're not at the start
            // and we're not already in an all caps string.
            if (char.IsUpper(c))
            {
                if (upperCaseRun == 0 && sb.Length != 0)
                {
                    sb.Append(' ');
                }
                upperCaseRun++;
            }
            else if( char.IsLower(c) )
            {
                if (upperCaseRun > 1) //The first new word will also be capitalized.
                {
                    sb.Insert(sb.Length - 1, ' ');
                }
                upperCaseRun = 0;
            }
            else
            {
                upperCaseRun = 0;
            }
            sb.Append(c);
        }
    
        return sb.ToString();
    }
    
        8
  •  3
  •   Brantley Blanchard    10 年前

    以下是将以下内容转换为标题大小写的原型:

    • 蛇案
    • 骆驼壳
    • 判决案例
    • 标题大小写(保持当前格式)

    显然,您自己只需要“ToTitleCase”方法。

    using System;
    using System.Collections.Generic;
    using System.Globalization;
    using System.Text.RegularExpressions;
    
    public class Program
    {
        public static void Main()
        {
            var examples = new List<string> { 
                "THEQuickBrownFox",
                "theQUICKBrownFox",
                "TheQuickBrownFOX",
                "TheQuickBrownFox",
                "the_quick_brown_fox",
                "theFOX",
                "FOX",
                "QUICK"
            };
    
            foreach (var example in examples)
            {
                Console.WriteLine(ToTitleCase(example));
            }
        }
    
        private static string ToTitleCase(string example)
        {
            var fromSnakeCase = example.Replace("_", " ");
            var lowerToUpper = Regex.Replace(fromSnakeCase, @"(\p{Ll})(\p{Lu})", "$1 $2");
            var sentenceCase = Regex.Replace(lowerToUpper, @"(\p{Lu}+)(\p{Lu}\p{Ll})", "$1 $2");
            return new CultureInfo("en-US", false).TextInfo.ToTitleCase(sentenceCase);
        }
    }
    

    控制台输出如下所示:

    THE Quick Brown Fox
    The QUICK Brown Fox
    The Quick Brown FOX
    The Quick Brown Fox
    The Quick Brown Fox
    The FOX
    FOX
    QUICK
    

    Blog Post Referenced

        9
  •  2
  •   Ferruccio    16 年前
    string s = "ThisIsMyCapsDelimitedString";
    string t = Regex.Replace(s, "([A-Z])", " $1").Substring(1);
    
        10
  •  2
  •   Zar Shardan    7 年前

        public static string CamelCaseToSpaceSeparated(this string str)
        {
            if (string.IsNullOrEmpty(str))
            {
                return str;
            }
    
            var res = new StringBuilder();
    
            res.Append(str[0]);
            for (var i = 1; i < str.Length; i++)
            {
                if (char.IsUpper(str[i]))
                {
                    res.Append(' ');
                }
                res.Append(str[i]);
    
            }
            return res.ToString();
        }
    
        11
  •  1
  •   Geoff    16 年前

    朴素的正则表达式解决方案。将不处理O'Conner,并在字符串的开头添加空格。

    s = "ThisIsMyCapsDelimitedString"
    split = Regex.Replace(s, "[A-Z0-9]", " $&");
    
        12
  •  0
  •   Max Schmeling    16 年前

    string myString = "ThisIsMyCapsDelimitedString";
    
    for (int i = 1; i < myString.Length; i++)
    {
         if (myString[i].ToString().ToUpper() == myString[i].ToString())
         {
              myString = myString.Insert(i, " ");
              i++;
         }
    }
    
        13
  •  0
  •   Erxin    10 年前

    尝试使用

    "([A-Z]*[^A-Z]*)"
    

    结果将适用于字母与数字的混合

    Regex.Replace("AbcDefGH123Weh", "([A-Z]*[^A-Z]*)", "$1 ");
    Abc Def GH123 Weh  
    
    Regex.Replace("camelCase", "([A-Z]*[^A-Z]*)", "$1 ");
    camel Case  
    
        14
  •  0
  •   Community CDub    8 年前

    https://stackoverflow.com/a/5796394/4279201

        private static StringBuilder camelCaseToRegular(string i_String)
        {
            StringBuilder output = new StringBuilder();
            int i = 0;
            foreach (char character in i_String)
            {
                if (character <= 'Z' && character >= 'A' && i > 0)
                {
                    output.Append(" ");
                }
                output.Append(character);
                i++;
            }
            return output;
        }
    
        15
  •  0
  •   Slai    8 年前

    匹配非大写字母和 Uppercase Letter Unicode Category : (?<=\P{Lu})(?=\p{Lu})

    Dim s = Regex.Replace("CorrectHorseBatteryStaple", "(?<=\P{Lu})(?=\p{Lu})", " ")
    
        16
  •  0
  •   Patrick from NDepend team    7 年前

    程序化和快速impl:

      /// <summary>
      /// Get the words in a code <paramref name="identifier"/>.
      /// </summary>
      /// <param name="identifier">The code <paramref name="identifier"/></param> to extract words from.
      public static string[] GetWords(this string identifier) {
         Contract.Ensures(Contract.Result<string[]>() != null, "returned array of string is not null but can be empty");
         if (identifier == null) { return new string[0]; }
         if (identifier.Length == 0) { return new string[0]; }
    
         const int MIN_WORD_LENGTH = 2;  //  Ignore one letter or one digit words
    
         var length = identifier.Length;
         var list = new List<string>(1 + length/2); // Set capacity, not possible more words since we discard one char words
         var sb = new StringBuilder();
         CharKind cKindCurrent = GetCharKind(identifier[0]); // length is not zero here
         CharKind cKindNext = length == 1 ? CharKind.End : GetCharKind(identifier[1]);
    
         for (var i = 0; i < length; i++) {
            var c = identifier[i];
            CharKind cKindNextNext = (i >= length - 2) ? CharKind.End : GetCharKind(identifier[i + 2]);
    
            // Process cKindCurrent
            switch (cKindCurrent) {
               case CharKind.Digit:
               case CharKind.LowerCaseLetter:
                  sb.Append(c); // Append digit or lowerCaseLetter to sb
                  if (cKindNext == CharKind.UpperCaseLetter) {
                     goto TURN_SB_INTO_WORD; // Finish word if next char is upper
                  }
                  goto CHAR_PROCESSED;
               case CharKind.Other:
                  goto TURN_SB_INTO_WORD;
               default:  // charCurrent is never Start or End
                  Debug.Assert(cKindCurrent == CharKind.UpperCaseLetter);
                  break;
            }
    
            // Here cKindCurrent is UpperCaseLetter
            // Append UpperCaseLetter to sb anyway
            sb.Append(c); 
    
            switch (cKindNext) {
               default:
                  goto CHAR_PROCESSED;
    
               case CharKind.UpperCaseLetter: 
                  //  "SimpleHTTPServer"  when we are at 'P' we need to see that NextNext is 'e' to get the word!
                  if (cKindNextNext == CharKind.LowerCaseLetter) {
                     goto TURN_SB_INTO_WORD;
                  }
                  goto CHAR_PROCESSED;
    
               case CharKind.End:
               case CharKind.Other:
                  break; // goto TURN_SB_INTO_WORD;
            }
    
            //------------------------------------------------
    
         TURN_SB_INTO_WORD:
            string word = sb.ToString();
            sb.Length = 0;
            if (word.Length >= MIN_WORD_LENGTH) {  
               list.Add(word);
            }
    
         CHAR_PROCESSED:
            // Shift left for next iteration!
            cKindCurrent = cKindNext;
            cKindNext = cKindNextNext;
         }
    
         string lastWord = sb.ToString();
         if (lastWord.Length >= MIN_WORD_LENGTH) {
            list.Add(lastWord);
         }
         return list.ToArray();
      }
      private static CharKind GetCharKind(char c) {
         if (char.IsDigit(c)) { return CharKind.Digit; }
         if (char.IsLetter(c)) {
            if (char.IsUpper(c)) { return CharKind.UpperCaseLetter; }
            Debug.Assert(char.IsLower(c));
            return CharKind.LowerCaseLetter;
         }
         return CharKind.Other;
      }
      enum CharKind {
         End, // For end of string
         Digit,
         UpperCaseLetter,
         LowerCaseLetter,
         Other
      }
    

      [TestCase((string)null, "")]
      [TestCase("", "")]
    
      // Ignore one letter or one digit words
      [TestCase("A", "")]
      [TestCase("4", "")]
      [TestCase("_", "")]
      [TestCase("Word_m_Field", "Word Field")]
      [TestCase("Word_4_Field", "Word Field")]
    
      [TestCase("a4", "a4")]
      [TestCase("ABC", "ABC")]
      [TestCase("abc", "abc")]
      [TestCase("AbCd", "Ab Cd")]
      [TestCase("AbcCde", "Abc Cde")]
      [TestCase("ABCCde", "ABC Cde")]
    
      [TestCase("Abc42Cde", "Abc42 Cde")]
      [TestCase("Abc42cde", "Abc42cde")]
      [TestCase("ABC42Cde", "ABC42 Cde")]
      [TestCase("42ABC", "42 ABC")]
      [TestCase("42abc", "42abc")]
    
      [TestCase("abc_cde", "abc cde")]
      [TestCase("Abc_Cde", "Abc Cde")]
      [TestCase("_Abc__Cde_", "Abc Cde")]
      [TestCase("ABC_CDE_FGH", "ABC CDE FGH")]
      [TestCase("ABC CDE FGH", "ABC CDE FGH")] // Should not happend (white char) anything that is not a letter/digit/'_' is considered as a separator
      [TestCase("ABC,CDE;FGH", "ABC CDE FGH")] // Should not happend (,;) anything that is not a letter/digit/'_' is considered as a separator
      [TestCase("abc<cde", "abc cde")]
      [TestCase("abc<>cde", "abc cde")]
      [TestCase("abc<D>cde", "abc cde")]  // Ignore one letter or one digit words
      [TestCase("abc<Da>cde", "abc Da cde")]
      [TestCase("abc<cde>", "abc cde")]
    
      [TestCase("SimpleHTTPServer", "Simple HTTP Server")]
      [TestCase("SimpleHTTPS2erver", "Simple HTTPS2erver")]
      [TestCase("camelCase", "camel Case")]
      [TestCase("m_Field", "Field")]
      [TestCase("mm_Field", "mm Field")]
      public void Test_GetWords(string identifier, string expectedWordsStr) {
         var expectedWords = expectedWordsStr.Split(' ');
         if (identifier == null || identifier.Length <= 1) {
            expectedWords = new string[0];
         }
    
         var words = identifier.GetWords();
         Assert.IsTrue(words.SequenceEqual(expectedWords));
      }
    
        17
  •  0
  •   John Smith jjcaicedo    6 年前

    一个简单的解决方案,它应该比正则表达式解决方案快一个数量级(基于我在这个线程中针对顶级解决方案运行的测试),特别是随着输入字符串的增长:

    string s1 = "ThisIsATestStringAbcDefGhiJklMnoPqrStuVwxYz";
    string s2;
    StringBuilder sb = new StringBuilder();
    
    foreach (char c in s1)
        sb.Append(char.IsUpper(c)
            ? " " + c.ToString()
            : c.ToString());
    
    s2 = sb.ToString();