代码之家  ›  专栏  ›  技术社区  ›  arachide

用于读取<title>和之间字符串的正则表达式

  •  0
  • arachide  · 技术社区  · 15 年前

    我希望以HTML字符串读取和之间的内容。

    我认为应该是客观的-C

    @"<title([\\s\\S]*)</title>"
    

    下面是为正则表达式重写的代码

    //source of NSStringCategory.h
    #import <Foundation/Foundation.h>
    #import <regex.h>
    
    
    @interface NSStringCategory:NSObject
    {
        regex_t preg;
    }
    
    -(id)initWithPattern:(NSString *)pattern options:(int)options;
    -(void)dealloc;
    
    -(BOOL)matchesString:(NSString *)string;
    -(NSString *)matchedSubstringOfString:(NSString *)string;
    -(NSArray *)capturedSubstringsOfString:(NSString *)string;
    
    +(NSStringCategory *)regexWithPattern:(NSString *)pattern options:(int)options;
    +(NSStringCategory *)regexWithPattern:(NSString *)pattern;
    
    +(NSString *)null;
    
    +(void)initialize;
    
    @end
    
    
    @interface NSString (NSStringCategory)
    
    
    -(BOOL)matchedByPattern:(NSString *)pattern options:(int)options;
    
    -(BOOL)matchedByPattern:(NSString *)pattern;
    
    -(NSString *)substringMatchedByPattern:(NSString *)pattern options:(int)options;
    
    
    -(NSString *)substringMatchedByPattern:(NSString *)pattern;
    
    
    -(NSArray *)substringsCapturedByPattern:(NSString *)pattern options:(int)options;
    
    
    -(NSArray *)substringsCapturedByPattern:(NSString *)pattern;
    
    
    -(NSString *)escapedPattern;
    
    @end
    

    和.m文件

     #import "NSStringCategory.h"
    static NSString *nullstring=nil;
    
    @implementation NSStringCategory
    
    -(id)initWithPattern:(NSString *)pattern options:(int)options
    {
        if(self=[super init])
        {
            int err=regcomp(&preg,[pattern UTF8String],options|REG_EXTENDED);
            if(err)
            {
                char errbuf[256];
                regerror(err,&preg,errbuf,sizeof(errbuf));
                [NSException raise:@"CSRegexException"
                            format:@"Could not compile regex \"%@\": %s",pattern,errbuf];
            }
        }
        return self;
    }
    
    -(void)dealloc
    {
        regfree(&preg);
        [super dealloc];
    }
    
    -(BOOL)matchesString:(NSString *)string
    {
        if(regexec(&preg,[string UTF8String],0,NULL,0)==0) return YES;
        return NO;
    }
    
    -(NSString *)matchedSubstringOfString:(NSString *)string
    {
        const char *cstr=[string UTF8String];
        regmatch_t match;
        if(regexec(&preg,cstr,1,&match,0)==0)
        {
            return [[[NSString alloc] initWithBytes:cstr+match.rm_so
                                             length:match.rm_eo-match.rm_so encoding:NSUTF8StringEncoding] autorelease];
        }
    
        return nil;
    }
    
    -(NSArray *)capturedSubstringsOfString:(NSString *)string
    {
        const char *cstr=[string UTF8String];
        int num=preg.re_nsub+1;
        regmatch_t *matches=calloc(sizeof(regmatch_t),num);
    
        if(regexec(&preg,cstr,num,matches,0)==0)
        {
            NSMutableArray *array=[NSMutableArray arrayWithCapacity:num];
    
            int i;
            for(i=0;i<num;i++)
            {
                NSString *str;
    
                if(matches[i].rm_so==-1&&matches[i].rm_eo==-1) str=nullstring;
                else str=[[[NSString alloc] initWithBytes:cstr+matches[i].rm_so
                                                   length:matches[i].rm_eo-matches[i].rm_so encoding:NSUTF8StringEncoding] autorelease];
    
                [array addObject:str];
            }
    
            free(matches);
    
            return [NSArray arrayWithArray:array];
        }
    
        free(matches);
    
        return nil;
    }
    
    +(NSStringCategory *)regexWithPattern:(NSString *)pattern options:(int)options
    { return [[[NSStringCategory alloc] initWithPattern:pattern options:options] autorelease]; }
    
    +(NSStringCategory *)regexWithPattern:(NSString *)pattern
    { return [[[NSStringCategory alloc] initWithPattern:pattern options:0] autorelease]; }
    
    +(NSString *)null { return nullstring; }
    
    +(void)initialize
    {
        if(!nullstring) nullstring=[[NSString alloc] initWithString:@""];
    }
    
    @end
    
    @implementation NSString (NSStringCategory)
    
    -(BOOL)matchedByPattern:(NSString *)pattern options:(int)options
    {
        NSStringCategory *re=[NSStringCategory regexWithPattern:pattern options:options|REG_NOSUB];
        return [re matchesString:self];
    }
    
    -(BOOL)matchedByPattern:(NSString *)pattern
    { return [self matchedByPattern:pattern options:0]; }
    
    -(NSString *)substringMatchedByPattern:(NSString *)pattern options:(int)options
    {
        NSStringCategory *re=[NSStringCategory regexWithPattern:pattern options:options];
        return [re matchedSubstringOfString:self];
    }
    
    -(NSString *)substringMatchedByPattern:(NSString *)pattern
    { return [self substringMatchedByPattern:pattern options:0]; }
    
    -(NSArray *)substringsCapturedByPattern:(NSString *)pattern options:(int)options
    {
        NSStringCategory *re=[NSStringCategory regexWithPattern:pattern options:options];
        return [re capturedSubstringsOfString:self];
    }
    
    -(NSArray *)substringsCapturedByPattern:(NSString *)pattern
    { return [self substringsCapturedByPattern:pattern options:0]; }
    
    -(NSString *)escapedPattern
    {
        int len=[self length];
        NSMutableString *escaped=[NSMutableString stringWithCapacity:len];
    
        for(int i=0;i<len;i++)
        {
            unichar c=[self characterAtIndex:i];
            if(c=='^'||c=='.'||c=='['||c=='$'||c=='('||c==')'
               ||c=='|'||c=='*'||c=='+'||c=='?'||c=='{'||c=='\\') [escaped appendFormat:@"\\%C",c];
            else [escaped appendFormat:@"%C",c];
        }
        return [NSString stringWithString:escaped];
    }
    
    
    
    @end
    

    我使用下面的代码获取“”和“”之间的字符串

    NSStringCategory *a=[[NSStringCategory alloc] initWithPattern:@"<title([\s\S]*)</title>" options:0];//
    

    不幸的是[MatchedSubStringOfString:Response]始终返回零

    如果正则表达式是错误的或任何其他原因,我不会这样做。

    欢迎发表评论

    谢谢

    Interdev公司

    3 回复  |  直到 15 年前
        1
  •  3
  •   Community CDub    8 年前

    (初步警告: you can't parse HTML correctly with Regex )


    您正在使用 regex.h ,它提供了posix正则表达式(在您的例子中是ere)。它们不支持所有PCRE语法,例如 \s \S (和) [\s\S] 不管怎样都是无用的,它匹配 任何东西 )。

    也许你应该用

    initWithPattern:@"<title[^>]*>([^<]*)</title>" options:REG_ICASE
    
        2
  •  1
  •   Tomislav Nakic-Alfirevic    15 年前

    <title[^>]*>\([^<]*\)</title> 应该有技巧。

        3
  •  0
  •   Stephen Harmon    15 年前

    对于这个特定的情况,我可以尝试从/system/library/frameworks/webkit框架实例化webdocumentrepresentation对象。

    可以将WebDocumentRepresentation对象的数据源设置为您感兴趣的HTML页,然后使用该对象的 标题 方法返回标题。

    这是 Mac OSX Reference Library document 在对象上。