代码之家 › 专栏 › 技术社区 › Keltex

如何从段落中获取3行文本

string parsing .net c#

Keltex · 技术社区 · 15 年前

我试图从一个段落中创建一个“片段”。我有一段很长的课文,中间有一个单词。我想得到包含该行之前的单词和该行之后的单词的行。

我有以下信息:

文本(字符串)
这些行由换行符删除 \n
我把索引放在我想高亮显示的文本字符串中

其他几个标准:

如果我的话落在段落的第一行,它应该显示前三行
如果我的话落在段落的最后一行,它应该显示最后三行
应在除能情况下显示整个段落(段落只有1或2行)

下面是一个例子:

This is the 1st line of CAT text in the paragraph
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
This is the 4th line of DOG text in the paragraph
This is the 5th line of RABBIT text in the paragraph

例如,如果我的索引指向bird,那么它应该将第1、2和3行显示为一个完整的字符串,如下所示:

This is the 1st line of CAT text in the paragraph
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph

如果我的索引指向dog,它应该将第3、4和5行显示为一个完整的字符串,如下所示:

This is the 3rd line of MOUSE text in the paragraph
This is the 4th line of DOG text in the paragraph
This is the 5th line of RABBIT text in the paragraph

等。

有人想帮忙解决这个问题吗?

4 回复 | 直到 15 年前

Michael Madsen 15 年前

使用linq扩展方法获得正确的字符串:

string[] lines = text.Split('\n');

// Find the right line to work with
int position = 0;
for (int i = 0; i < lines.Count(); i++)
  if (lines[i].Contains(args[0]))
    position = i - 1;

// Get in range if we had a match in the first line
if (position == -1)
  position = 0;

// Adjust the line index so we have 3 lines to work with
if (position > lines.Count() - 3)
  position = lines.Count() - 3;

string result = String.Join("\n", lines.Skip(position).Take(3).ToArray());

当然,这可以通过在找到索引后立即退出for循环来进行一些优化,可能还有其他一些事情。你甚至可以使用linqify,这样你就不需要实际存储额外的数组,但是我现在想不出一个好的方法来实现这一点。

另一种检查位置的方法可能是 position = Math.Max(0,Math.Min(position, lines.Count() - 3)); -这会同时处理这两个问题。

Dan Tao 15 年前

在我看来,这是一个利用 StringReader 班级:

逐行阅读你的课文。
把你的线路放在某种缓冲区(例如 Queue<string> ),在读取给定数量的行后删除不需要的行。
一旦你的“针”被找到,再多读一行(如果可能的话),然后返回缓冲区中的内容。

我认为,这比其他建议的方法有一些优势:

因为它没有利用 String.Split ,它没有更多比您需要的工作更多——即读取整个字符串,寻找要拆分的字符,并创建子字符串数组。
实际上,它不一定读取整个字符串完全 ,因为一旦找到要查找的文本,它只会尽可能获得所需的填充行数。
它甚至可以重构(非常容易),以便能够通过 TextReader --例如,a StreamReader --因此它甚至可以处理大文件,而不必将给定文件的全部内容加载到内存中。

想象一下这个场景:你想从一个文本文件中找到一段文字,其中包含了一部小说的全部文字。(并不是说这是你的设想——我只是在假设。) 字符串。拆分 会要求 整本小说 根据指定的分隔符进行拆分,而使用 字符串读取器 (在这种情况下, 流读取器 )只需要阅读,直到找到所需的文本,此时将返回摘录。

再一次,我意识到这不一定你的场景——只是建议这种方法提供 可扩展性 作为它的优势之一。

下面是一个快速实现:

// rearranged code to avoid horizontal scrolling
public static string FindSurroundingLines
(string haystack, string needle, int paddingLines) {

    if (string.IsNullOrEmpty(haystack))
        throw new ArgumentException("haystack");
    else if (string.IsNullOrEmpty(needle))
        throw new ArgumentException("needle");
    else if (paddingLines < 0)
        throw new ArgumentOutOfRangeException("paddingLines");

    // buffer needs to accomodate paddingLines on each side
    // plus line containing the needle itself, so:
    // (paddingLines * 2) + 1
    int bufferSize = (paddingLines * 2) + 1;

    var buffer = new Queue<string>(/*capacity*/ bufferSize);

    using (var reader = new StringReader(haystack)) {
        bool needleFound = false;

        while (!needleFound && reader.Peek() != -1) {
            string line = reader.ReadLine();

            if (buffer.Count == bufferSize)
                buffer.Dequeue();

            buffer.Enqueue(line);

            needleFound = line.Contains(needle);
        }

        // at this point either the needle has been found,
        // or we've reached the end of the text (haystack);
        // all that's left to do is make sure the string returned
        // includes the specified number of padding lines
        // on either side
        int endingLinesRead = 0;
        while (
            (reader.Peek() != -1 && endingLinesRead++ < paddingLines) ||
            (buffer.Count < bufferSize)
        ) {
            if (buffer.Count == bufferSize)
                buffer.Dequeue();

            buffer.Enqueue(reader.ReadLine());
        }

        var resultBuilder = new StringBuilder();
        while (buffer.Count > 0)
            resultBuilder.AppendLine(buffer.Dequeue());

        return resultBuilder.ToString();
    }
}

一些示例输入/输出(带有 text 包含示例输入):

代码:

Console.WriteLine(FindSurroundingLines(text, "MOUSE", 1);

输出:

This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
This is the 4th line of DOG text in the paragraph

代码:

Console.WriteLine(FindSurroundingLines(text, "BIRD", 1);

输出:

This is the 1st line of CAT text in the paragraph
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph

代码:

Console.WriteLine(FindSurroundingLines(text, "DOG", 0);

输出:

This is the 4th line of DOG text in the paragraph

代码:

 Console.WriteLine(FindSurroundingLines(text, "This", 2);

输出:

This is the 1st line of CAT text in the paragraph
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
This is the 4th line of DOG text in the paragraph
This is the 5th line of RABBIT text in the paragraph

NebuSoft 15 年前

有几种方法可以解决这个问题:

第一种方法: 使用 String.IndexOf() 和 String.LastIndexOf() .

您可以使用 TextBox.SelectionStart() . 然后只需从选择位置查找lastIndexOf,查找“\n”以查找上一行(不要从选择中获取第一个lastIndexOf,一旦找到一个,请从该位置再次执行该操作,以便获得该行的开头)。然后从选择点执行同样的操作,只使用indexof查找'\n'以获取行的结尾。再一次,不要使用你找到的第一个,从第一个找到的位置开始重复它以得到第二行的结尾。然后简单地用找到的区域替换文本。

第二种方法: 使用 String.Split() 通过“\n”字符(创建一个字符串数组,每个字符串按照数组索引的顺序包含与文本不同的行)。找到文本所在行的索引,然后简单地从字符串[索引]中获取该行之前、包括和之后的索引。希望这两种方法足够清晰,可以让您理解您的代码。如果你还卡住了,告诉我。

MindingData 15 年前

好吧。让我试试看,

我想我要做的第一件事就是把所有的东西分成数组。因为这样我们就有了一个简单的方法来“数”线。

string[] lines = fullstring.Split('\n');

一旦我们有了它,不幸的是,我不知道有任何索引穿过数组中的每个点。可能有一个,但如果不在网上拖网,我只会去

int i = -1;
string animal = 'bird';

foreach(string line in lines)
{
i++;
if(line.indexof(animal) > -1) break;

}
// we will need a if(i == -1) then we didn't find the animal etc

好吧,那么,我们现在有电话了。我们要做的就是…

if(i == 0)
{
writeln(lines[0);
writeln(lines[1]);
etc
}
else
if(i == lines.count - 1)
{
//this means last array index
}
else
{
//else we are in the middle. So just write out the i -1, i, i+1
}

我知道那太乱了。但这就是我解决问题的方法。