这听起来像是一份工作
re.sub
.模式
(?<!\r)\n
将匹配任何LF字符
\n
其前面没有回车(CR)
\r
.
这是一个示例文件,
sample data.txt
(显示行尾的屏幕截图)
为了避免任何行尾转换,请以二进制读取模式打开文件
'rb'
import re
pattern = b'(?<!\r)\n' # match any \n not preceded by \r
with open(r'<path to>\sample data.txt', 'rb') as file:
data = file.read()
print('Pre-substitution: ', data)
# replace any matches with a semicolon ';'
result = re.sub(pattern, b';', data)
print('Post-substitution: ', result)
此打印:
Pre-substitution: b'this line ends with CRLF\r\nthis line ends with LF\nthis line ends with CRLF\r\nthis line ends with LF\nthis line ends with CRLF\r\n'
Post-substitution: b'this line ends with CRLF\r\nthis line ends with LF;this line ends with CRLF\r\nthis line ends with LF;this line ends with CRLF\r\n'
值得一提的是,连续
n
s都将被替换,所以
\n\n\n
成为
;;;
和
\r\n\n
成为
r\n;
.
另请注意
pattern
string和替换值都是字节串(
b'<str>'
)-如果你不这样做,你会得到一个
TypeError
!