代码之家  ›  专栏  ›  技术社区  ›  hdmiimdh

HTML到MD java解析异常

  •  -3
  • hdmiimdh  · 技术社区  · 7 年前

    html to md 然而,它似乎已经非常过时,不再工作了,下面的堆栈跟踪的bc,是否有机会在2018年用任何基于jvm的语言将html转换为md?

    这两个文件(html、xsl)都正确地格式化为UTF-8,并且不包含任何花哨的字符

    org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    

    这是我正在调整的代码

    public static void main(String[] args) throws TransformerException {
        final String md = convert(htmlLocation);
    }
    
    public static String convert(final String htmlLocation) throws TransformerException {
    
        if (html == null) {
            return "";
        }
    
        final File xslFile = new File(xslLocation);
        final Source htmlSource = new StreamSource(new StringReader(htmlLocation));
        final Source xslSource = new StreamSource(xslFile);
    
        final TransformerFactory transformerFactory = TransformerFactory.newInstance();
        final Transformer transformer = transformerFactory.newTransformer(xslSource);
    
        final StringWriter result = new StringWriter();
        transformer.transform(htmlSource, new StreamResult(result));
    
        return result.toString();
    }
    

    html内容

    <html>
        <h1>Lorem ipsum dolor</h1>
        <h2>Lorem ipsum dolor</h2>
        <p>Lorem ipsum dolor</p>
    </html>

    https://github.com/pnikosis/jHTML2Md

    1 回复  |  直到 7 年前
        1
  •  1
  •   akash    7 年前
    org.xml.sax.SAXParseException; 
    lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    

    UTF-8 BOM ( this 删除BOM表的命令。