代码之家  ›  专栏  ›  技术社区  ›  Khushboo

使用Beautifulsoup解析HTML-打印有效,但返回无效

  •  0
  • Khushboo  · 技术社区  · 1 年前

    为什么 print() 返回这些标记下的所有文本,但是 return 不是吗?

    这是我正在使用的功能-

    def parse_html(data):
        ls = []
        htmlParse = BeautifulSoup(data, 'html.parser')
        for para in htmlParse.find_all(['script', 'head', 'title', 'meta', '[document]', 'p', 'body', 'a', "form", "input", "button", "style"]): 
            ls.append(para.text.strip())
            return ls
    
    Text = '<!DOCTYPE html><html><head>    <meta charset="utf-8">    <meta http-equiv="X-UA-Compatible" content="IE=edge">    <meta name="viewport" content="width=device-width, initial-scale=1">    <title>FlexPortalen - Log ind</title>    <link rel="stylesheet" href="/Content/bootstrap.css" />    <link rel="stylesheet" href="/Content/bootstrap-theme.min.css" />    <link rel="stylesheet" href="/login.css" />    <!--[if lt IE 9]>      <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>    <![endif]--></head><body>    <div class="container">        <div class="login-box">            <form method="post">                <input name="__RequestVerificationToken" type="hidden" value="w4YgqRKtcaPFQn6ncaavNgPVb5rLp0CtbylMJ3zYYa2fTGoAfkJ97araAO5i4Nbwo0wERIboCQssguo0UviOaM3HvECpjfuokKcq4rt_ADM1" />                <h2 class="text-center login-heading">FlexPortalen</h2>                <div class="form-group">                    <input type="text" class="form-control input-lg" name="username" id="username" placeholder="Brugernavn...    " />                </div>                <div class="form-group">                        <input type="password" class="form-control input-lg" name="password" id="password" placeholder="Adgangskode..." />                </div>                <div class="checkbox text-center">                    <label>                        <input type="checkbox" name="rememberMe" id="rememberMe"  /> Husk mig?                    </label>                </div>                                <p class="text-center">                    <button type="submit" class="btn btn-primary btn-lg">Log ind</button>                </p>            </form>        </div>    </div>    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>    <script src="/Scripts/bootstrap.min.js"></script></body></html>'
    

    如果我打印,它会给出:

    FlexPortalen - Log ind
    FlexPortalen  
    Husk mig?                 
    Log ind
    

    但当我回来时,它只给出:

    ['FlexPortalen - Log ind']
    
    1 回复  |  直到 1 年前
        1
  •  0
  •   HedgeHog    1 年前

    检查您的缩进 return -要返回 list 所有信息都放在外面 for loop ,否则它会返回 ls 第一次迭代:

    def parse_html(data):
        ls = []
        htmlParse = BeautifulSoup(data, 'html.parser')
        for para in htmlParse.find_all(['script', 'head', 'title', 'meta', '[document]', 'p', 'body', 'a', "form", "input", "button", "style"]): 
            ls.append(para.text.strip())
        return ls