代码之家  ›  专栏  ›  技术社区  ›  user3768495

如何正确使用熊猫矢量化?

  •  0
  • user3768495  · 技术社区  · 6 年前

    根据 an article , the vectorization apply 熊猫dafaframe列的函数。

    但我有一个特殊的案例:

    import pandas as pd
    
    df = pd.DataFrame({'IP': [ '1.0.64.2', '100.23.154.63', '54.62.1.3']})
    
    def compare3rd(ip):
        """Check if the 3dr part of an IP is greater than 100 or not"""
        ip_3rd = ip.split('.')[2]
        if int(ip_3rd) > 100:
            return True
        else:
            return False
    
    
    # This works but very slow
    df['check_results'] = df.IP.apply(lambda x: compare3rd(x))
    print df
    
    # This is supposed to be much faster
    # But it doesn't work ...
    df['check_results_2'] = compare3rd(df['IP'].values)
    print df 
    

    完全错误跟踪如下:

    Traceback (most recent call last):
      File "test.py", line 16, in <module>
        df['check_results_2'] = compare3rd(df['IP'].values)
      File "test.py", line 6, in compare3rd
        ip_3rd = ip.split('.')[2]
    AttributeError: 'numpy.ndarray' object has no attribute 'split'
    

    我的问题是:我如何正确使用这个 矢量化 方法在这种情况下?

    1 回复  |  直到 6 年前
        1
  •  1
  •   BENY    6 年前

    用支票 str 在里面 pandas

    df.IP.str.split('.').str[2].astype(int)>100
    0    False
    1     True
    2    False
    Name: IP, dtype: bool
    

    自从你提到 vectorize

    import numpy as np
    np.vectorize(compare3rd)(df.IP.values)
    array([False,  True, False])