代码之家  ›  专栏  ›  技术社区  ›  sklal

在数据帧中按字母顺序排序

  •  0
  • sklal  · 技术社区  · 4 年前

    我有一个数据帧

    Counties                        Numbers
    Yabucoa Municipio, Puerto Rico  7766
    Marion County, West Virginia    8756
    Barbour County, Alabama         33445
    Santa Cruz County, Arizona      447
    Navajo County, Arizona          1500
    Denver County, Colorado         67990
    

    我试着以这样一种方式排序:州名按字母顺序排序,县名在州内部排序

    Counties                        Numbers
    Barbour County, Alabama         33445
    Navajo County, Arizona          1500
    Santa Cruz County, Arizona      447
    Denver County, Colorado         67990
    Yabucoa Municipio, Puerto Rico  7766
    Marion County, West Virginia    8756
    

    数据帧代码:

    df_test = pd.DataFrame([
        {'Counties': 'Yabucoa Municipio, Puerto Rico','Numbers': 7766},
        {'Counties': 'Marion County, West Virginia','Numbers': 8756},
        {'Counties': 'Barbour County, Alabama','Numbers': 33445},
        {'Counties': 'Santa Cruz County, Arizona','Numbers': 447},
        {'Counties': 'Navajo County, Arizona','Numbers': 1500},
        {'Counties': 'Denver County, Colorado','Numbers': 67990}
    ])
    

    我试过用 sort split 但它不能提供所需的输出

    df_test['Counties'] = df_test['Counties'].apply(lambda x: ','.join(sorted(x.split(','))))
    

    1 回复  |  直到 4 年前
        1
  •  1
  •   baxx    4 年前

    df = pd.DataFrame(
        [
            {"Counties": "Yabucoa Municipio, Puerto Rico", "Numbers": 7766},
            {"Counties": "Marion County, West Virginia", "Numbers": 8756},
            {"Counties": "Barbour County, Alabama", "Numbers": 33445},
            {"Counties": "Santa Cruz County, Arizona", "Numbers": 447},
            {"Counties": "Navajo County, Alabama", "Numbers": 1500},
            {"Counties": "Denver County, Colorado", "Numbers": 67990},
        ]
    )
    

    然后创建一个要重新排序的键:

    re_order_key = (
        df["Counties"]
        .str.split(",", expand=True)
        .rename(columns={0: "county", 1: "state"})
        .sort_values(by=["state", "county"])
    )
    

    将此索引与iloc一起使用:

    df.iloc[re_order.index, :].reset_index(drop=True)
    

    它给出:

                             Counties  Numbers
    0         Barbour County, Alabama    33445
    1          Navajo County, Alabama     1500
    2      Santa Cruz County, Arizona      447
    3         Denver County, Colorado    67990
    4  Yabucoa Municipio, Puerto Rico     7766
    5    Marion County, West Virginia     8756