代码之家  ›  专栏  ›  技术社区  ›  Yneedtobeserious

给定的数据点,形成它们的关系

  •  0
  • Yneedtobeserious  · 技术社区  · 2 周前

    给定一个具有x、y和f_xy三列的数据集,我如何制定关于x、y与f_xy之间关系的假设?

    数据可以通过以下python代码进行复制:

    将熊猫作为pd导入

    数据 {'x':{0:0.3745401188473625, 1: 0.7319939418114051, 2: 0.1560186404424365, 3: 0.0580836121681994, 4: 0.6011150117432088, 5: 0.0205844942958024, 6: 0.8324426408004217, 7: 0.1818249672071006, 8: 0.3042422429595377, 9: 0.4319450186421157, 10: 0.6118528947223795, 11: 0.2921446485352181, 12: 0.4560699842170359, 13: 0.1996737821583597, 14: 0.5924145688620425, 15: 0.6075448519014384, 16: 0.0650515929852795, 17: 0.9656320330745594, 18: 0.3046137691733707, 19: 0.6842330265121569, 20: 0.1220382348447788, 21: 0.0343885211152183, 22: 0.2587799816000169, 23: 0.3117110760894109, 24: 0.5467102793432796, 25: 0.9695846277645586, 26: 0.9394989415641892, 27: 0.5978999788110851, 28: 0.0884925020519195, 29: 0.045227288910538, 30: 0.388677289689482, 31: 0.8287375091519293, 32: 0.2809345096873807, 33: 0.1409242249747626, 34: 0.0745506436797708, 35: 0.7722447692966574, 36: 0.0055221171236023, 37: 0.7068573438476171, 38: 0.7712703466859457, 39: 0.3584657285442726, 40: 0.8631034258755935, 41: 0.3308980248526492, 42: 0.3109823217156622, 43: 0.7296061783380641, 44: 0.8872127425763265, 45: 0.1195942459383017, 46: 0.7607850486168974, 47: 0.770967179954561, 48: 0.5227328293819941, 49: 0.0254191267440951}, “y”:{0:0.9507143064099162, 1: 0.5986584841970366, 2: 0.1559945203362026, 3: 0.8661761457749352, 4: 0.7080725777960455, 5: 0.9699098521619944, 6: 0.2123391106782761, 7: 0.1834045098534338, 8: 0.5247564316322378, 9: 0.2912291401980419, 10: 0.1394938606520418, 11: 0.3663618432936917, 12: 0.7851759613930136, 13: 0.5142344384136116, 14: 0.0464504127199977, 15: 0.1705241236872915, 16: 0.9488855372533332, 17: 0.8083973481164611, 18: 0.0976721140063838, 19: 0.4401524937396013, 20: 0.4951769101112702, 21: 0.909320402078782, 22: 0.662522284353982, 23: 0.5200680211778108, 24: 0.184854455525527, 25: 0.7751328233611146, 26: 0.8948273504276488, 27: 0.9218742350231168, 28: 0.1959828624191452, 29: 0.3253303307632643, 30: 0.2713490317738959, 31: 0.3567533266935893, 32: 0.5426960831582485, 33: 0.8021969807540397, 34: 0.9868869366005172, 35: 0.1987156815341724, 36: 0.8154614284548342, 37: 0.7290071680409873, 38: 0.0740446517340903, 39: 0.1158690595251297, 40: 0.6232981268275579, 41: 0.0635583502860236, 42: 0.325183322026747, 43: 0.6375574713552131, 44: 0.4722149251619493, 45: 0.713244787222995, 46: 0.5612771975694962, 47: 0.4937955963643907, 48: 0.4275410183585496, 49: 0.1078914269933044}, ‘f_xy’:{0:0.157130801863798, 1: 0.1713440485863025, 2: 3.129805110672629, 3: 0.204050792591901, 4: 0.1696344532792451, 5: 0.3507700760080584, 6: 0.6976759541933626, 7: 2.6420140332120847, 8: 0.6010195813501505, 9: 1.0230332050209543, 10: 1.2445986467211745, 11: 1.196766856155742, 12: 0.1560149760139054, 13: 0.9482903687042544, 14: 1.6336636832821407, 15: 1.1513253320985553, 16: 0.2340182514559566, 17: 0.0273325941782971, 18: 2.632976880968226, 19: 0.2266532986907281, 20: 1.16935854022532, 21: 0.3966214152543048, 22: 0.4876505603989647, 23: 0.656928500996059, 24: 1.0169235000222618, 25: 0.0241795100124299, 26: 0.014129531411363, 27: 0.017154888508862, 28: 3.1910192671449296, 29: 2.3823461725425723, 30: 1.4519917624809175, 31: 0.3668196629680049, 32: 0.7050384080679081, 33: 0.4011237323170427, 34: 0.0716906504662595, 35: 0.6570333572025112, 36: 0.5406866248448107, 37: 0.3868260229189749, 38: 0.9660682048266296, 39: 2.257494106637491, 40: 0.1340975089516994, 41: 2.6887050797417227, 42: 1.385803460277527, 43: 0.2538626894067123, 44: 0.2901622850453029, 45: 0.479476802610729, 46: 0.3525629145779723, 47: 0.1171668916298969, 48: 0.6217526074701243, 49: 5.13429928758166}}

    df=pd.DataFrame(数据)

    2 回复  |  直到 2 周前
        1
  •  0
  •   Jordan    2 周前

    对fxy所代表的内容进行一些澄清将是有益的。假设它是受x和y影响的因变量,有几种方法可以使用。一种方法是直观地绘制数据,例如,x与fxy、y与fxy和x与y,以寻找趋势。或者,您可以应用回归分析来确定最适合您正在探索的关系的模型。

        2
  •  0
  •   Hao Ren    2 周前

    在你开始创建任何情节之前,了解你想要说明的关键关系是至关重要的。例如,考虑一个数据集,其中$x$表示个人的年龄,$y$表示他们的体重,$f_{xy}$表示跑100米所需的时间。这里的目的是研究跑步速度是如何随着年龄和体重的变化而变化的。

    散点图是可视化数据集中关系的有效方法,可以同时显示$x$和$y$。散点图的大小表示跑步者的速度($f_{xy}$) seaborn 图书馆

    import seaborn as sns
    import matplotlib.pyplot as plt
    
    plt.figure(figsize=(10, 6))
    sns.scatterplot(x='x', y='y', size='f_{xy}', data=data, hue='f_{xy}', palette='viridis', sizes=(20, 200))
    plt.title('Scatter Plot of Age, Weight, and 100m Run Time')
    plt.show()
    

    在打印之前,请考虑是否需要使用 pandas 如果需要,将其转换(或展开)为带有进一步信息(如列或轴的标签)的DataFrame。