我有一个pandas数据帧,格式如下:
record_id, f_1 , f_2, f_3, ... , f_n, A, B, C
1, 0.1, 0.2, 0.3, ... , 1.2, 1, 0, 1
2, 0.3, 1.2, 0.5, ... , 2.1, 1, 0, 0
3, 0.2, 3.2, 1.3, ... , 0.4, 1, 1, 0
4, 1.1, 0.1, 0.7, ... , 0.5, 0, 0, 1
5, 2.1, 0.5, 0.8, ... , 1.9, 0, 1, 1
6, 0.5, 0.4, 0.2, ... , 0.8, 1, 1, 1
:
:
1/(total_records_after_duplication)
.
record_id, f_1 , f_2, f_3, ... , f_n, target weight
1, 0.1, 0.2, 0.3, ... , 1.2, A 0.5
1, 0.1, 0.2, 0.3, ... , 1.2, C 0.5
2, 0.3, 1.2, 0.5, ... , 2.1, A 1.0
3, 0.2, 3.2, 1.3, ... , 0.4, A 0.5
3, 0.2, 3.2, 1.3, ... , 0.4, B 0.5
4, 1.1, 0.1, 0.7, ... , 0.5, C 1.0
5, 2.1, 0.5, 0.8, ... , 1.9, B 0.5
5, 2.1, 0.5, 0.8, ... , 1.9, C 0.5
6, 0.5, 0.4, 0.2, ... , 0.8, A 0.333
6, 0.5, 0.4, 0.2, ... , 0.8, B 0.333
6, 0.5, 0.4, 0.2, ... , 0.8, C 0.333
:
: