Featuretools适用于具有
或
没有日期时间,回答你的问题,Featuretools
为没有日期时间的单个表生成功能。对于iris数据集,只有一个表,没有要规范化的直接特征(从现有表生成新表),因此您可以使用transform原语生成新特征。
-
制造
EntitySet
-
添加单个
entity
-
transform
你想要的原语。
下面是一个完整的工作示例:
from sklearn.datasets import load_iris
import pandas as pd
import featuretools as ft
# Load data and put into dataframe
iris = load_iris()
df = pd.DataFrame(iris.data, columns = iris.feature_names)
df['species'] = iris.target
df['species'] = df['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})
# Make an entityset and add the entity
es = ft.EntitySet(id = 'iris')
es.entity_from_dataframe(entity_id = 'data', dataframe = df,
make_index = True, index = 'index')
# Run deep feature synthesis with transformation primitives
feature_matrix, feature_defs = ft.dfs(entityset = es, target_entity = 'data',
trans_primitives = ['add_numeric', 'multiply_numeric'])
feature_matrix.head()
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species petal width (cm) + sepal width (cm) petal length (cm) + petal width (cm) petal length (cm) + sepal length (cm) petal length (cm) + sepal width (cm) sepal length (cm) + sepal width (cm) petal width (cm) + sepal length (cm) petal length (cm) * sepal width (cm) sepal length (cm) * sepal width (cm) petal width (cm) * sepal length (cm) petal width (cm) * sepal width (cm) petal length (cm) * sepal length (cm) petal length (cm) * petal width (cm) petal width (cm) + sepal width (cm) * sepal length (cm) + sepal width (cm) petal width (cm) + sepal width (cm) * sepal length (cm) petal length (cm) + petal width (cm) * petal width (cm) petal width (cm) + sepal length (cm) * sepal length (cm) petal length (cm) * petal width (cm) + sepal length (cm) petal width (cm) * sepal length (cm) + sepal width (cm) petal length (cm) + sepal length (cm) * sepal width (cm) petal length (cm) + petal width (cm) * sepal length (cm) petal length (cm) + sepal length (cm) * petal width (cm) + sepal width (cm) petal length (cm) + sepal length (cm) * sepal length (cm) + sepal width (cm) petal length (cm) * sepal length (cm) + sepal width (cm) petal length (cm) + sepal width (cm) * sepal length (cm) + sepal width (cm) petal length (cm) + sepal width (cm) * sepal length (cm) petal length (cm) + petal width (cm) * petal length (cm) + sepal length (cm) petal length (cm) + sepal length (cm) * petal width (cm) petal length (cm) + sepal width (cm) * sepal width (cm) petal length (cm) + petal width (cm) * petal length (cm) + sepal width (cm) sepal length (cm) + sepal width (cm) * sepal width (cm) petal length (cm) + sepal length (cm) * petal width (cm) + sepal length (cm) petal width (cm) + sepal length (cm) * sepal length (cm) + sepal width (cm) petal length (cm) + petal width (cm) * petal width (cm) + sepal width (cm) petal length (cm) + sepal width (cm) * petal width (cm) + sepal length (cm) petal length (cm) * petal length (cm) + sepal length (cm) petal width (cm) * petal width (cm) + sepal width (cm) petal length (cm) + petal width (cm) * sepal length (cm) + sepal width (cm) petal length (cm) * petal width (cm) + sepal width (cm) petal length (cm) + sepal width (cm) * petal width (cm) petal length (cm) * petal length (cm) + petal width (cm) petal length (cm) + petal width (cm) * sepal width (cm) petal length (cm) + sepal length (cm) * sepal length (cm) petal width (cm) + sepal length (cm) * sepal width (cm) petal length (cm) * petal length (cm) + sepal width (cm) petal width (cm) + sepal width (cm) * sepal width (cm) petal length (cm) + sepal length (cm) * petal length (cm) + sepal width (cm) sepal length (cm) * sepal length (cm) + sepal width (cm) petal width (cm) * petal width (cm) + sepal length (cm) petal length (cm) + sepal width (cm) * petal width (cm) + sepal width (cm) petal length (cm) + petal width (cm) * petal width (cm) + sepal length (cm) petal width (cm) + sepal length (cm) * petal width (cm) + sepal width (cm)
index
0 5.1 3.5 1.4 0.2 setosa 3.7 1.6 6.5 4.9 8.6 5.3 4.90 17.85 1.02 0.70 7.14 0.28 31.82 18.87 0.32 27.03 7.42 1.72 22.75 8.16 24.05 55.90 12.04 42.14 24.99 10.40 1.30 17.15 7.84 30.10 34.45 45.58 5.92 25.97 9.10 0.74 13.76 5.18 0.98 2.24 5.60 33.15 18.55 6.86 12.95 31.85 43.86 1.06 18.13 8.48 19.61
1 4.9 3.0 1.4 0.2 setosa 3.2 1.6 6.3 4.4 7.9 5.1 4.20 14.70 0.98 0.60 6.86 0.28 25.28 15.68 0.32 24.99 7.14 1.58 18.90 7.84 20.16 49.77 11.06 34.76 21.56 10.08 1.26 13.20 7.04 23.70 32.13 40.29 5.12 22.44 8.82 0.64 12.64 4.48 0.88 2.24 4.80 30.87 15.30 6.16 9.60 27.72 38.71 1.02 14.08 8.16 16.32
2 4.7 3.2 1.3 0.2 setosa 3.4 1.5 6.0 4.5 7.9 4.9 4.16 15.04 0.94 0.64 6.11 0.26 26.86 15.98 0.30 23.03 6.37 1.58 19.20 7.05 20.40 47.40 10.27 35.55 21.15 9.00 1.20 14.40 6.75 25.28 29.40 38.71 5.10 22.05 7.80 0.68 11.85 4.42 0.90 1.95 4.80 28.20 15.68 5.85 10.88 27.00 37.13 0.98 15.30 7.35 16.66
3 4.6 3.1 1.5 0.2 setosa 3.3 1.7 6.1 4.6 7.7 4.8 4.65 14.26 0.92 0.62 6.90 0.30 25.41 15.18 0.34 22.08 7.20 1.54 18.91 7.82 20.13 46.97 11.55 35.42 21.16 10.37 1.22 14.26 7.82 23.87 29.28 36.96 5.61 22.08 9.15 0.66 13.09 4.95 0.92 2.55 5.27 28.06 14.88 6.90 10.23 28.06 35.42 0.96 15.18 8.16 15.84
4 5.0 3.6 1.4 0.2 setosa 3.8 1.6 6.4 5.0 8.6 5.2 5.04 18.00 1.00 0.72 7.00 0.28 32.68 19.00 0.32 26.00 7.28 1.72 23.04 8.00 24.32 55.04 12.04 43.00 25.00 10.24 1.28 18.00 8.00 30.96 33.28 44.72 6.08 26.00 8.96 0.76 13.76 5.32 1.00 2.24 5.76 32.00 18.72 7.00 13.68 32.00 43.00 1.04 19.00 8.32 19.76
有关基本体的详细信息,请参见
here
here
.
Featuretools最适用于具有多个表的关系数据集,但也适用于单个表。有
several demos
从一个表中工作,其中一个实体被规范化以创建多个表。这个
taxi trip duration
项目就是这种方法的一个很好的例子。
对于iris数据集,只有4个数字特征——假设你预测的是物种——这些特征不能直接用于你可以标准化的值。但是,您可以将诸如KMeans集群之类的集群技术应用于数字特征,然后基于集群分配创建实体。这个
predict remaining useful life