脚本
我正在用Azure数据工厂v1开发ETL(不幸的是
我不能使用Azure数据工厂v2
)
我想从给定的blob存储容器中读取所有.csv文件,然后将每个文件的内容写入SQL Azure中的表。
问题
目标表包含csv文件中的所有列。它还必须包含一个具有数据来源文件名的新列。
这就是我被卡住的地方:我找不到将文件名从源数据集(.csv文件从Blob存储源)传递到目标数据集(SQL Azure接收器)的方法。
我已经试过了
我已经实现了一个管道,它从blob存储中读取文件并将其保存到SQLAzure中的表中。
下面是JSON中的一个摘录,它将单个文件复制到SQL Azure:
{
"name": "pipelineFileImport",
"properties": {
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource",
"recursive": false
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "TypeOfRecord:TypeOfRecord,TPMType:TPMType,..."
}
},
"inputs": [
{
"name": "InputDataset-cn0"
}
],
"outputs": [
{
"name": "OutputDataset-cn0"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Day",
"interval": 1
},
"name": "Activity-0-pipelineFileImport_csv->[staging]_[Files]"
}
],
"start": "2018-07-20T09:50:55.486Z",
"end": "2018-07-20T09:50:55.486Z",
"isPaused": false,
"hubName": "test_hub",
"pipelineMode": "OneTime",
"expirationTime": "3.00:00:00",
"datasets": [
{
"name": "InputDataset-cn0",
"properties": {
"structure": [
{
"name": "TypeOfRecord",
"type": "String"
},
{
"name": "TPMType",
"type": "String"
},
...
],
"published": false,
"type": "AzureBlob",
"linkedServiceName": "Source-TestBlobStorage",
"typeProperties": {
"fileName": "testFile001.csv",
"folderPath": "fileinput",
"format": {
"type": "TextFormat",
"columnDelimiter": ";",
"firstRowAsHeader": true
}
},
"availability": {
"frequency": "Day",
"interval": 1
},
"external": true,
"policy": {}
}
},
{
"name": "OutputDataset-cn0",
"properties": {
"structure": [
{
"name": "TypeOfRecord",
"type": "String"
},
{
"name": "TPMType",
"type": "String"
},...
],
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "Destination-SQLAzure-cn0",
"typeProperties": {
"tableName": "[staging].[Files]"
},
"availability": {
"frequency": "Day",
"interval": 1
},
"external": false,
"policy": {}
}
}
]
}
}
我需要什么
我需要一种方法将源文件的名称传递给目标数据集,以便将其写入SQL Azure数据库。