边界查询需要稍作修改。默认情况下,
Sqoop
将使用以下查询查找创建拆分的边界:
SELECT MIN(department_id), MAX(department_id) FROM departments
要导入数据的子集,可以使用此边界查询提供上下限:
SELECT 3,6 FROM departments
下图提供了更多详细信息:
mysql> create database retail_db;
mysql> use retail_db;
mysql> create table departments (department_id int primary key, department_name varchar(255));
mysql> insert into departments values(2, 'Fitness');
mysql> insert into departments values(3, 'Footwear');
mysql> insert into departments values(4, 'Apparel');
mysql> insert into departments values(5, 'Golf');
mysql> insert into departments values(6, 'Outdoors');
mysql> insert into departments values(7, 'Fan Shop');
2) 检查数据
mysql> select * from departments;
+---------------+-----------------+
| department_id | department_name |
+---------------+-----------------+
| 2 | Fitness |
| 3 | Footwear |
| 4 | Apparel |
| 5 | Golf |
| 6 | Outdoors |
| 7 | Fan Shop |
+---------------+-----------------+
6 rows in set (0.00 sec)
3) 运行Sqoop作业
$ sqoop import --connect jdbc:mysql://localhost:3306/retail_db --username user --password password --table departments --target-dir /test/run --boundary-query 'SELECT 3,6 FROM departments'
$ hadoop fs -cat /test/run/part-*
3,Footwear
4,Apparel
5,Golf
6,Outdoors