代码之家  ›  专栏  ›  技术社区  ›  Bananas

使用univocity解析两个不同的csv文件并写入新的csv文件

  •  2
  • Bananas  · 技术社区  · 7 年前

    我总是在java程序中使用univocity解析器来比较csv文件。它工作得很好,速度快得多。

    但问题是,这次我试图解析两个具有复杂值的不同大容量csv文件,并在新的csv文件中打印差异,

    查看作者的一个示例,在将文件1读入列表并转换为映射后,我尝试使用processFile,但在解析时仍然出现错误。

    下面是我的示例输入和预期输出文件。

    输入-文件1

    "h1","h2","h3","h4","h5"
    "00000","US","9503.00.0089","USA","9503.0089"
    "","EU","9503.00.7000","EUROPEAN UNION","9503.00.7000"
    "#1200","US","5601.22.0010","USA","5601.22.0010"
    "0180691","US","9503.00.0073","USA","9503.00.0073"
    “DRTY01”,”CA”,”9603.01.0088”,”CAN”,”9603.01.0088”
    

    输入-文件2

    "h1","h2","h3","h6","h7","h8","h9","h10",h11 
    "018890","US","","2015","101","1","1","All",””
    "00000","US","9503.00.0090","1986","101","1","1","All","9503.00.0090"
    "0180691","US","9503.00.0073","2019","101","1","1","All","9503.00.0073”
    “DRTY01”,”CA”,”9603.01.0087”,”2002”,”102”,”1”,”2”,”CA”, “9603.01.0087”
    

    在file1和file2中选择h1,h2公共值,然后比较file1的h3和file2的h3,如果两个文件h3不相等,那么我想打印-h1-h1,h4,h10,h5,h11,h6,h7,h7。到文件3

    输出-文件3

    “h1”,”h4”,” h10”,”h5”, ”h11”,”h6”,”h7”,”h8”,”h9”
    "00000","USA”,”All”,”9503.00.0089”,”9503.00.0090”, "1986","101","1","1"   
    "DRTY01”,“CAN”,”CA”,”9603.01.0088”,“9603.01.0087”,”2002”,”102”,”1”,”2”
    
    1 回复  |  直到 7 年前
        1
  •  2
  •   utkarsh31    7 年前

    我有一个你的问题的解决方案,但请做回归测试。所以我假设的是 h1和h2的组合将是唯一的值 . 我正在创建一个HashMap,其中一个映射作为键,csv文件的整行作为值。我们将重写所创建类的hashcode和equals方法,如:

    • 哈希代码将只使用h1和h2来生成代码(因为它们肯定是唯一的)
    • 我们还将使用h3作为比较条件,当两个h3相同时,该条件将返回false。

    equals中的逻辑是-如果h1和h2在map1和map2中相同,而h3不同,请给出map1和map2中的行。该逻辑在映射中使用额外的空间,但总体计算逻辑减少到 O(N) . 下面的代码将为您提供所需的地图行。我没有正确执行IO和异常处理,请妥善处理。

    public class UnivocityTest
    {
    
        public static void main(String[] args) throws FileNotFoundException
        {
            // Get data from csv file1
            List<String[]> f1 = getData("example.csv");
            // Get data from csv file2
    
    
           List<String[]> f2 = getData("example1.csv");
    
            // Convert data to a Map with HeaderList class and entire row.
            Map<HeaderList, String[]> map1 = convertAndReturn(f1);
            Map<HeaderList, String[]> map2 = convertAndReturn(f2);
    
            //Currently prints the required rows.
            compareData(map1, map2);
        }
    
        // Convert csv to List<String[]>
        private static List<String[]> getData(String file) throws FileNotFoundException
        {
            CsvParserSettings parserSettings = new CsvParserSettings();
            parserSettings.setLineSeparatorDetectionEnabled(true);
            RowListProcessor rowProcessor = new RowListProcessor();
            parserSettings.setProcessor(rowProcessor);
            parserSettings.setHeaderExtractionEnabled(true);
    
            CsvParser parser = new CsvParser(parserSettings);
            parser.parse(getReader(file));
            // String[] headers = rowProcessor.getHeaders();
            List<String[]> rows = rowProcessor.getRows();
    
            return rows;
        }
    
        // get reader object
        private static Reader getReader(String string) throws FileNotFoundException
        {
            // TODO Add proper file handling and exception handling
            return new FileReader(new File(string));
        }
    
        // Return HashMap
        private static Map<HeaderList, String[]> convertAndReturn(List<String[]> f1)
        {
            Map<HeaderList, String[]> map = new java.util.HashMap<>();
    
            for (String[] each : f1)
            {
                // For each row in csv create a corresponding HeaderList object with h1,h2 and h3 as key
                // and row as value.
                HeaderList header = new HeaderList(each[0], each[1], each[2]);
                map.put(header, each);
            }
    
            return map;
        }
    
        private static void compareData(Map<HeaderList, String[]> map1, Map<HeaderList, String[]> map2)
        {
            // Iterates over the map1 keys one by one. For each key we check if there is a matching key
            // in map2. The matching condition will be h1 and h2 should be same while h3 should be
            // different. Once a key like that is found currently I'm printing both the rows, here you
            // can get the rows you want from the map and return them.
    
            for (HeaderList each : map1.keySet())
            {
                if (map2.containsKey(each))
                {
    //TODO Assume you want columns h3,h4 from file1 and h6  h7 from file2.
                    //We know map1 represents file1 with columns h3 and h4 at positions 2 and 3 inside the String[]
                    //We know map2 represents file1 with columns h6 and h7 at positions 3 and 4 inside the String[]
                    String h3FromFile1 = map1.get(each)[2];
                    String h4FromFile1 = map1.get(each)[3];
                    String h6FromFile2 = map2.get(each)[3];
                    String h7FromFile2 = map2.get(each)[4];
                    System.out.println("Required Columns: ");
                    System.out.println("h3 file1: "+ h3FromFile1);
                    System.out.println("h4 file1: "+ h4FromFile1);
                    System.out.println("h6 file2: "+ h6FromFile2);
                    System.out.println("h7 file2: " + h7FromFile2);
                    System.out.println(Arrays.toString(map1.get(each)));
                    System.out.println(Arrays.toString(map2.get(each)));
                    System.out.println("-------------------------------");
                }
            }
        }
    
    }
    

    包含三列h1、h2、h3的bean类:

    class HeaderList
            {
    
                private String h1;
    
                private String h2;
    
                private String h3;
    
                public HeaderList(String h1, String h2, String h3)
                {
                    super();
                    this.h1 = h1;
                    this.h2 = h2;
                    this.h3 = h3;
                }
    
                /**
                 * The hash code method which generate same hashkey for h1 and h2.
                 * 
                 * @inheritDoc
                 */
                @Override
                public int hashCode()
                {
                    final int prime = 31;
                    int result = 1;
                    result = prime * result + ((h1 == null) ? 0 : h1.hashCode());
                    result = prime * result + ((h2 == null) ? 0 : h2.hashCode());
                    return result;
                }
    
                /**
                 * The equals method assumes each csv file row will be uniquely identified my h1 and h2
                 * combined. Please see if h1 and h2 cannot be uniquely identified then it may lead to data
                 * loss. For h3 we return true only for same values.
                 * 
                 * @inheritDoc
                 */
                @Override
                public boolean equals(Object obj)
                {
                    if (this == obj)
                        return true;
                    if (obj == null)
                        return false;
                    if (getClass() != obj.getClass())
                        return false;
                    HeaderList other = (HeaderList) obj;
                    if (h1 == null)
                    {
                        if (other.h1 != null)
                            return false;
                    }
                    else if (!h1.equals(other.h1))
                        return false;
                    if (h2 == null)
                    {
                        if (other.h2 != null)
                            return false;
                    }
                    else if (!h2.equals(other.h2))
                        return false;
                    if (h3 == null)
                    {
                        if (other.h3 == null)
                            return false;
                    }
                    else if (h3.equals(other.h3))
                        return false;
                    return true;
                }
    
                /**
                 * @inheritDoc
                 */
                @Override
                public String toString()
                {
                    return "HeaderList [h1=" + h1 + ", h2=" + h2 + ", h3=" + h3 + "]";
                }
    
            }
    

    给定输入csv文件的输出:

    Required Columns: 
    h3 file1: 9603.01.0088
    h4 file1: CAN
    h6 file2: 2002
    h7 file2: 102
    [DRTY01, CA, 9603.01.0088, CAN, 9603.01.0088]
    [DRTY01, CA, 9603.01.0087, 2002, 102, 1, 2, CA, 9603.01.0087]
    -------------------------------
    Required Columns: 
    h3 file1: 9503.00.0089
    h4 file1: USA
    h6 file2: 1986
    h7 file2: 101
    [00000, US, 9503.00.0089, USA, 9503.0089]
    [00000, US, 9503.00.0090, 1986, 101, 1, 1, All, 9503.00.0090]
    -------------------------------