代码之家  ›  专栏  ›  技术社区  ›  user2198367

使用电子表格:ParseExcel获取XLS文件中的零单元格

  •  1
  • user2198367  · 技术社区  · 13 年前

    首先参考堆栈溢出问题 Using Spreadsheet::WriteExcel .

    数据结构如下所示:

          col1      col2    col3   col4   col5
    row1  School    1
    row2  Dean      John
    row3  No.stu.   55
    row4  some irrelevant stuff
    row5  School2   2
    row6  Dean      Tony
    row7  No. stu.  60
    row8  some irrelevant stuff
    row9  School    3
    row10 Dean      James
    row11 No.stu.   56
    row12 No. teacher 20
    row13 School    4
    row14 Dean      Tom
    row15 No.stu.   79
    row16 No. teacher 21
    row17 course
    row18           math    2
    row19           eng     4
    row20 teacher   name    age   gender   race
    row21           Jane    20    female   white
    row22 student   name    Lee
    row23           SAT     1434
    row24           gender  male
    

    我想要实现的输出是:

          col1  col2  col3     col4          col5         col6          col7        col8       col9
    row1 School Dean No.stu.  No. teacher  course_math  course_eng  teacher_name  teacher_age  teacher_gender    teacher_race    student_name   student_SAT   student_gender
    row2 1      John  55
    row3 2      Tony  60
    row4 3      James 56       20
    row5 4      Tome  79       21              2           4            Jane        20          female                white         Lee         1434          male
    

    多亏了黑帮,我得到的密码是:

    use strict;
    use warnings;
    
    use Spreadsheet::ParseExcel;
    use FindBin qw($Bin);
    
    my ($infile) = @ARGV;
    
    my $parser   = Spreadsheet::ParseExcel->new();
    my $workbook = $parser->parse("$Bin/Test.xls");
    die $parser->error unless defined $workbook;
    my ($worksheet) = $workbook->worksheets();
    
    my %data;
    my $row    = 0;
    my $school = "";
    while (1) {
        my $cell = $worksheet->get_cell($row, 0);
        last unless defined($cell);
    
        my $key = $cell->value();
        my $value = $worksheet->get_cell($row++, 1)->value();
    
        if ($key eq "School") {
    
            $school = $value;
            next;
        }
    
        $data{$school}->{$key} = $value;
    }
    sleep 1;
    

    我从解析row17-row19开始。我遇到的第一个问题是(第17行,第3列)中的空单元格。代码到达此处时出错。知道Excel单元格区分“空”和“空”,我可以通过将原始XLS文件的单元格格式设置为“常规”以外的其他格式来处理它。然而,这只是一个暂时的解决方案。我想知道是否有任何命令可以用来获取空单元格。我已经试过了 unformatted() 通过添加:

    my $unformattedvalue = $worksheet->get_cell( $row++, 1 )->unformatted();
    

    然而,这并不奏效。

    然后,我尝试使用以下代码指定“课程”状态下的数据结构:

    my %data;
    my $row    = 0;
    my $school = "";
    my $course = ""; #Initial value for the state of course
    while (1) {
        my $cell = $worksheet->get_cell($row, 0);
        last unless defined($cell);
    
        my $key = $cell->value();
        my $value  = $worksheet->get_cell( $row++, 1 )->value();
        my $value1 = $worksheet->get_cell( $row++, 2 )->value(); #Fetching the value in column 3
    
        if ($key eq "School") {
    
            $school = $value;
            next;
        }
    
        if ($key eq "course") { #Just mimicking the how we construct the structure of 'School'
    
             $course = $value1;
             next;
        }
    
        $data{$school}->{$key} = $value;  #Must be something wrong here, but can not figure out
    }
    

    代码未通过并给出 Can't call method "value" on an undefined value at xxx line of 'my $value1 = $worksheet->get_cell( $row++, 2 )->value()';

    简而言之,我的问题是:

    1. 一般来说,如何在不干预过程的情况下获取XLS中的空单元格?
    2. 如何使用电子表格解析如下结构::ParseExcel

      row17 course
      row18           math    2
      row19           eng     4
      
    2 回复  |  直到 9 年前
        1
  •  2
  •   imran    13 年前

    你正在做一个 $row++ 当你不该这么做的时候。

    请尝试使用以下代码作为起点:

    my %data;
    my $state = "";
    my $school = "";
    my $student = "";
    my ( $row_min, $row_max ) = $worksheet->row_range();
    my $row = $row_min;
    while ($row <= $row_max) {
        my $cell0 = $worksheet->get_cell( $row, 0 );
        my $cell1 = $worksheet->get_cell( $row, 1 );
    
        if (defined($cell0)) {
            my $key = $cell0->value();
            if ($key eq 'School') {
                $state = 'school';
                $school = $cell1->value();
            } elsif ($key eq 'course') {
                $state = 'course';
            } elsif ($key eq 'teacher') {
                $state = 'teacher';
            } elsif ($key eq 'student') {
                $state = 'student');
                $student = $worksheet->get_cell( $row, 2 )->value();
            } else {
                $data{$school}{$key} = $cell1->value();
            }
        } elsif ($state eq 'course') {
            # process columns for course
        } elsif ($state eq 'teacher') {
            # process columns for teacher
        } elsif ($state eq 'student') {
            # process columns for student
        }
        $row++;
    }
    use Data::Dumper;
    print Dumper(\%data);
    

    更新:

    要处理教师行,我会首先在第一个while循环之外声明%tacher_columns散列,并使用col_range()方法声明$col_min和$col_max。然后在里面 $key eq 'teacher' elsif子句,我会这样做:

    %teacher_columns = (); # clear it out in case column names are different for this school
    for my $col (2 .. $col_max) {
        my $cell = $worksheet->get_cell( $row, $col );
        $teacher_columns{$cell->value()} = $col if defined($cell) and $cell->value();
    }
    

    然后在 state eq 'teacher' elsif子句,我会这样做:

    foreach my $key (keys %teacher_columns) {
        my $cell = $worksheet->get_cell($row, $teacher_columns{$key});
        if (defined($cell)) {
          # store cell data into proper location of your data hash
        }
    }
    
        2
  •  0
  •   Peter Mortensen Pieter Jan Bonestroo    11 年前

    这是我的家庭作业:

    我只是简单地添加了列的进程。

    elsif ($state eq 'student') {
            my $key = $cell1->value();
            $data{$school}{$student}{$key} =$cell2->value();
    }  elsif ($state eq 'course') {
           my $key = $cell1->value();
            $data{$school}{$course}{$key} =$cell2->value();
    

    哪里 $cell2 定义为:

    my $cell2 = $worksheet->get_cell( $row, 2 );
    

    它给了我:

    $VAR1 = {
              '4' => {
                       'course' => {
                                     'math' => '2',
                                     'eng' => '4'
                                   },
                       'No.Stu' => '79',
                       'No.Teacher' => '21',
                       'Lee' => {
                                  'SAT' => '1434',
                                  'gender' => 'male'
                                },
                       'Dean' => 'Tom'
                     },
              '1' => {
                       'No.Stu' => '55',
                       'Dean' => 'John'
                     },
              '3' => {
                       'No.Stu' => '56',
                       'No.Teacher' => '20',
                       'Dean' => 'James'
                     },
              '2' => {
                       'No.Stu' => '60',
                       'Dean' => 'Tony'
                     }
            };
    

    看起来不错。

    但我不知道如何为老师处理这个专栏,因为它有一些不同的结构:

            col1      col2    col3   col4   col5
      row20 teacher   name    age   gender   race
      rwo21           Jane    20    female   white
    

    我试图在列上循环以获取单元格值作为键,但没有成功:

    my ( $col_min, $col_max ) = $worksheet->col_range();
    my $col = $col_min;
    my $cell3 = $worksheet->get_cell(++$row, $col );
    
    elsif ($state eq 'teacher') {
            while ($col <= $col_max) {
                my $key = $worksheet->get_cell($row, $col++ );
            }
               $data{$school}{$teacher}{$key} =$cell3->value();
        }
    

    有什么建议吗?

    更新:我遵循了@imran的建议,效果很好,然后我只需使用以下代码将我解析的内容重写到一个新的XLS文件中:

    use Spreadsheet::WriteExcel;
    
    $workbook = Spreadsheet::WriteExcel->new('Result.xls');
    $worksheet = $workbook->add_worksheet();
    
    $col = 0;
    $row = 0;
    
    $worksheet->write( $row++, $col,
        ["School", "No.Stu", "No.Teacher", "Dean","Course_math", "Course_eng", "student_SAT", "student_name", "student_gender", "teacher_race", "teacher_name", "teacher_age", "teacher_gender"] );
    
    foreach my $school( sort keys %data ) {
    
         $worksheet->write( $row++, $col,
            [ $school, @{$data{$school}}{ "No.Stu", "No.Teacher", "Dean"}, @{$data{$school}{course}}{ "math", "eng"}, @{$data{$school}{student}}{ "SAT", "name", "gender"}, @{$data{$school}{teacher}}{ "race", "name", "age", "gender"} ]) ;
    }
    
    $workbook->close();
    

    除了一件小事外,一切都很顺利。的单元格 teacher_age 为空,但它在哈希表中显示它已经被解析。我想不通。。。

    散列后的教师部分如下:

               'teacher' => {
                              'race' => 'white',
                              'name' => 'Jane',
                              'age ' => '25',
                              'gender' => ' female'
                            },
    

    我使用的代码正是你建议的,但重新格式化的xls中的“teacher_age”单元格。文件为空。

    更新2:回到关于“state”的问题,它总是从后面的行开始解析吗?如果我有以下数据结构怎么办:

     School    1
     course    math
               eng
     ...
     School     2
     course    phy
     ...
     School     3
     course    chem
               gym
               music
    

    如果我使用旧代码并定义课程状态,我只能解析 eng 从…起 school 1 , gym and music 课程起点 school 2 。嗯,我本来想在前面一排定义球场状态,但没有得到任何运气。有什么建议吗?

    更新:

    我听从了@imran的建议,将课程代码更改为:

    } elsif ($key eq 'course') {
                $state = 'course';
                $course = $worksheet->get_cell( $row, 1 )->value();
    }
    

    相应地,哈希表为:

    } elsif ($state eq 'course') {
                my $key = $cell1->value();
                $data{$school}{$course}{$key} =$cell1->value();
    }
    

    然而,它只能解析

     '1' => {
                       'math' => {
                                   'eng' => 'eng'
                                 },
           }
     '3' => {
       'chem' => {
                                   'gym' => 'gym',
                                   'music' => 'music
                                 },
          }
    

    并且没有对进行解析 school2 .

    这个问题已经通过简单的继续来解决了 course 就在它到达密钥之后。即,而不是具有单独的定义块 course keys 并继续进行课程表。我现在有:

      } elsif ($key eq 'course') {
                $state = 'course';
                $course = $state;
                my $key = $cell1->value();
                $data{$school}{$course}{$key} = $key;
    }
    

    它真的很管用。但我仍然很好奇它是如何工作的。