Home » PHP » How to Read XLSX File Into Array

How to Read XLSX File Into Array

Sometimes you may need to import some data from an XLSX file into your application. The first step is to load data from the file into an array. There are a few libraries that can help with this.

Each library has its advantages and disadvantages and may be the best solution in a specific case. In this article, I will show how to read XLSX file into array using PHPSpreadsheet and OpenSpout.

How to Import XLSX File Into Array

I will use the XLSX file with data about squirrels from cityofnetwork.us for all examples in this article. The original file is too short, so I repeated data a few times to get a large file for 30k lines. It will be enough to test how the libraries handle big files. I assume that it is placed in the current directory and is called squirrel.xlsx. I will explain not only how to load data from the XLSX file to an array but also show some bench-marking data.

I prefer to use data transfer objects instead of associative arrays to store data. It is more convenient to use. Maybe arrays are faster but objects are more predictable and they allow IDE to suggest to you what fields are available. Thus the following DTO will be used for storing data for each squirrel in examples:

class SquirrelData { public ?float $coordinateX; public ?float $coordinateY; public ?string $squirrelId; public ?string $hectare; public ?string $shift; public ?string $date; public ?int $squirrelNumber; public ?string $age; }

I will use the symfony/stopwatch package to get loading time and memory consumption of the script. For example:

$watch = new Stopwatch(); $watch->start("EventName"); //Your code here $info = $watch->stop("EventName"); var_dump((string) $info);

You can read in detailed how to calculate execution time in PHP in this article.

1. PHPSpreadsheet

Very often I see recommendations to use PHPExcel which is called Spreadsheet now. It is a great library that supports many spreadsheet file formats and has a lot of powerful capabilities for editing and styling documents. You can install the library into your project using this command:

composer require phpoffice/phpspreadsheet

First of all, you need to load a document file. Create the Xlsx reader and load the desired file using the load() method. Also, you can configure the reader to read-only data without formatting using the setReadOnly() method.

$reader = new \PhpOffice\PhpSpreadsheet\Reader\Xlsx(); $reader->setReadDataOnly(true); $spreadsheet = $reader->load("squirrels.xlsx");

Each spreadsheet document can contain multiple sheets. In most cases, there is only one sheet and you can just work with the first:

$sheet = $spreadsheet->getSheet($spreadsheet->getFirstSheetIndex());

But if you want to get a specific sheet, you can fetch it by name:

$sheet = $spreadsheet->getSheetByName('2018_Central_Park_Squirrel_Cens');

Now you can use this code to load all data from the sheet into the array using the toArray() method:

$data = $sheet->toArray();

This function returns all sheet data. But It is not desired behavior in many cases. For example, you may want to take only specific columns or skip the first row with a header. You can convert rows into an array manually using this code:

$data = []; $sheet = $spreadsheet->getSheet($spreadsheet->getFirstSheetIndex()); foreach ($sheet->getRowIterator() as $row) { if ($row->getRowIndex() === 1) { continue; //Skip heading } $cells = iterator_to_array($row->getCellIterator("A", "H")); $data[] = [ "x_coordinate" => $cells["A"]->getValue(), "y_coordinate" => $cells["B"]->getValue(), "squirrel_id" => $cells["C"]->getValue(), "hectare" => $cells["D"]->getValue(), "shift" => $cells["E"]->getValue(), "date" => $cells["F"]->getValue(), "squirrel_number" => $cells["G"]->getValue(), "age" => $cells["H"]->getValue(), ]; }

Here the first row is skipped and I took only the first eight columns. In this case, the memory consumption was 592 megabytes and the loading process took 21 seconds. It is a little faster than the previous example but not enough convenient.

As I said before, I prefer to use an array of objects instead of multi-dimensional arrays. Create a new SquirrelData object for each row and fill it:

$sheet = $spreadsheet->getSheet($spreadsheet->getFirstSheetIndex()); foreach ($sheet->getRowIterator() as $row) { if ($row->getRowIndex() === 1) { continue; //Skip heading } $cells = iterator_to_array($row->getCellIterator("A", "H")); $squirrelData = new SquirrelData(); $squirrelData->coordinateX = $cells["A"]->getValue(); $squirrelData->coordinateY = $cells["B"]->getValue(); $squirrelData->squirrelId = $cells["C"]->getValue(); $squirrelData->hectare = $cells["D"]->getValue(); $squirrelData->shift = $cells["E"]->getValue(); $squirrelData->date = $cells["F"]->getValue(); $squirrelData->squirrelNumber = $cells["G"]->getValue(); $squirrelData->age = $cells["H"]->getValue(); $data[] = $squirrelData; }

Now you have an array of object with information about squirrels and can use it everywhere in your application. The memory consumption and loading time are almost the same.

2. OpenSpout

As you can see the PHPSpreadsheet is very heavy and requires a lot of memory for processing large files. If you want only to read XLSX into array without editing and something else, you may want to use a more lightweight library. You can choose openspout/openspout. It is an open-source for the box/spout library which is discontinued now. Use this command to install the package into your project:

composer install openspout/openspout

Now you can load the XLSX file using this code:

$reader = new \OpenSpout\Reader\XLSX\Reader(); $reader->open("./squirrel.xlsx");

This library does not support fetching specific sheet. You can iterate by all sheets, and then by rows and cells. This code will load data only from the first sheet:

foreach ($reader->getSheetIterator() as $sheet) { foreach ($sheet->getRowIterator() as $rowIndex => $row) { if ($rowIndex === 1) { continue; //Skip heading } $cells = $row->getCells(); $data[] = [ "x_coordinate" => $cells[0]->getValue(), "y_coordinate" => $cells[1]->getValue(), "squirrel_id" => $cells[2]->getValue(), "hectare" => $cells[3]->getValue(), "shift" => $cells[4]->getValue(), "date" => $cells[5]->getValue(), "squirrel_number" => $cells[6]->getValue(), "age" => $cells[7]->getValue(), ]; } break; // Only first sheet }

As you can see this library uses only 42 megabytes of memory but the loading process has taken 42 seconds. This is twice more in comparison with the previous library. It is slower but in some cases, memory consumption may be more critical than time.

If you want to get a specific sheet, you can filter it by the sheet name:

foreach ($reader->getSheetIterator() as $sheet) { if ($sheet->getName() !== "2018_Central_Park_Squirrel_Cens") { continue; } //.... } //Don't forget to comment the line with break

You also can use the data transfer object from the previous example to store data in the more conveniently:

foreach ($reader->getSheetIterator() as $sheet) { foreach ($sheet->getRowIterator() as $row) { if ($rowIndex === 1) { continue; //Skip heading } $cells = $row->getCells(); $squirrelData = new SquirrelData(); $squirrelData->coordinateX = $cells[0]->getValue(); $squirrelData->coordinateY = $cells[1]->getValue(); $squirrelData->squirrelId = $cells[2]->getValue(); $squirrelData->hectare = $cells[3]->getValue(); $squirrelData->shift = $cells[4]->getValue(); $squirrelData->date = $cells[5]->getValue(); $squirrelData->squirrelNumber = $cells[6]->getValue(); $squirrelData->age = $cells[7]->getValue(); $data[] = $squirrelData; } breeak; }

Wrapping Up

In this article I have explained how to read Excel file to array PHP using PHPSpreadsheet and Openspout libraries. As you can see the first library is faster but requires a lot of memory. The second library can read large files on systems with limited resources. Which library do you like? Maybe it is not listed in this article? Write about it using the comments form below!

Leave a Comment

Exit mobile version