Home » PHP » How to Read XLSX File Into Array in PHP

How to Read XLSX File Into Array in PHP

Sometimes you may need to import some data from an XLSX file into your application. The first step is to load the data from the file into an array. There are a few libraries that can help with this task.

Each library has its advantages and disadvantages and may be the best solution in a specific case. In this article, I will show how to read XLSX file into array in PHP using PHPSpreadsheet and OpenSpout.


Table of Contents

How to Import an XLSX File Into an Array

Let’s create a simple composer project to perform all the examples from this article. You can run composer init, or you can just create composer.json with the following project configuration:

composer.json{     "name": "haait/xlsx-project",     "type": "project",     "description": "Test Project.",     "require": {         "symfony/stopwatch": "^6.3",         "symfony/var-dumper": "^6.3"     },     "require-dev": {},     "autoload": {         "psr-4": {             "Project\\": "src/"         }     },     "minimum-stability": "dev",     "prefer-stable": true }

And then run the following command:

composer install

I will use the XLSX file with data about squirrels from cityofnetwork.us for all examples. The original file is too short, so I repeated the data a few times to get a large file of almost 30k lines. It will be enough to test how the libraries handle big files. I assume that the file is placed in the current directory and is called squirrels.xlsx. I will explain not only how to load data from the XLSX file to array but also show some bench-marking info.  Here are the example contents of the file:

I prefer to use data transfer objects instead of associative arrays to store data. It is more convenient to use. Maybe arrays are faster, but objects are more predictable, and they allow the IDE to suggest to you what fields are available. Thus, the following DTO will be used for storing data about each squirrel:

src/SquirrelData.php<?php namespace Project; class SquirrelData {     public ?float $coordinateX;     public ?float $coordinateY;     public ?string $squirrelId;     public ?string $hectare;     public ?string $shift;     public ?string $date;     public ?int $squirrelNumber;     public ?string $age; }

Then, let’s create index.php which will be used to read the data from the XLSX file. I will use the symfony/stopwatch package to get loading time and memory consumption of the script and symfony/var-dumper to output variable values. For example:

index.php<?php require "vendor/autoload.php"; use Symfony\Component\Stopwatch\Stopwatch; $watch = new Stopwatch(); $watch->start("EventName"); //code $info = $watch->stop("EventName"); dump((string) $info);

You can read in detail how to calculate execution time in PHP in this article.

1. PHPSpreadsheet

Very often, I see recommendations to use PHPExcel, which was renamed to Spreadsheet some time ago. It is a great library that can read many spreadsheet file formats and has a lot of powerful capabilities for editing and styling documents. You can install the library into your project using this command:

composer require phpoffice/phpspreadsheet

First of all, you need to load a document file. Create the Xlsx reader and load the desired file using the load() method. Also, you can configure the reader to read-only mode, which skips formatting, using the setReadOnly() method.

index.php$reader = new \PhpOffice\PhpSpreadsheet\Reader\Xlsx(); $reader->setReadDataOnly(true); $spreadsheet = $reader->load("squirrels.xlsx");

Each spreadsheet document can contain multiple sheets. In most cases, there is only one sheet, and you can just work with the first:

index.php$sheet = $spreadsheet->getSheet($spreadsheet->getFirstSheetIndex());

But if you want to get a specific sheet, you can access it by name:

index.php$sheet = $spreadsheet->getSheetByName('2018_Central_Park_Squirrel_Cens');

Now you can use this code to load all the data from the sheet into the array using the toArray() method:

index.php$data = $sheet->toArray(); dump($data[0]);

The code above returns all the data on the sheet. But it is not the desired behavior in many cases. For example, you may want to take only specific columns or skip the first row that contains a header. You can convert rows into an array manually using the following code:

index.php$data = []; $sheet = $spreadsheet->getSheet($spreadsheet->getFirstSheetIndex()); foreach ($sheet->getRowIterator() as $row) {     if ($row->getRowIndex() === 1) {         continue; //Skip heading     }     $cells = iterator_to_array($row->getCellIterator("A", "H"));     $data[] = [         "x_coordinate" => $cells["A"]->getValue(),         "y_coordinate" => $cells["B"]->getValue(),         "squirrel_id" => $cells["C"]->getValue(),         "hectare" => $cells["D"]->getValue(),         "shift" => $cells["E"]->getValue(),         "date" => $cells["F"]->getValue(),         "squirrel_number" => $cells["G"]->getValue(),         "age" => $cells["H"]->getValue(),     ]; } dump($data[0]);

Here the first row is skipped, and I took only the first eight columns. In this case, the memory consumption was 438 MB, and the loading process took 21 seconds. It is a little faster than the previous example but still needs a lot of RAM.

As I said before, I prefer to use an array of objects instead of multi-dimensional arrays. Create a new SquirrelData object for each row and fill it:

index.php$sheet = $spreadsheet->getSheet($spreadsheet->getFirstSheetIndex()); foreach ($sheet->getRowIterator() as $row) {     if ($row->getRowIndex() === 1) {         continue; //Skip heading     }     $cells = iterator_to_array($row->getCellIterator("A", "H"));     $squirrelData = new \Project\SquirrelData();     $squirrelData->coordinateX = $cells["A"]->getValue();     $squirrelData->coordinateY = $cells["B"]->getValue();     $squirrelData->squirrelId = $cells["C"]->getValue();     $squirrelData->hectare = $cells["D"]->getValue();     $squirrelData->shift = $cells["E"]->getValue();     $squirrelData->date = $cells["F"]->getValue();     $squirrelData->squirrelNumber = $cells["G"]->getValue();     $squirrelData->age = $cells["H"]->getValue();     $data[] = $squirrelData; } dump($data[0]);

Now you have an array of objects with information about squirrels and can use them everywhere in your application. The memory consumption and loading time are almost the same.

2. OpenSpout

As you can see, the PHPSpreadsheet is very heavy and requires a lot of memory for processing large files. If you want only to read XLSX into array without modifying the file, you may want to use a more lightweight library. You can choose openspout/openspout. It is an open-source fork of the box/spout library which is discontinued now. Use the following command to install the package:

composer require openspout/openspout

Now you can load the XLSX file:

index.php$reader = new \OpenSpout\Reader\XLSX\Reader(); $reader->open("./squirrels.xlsx");

This library does not support accessing a specific sheet directly. You can iterate by all sheets, and then by rows and cells. This code will load data only from the first sheet:

index.phpforeach ($reader->getSheetIterator() as $sheet) {     foreach ($sheet->getRowIterator() as $rowIndex => $row) {         if ($rowIndex === 1) {             continue; //Skip heading         }         $cells = $row->getCells();         $data[] = [             "x_coordinate" => $cells[0]->getValue(),             "y_coordinate" => $cells[1]->getValue(),             "squirrel_id" => $cells[2]->getValue(),             "hectare" => $cells[3]->getValue(),             "shift" => $cells[4]->getValue(),             "date" => $cells[5]->getValue(),             "squirrel_number" => $cells[6]->getValue(),             "age" => $cells[7]->getValue(),         ];     }     break; // Only the first sheet } dump($data[0]);

As you can see, this library uses only 42 MB of memory and is a little faster. If you want to get a specific sheet, you can filter it by the sheet name:

index.phpforeach ($reader->getSheetIterator() as $sheet) {     if ($sheet->getName() !== "2018_Central_Park_Squirrel_Cens") {         continue;     }     //.... } //Don't forget to comment the line with break

You can also use the data transfer object from the previous example to store the data:

index.phpforeach ($reader->getSheetIterator() as $sheet) {     foreach ($sheet->getRowIterator() as $rowIndex => $row) {         if ($rowIndex === 1) {             continue; //Skip heading         }         $cells = $row->getCells();         $squirrelData = new \Project\SquirrelData();         $squirrelData->coordinateX = $cells[0]->getValue();         $squirrelData->coordinateY = $cells[1]->getValue();         $squirrelData->squirrelId = $cells[2]->getValue();         $squirrelData->hectare = $cells[3]->getValue();         $squirrelData->shift = $cells[4]->getValue();         $squirrelData->date = $cells[5]->getValue();         $squirrelData->squirrelNumber = $cells[6]->getValue();         $squirrelData->age = $cells[7]->getValue();         $data[] = $squirrelData;     }     break; } dump($data[0]);

Wrapping Up

In this article, I have explained how to read Excel file to array in PHP using the PHPSpreadsheet and OpenSpout libraries. As you can see, the first library has more features but requires a lot of memory. The second library can read large files on systems with limited resources. Which library do you like? Maybe it is not listed in this article? Write about it using the comments form below!

Leave a Comment

Exit mobile version