If you are building a software for HR or finance domain, there is usually requirement for generating excel reports which are usually across management levels. Apart from reports, you can expect input data for application coming in form of excel sheets and application is expected to support it. These are many open source APIs to handle such scenarios.
Apache POI is one of them and is well trusted over time. In short, you can read and write MS Excel files using Java. In addition, you can read and write MS Word and MS PowerPoint files using Java.
In this post, I am discussing some common activities required to do in real life application.
Apache POI runtime dependencies
If you are working on a maven project, you can include the POI dependency in pom.xml file using this:
If you are not using maven, then you can download maven jar files from POI download page. Include following jar files minimum to run the sample code:
Some useful POI classes
Apache POI main classes usually start with either HSSF. XSSF or SXSSF .
- HSSF is the POI Project’s pure Java implementation of the Excel ’97(-2007) file format. e.g. HSSFWorkbook . HSSFSheet .
- XSSF is the POI Project’s pure Java implementation of the Excel 2007 OOXML (.xlsx) file format. e.g. XSSFWorkbook . XSSFSheet .
- SXSSF (since 3.8-beta3) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. e.g. SXSSFWorkbook . SXSSFSheet . SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window. while XSSF gives access to all rows in the document.
Apart from above classes, Row and Cell are used to interact with a particular row and a particular cell in excel sheet.
Another useful class FormulaEvaluator is used to evaluate the formula cells in excel sheet.
A wide range of classes like CellStyle . BuiltinFormats . ComparisonOperator . ConditionalFormattingRule . FontFormatting . IndexedColors . PatternFormatting . SheetConditionalFormatting etc. are used when you have to add formatting in a sheet, mostly based on some rules.
We will see the usage of above classes in coming examples.
Writing an excel file
am taking this example first so that we can reuse the excel sheet created by this code to read back in next example.
Writing a file using POI is very simple and involve following steps:
- Create a workbook
- Create a sheet in workbook
- Create a row in sheet
- Add cells in sheet
- Repeat step 3 and 4 to write more data
It seems very simple, right? Lets have a look at the code doing these steps:
Reading an excel file
Reading an excel file is also very simple if we divide this in steps.
- Create workbook instance from excel sheet
- Get to the desired sheet
- Increment row number
- iterate over all cells in a row
- repeat step 3 and 4 until all data is read
Lets see all above steps in code. I am writing the code to read the excel file created in above example.
Using formulas in excel sheet
When working on complex excel sheets, we encounter many cells which have formula to calculate their values. These are formula cells. Apache POI has excellent support for adding formula cells and evaluating already present formula cells also.
Les see one example of how to set formula cells in excel?
In this code, there are four cells in a row and fourth one in multiplication of all previous 3 rows. So the formula will be. A2*B2*C2 (in second row)
Similarly, I you want to read a file which have formula cells in it, use following logic to evaluate the formula cells.
Formatting the cells
So for we have seen the examples of reading/ writing and excel file using apache POI. But, when we are creating a report in excel file and it becomes utmost important to add formatting on cells which fit into any per-determined criteria. This formatting can be a different coloring based on certain value range, based on expiry date limit etc.
In below examples, I am taking couple of such formatting examples for various purposes.
1) Cell value is in between a certain range
This piece of code will color any cell in range whose value is between a configured range. [e.g. between 50 and 70]