When you save an Excel to it's default extension, .xlsx, it isn't really a file, it is a container of files.

I was poking around the internet looking for tips on how to import Excel data into LabVIEW without using Direct-X. In one blog, the writer said just open the .xlsx file and convert the data directly. The writer said replace the .xlsx extension with .zip, or add .zip after .xlsx. Intrigued, I created a small table, and performed the above instructions - unzipped the .xlsx file.

Top Level:
[_rels]
[docProps]
[xl]
[Content_Types].xml // Directory of files

--- to see what the file looks like, drag the .xml into a blank browser page.

In folder [xl], we find:
[_rels]
[printerSettings]
[theme]
[worksheets]
calcChain.xml
sharedStrings.xml // all the text from all sheets stored here. Consider this as array of strings
styles.xml // styles here
workbook.xml // maps sheet1, sheet2.... to page names if you renamed sheet1 to something, etc..

In folder [worksheets]
[_rels]
sheet1.xml // cell values or pointers here
sheet2.xml
sheet3.xml

-----------
Almost done with my decoding of Excel data.

For example, sheet1.xml code snippet
<c r="E3">
<v>4</v>
</c>

Cell E3 will have the value of 4


But if you see this:

<c r="E3" t="s">
<v>4</v>
</c>

Then cell E3 contains text, from the file 'sharedStrings.xml', and <v>4</v> is a pointer to the string array from 'sharedStrings.xml'.

If it works for Excel, it should work with docx, right?

The answer is yes it does. docx is a container, and any graphics, images in the document is in the container