ZIM File Example
Example of a small zim file
0000000: 5a49 4d04 0500 0000 3803 66de 7f7a d95d ZIM.....8.f..z.] 0000010: d81c 7f3d ee01 7596 0200 0000 0100 0000 ...=..u......... 0000020: 6600 0000 0000 0000 7600 0000 0000 0000 f.......v....... 0000030: aa00 0000 0000 0000 5000 0000 0000 0000 ........P....... 0000040: ffff ffff ffff ffff 0301 0000 0000 0000 ................ 0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70 text/html.text/p 0000060: 6c61 696e 0000 7e00 0000 0000 0000 9400 lain..~......... 0000070: 0000 0000 0000 0000 0000 0100 0000 0000 ................ 0000080: 0041 0000 0000 0000 0000 0000 0000 4175 .A............Au 0000090: 746f 0000 0100 0042 0000 0000 0000 0000 to.....B........ 00000a0: 0100 0000 4175 746f 0000 b200 0000 0000 ....Auto........ 00000b0: 0000 04fd 377a 585a 0000 0169 22de 3602 ....7zXZ...i".6. 00000c0: 0021 0110 0000 00a8 708e 86e0 001c 0018 .!......p....... 00000d0: 5e00 0600 34fb de91 72a3 8034 fc31 871f ^...4...r..4.1.. 00000e0: e6aa 6104 7025 b184 dc00 00e1 66f9 e700 ..a.p%......f... 00000f0: 0130 1d01 efc6 9c90 4299 0d01 0000 0000 .0......B....... 0000100: 0159 5a78 bdf4 58bd 108c 16c6 9ef5 d42c .YZx..X........, 0000110: a3f4 e3 ...
Header
0000000: 5a49 4d04 0500 0000 0998 0d07 1f75 e653 ZIM..........u.S
This is the magic number 72173914, which is in hex 44D495A and since all data is little endian, the result is this 5a49 4d04.
0000000: 5a49 4d04 0500 0000 0998 0d07 1f75 e653 ZIM..........u.S
The major version number of the zim format is 5...
0000000: 5a49 4d04 0500 0000 0998 0d07 1f75 e653 ZIM..........u.S
...and the minor version number is 0.
0000000: 5a49 4d04 0500 0000 0998 0d07 1f75 e653 ZIM..........u.S 0000010: 7106 3faf 3c40 cba1 0200 0000 0100 0000 q.?.<@..........
The uuid.
0000010: 7106 3faf 3c40 cba1 0200 0000 0100 0000 q.?.<@..........
The file has 2 articles.
0000010: 7106 3faf 3c40 cba1 0200 0000 0100 0000 q.?.<@..........
And one cluster.
0000020: 6600 0000 0000 0000 7600 0000 0000 0000 f.......v.......
You will find the url pointer list at offset 0000 0000 0000 0066.
0000020: 6600 0000 0000 0000 7600 0000 0000 0000 f.......v.......
The title pointer list follows at offset 0000 0000 0000 0076.
0000030: aa00 0000 0000 0000 5000 0000 0000 0000 ........P.......
The cluster pointer list can be found at offset 0000 0000 0000 00aa.
0000030: aa00 0000 0000 0000 5000 0000 0000 0000 ........P.......
A list of used mime types is at offset 0000 0000 0000 0050.
0000040: ffff ffff ffff ffff 0301 0000 0000 0000 ................
No main page is defined here...
0000040: ffff ffff ffff ffff 0301 0000 0000 0000 ................
...nor a layout page.
0000040: ffff ffff ffff ffff 0301 0000 0000 0000 ................
A md5 checksum of the file until that can be found at 0000 0000 0000 0103.
Mime type list
0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70 text/html.text/p 0000060: 6c61 696e 0000 7e00 0000 0000 0000 9400 lain..~.........
As seen the mime type list starts at 0000 0000 0000 0050.
0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70 text/html.text/p 0000060: 6c61 696e 0000 7e00 0000 0000 0000 9400 lain..~.........
Mime types are zero terminated.
0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70 text/html.text/p 0000060: 6c61 696e 0000 7e00 0000 0000 0000 9400 lain..~.........
After the last mime type, a single zero byte follows.
Url pointer list
0000060: 6c61 696e 0000 7e00 0000 0000 0000 9400 lain..~......... 0000070: 0000 0000 0000 0000 0000 0100 0000 0000 ................
The url pointer list starting at 0000 0000 0000 0066 has 2 pointers to the 2 directory entries of the 2 articles. The list is ordered by the url of the article. The first directory entry can be found at 0000 0000 0000 007e and the second at 0000 0000 0000 0094.
Title pointer list
0000070: 0000 0000 0000 0000 0000 0100 0000 0000 ................
Here is the title pointer list.
0000070: 0000 0000 0000 0000 0000 0100 0000 0000 ................
When ordering by title, the article number 0 is the first. This actually refers to the 0th article in the url pointer list. To find the directory entry of the article, you have to look up the offset in the url pointer list.
0000070: 0000 0000 0000 0000 0000 0100 0000 0000 ................
When ordering by title, the article number 1 is the second. The offset to the directory entry can be found at (url-pointer-list + 1*sizeof(pointer))=0066+1*8=006E.
Directory entry
0000070: 0000 0000 0000 0000 0000 0100 0000 0000 ................ 0000080: 0041 0000 0000 0000 0000 0000 0000 4175 .A............Au 0000090: 746f 0000 0100 0042 ff7f 0000 0000 0000 to.....B........
This is the first directory entry at 007e.
0000070: 0000 0000 0000 0000 0000 0100 0000 0000 ................
The article has the 0th mime type from the list, which is "text/html".
0000080: 0041 0000 0000 0000 0000 0000 0000 4175 .A............Au
No extra bytes are set.
0000080: 0041 0000 0000 0000 0000 0000 0000 4175 .A............Au
The namespace of the article is 'A'.
0000080: 0041 0000 0000 0000 0000 0000 0000 4175 .A............Au
The version is 0000 0000.
0000080: 0041 0000 0000 0000 0000 0000 0000 4175 .A............Au
The data can be found in cluster 0.
0000080: 0041 0000 0000 0000 0000 0000 0000 4175 .A............Au
And it is the 0th blob in the cluster.
0000080: 0041 0000 0000 0000 0000 0000 0000 4175 .A............Au 0000090: 746f 0000 0100 0042 0000 0000 0000 0000 to.....B........
The url of the article is "Auto" and is zero terminated.
0000090: 746f 0000 0100 0042 0000 0000 0000 0000 to.....B........
The title is empty and hence the title is the same as the url: "Auto".
0000090: 746f 0000 0100 0042 0000 0000 0000 0000 to.....B........ 00000a0: 0100 0000 4175 746f 0000 b200 0000 0000 ....Auto........
The next directory entry can be found at offset 0000 0000 0000 0094 and is the Article "Auto" of namespace 'B'.
0000090: 746f 0000 0100 0042 0000 0000 0000 0000 to.....B........
The mimetype is entry number 1 and hence the 2nd since the first is the number 0. The mimetype is then "text/plain". The rest of the directory entry is similar to the fist one.
Cluster pointer list
00000a0: 0100 0000 4175 746f 0000 b200 0000 0000 ....Auto........ 00000b0: 0000 04fd 377a 585a 0000 0169 22de 3602 ....7zXZ...i".6.
The cluster pointer list is just one offset in this case. The cluster can be found at offset 0000 0000 0000 00b2.
Cluster
00000b0: 0000 04fd 377a 585a 0000 0169 22de 3602 ....7zXZ...i".6. 00000c0: 0021 0110 0000 00a8 708e 86e0 001c 0018 .!......p....... 00000d0: 5e00 0600 34fb de91 72a3 8034 fc31 871f ^...4...r..4.1.. 00000e0: e6aa 6104 7025 b184 dc00 00e1 66f9 e700 ..a.p%......f... 00000f0: 0130 1d01 efc6 9c90 4299 0d01 0000 0000 .0......B....... 0000100: 0159 5a78 bdf4 58bd 108c 16c6 9ef5 d42c .YZx..X........, 0000110: a3f4 e3 ...
This is the cluster.
00000b0: 0000 04fd 377a 585a 0000 0169 22de 3602 ....7zXZ...i".6.
The first byte specifies the compression algorithm, which is 4, which means lzma (or more precise xz) here.