Difference between revisions of "ZIM File Example"

From openZIM
Jump to navigation Jump to search
 
(10 intermediate revisions by one other user not shown)
Line 1: Line 1:
==Example of a small zim file==
==Example of a small zim file==


  0000000: 5a49 4d04 0500 0000 0998 0d07 1f75 e653 ZIM..........u.S
This is a example of a small zim file. You can find the documentation about the format at [[ZIM File Format]].
  0000010: 7106 3faf 3c40 cba1 0200 0000 0100 0000  q.?.<@..........
 
  0000020: 6600 0000 0000 0000 7600 0000 0000 0000  f.......v.......
  0000000: 5a49 4d04 0500 0000 19fd 9100 732b cfb6 ZIM.........s+..
  0000030: aa00 0000 0000 0000 5000 0000 0000 0000  ........P.......
  0000010: 3406 5519 ac2e 03c4 0300 0000 0100 0000  4.U.............
  0000040: ffff ffff ffff ffff 0301 0000 0000 0000  ................
  0000020: 6600 0000 0000 0000 7e00 0000 0000 0000  f.......~.......
  0000030: ce00 0000 0000 0000 5000 0000 0000 0000  ........P.......
  0000040: ffff ffff ffff ffff 2701 0000 0000 0000  ........'.......
  0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70  text/html.text/p
  0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70  text/html.text/p
  0000060: 6c61 696e 0000 7e00 0000 0000 0000 9400 lain..~.........
  0000060: 6c61 696e 0000 8a00 0000 0000 0000 a000 lain............
  0000070: 0000 0000 0000 0000 0000 0100 0000 0000  ................
  0000070: 0000 0000 0000 b800 0000 0000 0000 0000  ................
  0000080: 0041 ff7f 0000 0000 0000 0000 0000 4175 .A............Au
  0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A..
  0000090: 746f 0000 0100 0042 ff7f 0000 0000 0000  to.....B........
  0000090: 0000 0000 0000 0000 0000 4175 746f 0000 ..........Auto..
  00000a0: 0100 0000 4175 746f 0000 b200 0000 0000 ....Auto........
00000a0: ffff 0041 0000 0000 0000 0000 4175 746f ...A........Auto
  00000b0: 0000 04fd 377a 585a 0000 0169 22de 3602 ....7zXZ...i".6.
00000b0: 6d6f 6269 6c65 0000 0100 0042 0000 0000  mobile.....B....
  00000c0: 0021 0110 0000 00a8 708e 86e0 001c 0018 .!......p.......
  00000c0: 0000 0000 0100 0000 4175 746f 0000 d600 ........Auto....
  00000d0: 5e00 0600 34fb de91 72a3 8034 fc31 871f ^...4...r..4.1..
  00000d0: 0000 0000 0000 04fd 377a 585a 0000 0169  ........7zXZ...i
  00000e0: e6aa 6104 7025 b184 dc00 00e1 66f9 e700 ..a.p%......f...
  00000e0: 22de 3602 0021 0110 0000 00a8 708e 86e0  ".6..!......p...
  00000f0: 0130 1d01 efc6 9c90 4299 0d01 0000 0000 .0......B.......
  00000f0: 001c 0018 5e00 0600 34fb de91 72a3 8034  ....^...4...r..4
  0000100: 0159 5ad1 9082 7bd5 8044 a46a 5b3e c865 .YZ...{..D.j[>.e
  0000100: fc31 871f e6aa 6104 7025 b184 dc00 00e1  .1....a.p%......
  0000110: d843 24                                  .C$
  0000110: 66f9 e700 0130 1d01 efc6 9c90 4299 0d01  f....0......B...
  0000120: 0000 0000 0159 5a6c d75d be78 953c 79d9 .....YZl.].x.<y.
  0000130: 5054 034b 5726 c4                        PT.KW&....


==Header==
==Header==
  0000000: <u>5a49 4d04</u> 0500 0000 0998 0d07 1f75 e653 ZIM..........u.S
  0000000: <u>5a49 4d04</u> 0500 0000 19fd 9100 732b cfb6 ZIM.........s+..
This is the magic number 72173914, which is in hex 44D495A and since all data is little endian, the result is this 5a49 4d04.
This is the magic number 72173914, which is in hex 44D495A and since all data is little endian, the result is this 5a49 4d04.


  0000000: 5a49 4d04 <u>0500</u> 0000 0998 0d07 1f75 e653 ZIM..........u.S
  0000000: 5a49 4d04 <u>0500</u> 0000 19fd 9100 732b cfb6 ZIM.........s+..
This is the major version number 5
The major version number of the zim format is 5...
 
0000000: 5a49 4d04 0500 <u>0000</u> 19fd 9100 732b cfb6  ZIM.........s+..
...and the minor version number is 0.
 
0000000: 5a49 4d04 0500 0000 <u>19fd 9100 732b cfb6</u>  ZIM.........s+..
0000010: <u>3406 5519 ac2e 03c4</u> 0300 0000 0100 0000  4.U.............
The uuid.
 
0000010: 3406 5519 ac2e 03c4 <u>0300 0000</u> 0100 0000  4.U.............
The file has 3 articles.
 
0000010: 3406 5519 ac2e 03c4 0300 0000 <u>0100 0000</u>  4.U.............
And one cluster.
 
0000020: <u>6600 0000 0000 0000</u> 7e00 0000 0000 0000  f.......v.......
You will find the url pointer list at offset 0000 0000 0000 0066.
 
0000020: 6600 0000 0000 0000 <u>7e00 0000 0000 0000</u>  f.......v.......
The title pointer list follows at offset 0000 0000 0000 007e.
 
0000030: <u>ce00 0000 0000 0000</u> 5000 0000 0000 0000  ........P.......
The cluster pointer list can be found at offset 0000 0000 0000 00ce.
 
0000030: ce00 0000 0000 0000 <u>5000 0000 0000 0000</u>  ........P.......
A list of used mime types is at offset 0000 0000 0000 0050.
 
0000040: <u>ffff ffff</u> ffff ffff 2701 0000 0000 0000  ........'.......
No main page is defined here...
 
0000040: ffff ffff <u>ffff ffff</u> 2701 0000 0000 0000  ................
...nor a layout page.
 
0000040: ffff ffff ffff ffff <u>2701 0000 0000 0000</u>  ................
A md5 checksum of the file until that can be found at 0000 0000 0000 0127.
 
==Mime type list==
0000050: <u>7465 7874 2f68 746d 6c00 7465 7874 2f70</u>  text/html.text/p
0000060: <u>6c61 696e 0000</u> 8a00 0000 0000 0000 a000  lain............
As seen the mime type list starts at 0000 0000 0000 0050.
 
0000050: <u>7465 7874 2f68 746d 6c00</u> 7465 7874 2f70  text/html.text/p
0000060: 6c61 696e 0000 8a00 0000 0000 0000 a000  lain..~.........
Mime types are zero terminated.
 
0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70  text/html.text/p
0000060: 6c61 696e 00<u>00</u> 8a00 0000 0000 0000 a000  lain..~.........
After the last mime type, a single zero byte follows.
 
==Url pointer list==
0000060: 6c61 696e 0000 <u>8a00 0000 0000 0000 a000</u>  lain............
0000070: <u>0000 0000 0000 b800 0000 0000 0000</u> 0000  ................
The url pointer list starting at 0000 0000 0000 0066 has 3 pointers to the 3 directory entries of the 3 articles. The list is ordered by the url of the articles. The first directory entry can be found at 0000 0000 0000 008a, the second at 0000 0000 0000 00a0 and the third at 0000 0000 0000 00b8.
 
==Title pointer list==
0000070: 0000 0000 0000 b800 0000 0000 0000 <u>0000</u>  ................
0000080: <u>0000 0100 0000 0200 0000</u> 0000 0041 0000  .............A..
Here is the title pointer list.
 
0000070: 0000 0000 0000 b800 0000 0000 0000 <u>0000</u>  ................
0000080: <u>0000</u> 0100 0000 0200 0000 0000 0041 0000  .............A..
When ordering by title, the article number 0 is the first. This actually refers to the 0th article in the url pointer list. To find the directory entry of the article, you have to look up the offset in the url pointer list. The offset to the directory entry can be found at (url-pointer-list + 0*sizeof(pointer))=0066+0*8=0066.
 
0000080: 0000 <u>0100 0000</u> 0200 0000 0000 0041 0000  .............A..
When ordering by title, the article number 1 is the second. The offset to the directory entry can be found at (url-pointer-list + 1*sizeof(pointer))=0066+1*8=006E.
 
0000080: 0000 0100 0000 <u>0200 0000</u> 0000 0041 0000  .............A..
When ordering by title, the article number 2 is the third. The offset to  the directory entry can be found at (url-pointer-list +  2*sizeof(pointer))=0066+2*8=0076.
 
==Directory entry==
0000080: 0000 0100 0000 0200 0000 <u>0000 0041 0000</u>  .............A..
0000090: <u>0000 0000 0000 0000 0000 4175 746f 0000</u>  ..........Auto..
This is the first directory entry at 008a.
 
0000080: 0000 0100 0000 0200 0000 <u>0000</u> 0041 0000  .............A..
The article has the 0th mime type from the list, which is "text/html".
 
0000080: 0000 0100 0000 0200 0000 0000 <u>00</u>41 0000  .............A..
No extra bytes are set.
 
0000080: 0000 0100 0000 0200 0000 0000 00<u>41</u> 0000  .............<u>A</u>..
The namespace of the article is 'A'.
 
0000080: 0000 0100 0000 0200 0000 0000 0041 <u>0000</u>  .............A..
0000090: <u>0000</u> 0000 0000 0000 0000 4175 746f 0000  ..........Auto..
The version is 0000 0000.
 
0000090: 0000 <u>0000 0000</u> 0000 0000 4175 746f 0000  ..........Auto..
The data can be found in cluster 0.
 
0000090: 0000 0000 0000 <u>0000 0000</u> 4175 746f 0000  ..........Auto..
And it is the 0th blob in the cluster.
 
0000090: 0000 0000 0000 0000 0000 <u>4175 746f 00</u>00  ..........<u>Auto.</u>.
The url of the article is "Auto" and is zero terminated.
 
0000090: 0000 0000 0000 0000 0000 4175 746f 00<u>00</u>  ..........Auto..
The title is empty and hence the title is the same as the url: "Auto".
 
00000a0: <u>ffff 0041 0000 0000 0000 0000 4175 746f</u>  ...A........Auto
00000b0: <u>6d6f 6269 6c65 0000</u> 0100 0042 0000 0000  mobile.....B....
The second directory entry can be found at offset 0000 0000 0000 00a0 and is the Article "Automobile" of namespace 'A'.
 
00000a0: <u>ffff</u> 0041 0000 0000 0000 0000 4175 746f  ...A........Auto
The mime type is set to ffff, which means, that this is a redirect.
 
00000a0: ffff <u>00</u>41 0000 0000 0000 0000 4175 746f  ...A........Auto
No extra bytes are set.
 
00000a0: ffff 00<u>41</u> 0000 0000 0000 0000 4175 746f  ...<u>A</u>........Auto
The namespace of the redirect is 'A'.
 
00000a0: ffff 0041 <u>0000 0000</u> 0000 0000 4175 746f  ...A........Auto
The version is 0000 0000.
 
00000a0: ffff 0041 0000 0000 <u>0000 0000</u> 4175 746f  ...A........Auto
It points to the 0th article, which is A/Auto.
 
00000a0: ffff 0041 0000 0000 0000 0000 <u>4175 746f</u>  ...A........<u>Auto</u>
00000b0: <u>6d6f 6269 6c65 00</u>00 0100 0042 0000 0000  mobile.....B....
The url of the redirect is "Automobile" and is zero terminated.
 
00000b0: 6d6f 6269 6c65 0000 <u>0100 0042 0000 0000</u>  mobile.....B....
00000c0: <u>0000 0000 0100 0000 4175 746f 0000</u> d600  ........Auto....
00000d0: 0000 0000 0000 04fd 377a 585a 0000 0169  ........7zXZ...i
The next directory entry can be found at offset 0000 0000 0000 00b8 and is the Article "Auto" of namespace 'B'.
 
00000b0: 6d6f 6269 6c65 0000 <u>01</u>00 0042 0000 0000  mobile.....B....
The mimetype is entry number 1 and hence the 2nd since the first is the number 0. The mimetype is then "text/plain".
The rest of the directory entry is similar to the first one.
 
==Cluster pointer list==
00000c0: 0000 0000 0100 0000 4175 746f 0000 <u>d600</u>  ........Auto....
00000d0: <u>0000 0000 0000</u> 04fd 377a 585a 0000 0169  ........7zXZ...i
The cluster pointer list is just one offset in this case. The cluster can be found at offset 0000 0000 0000 00d6.
 
==Cluster==


  0000000: 5a49 4d04 0500 <u>0000</u> 0998 0d07 1f75 e653 ZIM..........u.S
  00000d0: 0000 0000 0000 <u>04fd 377a 585a 0000 0169</u>  ........7zXZ...i
and the minor version number 0
00000e0: <u>22de 3602 0021 0110 0000 00a8 708e 86e0</u>  ".6..!......p...
00000f0: <u>001c 0018 5e00 0600 34fb de91 72a3 8034</u>  ....^...4...r..4
0000100: <u>fc31 871f e6aa 6104 7025 b184 dc00 00e1</u>  .1....a.p%......
0000110: <u>66f9 e700 0130 1d01 efc6 9c90 4299 0d01</u>  f....0......B...
0000120: <u>0000 0000 0159 5a</u>6c d75d be78 953c 79d9  .....YZl.].x.<y.
This is the cluster. Note that you can't know how large the cluster is until you uncompress the first bytes.


  0000000: 5a49 4d04 0500 0000 <u>0998 0d07 1f75 e653</u> ZIM..........u.S
  00000d0: 0000 0000 0000 <u>04</u>fd 377a 585a 0000 0169 ........7zXZ...i
0000010: <u>7106 3faf 3c40 cba1</u> 0200 0000 0100 0000  q.?.<@..........
The first byte specifies the compression algorithm, which is 4, which means lzma (or more precise xz) here.
the uuid


  0000010: 7106 3faf 3c40 cba1 <u>0200 0000</u> 0100 0000  q.?.<@..........
===uncompressed cluster===
the file has 2 articles
When uncompressing the data we get
  0000000: 0c00 0000 1900 0000 1d00 0000 3c68 313e ............&lt;h1&gt;
0000010: 4175 746f 3c2f 6831 3e41 7574 6f        Auto&lt;/h1&gt;Auto


  0000010: 7106 3faf 3c40 cba1 0200 0000 <u>0100 0000</u>  q.?.<@..........
  0000000: <u>0c00 0000</u> 1900 0000 1d00 0000 3c68 313e ............&lt;h1&gt;
and one cluster
The offset to the first blob in the uncompressed data is 0000 000c. There are 3 offsets in the cluster since 0c are 3*4 byte offsets. 2 offsets to the start and the last is always a offset to the end of the cluster.


  0000020: <u>6600 0000 0000 0000</u> 7600 0000 0000 0000  f.......v.......
  0000000: 0c00 0000 <u>1900 0000</u> 1d00 0000 3c68 313e ............&lt;h1&gt;
you will find the url pointer list at offset 0000 0000 0000 0066


0000020: 6600 0000 0000 0000 <u>7600 0000 0000 0000</u>  f.......v.......
The offset to the second blob in the uncompressed data is 0000 0019. This is also the end of the first blob.
the title pointer list follows at offset 0000 0000 0000 0076


  0000030: <u>aa00 0000 0000 0000</u> 5000 0000 0000 0000 ........P.......
  0000000: 0c00 0000 1900 0000 <u>1d00 0000</u> 3c68 313e ............&lt;h1&gt;
the cluster pointer list can be found at offset 0000 0000 0000 00aa
The cluster has 0000 001d bytes uncompressed data and the end of the second (last) blob.


  0000030: aa00 0000 0000 0000 <u>5000 0000 0000 0000</u>  ........P.......
  0000000: 0c00 0000 1900 0000 1d00 0000 <u>3c68 313e</u>  ............<u>&lt;h1&gt;</u>
a list of used mime types is at offset 0000 0000 0000 0050
0000010: <u>4175 746f 3c2f 6831 3e</u>41 7574 6f        <u>Auto&lt;/h1&gt;</u>Auto
This is the data of the first blob from offset 00c0 to 0019-1.


  0000040: <u>ffff ffff</u> ffff ffff 0301 0000 0000 0000  ................
  0000010: 4175 746f 3c2f 6831 3e<u>41 7574 6f</u>         Auto&lt;/h1&gt;Auto
no main page is defined here
This is the data of the first blob from offset 0019 to 001d-1.


0000040: ffff ffff <u>ffff ffff</u> 0301 0000 0000 0000  ................
== See also ==
nor a layout page
* [[ZIM file format]]

Latest revision as of 19:30, 11 January 2013

Example of a small zim file

This is a example of a small zim file. You can find the documentation about the format at ZIM File Format.

0000000: 5a49 4d04 0500 0000 19fd 9100 732b cfb6  ZIM.........s+..
0000010: 3406 5519 ac2e 03c4 0300 0000 0100 0000  4.U.............
0000020: 6600 0000 0000 0000 7e00 0000 0000 0000  f.......~.......
0000030: ce00 0000 0000 0000 5000 0000 0000 0000  ........P.......
0000040: ffff ffff ffff ffff 2701 0000 0000 0000  ........'.......
0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70  text/html.text/p
0000060: 6c61 696e 0000 8a00 0000 0000 0000 a000  lain............
0000070: 0000 0000 0000 b800 0000 0000 0000 0000  ................
0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A..
0000090: 0000 0000 0000 0000 0000 4175 746f 0000  ..........Auto..
00000a0: ffff 0041 0000 0000 0000 0000 4175 746f  ...A........Auto
00000b0: 6d6f 6269 6c65 0000 0100 0042 0000 0000  mobile.....B....
00000c0: 0000 0000 0100 0000 4175 746f 0000 d600  ........Auto....
00000d0: 0000 0000 0000 04fd 377a 585a 0000 0169  ........7zXZ...i
00000e0: 22de 3602 0021 0110 0000 00a8 708e 86e0  ".6..!......p...
00000f0: 001c 0018 5e00 0600 34fb de91 72a3 8034  ....^...4...r..4
0000100: fc31 871f e6aa 6104 7025 b184 dc00 00e1  .1....a.p%......
0000110: 66f9 e700 0130 1d01 efc6 9c90 4299 0d01  f....0......B...
0000120: 0000 0000 0159 5a6c d75d be78 953c 79d9  .....YZl.].x.<y.
0000130: 5054 034b 5726 c4                        PT.KW&....

Header

0000000: 5a49 4d04 0500 0000 19fd 9100 732b cfb6  ZIM.........s+..

This is the magic number 72173914, which is in hex 44D495A and since all data is little endian, the result is this 5a49 4d04.

0000000: 5a49 4d04 0500 0000 19fd 9100 732b cfb6  ZIM.........s+..

The major version number of the zim format is 5...

0000000: 5a49 4d04 0500 0000 19fd 9100 732b cfb6  ZIM.........s+..

...and the minor version number is 0.

0000000: 5a49 4d04 0500 0000 19fd 9100 732b cfb6  ZIM.........s+..
0000010: 3406 5519 ac2e 03c4 0300 0000 0100 0000  4.U.............

The uuid.

0000010: 3406 5519 ac2e 03c4 0300 0000 0100 0000  4.U.............

The file has 3 articles.

0000010: 3406 5519 ac2e 03c4 0300 0000 0100 0000  4.U.............

And one cluster.

0000020: 6600 0000 0000 0000 7e00 0000 0000 0000  f.......v.......

You will find the url pointer list at offset 0000 0000 0000 0066.

0000020: 6600 0000 0000 0000 7e00 0000 0000 0000  f.......v.......

The title pointer list follows at offset 0000 0000 0000 007e.

0000030: ce00 0000 0000 0000 5000 0000 0000 0000  ........P.......

The cluster pointer list can be found at offset 0000 0000 0000 00ce.

0000030: ce00 0000 0000 0000 5000 0000 0000 0000  ........P.......

A list of used mime types is at offset 0000 0000 0000 0050.

0000040: ffff ffff ffff ffff 2701 0000 0000 0000  ........'.......

No main page is defined here...

0000040: ffff ffff ffff ffff 2701 0000 0000 0000  ................

...nor a layout page.

0000040: ffff ffff ffff ffff 2701 0000 0000 0000  ................

A md5 checksum of the file until that can be found at 0000 0000 0000 0127.

Mime type list

0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70  text/html.text/p
0000060: 6c61 696e 0000 8a00 0000 0000 0000 a000  lain............

As seen the mime type list starts at 0000 0000 0000 0050.

0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70  text/html.text/p
0000060: 6c61 696e 0000 8a00 0000 0000 0000 a000  lain..~.........

Mime types are zero terminated.

0000050: 7465 7874 2f68 746d 6c00 7465 7874 2f70  text/html.text/p
0000060: 6c61 696e 0000 8a00 0000 0000 0000 a000  lain..~.........

After the last mime type, a single zero byte follows.

Url pointer list

0000060: 6c61 696e 0000 8a00 0000 0000 0000 a000  lain............
0000070: 0000 0000 0000 b800 0000 0000 0000 0000  ................

The url pointer list starting at 0000 0000 0000 0066 has 3 pointers to the 3 directory entries of the 3 articles. The list is ordered by the url of the articles. The first directory entry can be found at 0000 0000 0000 008a, the second at 0000 0000 0000 00a0 and the third at 0000 0000 0000 00b8.

Title pointer list

0000070: 0000 0000 0000 b800 0000 0000 0000 0000  ................
0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A..

Here is the title pointer list.

0000070: 0000 0000 0000 b800 0000 0000 0000 0000  ................
0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A..

When ordering by title, the article number 0 is the first. This actually refers to the 0th article in the url pointer list. To find the directory entry of the article, you have to look up the offset in the url pointer list. The offset to the directory entry can be found at (url-pointer-list + 0*sizeof(pointer))=0066+0*8=0066.

0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A.. 

When ordering by title, the article number 1 is the second. The offset to the directory entry can be found at (url-pointer-list + 1*sizeof(pointer))=0066+1*8=006E.

0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A.. 

When ordering by title, the article number 2 is the third. The offset to the directory entry can be found at (url-pointer-list + 2*sizeof(pointer))=0066+2*8=0076.

Directory entry

0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A..
0000090: 0000 0000 0000 0000 0000 4175 746f 0000  ..........Auto..

This is the first directory entry at 008a.

0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A..

The article has the 0th mime type from the list, which is "text/html".

0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A..

No extra bytes are set.

0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A..

The namespace of the article is 'A'.

0000080: 0000 0100 0000 0200 0000 0000 0041 0000  .............A..
0000090: 0000 0000 0000 0000 0000 4175 746f 0000  ..........Auto..

The version is 0000 0000.

0000090: 0000 0000 0000 0000 0000 4175 746f 0000  ..........Auto..

The data can be found in cluster 0.

0000090: 0000 0000 0000 0000 0000 4175 746f 0000  ..........Auto.. 

And it is the 0th blob in the cluster.

0000090: 0000 0000 0000 0000 0000 4175 746f 0000  ..........Auto..

The url of the article is "Auto" and is zero terminated.

0000090: 0000 0000 0000 0000 0000 4175 746f 0000  ..........Auto.. 

The title is empty and hence the title is the same as the url: "Auto".

00000a0: ffff 0041 0000 0000 0000 0000 4175 746f  ...A........Auto
00000b0: 6d6f 6269 6c65 0000 0100 0042 0000 0000  mobile.....B....

The second directory entry can be found at offset 0000 0000 0000 00a0 and is the Article "Automobile" of namespace 'A'.

00000a0: ffff 0041 0000 0000 0000 0000 4175 746f  ...A........Auto

The mime type is set to ffff, which means, that this is a redirect.

00000a0: ffff 0041 0000 0000 0000 0000 4175 746f  ...A........Auto 

No extra bytes are set.

00000a0: ffff 0041 0000 0000 0000 0000 4175 746f  ...A........Auto 

The namespace of the redirect is 'A'.

00000a0: ffff 0041 0000 0000 0000 0000 4175 746f  ...A........Auto 

The version is 0000 0000.

00000a0: ffff 0041 0000 0000 0000 0000 4175 746f  ...A........Auto 

It points to the 0th article, which is A/Auto.

00000a0: ffff 0041 0000 0000 0000 0000 4175 746f  ...A........Auto
00000b0: 6d6f 6269 6c65 0000 0100 0042 0000 0000  mobile.....B....

The url of the redirect is "Automobile" and is zero terminated.

00000b0: 6d6f 6269 6c65 0000 0100 0042 0000 0000  mobile.....B....
00000c0: 0000 0000 0100 0000 4175 746f 0000 d600  ........Auto....
00000d0: 0000 0000 0000 04fd 377a 585a 0000 0169  ........7zXZ...i

The next directory entry can be found at offset 0000 0000 0000 00b8 and is the Article "Auto" of namespace 'B'.

00000b0: 6d6f 6269 6c65 0000 0100 0042 0000 0000  mobile.....B....

The mimetype is entry number 1 and hence the 2nd since the first is the number 0. The mimetype is then "text/plain". The rest of the directory entry is similar to the first one.

Cluster pointer list

00000c0: 0000 0000 0100 0000 4175 746f 0000 d600  ........Auto....
00000d0: 0000 0000 0000 04fd 377a 585a 0000 0169  ........7zXZ...i

The cluster pointer list is just one offset in this case. The cluster can be found at offset 0000 0000 0000 00d6.

Cluster

00000d0: 0000 0000 0000 04fd 377a 585a 0000 0169  ........7zXZ...i
00000e0: 22de 3602 0021 0110 0000 00a8 708e 86e0  ".6..!......p...
00000f0: 001c 0018 5e00 0600 34fb de91 72a3 8034  ....^...4...r..4
0000100: fc31 871f e6aa 6104 7025 b184 dc00 00e1  .1....a.p%......
0000110: 66f9 e700 0130 1d01 efc6 9c90 4299 0d01  f....0......B...
0000120: 0000 0000 0159 5a6c d75d be78 953c 79d9  .....YZl.].x.<y.

This is the cluster. Note that you can't know how large the cluster is until you uncompress the first bytes.

00000d0: 0000 0000 0000 04fd 377a 585a 0000 0169  ........7zXZ...i

The first byte specifies the compression algorithm, which is 4, which means lzma (or more precise xz) here.

uncompressed cluster

When uncompressing the data we get

0000000: 0c00 0000 1900 0000 1d00 0000 3c68 313e  ............<h1>
0000010: 4175 746f 3c2f 6831 3e41 7574 6f         Auto</h1>Auto
0000000: 0c00 0000 1900 0000 1d00 0000 3c68 313e  ............<h1>

The offset to the first blob in the uncompressed data is 0000 000c. There are 3 offsets in the cluster since 0c are 3*4 byte offsets. 2 offsets to the start and the last is always a offset to the end of the cluster.

0000000: 0c00 0000 1900 0000 1d00 0000 3c68 313e  ............<h1>

The offset to the second blob in the uncompressed data is 0000 0019. This is also the end of the first blob.

0000000: 0c00 0000 1900 0000 1d00 0000 3c68 313e  ............<h1>

The cluster has 0000 001d bytes uncompressed data and the end of the second (last) blob.

0000000: 0c00 0000 1900 0000 1d00 0000 3c68 313e  ............<h1>
0000010: 4175 746f 3c2f 6831 3e41 7574 6f         Auto</h1>Auto

This is the data of the first blob from offset 00c0 to 0019-1.

0000010: 4175 746f 3c2f 6831 3e41 7574 6f         Auto</h1>Auto

This is the data of the first blob from offset 0019 to 001d-1.

See also