Difference between revisions of "Zimit"
Jump to navigation
Jump to search
(Add the link to the source code repositories) |
(New "Context" section) |
||
Line 1: | Line 1: | ||
'''Zimit''' is a tool allowing to create a ZIM file of "any" Web site. | '''Zimit''' is a tool allowing to create a ZIM file of "any" Web site. | ||
== Context == | |||
openZIM provides many scrapers software solutions for dedicated source of content like: TED, Wikipedia (Mediawiki, Project Gutenberg, ...). This is a great solution to provide quality ZIM files, but developing and maintaining each of them is costly. | |||
Zimit is our approach to allow to scrape "random" Web site and get an acceptable snapshot to be used offline. | |||
== Principle == | == Principle == |
Revision as of 08:17, 22 May 2023
Zimit is a tool allowing to create a ZIM file of "any" Web site.
Context
openZIM provides many scrapers software solutions for dedicated source of content like: TED, Wikipedia (Mediawiki, Project Gutenberg, ...). This is a great solution to provide quality ZIM files, but developing and maintaining each of them is costly.
Zimit is our approach to allow to scrape "random" Web site and get an acceptable snapshot to be used offline.
Principle
URL rewriting
Source code
- Browsertrix, the Web crawler which gather everything in a WARC file
- Warc2zim, a command line tool transforming a WARC file to a ZIM file
- Zimit, the packaing withing a Docker image of both Browsertrix and Warc2zim
- Zimit frontend, which is the Web UI use for the Zimit SaaS solution youzim.it