Skip to content

Frequently Asked Questions

benoit74 edited this page Oct 3, 2025 · 18 revisions

Command line options

Can all MediaWiki instances be scraped?

MWoffliner can not scrape any online MediaWiki instance.

Here are the prerequisites:

  • MediaWiki version must be 1.27 or higher
  • MediaWiki APIs must be reachable and stable, see documentation about API endpoints to have details about which one you need
  • Mediawiki instance must be stable and able to provide proper responses for articles requested (scraper tolerates only very few articles failing)

Which value for --mwUrl?

--mwUrl value is the MediaWiki base URL. It should be considered like an URL prefix on which the URL paths (for example --mwWikiPath value) will be appended. Usually the --mwUrl URL is only composed from the protocol scheme and the domain name (for example https://en.wikipedia.org), but if the whole MediaWiki is not available at the root of the host, then you might have to add a path. You can observe the Mediawiki base URL just by loading the main page of the remote MediaWiki instance, but it's also given on the Special:Version page, here for example on Wikipedia in English.

Which value for --mwWikiPath?

Warning

--mwWikiPath has been deprecrated in 1.17, you should not use it anymore. It will me removed in next major

--mwWikiPath value is the MediaWiki wiki base URL path. This is the Web browser visible path configured to access any article; the article ID being appended directly after. Usually this is just /wiki/. You can also put there the index.php end-point path. For example, for Wikipedia in English, you can indifferently configure /wiki/ or /w/index.php. You can observe the Mediawiki base URL just by loading the main page of the remote MediaWiki instance, but it's also given on the Special:Version page, here for example on Wikipedia in English.

Which value for --mwActionApiPath?

--mwActionApiPath value is the MediaWiki "tradition" API path. Usually the path value here is very similar to the one of --mwModulePath as api.php is positioned just beside load.php. You can find it by loading the Special:Version page. For example for Wikipedia in English, this is /w/api.php and you can see it here.

Which value for --mwModulePath?

--mwModulePath value is the MediaWiki module load path. Usually the path value here is very similar to the one of --mwActionApiPath as load.php is positioned just beside api.php. You can find it by loading the Special:Version page. For example for Wikipedia in English, this is /w/load.php and you can see it here.

Which value for --mwRestApiPath?

--mwRestApiPath value is the MediaWiki REST API URL path for RestApi (desktop) HTML renderer. You can find it by loading the Special:Version page to get the rest.php. For example for Wikipedia in English, this is /w/rest.php and you can see it here.

What is the option --forceRender?

To retrieve HTML pages from a remote MediaWiki instance, MWoffliner deals with Mediawiki APIs. MediaWiki provides multiples ways to retrieve HTML pages, but depending of the version of MediaWiki and the way it is setup, many of them might be unavailable. Per default, MWoffliner will do it's best to pick the right API: priority given on modern & mobile friendly API end-points (see https://github.com/openzim/mwoffliner/wiki/API-end%E2%80%90points). If you want to force the usage of a specific one, then use the option --forceRender.

How can I know if a given website is a Mediawiki?

There is no fool-proof solution to that, but a quite reliable one seems to be:

  • open any article on the website (like not the home page, but a children page) on Firefox, Chrome, Edge browsers on your desktop
  • open the DevTools (shortcut vary, search on the Internet for an HowTo) and go to the "Console"
  • type mw.config.get('skin') in the Console prompt and press Enter
  • if you get a value, then this is a Mediawiki for sure ; if you get an error, then this is a high probability this is not a Mediawiki