Szerkesztő:Bináris/TOCbot
This page in a nutshell: TOCbot is created to help you search the archives of village pumps, notice boards and personal talkpage archives. The bot works momently in Hungarian Wikipedia both for community and user archives. Internationalization is coming up. You may see an example here. |
|
Development and testing of the archive list generator has been finished. Recognition of international date formats is in process. The bot will soon be published for beta-test. |
TOCbot is a script developed under Pywikipedia framework by user:Bináris. It is designed to work in any MediaWiki wiki, not just Wikipedia.
What is this?
[szerkesztés]People often complain that discussions are hopelessly lost in numerous archives of a page. Some people think LQT is the final answer to this problem; I want to show another possibility. This bot helps to find old discussions and makes the history of talks clear and easy to survey. It creates the table of contents as a sortable wikitable and puts it on a subpage. The table contains the titles of sections linked to the original one, number of archive, dates of first and last contribution and an estimated number of contibutions (based on dates and templates) as well as the length of the section. These can help in finding old discussions if you have some memories on them. The bot needs standard dates according to your wiki (including hour:minute and time zone, as in ~~~~ or ~~~~~) and regular archive names.
I wrote this bot first for village pumps of Hungarian Wikipedia, but I extended the use of it for other language wikis as well as for user archives. General solution was not easy, because date formats, archive names and relevant templates are various, but I believe it was worth.
There are some remarkable efforts to make such lists. In Finnish Wikipedia, e.g. Wikipedia:Kahvihuone (käytännöt)/Arkistohakemisto is maintained semi-manually by Ejs-80 with some Emacs macros. In German Wikipedia, while looking for interesting name patterns, I found Wikipedia:Fragen zur Wikipedia, archives of which are under Wikipedia:Fragen zur Wikipedia/Archiv-Gesamtverzeichnis with "hand-made" indexes maintained by ParaDox. Polish Wikipedia also has such TOCs; pl:Wikipedia:Kawiarenka/Kwestie techniczne dyskusja/Archiwum is updated by MalarzBOT. As far as I understand, Malarzbot uses a localized version of Misza13's archivebot, and appends the section title during archiving. In English Wiktionary, en:wikt:Wiktionary:Beer parlour/timeline was updated by Werdnabot during archiving until April 2007; then Connel MacKenzie tried to follow the events manually, but by the end of 2007 they gave it up. Rich Farmbrough listed his own talk archives until Aug 2008 manually. These examples show that the need of TOCbot has long been present.
The bot works currently only in Hungarian Wikipedia, both for community pages and for personal archives. All the TOCs of Hungarian community pages can be seen here, with the original in the first column and the TOC in the second column. The first user archive was my own talk archive. Other examples are listed at Kategória:Személyes tartalomjegyzékek.
You may follow the development of TOCbot here. Please leave your messages concerning TOCbot at the talk page of this page after reading the sections below.
TOCbot and archivebot
[szerkesztés]Misza13's archivebot.py is now used wikiwide to archive community pages and personal talk pages. TOCbot uses some ideas and solutions of it. They are ideal to use together: use archivebot to archive your pages and TOCbot to create the TOC of your archives.
Why are the dates written in a strange order?
[szerkesztés]Although several things used and made by this bot may vary from language to language, it will always write the dates in Hungarian order: the year followed by the month and the day. This is in order to make the table sortable by date. This deductive order is natural for Hungarians: as we (uniquely in Europe) have our personal names in so called "Eastern name order", we, unlike most of European nations, also write the dates in a similar form, having the most narrowing part of them first. This order is called Big endian and it is also the selected date order of ISO 8601. Some people think this rule affects our thinking in some way. Sort the TOC of archives by dates to look for a discussion and enjoy the advantage of the Hungarian habits.
How does the bot work?
[szerkesztés]It has three main parts, listed from the core:
- the table generator, which analyzes the archives and creates the collected TOC of them,
- the archive generator, which collects, puts in order and yields the archives of a chosen community page or talk page (see the collection),
- the page generator, which yields the pages to work on.
Levels of solution
[szerkesztés]- Creating the table from archives of a given page and saving it
- Creating the TOC of all the village pumps of Hungarian Wikipedia (they have a regular system for naming the archives)
- Processing other archives with some regular archive name system
- Processing some other Hungarian community pages
- Inserting custom templates as headers and categories as footers
- Internationalization I (gathering and sorting archives)
- Internationalization II (recognition of local dates and templates, sorting the dates of a section, creating the table and the headers) – just beeing processed
- Making the bot available for any archives, including personal talk pages
- Checking the existing TOC against unnecessary run
- User guide for end users
- Manual for bot owners
- Adding the script to Pywikipedia framework
How can you localize the bot?
[szerkesztés]Each wiki that wants to use TOCbot needs
- a Pywiki bot owner, who maintains the project in that wiki (more than one may cooperate)
- a bored computer that has spare time (for example a toolserver)
- some localization in the code
- a page describing for local users, how to make archives listed
Localization for the first time needs a basic knowledge of regular expressions.
Here is the detailed bot owners' guide.