Decentralized search engine & automatized press reviews

Version 1.8.1 : less bugs, more sources and no more month_nb

This new version 1.8.1 was intended as maintenance release:

  • updating dependencies

  • fixing induced bugs

  • adding sources

But it turned out to become the release that got rid of one of my own dependencies : month_nb.

At first launch, this version opens tabs frenetically… Sorry for the inconvenience it won’t happen at 2nd launch (it’s a problem linked to automated search and summer time).
Users reported a frantic tab opening on the Setting page, that persisted at each page opening. This bug is fixed in version 1.8.2.

1. month_nb

This library, presented here and there, allows to convert a month name in its corresponding month number, without having to know the month name language. It’s working for 72 languages. It was a minimalistic approach based on a compact RegExp tree of month-name starting-letters…

It was interesting in JavaScript as it was 20x smaller than alternatives and unique regarding the language knowledge-free approach.

Using the Intl.DateTimeFormat API (which might have been operational since 2014 but was still moving in last august), it’s now possible to replace the ~300 lines of month_nb by 5 lines of JavaScript code encapsulated in 3 functions (as this first commit showed).

The principle is simple, the web browser is knowing how month names are written for 218 locales (in Firefox 102.0a1) so when a month name needs to be converted into its number, Firefox can provide the list of the month names of this language and I check the index of the good one in the list. It worked out of the box for most sources.

Notable exceptions were:

  • the Esperanto version of Le Monde Diplomatique, which uses English month names for its dates (so I introduced a new date_locale property in source descriptions)

  • the Dutch (nl) language, which uses an abbreviated version of "marsh" which is not just the 3-first letters of the month name: maart abbreviation is mrt. (so I also compare with browser’s provided "short" versions of the month names)

  • Russian month names from Wikipedia weren’t recognized. After some exchanges on Mastodon I learnt that it was due to declinations of the names in this context of their usage. I decided to add the list of the declined month names at the right spot in the code to support this case.

  • And the Arabic languages, such as the Egyptian Wikipedia (ar-eg) or the Iranian Le Monde Diplomatique (fa-ir) and I think I’ll need help for those ones as I tooted via the official Mastodon account.

I created month_nb for the early prototypes of and was working on it since 2013. It costed me nightly tears of blood (and recently daily ones for Christopher) but after hours of right-to-left quirks in our editors, not really knowing what we were doing, we managed to get those Arabic languages working via month_nb… but abbreviations could not be supported for all languages (too many collisions). With the new solution, the correct list of the 12 strings used by Wikipedia for the different Arabic locales might be enough (like for Russian).

1.1. Name of languages and countries

Then, as I was in the momentum for it, I also replaced our interns-made list of language names by similar Intl.DisplayNames browser introspections and added country names that we previously decided to skip. Both are now shown in the browser locale (so supposedly in your language), while precedent list was presented in the language’s locale only.

By the way, all in all (and per cloc measurements) I already trashed 25% of interns code lines (saving features, loosing hours).

I also wanted to go for a browser introspected list of all the timezones (used in settings) but the Intl.Locale.timeZones API is not supported by Firefox yet (so here we keep our own list for the moment).

I’ll keep an eye on this to drop this last list as soon as possible.

2. Dependency updates and bugs fix

Aside from removing month_nb, other dependencies were simply updated for this release:

  • Choices.js v10.1.0 (at last a new release after a long period of dead looking project, a new team gathered around the project)

  • Browser-polyfill.js v3.0.1

  • Gettext.js v1.1.0

Those updates bring small bugs to fix and deprecation warnings to manage. I also fixed a bug on the ListJS pagination (which was scrolling up to the top at each page change) using a solution that I took the time to report upstream. This bug, introduced by ListJS v2.3.1 might have been there for 9 months.

I also fixed the result-removing which was slow and did not work for sources with parenthesis in the name, like: "Wikipedia (pl)". The new system is 4x faster and displays a waiting cursor while you’re waiting.

Imports and exports of results should also display this waiting cursor. This might be the third time I announce this feature, mais it trickier than it might seem to achieve. The treatment covered should be asynchroneous, else the browser just perform the treatment and the cursor is not updated, and you must enforce the waiting cursor even when the mouse hovers links, where a link-signaling cursor is set by default (or by my class to get buttons looking like links). It might give the feeling that one must shake the mouse to get that damned waiting cursor !

2.1. Browser storage limitations

Another annoying bug was introduced by the previous version in the new list of sources. There is a button to remove all selection. It works well in the last two tabs (added sources and removed sources) but if you click on it on the list of all sources or on the list of all selected sources, you’ll create an empty selection and a popup tells you so and load the default tags to get you a non-empty selection of sources. Unfortunately, this action was trapping users in popup loops.

I fixed this, but it learnt me a lot. There should not have been a loop. It was created by the fact that you can’t save big strings (such as the big source exclusion list that the "remove all" button creates) in the You are limited to 512 elements of 8 ko each (for a total size of 100 ko max).

It represents approximately a custom selection list of 200 sources (added or removed). Bigger lists will work but won’t be saved.

If you avoid trying to save too big objects, it fixes the loop problem, but a better solution is yet to be found. Currently I use the main URL of the source as source-key, it allows to find a source from an RSS file and guaranties that old exports will be coherent with updated (it was "at no cost" before this limit reaching).

Those limits are now monitored and reported in the JavaScript console when you open a new tab.

3. New sources

This release embed 62 more sources for a total of 378, 59 countries (+4) and 33 languages (+10).

The Agenda source type now includes (in addition of all the Demospheres instances for a total of 30 agendas, waiting for this feature request to get fixed in Mobilizon's developper).

The Encyclo. source type now includes 14 languages of Wikipedia for a total of 20 indexed encyclopedias.

And a Social Network source type is introduce in this release with 14 elements including Invidious and several languages of Dailymotion for the moment. Other "web" sources will follow and feel free to suggest a better name for this category.

But the new star is the Scrutari instance of the Coredem (Scrutari description in english) in a renammed Doc. source type (along with Wikileaks and

Scrutari is a libre software search engine indexing 58 663 resources from 38 sites promoting a world-wide democracy. It’s ventilated into 14 languages in

4. New files

To finish, 3 files appeared in the repository:

  • .eslintrc.json which contains configuration directives for ESLint, to help enforcing some coding style rules

  • wiki/incompatible_sources.adoc which is not really new as it is the previous wiki page of the project that moved into the repository itself (might be more visible here, and easier for me to maintain)

  • json/broken_sources.json a file where I moved the remains of broken source definitions (around 50 of them) that were never finished or broke afterwards. The motivation was to avoid sending 1440 lines of JSON uselessly to users… and I fixed some of them by the way

Before trying to add a new source, one might not lost his time searching through those last two files for hints about hypothetical previous attempts.