Decentralized search engine & automatized press reviews

Version 1.8.11 : 900 sources, event and job result type, #fixtheworld

Over 185 commits were made since the previous release, an extraordinary bump into those statistics made possible by 3 interns whom made a great work adding around 250 sources:

  • Jérome Bertin

  • Céline Duguet

  • Vincent Gay

With quite some proofreading work and other additions on my side the number of sources shipped with this new release is 930 (it’s a 50% growth of the source collection).

1. Source maintenance

In particular all the compatible members of the SPIIL (french independent online press union) have been added rising the "indep." french sources number to 132 (over a total of 373 "indep." sources).

Reference press sources are now 89 (and 40 more might be added soon, as the corresponding Wikipedia page grown).

There are 386 sources providing results as RSS feeds, mainly because it’s the WordPress default, and that a lot of newspapers are using WordPress as a backend. It would be cool if the 20 SPIP sources could do the same !

Illustrations were got back from a great proportions of these 386 RSS sources, thanks to a simple regex prospection trick, and it’s a happy ending for an item stayed years in my todo-list :)

Also, mainly over those RSS sources, the filter_results rule is now applied over 79 sources, ensuring exact results, and the rule were improved to match exact words with better word separators. Despite this quest for exact truth, a setting should soon allow to declutch this filtering to let approximate results flow at will if needed.

1.1. New result types : event and job

To help searching across the 49 agendas a result type event have been added and a job result type were added to suit the 3 first job search-engines integrated into

It’s not every year that a new type of results if added to and a lot of job search-engine are still to be added. But it’s a new door that opens for

2. Source definition evolution

2.1. token_url

A token_url source definition entry appeared to instruct that this URL should be called before any search. It allows to setup a cookie (which can be used to set the language of the next search) or to grab a token that would be mandatory to perform a search. If the language selection scheme is actually used, the token grabbing one is not met yet.

2.2. date_locale

A new special value is now recognized by for the date_local source definition entry, it’s : browser.

This means that the dates of this sources are display using the user browser locale… so it changes from a user to another.

Corriere della Sera is behaving like that.

3. Source list pagination bug

In the list of all the sources, you can pick up sources one by one. But the pagination was reset at each selection, making this feature hard to work with. In this new release you’ll stay on your page while selecting all what you want.

4. Fun facts

I noticed that the Monde Diplomatique in esperanto now serves its dates in esperanto. It was a pleasant update to make.

Less obvious, El Watan (famous Algerian newspaper) did change its domain after 30 years of existence, creating a lot of 404 links around the world… It’s now

We were quite surprised to figure out that Mediabask produces articles in several languages (fr, es, eus) but don’t let you choose in which language you want to perform your search ! Results of each language are served each time.

Special mention for the Otago Daily Times which domain looks more like an wrongly named LibreOffice Text document than a big newspaper :

And to finish, El Paìs (spanish Reference Press organ) can find you results for :

  • bbbb (it’s a 404)

  • cccc (it’s also a 404)

  • ffff (and it exists)

  • vvvv (and it exists)

  • xxxx (and it exists)

  • zzzz (and it exists)

All these fun facts were pushed to Mastodon with the #metapress.

4.1. Hacking La Charente Libre for its dates

La Charente Libre is an important regional newspaper in the south-west of France. It offers no date on its results but many people were interested to search through it via

The web interface gives no hope to sort results by date or so… but if you have the idea to try adding &sort=date at the end of a search URL, TADA, dates are added to the results when the web page reloads.

La Charente Libre seems to be using a custom web application, but this trick could be tried for every source missing dates on results that are listed here.

Don’t hesitate to report me where it works !

5. Fix the world

You will also find on Mastodon a dozen of opportunities to help with no code, contacting the sources I mentioned and asking them to fix their problems, for instance:

You can get the full list here : #fixtheworld