Decentralized search engine & automatized press reviews


1. Press review

I’ve been part, since 2008, of the independent citizens committed toward the same direction than La Quadrature du Net… After a few translations, I willingly went to a demanding task : the press review ; to feel useful, waiting to have a better understanding of the complex political machinery of French parliament and European one.

To start understanding something about it, I knew I had to read a lot of press articles about it, and at the press review, I got a selection of choice, a few years before that my choices came to make the selection…

The press review requires a lot of work and a constant devotion, to stay up to date. There were a small part of the job that was interesting : to read the articles, choose the ones to keep, extract the good lines… and the other part was bigger and boring : one had to copy/paste the title, copy/paste the date (and damn, where did they hide their date in this page !?), copy/paste the extract, copy/paste the link… This robot-like task is unfortunately more time-consuming than the reading of the article itself.

During more than 5 years, I devoted the majority of my spare time to keep the press review of La Quadrature du Net up to date. Thousands of articles cited the association mouthpiece [1], sometimes with a simple copy/paste, sometimes with a great analysis, sometimes willingly distorting the words… And we had to seek for more ! As Jérémie Zimmermann told to the small team that I had difficulties to motivate one night : we were his eyes.

I also had the feeling to be the broom wagon of the collective, always late to the feast. The position was nevertheless a strategical one, it was the quickest way to learn and it was also of a strategical importance : we had to get our complex and abstract matters to a broad audience. Indeed, the net neutrality don’t directly appears to be a freedom of speech channel, which itself don’t directly appears to be one of the conditions for a fair society.

We also had to improve our press echos resonance, and with La Quadrature du Net, things were getting even further… Indeed, beyond the rational thinking, the press review also allowed us to shade some light over a human factor in journalism, as it happen that a good analysis citing la Quadrature were found missing in its press review, and that, at the next interview, the journalists complained about it, with grumpy faces, disappointed that we missed them, or did not select their work.

2. Dedicated search engine

The task was colossal. We were splitting among us in the team the national and international sources to poll every days. I was always finding hours of work to do to new volunteers. Torn apart between the need to face the stake, and the need to stay independent and coherent with our views, we finally setup some Google News alerts, during a militant forces famine.

I seek alternatives for a while, but I found nothing accessible nor libre. And how to compete with the indexation power of one of the biggest web search engine ? How to go without their index ? My grand-mother already knew the anwser, she always told me : « Better asking to 1 who knows, than 10 whom seek. » [2]

To get our responses, without using a central oracle, we can content ourselves to ask to who knows. And here, who knows what articles were publish today ? If its not the newspaper makers themselves ? Let’s unit, in the same request, the indexes of the newspapers, to get a distributed search engine ! Everyone index his content, and the post stamp fait foi.

Effectively, what determines the relevancy of press results is mainly their date. And in this matter, Google News is not providing much added value in its pages over chronological order. Getting results in a proper order is exactly what an ordinateur does the best. At least french ones did.

3. Material and technical considerations, perseverance

I started a first prototype by the end of 2013. It was too slow and the browser was not allowing to launch the requests from the user computer… I soon had other rockets to code in JavaScript, and then too much distractions prevented me from continuing to dig on the subject. But I kept the idea in a corner of my mind all that time, as Benjamin Bayart once told me right in the eyes that circumventing centralized web search engine was one of the major challenges for free softwares.

Until this summer, when a friend (Taziden) came to me over IRC to bother about a small piece of "JavaScript" to deal with its playlists… I, the so-called JavaScript expert, to whom help was asked from, I opened some wide eyes, as I was not able to recognize the syntactical structures introduced in the language by the last major update of the norm. After 20 years of immobility, and a small decade of disparate moves from web browsers, the new ECMAScript 5/6/7 was offering a lot of new features, that I had the time to explore, as I became associate of my company in the mean time.

With Firefox apparition, a decade ago, the web became again a field of innovation. The JavaScript was one of the features that web browsers (such as Chrome and IE7-8-9…) picked up to compete over its capabilities and speed of execution. From a simple form-check tool, JavaScript became a full featured programming language, and won the best acceleration of this decade. Several companies even tried to build a full operating system upon it (such as Palm with its WebOS, or Mozilla with its FirefoxOS).

With such a context, it’s easier to understand why Mozilla decided to use JavaScript to build it’s new addon plate-form upon to, even if it means to get rid of previous realizations. Because yes, nowadays a browser WebExtension for Firefox, Chrome or Opera can be boiled down to a simple JavaScript file. But a JavaScript fitted with super-powers. As installing an extension is considered a strong approval from the user, it even becomes possible to reach the iframes content, and so to launch the meta-search engine requests from the user web browser, instead of a unique web server centralizing the usage.

3 months later, a first prototype was able to discover millions of results in less than 10 seconds, and to actually get hundreds of them directly to the user’s computer : the 10 lasts of each one of the 30 newspapers queried from 24 countries. A world tour of English speaking press, in one click.

Here is a screenshot of recent and better looking version of the tool.

20171216 meta press country 640
Figure 1. 20171216 screenshot, click to enlarge your picture…

1. I remember my first New York Times. I was discovering this American newspaper in a Finnish hotel lobby, during a week-end. On several column of the cover page I found an interview of the mouthpiece! Photo, email, marking, transcription… a bit later in the night the interview was online in our press review. I was not sleepy anyway.
2. It also works with : « there is 10 category of persons, the ones that know (how to count in binary), and the others… »