Decentralized search engine & automatized press reviews

Version 1.8.7 : Mobilizon, source testing, end-to-end testing slowly acquires more maturity, for this new version the focus was on : testing.

  • testing of all the sources

  • testing of all the features

But that’s not all, quite some bugs where corrected and following the discussions with Code Lutin’s crew met in Nantes earlier this month, a new source was contributed : Mobilizon !

You can find it in the Agenda source type, and this 1st instance is opening the door for the 88 known instances of Mobilizon, that will soon join the 27 instances of Demosphere, the, while we still wait for the to implement a full-text search.

Users reported that the v1.8.8 was opening the Welcome tab at each Firefox start… It’s fixed with version (And yes : the v1.8.8 is in fact the v1.8.7 because of a numbering mistake…).

1. Source testing

A source testing feature existed, made by Christopher Gauthier, it was able to verify that all the sources were still reachable and giving results. We tried many approaches for this first version and it was slow, running in around 600 seconds for 300 sources.

The new version keeps most of the presentation improvement made by Christopher over the 1st iteration, but I simplified a lot of the operations. I, more or less, removed 1000 code lines from the main page JavaScript and got it into separate files, keeping around 300 lines.

Before, it was needed to change some Firefox settings (to allow it to open one tab by source to test). Now it’s just a page to visit and a button to click. All the sources are tested at once, in the same tab, with the same word. Because there is one magic word that at least 75% of all the sources are responding to… and it’s « Europe ». (yes, some newspapers did not spoke about COVID, at all ; and yes, I have 25% of the sources to fix).

This new version of the test procedure runs in 1'30" with a recent Firefox (it’s 3x time slower with the ASAN version of Firefox).

I tested various browsers with no noticeable differences in speed. Firefox is regularly the fastest WebExtension supporting browser in the world according to this test.

One funny point to finish : there are groups of sources hosted on the same web server. It can be different languages of the same newspapers (Euronews), or a group of local newspapers (Dauphiné libéré and its numerous extends) or even already just the image and video versions of the same source (ANSA). And they fail when queried all at once by the source testing procedure. I guess that when their common server sees 10 requests from the same IP within a few milliseconds, it hangs up the connexions.

Fortunately, Firefox introduced a DNS handling API for WebExtensions about a year ago. So IPs of the sources are collected prior to send queries and a delay is added for sources sharing their IP. This improves the overall behavior and this need would have been difficult to spot without this test procedure.

Bad web browsers don’t have support for DNS handling (all the others) and I’m left with the shuffling of the source list and a static delay between each sources. To be true, as it takes around 50ms to lookup for a domain IP results are comparable…

But regarding this core need, Firefox is also shinning world wide.

20220930 source testing
Figure 1. Capture of the source testing matrix result.

2. End to end testing

I should have done it before, as it would have avoid quite some regressions… (regarding one-word / many-words tag auto-selection for instance, reported on Mastodon and fixed with this release)

I chose Selenium for this over Karma to be able to click on JavaScript alerts, and over Puppeteer also to be able to run the tests in several browsers.

I used a bit of Python to manipulate Selenium (which is made of Java) and needed some JavaScript too.

I did only put the basis of what will still be a long journey to achieve a 100% test coverage. But we’re heading to it, and we already have a 100% coverage for internal image loading, internal links loading and static external links loading. This looks like something basic, everyone needs this, but it’s not easy to achieve. To start with something it required external dependencies (Selenium, Python-Selenium, Geckodriver…) and a combination of Python and JavaScript code.

Then I was left with one question:

Why is #Selenium so famous ?
Ok, it can open a headless Firefox and execute some JavaScript in.
But it can't get you JavaScript exceptions back.
It can't get you HTTP status.
It can't get you span text if an alert is open.
It can't click on a browser permission popup.
What are we left with ? Be water.

To manage to get a reliable behavior from Selenium, you need to wait until all the JavaScript of your web page loads. This is not something Selenium provides. I got 2 methods working for this : setting a CSS class on the body HTML tag (for instance) of your web page, via JavaScript, when it finishes to load, or setting a JavaScript variable in the window object and wait for it from the Selenium side.

Then, Selenium can’t tell me if an image is loaded or not. So I had to figure out how to tell this using JavaScript code, injected or called from Selenium.

Here are two ways to find out if an image loaded or not :

  • is the image having a non-zero .naturalWidth property ?

  • is the image supports a call to .decode() ?

The good point is : you don’t need a side request to get the information.

But you need those to tell if the links are broken or not (indeed).

As ranted in the Mastodon toot, Selenium won’t let you access the HTTP status code of a loaded page.

So lets try with JavaScript. It works great for WebExtension internal links. But external links are blocked by CORS policy. So those other links are to be tested from the Python driver script, but with a side request that Selenium ignores about (and deal with your error reporting to keep it homogeneous).

I pass on the genuine bugs that you’ll discover meanwhile… (exhaustively testing stuff)

And eventually it runs and reports no errors : so much emotions in front of this new lovely kind of "nothing".

It will be a long journey to write the tests of all the settings, one by one…