I am working exetensively with pyppeteer - a Python library having headless Chrome browser under the hood. It allows you to parse virtually any kind of site because of real browser.
I tend to use it on every js-based and websocket-based web that I parse.
So long story short, at some point of time my clients started to complain that they experience nasty errors which in most cases completely break the parsing process (even taking into account lot of smart error handling functions that I wrote). Adding new level of try-catch block resulted in constant restart of the browser but allowed the parsing to start working.
Now, after spending several days in total I found the reason and a... not a solution yet, but workaround that will work.
The full thread of developers discussion is available here https://github.com/miyakogi/pyppeteer/pull/160 and the summary is that this is a Chromium bug when the browser is treated as closed after 20 seconds and we can only patch an internal method of the library in our script in order to work around this.
So just add the following lines in your code before instantiating the browser window. I call it at the very beginning of the script:
def patch_pyppeteer(): import pyppeteer.connection original_method = pyppeteer.connection.websockets.client.connect def new_method(*args, **kwargs): kwargs['ping_interval'] = None kwargs['ping_timeout'] = None return original_method(*args, **kwargs) pyppeteer.connection.websockets.client.connect = new_method patch_pyppeteer()
That's it, now my program runs smoothly! Thanks to the guys from the forum.