Generic Webcrawler

By silviodc, 3 June 2021

Do you sometimes have the issue that you have to come back to a webpage to see if an update is finally there?

While many websites let you register for an update if a product is available again, most sites do not offer this service.
In my particular case I was searching for a fixed parking spot in a bigger garage around the corner.

It’s a bigger chain and they are not likely to change their website, so the ideal “target”.

I have set up a cronjob on the amazing free site cron-job.org, and they will execute a php script on my server once a day. You can also configure more frequent calls and specify the time or day pretty flexible.

In the call it hands over on which website to search and for what searchterm to look, for example:
genericcrawler/?searchurl=www.parkhaus.de&searchterm=dauerparker&sentTo=me@myself.com

In the script we simply read out the get parameters and initialize our variables:

$searchterm = $_GET['searchterm'] ?? '404';
$url = $_GET['searchurl'] ?? '404';
$sentTo= $_GET['sentTo'] ?? '404';

If the first parameters are not missing, the php script then gets the whole html content of the specified website via

$html = file_get_contents($url);


and searches for the position of the searchterm:

$pos = strpos($html, $searchterm);

If it does not exist, $pos will be false and the script ends.
In case it found the searchterm, the script will send an email with the happy news to the stated email adress.

With this super short script, you will always be one of the first people to find out about new updates one not so modern sites, without the hassle of manually checking the website every day