PHP Classes

PHP Image Crawler: Crawl Web site pages to find images in the pages

Recommend this page to a friend!
  Info   View files Example   View files View files (19)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not yet rated by the usersTotal: 93 This week: 1All time: 9,876 This week: 560Up
Version License PHP version Categories
image-crawler 1.0.0MIT/X Consortium ...5Graphics, Searching, Web services
Description 

Author

This package can crawl Web site pages to find images in the pages.

It provide a script that can be run from the command line that starts a robot to retrieve a Web page with a given URL and follow links to other Web pages in the same site.

The package can return the number of image tags that it finds in the retrieved pages and saves a report to a text file.

Picture of Igor Dyshlenko
  Performance   Level  
Name: Igor Dyshlenko is available for providing paid consulting. Contact Igor Dyshlenko .
Classes: 3 packages by
Country: Ukraine Ukraine
Age: 53
All time rank: 384065 in Ukraine Ukraine
Week rank: 411 Up8 in Ukraine Ukraine Up
Innovation award
Innovation award
Nominee: 1x

Example

#!/usr/bin/env php
<?php
if (!file_exists(__DIR__ . '/vendor/autoload.php')) {
    echo
"\nThe crawler utility not installed. Use \"composer install\" or \"composer update\" for install.\n\n";
    exit(
1);
}

error_reporting(E_ERROR);

include
__DIR__ . '/vendor/autoload.php';

use
App\Console\ArgumentHolder;
use
App\ContentLoader;
use
App\ImgCountHandler;
use
Domain\Site;

const
DEFAULT_TIMEOUT = 60,
DEFAULT_LEVEL = PHP_INT_MAX,
SITE_INDEX = 0;

$consoleArguments = new ArgumentHolder();

$url = $consoleArguments->getParameter(SITE_INDEX);

$paramsError =
    (
filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED) === false) ||
    ((
$parsed = parse_url($url)) === false);
$paramsError |= (!in_array($parsed['scheme'] ?? [], ['http', 'https'], true));
if (
$paramsError) {
    echo
"\nIncorrect URL ", $url, "\n\nUse the utility as follows.\n\n";
}

if (
$paramsError || $url === null || $consoleArguments->getOption('h') !== null) {
    if (
$text = file_get_contents('help.txt')) {
        echo
$text;
    } else {
        echo
"File \"help.txt\" not found.\n";
    }
    exit(
1);
}
$start = microtime(true);
set_time_limit($consoleArguments->getOption('t') ?? DEFAULT_TIMEOUT);

$loader = ContentLoader::getInstance();
$site = new Site($url);
$handler = new ImgCountHandler($site, $url, $loader, [], $consoleArguments->getOption('l') ?? DEFAULT_LEVEL);

$report = $handler->handle($url);
$fullFilename = ($consoleArguments->getOption('d') ?? '.') . '/' . $report->getDefaultFilename();

if (
file_put_contents($fullFilename, $report->getContent()) === false) {
    echo
"\n\nFile ", $fullFilename, " cann't be saved.";
    exit(
1);
}

echo
"\nFile ", $fullFilename, " saved.\n", sprintf("Full runtime = %.3f sec.\n", microtime(true) - $start);


  Files folder image Files  
File Role Description
Files folder imageApp (4 files, 1 directory)
Files folder imageDomain (4 files)
Files folder imageInfrastructure (1 directory)
Files folder imagetests (4 files)
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file composer.lock Data Auxiliary data
Accessible without login Plain text file crawler Example Example script
Accessible without login Plain text file help.txt Doc. Documentation
Accessible without login Plain text file phpunit.xml Data Auxiliary data

  Files folder image Files  /  App  
File Role Description
Files folder imageConsole (1 file)
  Accessible without login Plain text file ContentLoader.php Class Class source
  Accessible without login Plain text file ContentLoaderInterface.php Class Class source
  Accessible without login Plain text file ImgCountHandler.php Class Class source
  Accessible without login Plain text file UrlFilter.php Class Class source

  Files folder image Files  /  App  /  Console  
File Role Description
  Accessible without login Plain text file ArgumentHolder.php Class Class source

  Files folder image Files  /  Domain  
File Role Description
  Accessible without login Plain text file ImgCountReport.php Class Class source
  Accessible without login Plain text file Page.php Class Class source
  Accessible without login Plain text file Report.php Class Class source
  Accessible without login Plain text file Site.php Class Class source

  Files folder image Files  /  Infrastructure  
File Role Description
Files folder imageRepository (1 file)

  Files folder image Files  /  Infrastructure  /  Repository  
File Role Description
  Accessible without login Plain text file PageRepository.php Class Class source

  Files folder image Files  /  tests  
File Role Description
  Accessible without login Plain text file ContentLoaderTest.php Class Class source
  Accessible without login Plain text file ImgCountHandlerTest.php Class Class source
  Accessible without login Plain text file SiteTest.php Class Class source
  Accessible without login Plain text file UrlFilterTest.php Class Class source

 Version Control Unique User Downloads Download Rankings  
 100%
Total:93
This week:1
All time:9,876
This week:560Up