Stack of codes.
Please comment below your opinion. We will try to improve and bring more accurate contents for you.

Tuesday, 19 December 2017

Scraping Data from Website in php

Scraping Data from Website in php::

There is PHP Simple HTML DOM Parser. It's fast, easy and super flexible.
It basically sticks an entire HTML page in an object then you can access any element from that object.

Document Link : http://simplehtmldom.sourceforge.net/

Like:: get all links on the main Google page:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images 
foreach($html->find('img') as $element) 
       echo $element->src . '<br>';

// Find all links 
foreach($html->find('a') as $element) 
       echo $element->href . '<br>';
 
 

 
Alternatively, 
we can use this library PHPPowertools/DOM-Query.
Document Link: https://github.com/PHPPowertools/DOM-Query

It uses customized version of Masterminds/html5-php under the hood parsing an HTML5 string into a DomDocument and symfony/DomCrawler for conversion of CSS selectors to XPath selectors.
It always uses the same DomDocument, even when passing one object to another, to ensure decent performance.


LIKE::
namespace PowerTools;

// Get file content
$pagecontent = file_get_contents( 'http://www.4wtech.com/csp/web/Employee/Login.csp' );

// Define your DOMCrawler based on file string
$H = new DOM_Query( $pagecontent );

// Define your DOMCrawler based on an existing DOM_Query instance
$H = new DOM_Query( $H->select('body') );

// Passing a string (CSS selector)
$s = $H->select( 'div.foo' );

// Passing an element object (DOM Element)
$s = $H->select( $documentBody );

// Passing a DOM Query object
$s = $H->select( $H->select('p + p') );

// Select the body tag
$body = $H->select('body');

// Combine different classes as one selector to get all site blocks
$siteblocks = $body->select('.site-header, .masthead, .site-body, .site-footer');

// Nest your methods just like you would with jQuery
$siteblocks->select('button')->add('span')->addClass('icon icon-printer');

// Use a lambda function to set the text of all site blocks
$siteblocks->text(function( $i, $val) {
    return $i . " - " . $val->attr('class');
});

// Append the following HTML to all site blocks
$siteblocks->append('<div class="site-center"></div>');

// Use a descendant selector to select the site's footer
$sitefooter = $body->select('.site-footer > .site-center');

// Set some attributes for the site's footer
$sitefooter->attr(array('id' => 'aweeesome', 'data-val' => 'see'));

// Use a lambda function to set the attributes of all site blocks
$siteblocks->attr('data-val', function( $i, $val) {
    return $i . " - " . $val->attr('class') . " - photo by Kelly Clark";
});

// Select the parent of the site's footer
$sitefooterparent = $sitefooter->parent();

// Remove the class of all i-tags within the site's footer's parent
$sitefooterparent->select('i')->removeAttr('class');

// Wrap the site's footer within two nex selectors
$sitefooter->wrap('<section><div class="footer-wrapper"></div></section>');




No comments:

Post a Comment

Popular Posts

Stack Of Codes, stackofcodes, Stack your codes here, Open your coding for all, Php coding, Codeigniter, blogger, Stack of codes Developers, Php Framework, stackofcodes.in, best coding site, Stack the codes, place the code in stack, Stack of codes is the largest, most trusted online community for developers to learn and share their programming knowledge and build their carrers.

Labels

PHP CodeIgniter SQL Facebook HTML Connect mysqli Constructor Destructor Stack Of Codes Update SQL codeignitre delete files from folder Aadhaar Agent Align center image Alternative Control Structure Back Slash Browser Browser version CSS Calculate Date Difference in php Cannot modify header information Check image exists CodeIgniter get_where CodeIgniter-HMVC-3.1.6 Codeigniter 3.1.7 Connect and select database Constants Constructor and Destructor Convert stdClass object to array in PHP Copy Image from URL Date Calculator]php Date Difference Disable Window Update Documents exists Enable Window Update File exists Gas Linking with Aadhaar Get TinyMCE to use full image url instead of relative one Get all files from folder How to delete a file in php Image Corrupted Image Exists in php Ip address Jquery Checkbox check all Make slug in Javascript or Jquery Multiple Submit Buttons in a form Mysqli connect Only variable references should be returned by reference Open and Close html ul tag and close the ul on evry third block Phar Php Version For Codeigniter 3.0 Platform Read All files in Directory Reduce the image file size using PHP Rename Table Robots.txt Generator Robots.txt Generator - SEO Tools - SEObook Scraping Data Scraping Data From Website Scrollbar Design Select Checkbox Server Requirements For Codeigniter 3.1.6 Simple Carousel demo Slug Sub string in Php Substring TinyMCE URL Conversion Example TinyMCE settings to get image path correct UAC Warning in windows pro Unique Check Unique Check in Codeigniter Virtual hosts WSDL cache Win 8.1 with UAC Warning XAMPP to send mail from localhost Xampp apache - XAMPP installation best frameworks best javascript frameworks check if contains http:// in php clean url clear cache codeigniter $this->db->like is case sensitive data containing apostrophes deselect Checkbox all duplicate field entry escape_str every third block ul tag will close facebook Share facebook server foreach etc front Slash get_where in codeigniter hash_hmac headers already sent how to make robot.txt file for my website? htaccess http:// or https:// iS mobile if-else iv size for all mobile / tablet display javascript json_decode in array php looking for a job make url with hyphen media queries. mysql_real_escape_string mysqli real escapr string server down slug in Javascript some function of xamp is possible disabled sql - How to split a single column values to multiple column values tinymce image path showing as ../../ unlink File url contains http:// or https:// while window update