logo

PHP DomDocument Tutorial

Add comment

This will be a quick tutorial that will show you how to use PHP’s DOMDocument to parse your XML so you do not have to use XML parser. In this tutorial you’ll see how to loop through your XML file and how to extract some specific data. For this, we will use XML file that is available on w3schools.com.

Load DocumentTop

First thing we have to do is to make an instance of DOMDocument class.

$dom = new DOMDocument();

Now that we have instance, we can load document. To do that we'll use load method. As an argument we pass path to our XML file.

$dom->load('http://www.w3schools.com/XML/simple.xml');

Now our document is loaded and we can do with it what we want.

Loop Through XMLTop

Now we will loop through our XML file. Let's say we want to print data from food elements. To do that we have to select them first. We'll do that by using getElementsByTagName method and pass name of our element (food).

$food = $dom->getElementsByTagName('food');

If you do var_dump on $food variable you'll see that you get instance of DOMNodeList class. It has method item and variable $length so you can loop through them or use foreach to do the job. When you loop you'll get each element that you got from your query as DOMElement and then you can do new queries or modify that element. We'll just loop.

<ul>
< ?php
//Loop through each item
foreach ($food as $elem) {
?>
    <li>
        <table>
            <tbody>
                <tr>
                    <td><b>Name:</b></td>
                    <td>< ?php echo $elem->getElementsByTagName('name')
                                        ->item(0)
                                        ->nodeValue; ?></td>
                </tr>
                <tr>
                    <td><b>Description:</b></td>
                    <td>< ?php echo $elem->getElementsByTagName('description')
                                        ->item(0)
                                        ->nodeValue; ?></td>
                </tr>
                <tr>
                    <td><b>Price:</b></td>
                    <td>< ?php echo $elem->getElementsByTagName('price')
                                        ->item(0)
                                        ->nodeValue; ?></td>
                </tr>
                <tr>
                    <td><b>Calories:</b></td>
                    <td>< ?php echo $elem->getElementsByTagName('calories')
                                        ->item(0)
                                        ->nodeValue; ?></td>
                </tr>
            </tbody>
        </table>
    </li>
< ?php
}
?>
</ul>

As you can see, we do new queries on our element retrieved from loop. We want element with tag names name, description, price and calories. When we get our element we want first of them that is in list of elements retrieved from query and it's value. This is how we loop through our XML. Result should be something like this.

  • Name: Belgian Waffles
    Description: two of our famous Belgian Waffles with plenty of real maple syrup
    Price: $5.95
    Calories: 650
  • Name: Strawberry Belgian Waffles
    Description: light Belgian waffles covered with strawberries and whipped cream
    Price: $7.95
    Calories: 900
  • Name: Berry-Berry Belgian Waffles
    Description: light Belgian waffles covered with an assortment of fresh berries and whipped cream
    Price: $8.95
    Calories: 900
  • Name: French Toast
    Description: thick slices made from our homemade sourdough bread
    Price: $4.50
    Calories: 600
  • Name: Homestyle Breakfast
    Description: two eggs, bacon or sausage, toast, and our ever-popular hash browns
    Price: $6.95
    Calories: 950

Retrieve Specific ElementTop

Let's say we want to get value of name element in third food element and print it out. We will use something like this.

$third = $dom->getElementsByTagName('food')
             ->item(2);
echo sprintf(
    'Name of third element is: <b>%s</b>',
    $third->getElementsByTagName('name')
          ->item(0)
          ->nodeValue
);

You should get result like this.

Name of third element is: Berry-Berry Belgian Waffles

ConclusionTop

DOM classes in PHP are very powerful and I like to use them for parsing XML much more then XML parser because they are build in object-oriented way and can be very easy extended. Thank you for reading.


logo

32 comments to “PHP DomDocument Tutorial”

  1. Amit says:

    I made a link extractor script reading ur totorial. keep posting dude !

  2. Dalibor says:

    parse_url kida link na dijelove ali nekontam sta time dobivam jer pojedini dijelovi url-a u ovom slucaju mi nista neznace. parse_str parsira dijelove linka u varijable… a ja zelim iscupat “product-card-column-1″ div na osnovu url-a
    ovo dolje je parse url koji sam izvadio iz url-a

    https://content.it4profit.com/itshop/itemcard_cs.jsp?ITEM=110829054057608743&THEME=asbis&LANG=hr

    Array ( [scheme] => https [host] => content.it4profit.com [path] => /itshop/itemcard_cs.jsp [query] => ITEM=110829054057608743&THEME=asbis&LANG=hr ) /itshop/itemcard_cs.jsp

  3. Daniel says:

    Hi,
    I got an error like Fatal error: Call to a member function getElementsByTagName() on a non-object in. How can i solve this

  4. Dalibor says:

    Rjesio sam ipak
    Trebalo je samo dodati $xml->PRICES->PRICE
    neznam zasto nije odmah radilo a probo sam.

    Kad smo vec kod toga znas li mozda kako na PRODUCT_CARD linku odsjeci gornji blok (ASBIS logo + linkovi)
    Taj link u PRODUCT_CARD tagu je ustvari opis proizvoda koji bi trebao ubaciti u webshop ali nezelim linkove veleprodaje…
    moze li DOM iscupati samo dio te kartice u neku varijablu koju bi onda stavljo u bazu umjesto linka koji mi nude u xml fajlu.
    Thanx

  5. Dalibor says:

    Bok Marijan
    Hvala na tutorijalu
    Imam jedan sitan problem please help

    Pokusavam parsirat xml fajl http://c-bit.hr/1/ASBIS/PriceAvail6.xml
    na sljedeci nacin :

    $url = “http://c-bit.hr/1/ASBIS/PriceAvail6.xml”;
    $xml = simplexml_load_file($url);

    // loop begins
    foreach($xml->PRICE as $PRICE)
    {blok naredbi}

    I ova petlja normalno radi ali samo ako iz xml fajla prethodno editorom odrezem CONTENT tag.
    tj ostane u njemu samo “PRICES” i “PRICE”
    Nekontam zasto nemogu izvuci varijablu $PRICE ako je fajl orginalan kao na linku

    Hvala unaprijed

  6. Raptorak says:

    Awesome tutorial. Thanks! How would one go about parsing many pages though (for example a paginated URL to pull page 1, page 2, etc. until it returns an empty dataset)?

  7. Jerry Yurow says:

    Very helpful, Marijan. You should follow-up this tutorial with others. This one covers the “Read” aspect of XML as a “database.” The other aspects are: Creating nodes, Updating (changing a node’s value), and Deleting a node. Some useful variations would be deleting all nodes with a given value other than the first one with that value. I am sure you can think of others as well.

  8. Jason says:

    Hi,

    Just a quick note to say your tutorial helped me. Thanks for posting.

    Jas

  9. haneef says:

    working good., thankx for posting

  10. Sreejith says:

    Thank you

  11. How do you mean “Delete HTML page”? You mean to delete elements from HTML?

Leave a Reply


 *


 *


logo
logo
Powered by Wordpress | Designed by Elegant Themes | CopyRight ©2014 php4every1.com