How to modify XPath in Helium Scraper?

Questions and answers about anything related to Helium Scraper
Post Reply
start_scraping
Posts: 1
Joined: Sun Apr 20, 2014 3:22 pm

How to modify XPath in Helium Scraper?

Post by start_scraping » Sun Apr 20, 2014 6:06 pm

Hi,
I am trying to define a specific part in a Table that unfortunately has no distinct class or id and the text is always changing besides the Header. I got a XPath function working in the Scraper extension for Chrome and tried to implement that into Helium Scraper, without any luck. My knowledge of XPath and Helium Scraper is fairly basic so I'm probably missing something. The code looks like this:

Code: Select all

<tbody><tr>
                                <td colspan="3" class="t-header">Studentenresidenz<span>&nbsp;&nbsp;</span></td>
                            </tr>
                            <tr>
                                <td width="60%" class="t-normal"><span>Magenta House</span> <span>Einzelzimmer mit eigenem Bad&nbsp;Kochmöglichkeit&nbsp;&nbsp;</span></td>
                                <td width="20%" class="t-normal">pro Woche</td>
                                <td width="20%" class="t-normal">295 GBP</td>
                            </tr>
                            <tr>
                                <td width="60%" class="t-normal"><span>NIDO</span> <span>Einzelzimmer mit eigenem Bad&nbsp;Kochmöglichkeit&nbsp;&nbsp;</span></td>
                                <td width="20%" class="t-normal">pro Woche</td>
                                <td width="20%" class="t-normal">355 GBP</td>
                            </tr>
                            <tr>
                                <td width="60%" class="t-normal"><span>Urbanest King's Cross</span> <span>Einzelzimmer mit eigenem Bad&nbsp;Kochmöglichkeit&nbsp;&nbsp;</span></td>
                                <td width="20%" class="t-normal">pro Woche</td>
                                <td width="20%" class="t-normal">320 GBP</td>
                            </tr>
                                    <tr>
                                        <td colspan="3" class="t-normal"><strong>Zusatzinformation</strong></td>
                                    </tr>
                                        <tr>
                                            <td class="t-normal">Hochsaisonzuschlag 21/06/2014 - 23/08/2014</td>
                                            <td class="t-normal">pro Woche</td>
                                            <td class="t-normal"><span>10 GBP</span></td>
                                        </tr>
                                        <tr>
                                            <td class="t-normal">Kaution (vor Ort zahlbar)</td>
                                            <td class="t-normal"></td>
                                            <td class="t-normal"><span>250 GBP</span></td>
                                        </tr>
                                 <tr>
                                    <td colspan="3" class="t-sub_header">&nbsp;</td>
                                 </tr>
                            <tr>
                                <td colspan="3" class="t-header">Gastfamilie<span>&nbsp;&nbsp;</span></td>
                            </tr>
                            <tr>
                                <td width="60%" class="t-normal"><span>Zone 1 + 2</span> <span>Einzelzimmer&nbsp;Halbpension&nbsp;&nbsp;</span></td>
                                <td width="20%" class="t-normal">pro Woche</td>
                                <td width="20%" class="t-normal">220 GBP</td>
                            </tr>
                            <tr>
                                <td width="60%" class="t-normal"><span>Zone 1 + 2</span> <span>Einzelzimmer mit eigenem Bad&nbsp;Halbpension&nbsp;&nbsp;</span></td>
                                <td width="20%" class="t-normal">pro Woche</td>
                                <td width="20%" class="t-normal">280 GBP</td>
                            </tr>
                            <tr>
                                <td width="60%" class="t-normal"><span>Zone 3 + 4</span> <span>Einzelzimmer&nbsp;Halbpension&nbsp;&nbsp;</span></td>
                                <td width="20%" class="t-normal">pro Woche</td>
                                <td width="20%" class="t-normal">170 GBP</td>
                            </tr>
                            <tr>
                                <td width="60%" class="t-normal"><span>Zone 3 + 4</span> <span>Doppelzimmer&nbsp;Halbpension&nbsp;&nbsp;</span></td>
                                <td width="20%" class="t-normal">pro Woche</td>
                                <td width="20%" class="t-normal">140 GBP</td>
                            </tr>
                            <tr>
                                <td width="60%" class="t-normal"><span>Zone 3 + 4</span> <span>Einzelzimmer mit eigenem Bad&nbsp;Halbpension&nbsp;&nbsp;</span></td>
                                <td width="20%" class="t-normal">pro Woche</td>
                                <td width="20%" class="t-normal">210 GBP</td>
                            </tr>
                                    <tr>
                                        <td colspan="3" class="t-normal"><strong>Zusatzinformation</strong></td>
                                    </tr>
                                        <tr>
                                            <td class="t-normal">Hochsaisonzuschlag 21/06/2014 - 23/08/2014</td>
                                            <td class="t-normal">pro Woche</td>
                                            <td class="t-normal"><span>20 GBP</span></td>
                                        </tr>
                                        <tr>
                                            <td class="t-normal">Zuschlag über Weihnachten 20/12/2014 - 27/12/2014</td>
                                            <td class="t-normal">pro Woche</td>
                                            <td class="t-normal"><span>50 GBP</span></td>
                                        </tr>
                                        <tr>
                                            <td class="t-normal">Zuschlag für spezielle Diäten</td>
                                            <td class="t-normal">pro Woche</td>
                                            <td class="t-normal"><span>30 GBP</span></td>
                                        </tr>
                                 <tr>
                                    <td colspan="3" class="t-sub_header">&nbsp;</td>
                                 </tr>

                        <tr>
                            <td colspan="3" class="t-header">Zusatzkosten</td>
                        </tr>
                            <tr>
                                <td class="t-normal">Transfer vom Flughafen London Heathrow zur Unterkunft</td>
                                <td class="t-normal">pro Weg</td>
                                <td class="t-normal"><span>100 GBP</span></td>
                            </tr>
                            <tr>
                                <td class="t-normal">Transfer vom Flughafen London Gatwick zur Unterkunft</td>
                                <td class="t-normal">pro Weg</td>
                                <td class="t-normal"><span>105 GBP</span></td>
                            </tr>
                            <tr>
                                <td class="t-normal">Transfer vom Flughafen London City zur Unterkunft</td>
                                <td class="t-normal">pro Weg</td>
                                <td class="t-normal"><span>100 GBP</span></td>
                            </tr>
                     <tr>
                        <td colspan="3" class="t-header">Anreise / Abreise</td>
                    </tr>
                        <tr>
                            <td class="t-normal" colspan="3">Anreise:&nbsp;Samstag / Sonntag (Residenz: nach 15.00 Uhr)</td> 
                        </tr>
                        <tr>
                            <td class="t-normal" colspan="3">Abreise:&nbsp;Samstag / Sonntag (Residenz: vor 10.00 Uhr)</td> 
                        </tr>
                        <tr>
                            <td class="t-normal" colspan="3">Info:&nbsp;Wenn Sie einen Cambridge Prüfungskurs buchen, überprüfen Sie bitte die aktuellen Prüfungsdaten, damit die Unterkunft entsprechend gebucht werden kann.</td> 
                        </tr>
                    </tbody>


The XPath I got working is the following: //tr[(preceding-sibling::tr[contains (td, 'Zusatzinformation')] and following-sibling::tr[td[@class="t-sub_header"]])]/td[not(@width)][not(@class='t-header')][not(@colspan)]

to get it correctly showing in each row I implemented it in the Scraper extension like the following:

//tr[(preceding-sibling::tr[contains (td, 'Zusatzinformation')] and following-sibling::tr[td[@class="t-sub_header"]])]

Column1: *[1][not(@width)][not(@class='t-header')][not(@colspan)]
Column2: *[2][not(@width)][not(@class='t-header')][not(@colspan)]
Column3: *[3][not(@width)][not(@class='t-header')][not(@colspan)]


The result looks like this, which is exactly what I want:

Image

----------------------------------

Now I tried to put the XPath function in the Field JS_XPath of a Kind, but nothing happens.

How would I have to implement this XPath function to make it work in Helium Scraper?
Or is there a simpler way to get to the same result?
At the end it should also relate to the Header "Zusatzinformation".

Thanks in advance for your help.

Dominic

Post Reply