select a kind with a part of its content

Questions and answers about anything related to Helium Scraper
Post Reply
Jmarc
Posts: 4
Joined: Fri Aug 16, 2013 5:45 pm

select a kind with a part of its content

Post by Jmarc » Sat Aug 17, 2013 10:15 am

I need to extract a text inside a code like :

Code: Select all

<p><strong>Name: </strong>Dylan</p>
<p><strong>First name: </strong>Bob</p>
<p><strong>Age: </strong>26</p>
To get each value (Dylan, Bob and 26), I try to define kinds based on a part of each text value.

For example, for the firts name kind, I need to define inside the kind editor that I want the Inner Text with "First name:" inside.

How can I do that ? Is it possible to select a property in the kind editor by given only a part of its value ?

Thanks for your help
Jean-marc

Jmarc
Posts: 4
Joined: Fri Aug 16, 2013 5:45 pm

Re: select a kind with a part of its content

Post by Jmarc » Sat Aug 17, 2013 2:33 pm

I found in the Helium blog, this article : The often overlooked JavaScript Gatherer (http://heliumscraper.com/wordpress/?p=115)
There is a example that extract links pointing to webpages in one particular domain (http://www.example.com). For that, a JavaScript gatherer that gets the domain of the URL of the links is created.
This solution sound quite complicated for a non developer user :?
In this example, it would have been much more "friendly user" to allowed the user to edit the kind of the url links collected on the page and just write that the value of the selected url have to contain the "www.example.com" text inside (sommething like : WHERE value LIKE '%www.example.com%')

Post Reply