Several things from one page

Questions and answers about anything related to Helium Scraper
Post Reply
sunshineseattle
Posts: 1
Joined: Sat Apr 16, 2011 7:04 am

Several things from one page

Post by sunshineseattle » Sat Apr 16, 2011 7:20 am

Please excuse my ignorance - first day of your free trial.

I would like to scrape several pieces of info (prices for example) from one page. The interior text (I think that's right) and the Exterior text are essentially the same for each piece of info. Will HS note that one is in first position, the next in second, etc. as part of its definition?

As a follow on to that - if the answer is yes - in some instances there may not be a price but rather something else; i.e. "N/A" or "Inquire". May I add examples of this out of order? That is can I add "N/A" to the first and subsequent positions from, say, the seventh position without screwing up the logic?

Did that make sense?

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Several things from one page

Post by webmaster » Sat Apr 16, 2011 9:21 am

This is actually an excellent question.

Helium Scraper uses the HTML structure of the page to distribute the extracted rows properly. This means that if each interior text and the exterior text are children of the same parent in the HTML structure, they will be extracted to the same row even if the interior text is "N/A" or anything else.

What this means is that for most cases, Helium Scraper will be able to recognize how to extract the information properly. But I've seen a few web pages where this doesn't hold because the visual and the HTML structure don't match. But these are just exceptions. Best thing to do here is to test it and see what happens.

If it doesn't, a little JavaScript will do it. If you wish you can give me some more detailed information such as the URL of the page to be extracted and I'll be able to give you some more details.
Juan Soldi
The Helium Scraper Team

Post Reply