Extraction not working

Questions and answers about anything related to Helium Scraper
Post Reply
Eco
Posts: 4
Joined: Mon Oct 06, 2014 7:48 am

Extraction not working

Post by Eco » Mon Oct 06, 2014 8:52 am

Hi Guys,

This is my first project so there is a good chance im doing somthing wrong.
I am following your video's (they are very fast, you should put some voice over to explain what you are doing, and make them a bit slower), and I am trying to extract listings from this site:
naturaltherapypages.com.au /natural_medicine/sa/Naturopath?limitno=10&pageno=10 (I took the http out on purpose and put a space in there so that it will not link).

Anyway, I setup the kinds, and it seems to pick them up properly.
I setup a process tree, but the data in not extracted properly.
The top level info is extracted, but when in the listing, the description is not taken out, but the full address is taken out.

I find it weird, because if I manually go into a listing and click the description kind, the text lights up.
Also I did a run and got the out of resources issue, and the video was 2 fast to follow.

Any help will be appreciated.
Thanks,
Eco

Eco
Posts: 4
Joined: Mon Oct 06, 2014 7:48 am

Re: Extraction not working

Post by Eco » Tue Oct 07, 2014 9:20 am

Hi Guys,

I am uploading here the project I made, can you tell me what Im doing wrong?
The system is running out of resources + not going to the next page.

Waiting to hear from you asap.


Thanks allot,
Eco
Attachments
natural pages 2.zip
(53.51 KiB) Downloaded 567 times

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Extraction not working

Post by webmaster » Tue Oct 07, 2014 4:57 pm

Hi Eco,

Here is how you meant to set up your actions:
main1.png
main1.png (9.63 KiB) Viewed 9247 times
Note that it uses a Navigate action instead of a Navigate Each one to navigate through the "Next" button, and the Repeat action is set to repeat 10 times. Now, this method will only let you navigate through a fixed number of pages. Here is how you can make it navigate through as many pages as it finds:
main2.png
main2.png (11.67 KiB) Viewed 9247 times
Regarding the memory leak, first try upgrading Internet Explorer to the latest version. If this doesn't help, try setting up your actions this way (to start with):
main3.png
main3.png (8.1 KiB) Viewed 9247 times
where the Extract to table: 'Table1' action is setup this way:
extract.png
extract.png (22.74 KiB) Viewed 9247 times
Note that I've added a line to it.

Then add another actions tree (called main4 here):
main4.png
main4.png (9.59 KiB) Viewed 9247 times
What you'd do is first run main3 and then, once you extracted all the links to Table1, run main4 which will navigate through the links you've extracted to Table1. Note that if you do this, you'd be at the first step of doing a multiple process extraction which would solve the memory leak (if upgrading IE doesn't). Here's how you'd implement multi-processes.

Also, you may want to take a look at this video since you're extracting related data to different tables.
Juan Soldi
The Helium Scraper Team

Post Reply