Hi Guys,
This is my first project so there is a good chance im doing somthing wrong.
I am following your video's (they are very fast, you should put some voice over to explain what you are doing, and make them a bit slower), and I am trying to extract listings from this site:
naturaltherapypages.com.au /natural_medicine/sa/Naturopath?limitno=10&pageno=10 (I took the http out on purpose and put a space in there so that it will not link).
Anyway, I setup the kinds, and it seems to pick them up properly.
I setup a process tree, but the data in not extracted properly.
The top level info is extracted, but when in the listing, the description is not taken out, but the full address is taken out.
I find it weird, because if I manually go into a listing and click the description kind, the text lights up.
Also I did a run and got the out of resources issue, and the video was 2 fast to follow.
Any help will be appreciated.
Thanks,
Eco
Extraction not working
Re: Extraction not working
Hi Guys,
I am uploading here the project I made, can you tell me what Im doing wrong?
The system is running out of resources + not going to the next page.
Waiting to hear from you asap.
Thanks allot,
Eco
I am uploading here the project I made, can you tell me what Im doing wrong?
The system is running out of resources + not going to the next page.
Waiting to hear from you asap.
Thanks allot,
Eco
- Attachments
-
- natural pages 2.zip
- (53.51 KiB) Downloaded 567 times
Re: Extraction not working
Hi Eco,
Here is how you meant to set up your actions:
Note that it uses a Navigate action instead of a Navigate Each one to navigate through the "Next" button, and the Repeat action is set to repeat 10 times. Now, this method will only let you navigate through a fixed number of pages. Here is how you can make it navigate through as many pages as it finds:
Regarding the memory leak, first try upgrading Internet Explorer to the latest version. If this doesn't help, try setting up your actions this way (to start with):
where the Extract to table: 'Table1' action is setup this way:
Note that I've added a line to it.
Then add another actions tree (called main4 here):
What you'd do is first run main3 and then, once you extracted all the links to Table1, run main4 which will navigate through the links you've extracted to Table1. Note that if you do this, you'd be at the first step of doing a multiple process extraction which would solve the memory leak (if upgrading IE doesn't). Here's how you'd implement multi-processes.
Also, you may want to take a look at this video since you're extracting related data to different tables.
Here is how you meant to set up your actions:
Note that it uses a Navigate action instead of a Navigate Each one to navigate through the "Next" button, and the Repeat action is set to repeat 10 times. Now, this method will only let you navigate through a fixed number of pages. Here is how you can make it navigate through as many pages as it finds:
Regarding the memory leak, first try upgrading Internet Explorer to the latest version. If this doesn't help, try setting up your actions this way (to start with):
where the Extract to table: 'Table1' action is setup this way:
Note that I've added a line to it.
Then add another actions tree (called main4 here):
What you'd do is first run main3 and then, once you extracted all the links to Table1, run main4 which will navigate through the links you've extracted to Table1. Note that if you do this, you'd be at the first step of doing a multiple process extraction which would solve the memory leak (if upgrading IE doesn't). Here's how you'd implement multi-processes.
Also, you may want to take a look at this video since you're extracting related data to different tables.
Juan Soldi
The Helium Scraper Team
The Helium Scraper Team