Project running out of resources

Questions and answers about anything related to Helium Scraper
Post Reply
Sandor
Posts: 3
Joined: Fri May 11, 2012 8:05 am

Project running out of resources

Post by Sandor » Fri May 11, 2012 8:56 am

Hi,

I'm a non-tech newbie HS user and have some problems with resources when I run my project.

I want to collect data from 3 different levels on the website( toplevel , productgrouplist with multiple pages , productpage) .
It starts running and puts the different data in 3 tables but the program stops after running some 900 products .
The complete selection excists of more than 500.000 products but I only want to gather around the 15000 products .

When running my project it stops and the message 'Helium Scraper is running out of resources' .
I am probably doing something wrong , but I have no clue in what direction I have to search.
Could someone point me into the right direction ?

Thanx.

My system: I3-2310 cpu with 4 G. memory
Windows 7 Pro
Helium Scraper: 2.3.3.0

unfortunately I can't upload my project cause the size is 1.6mb ?!

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Project running out of resources

Post by webmaster » Sat May 12, 2012 3:57 am

Which version of Internet Explorer do you have installed?
Juan Soldi
The Helium Scraper Team

Sandor
Posts: 3
Joined: Fri May 11, 2012 8:05 am

Re: Project running out of resources

Post by Sandor » Tue May 15, 2012 7:27 am

I'm running IE9 on my pc, and browser emulation is set to standard.

The strange thing is that this problem occurs at my work and when I do the job at home I don't encounter the problem.
And that is an old AMD3400+ pc with 2 gb memory.

made a new project and I have attached it a zip-file. Maybe you see what I'm doing wrong.
Attachments
Test2.zip
(39.36 KiB) Downloaded 606 times

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Project running out of resources

Post by webmaster » Thu May 17, 2012 5:17 am

I've looked at your project but can't figure out what is the URLs kind supposed to select (is not selecting anything on my end). Can you send a screenshot of a selected URL? Also, about how long it takes until you see the error?
Juan Soldi
The Helium Scraper Team

Sandor
Posts: 3
Joined: Fri May 11, 2012 8:05 am

Re: Project running out of resources

Post by Sandor » Thu May 17, 2012 12:41 pm

It selects the URL's on the second level. So if you were on the top level you didn't see anything.

Top-level:
Index » Pricewatch » Unsorted
Kind: Categorie

Second-level:
Index » Pricewatch » Unsorted » Geheugen intern (or any other category)
Kind: Productnaam , Prijzen, Datum, Laagste prijs, URL, PRIO

Third-level:
Index » Pricewatch » Unsorted » Apple » Apple Memory Module 1GB 1066MHz DDR3 EC... (or any other product within that category)
Kind: Merk

The resource problem occurs after about 8 / 9 minutes and around 900 products.
And when this works I hope it also runs over the caterogy's with more than one page :D

thanks for your time
Attachments
URL.jpg
URL.jpg (189.43 KiB) Viewed 10557 times
Test3.zip
version 1.02 of project
(37.21 KiB) Downloaded 609 times

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Project running out of resources

Post by webmaster » Sun May 20, 2012 6:05 pm

Hi,

I'm not sure why is one of your machines not running out of resources. This site is leaking memory also on my end. But here is a solution. The Start Processes action is similar to the Navigate URLs one, except that it navigates to each URL in a new process. You can double click the one in the attached project to see how is configured. Note that the ID column is checked. This will let you extract the Cats' table ID from Extract actions in the child processes.

Now, for the data to be extracted from multiple process to the same database, you need to export and connect to the database. Just open the attached project, then go to the database panel and click Export Database -> Export and Connect, save the database anywhere you want and then save the project. Then run the "action1" tree and you'll see other processes come up. This will also speed up the extraction since you'll have more than one processes extracting at the same time.

Also, and this is not related to the memory problem, notice that I'm using the premade Go Through All Pages instead of just navigating through the next button. This premade will turn the pages as long as there is a next button. I haven't tested your NEXT kind but I'm assuming it will select the next button whenever is present. In case you need it for another project, you can import this premade from inside any actions tree at New action -> Execute Actions Tree -> More....

I recommend checking out the documentation on the Start Processes action so you get a clear idea of what's going on.

Let me know if you have any questions.
Attachments
Test4.hsp
(587.63 KiB) Downloaded 630 times
Juan Soldi
The Helium Scraper Team

Post Reply