Help / Advice needed to extract Hierarchical Data

Questions and Answers about programming Helium Scraper.
Post Reply
Eamonn O.
Posts: 2
Joined: Thu Feb 28, 2013 3:39 pm

Help / Advice needed to extract Hierarchical Data

Post by Eamonn O. » Thu Feb 28, 2013 4:30 pm

I bought this software a few days ago as I want to extract products details from our site. The products are in various categories and sub-categories. These are of varying tree depth, between 2 & 5 deep.

So really I want to know how to start at the top of the category structure and recursive down the sub- categories extracting the category information along the way. Then when a defined “kind” is empty, signifying the final category, run an extract product sub-process.

Also I do not know how to store the category information. I would like the database structure to allow me to easily re-create the category hierarchy.

Should all the category information be stored in the same table?

Again I don’t know how to do this and preserve the hierarchy. Help Please!

On the bright side, I have successfully created extractions for individual categories and the products within them :)

I’m guessing that this must be quite a common task so hopefully someone will know how to do this.

Thanks

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Help / Advice needed to extract Hierarchical Data

Post by webmaster » Thu Feb 28, 2013 5:27 pm

Can you send me the URL? I'll see if I can come up with a generic solution that everyone can benefit from.
Juan Soldi
The Helium Scraper Team

Eamonn O.
Posts: 2
Joined: Thu Feb 28, 2013 3:39 pm

Re: Help / Advice needed to extract Hierarchical Data

Post by Eamonn O. » Fri Mar 01, 2013 9:22 am

Hi Juan,

The example URL is http://www.karcher.com/int/Products/Home__Garden.htm

I would like to extract the hierarchical category structure as well at the product details shown in the last category of a branch.

I can do some of this using standard - "Extract to table" & "Navigate Each" however this only works well when you have a constant depth for each sub-section.
This method produces 1 new data table for each sub-level, it would be better to have all the section / sub-section information stored in 1 table.

This article clearly explains a data structure for this kind of information.
http://karwin.blogspot.co.uk/2010/03/re ... ables.html

I'm guessing that the recursion down thru the sections would be done within a "While Loop" but I cannot see how to set the condition to be the state of a "Kind" on a given section page.

Also I am stuck on how to determine each levels position (e.g. ancestor, descendant values) which would be needed to store hierarchical information in a "Closure Table" style structure.

Thanks

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Help / Advice needed to extract Hierarchical Data

Post by webmaster » Wed Mar 06, 2013 5:23 pm

Hi,

Here is a premade you can use and a sample. If you open CategoriesTemplate.hsp and go to Project -> Notes you'll get some instructions on how to use the project. The CategoriesSample.hsp file is a sample that extracts from the site you provided. To use it, open it, import the CategoriesTemplate.hsp file and then run the Sample actions tree. Note I've set the Max. Depth to 4. To get deeper levels, you'll need to create kinds for them and change the Max. Depth.

I've tested the template in a couple of places and seemed to work OK. If you have any problem understanding the way the template works or need any help on how to use it please don't hesitate to write back.
Attachments
CategoriesTemplate.hsp
(525.01 KiB) Downloaded 830 times
CategoriesSample.hsp
(534.83 KiB) Downloaded 834 times
Juan Soldi
The Helium Scraper Team

Post Reply