Problem scraping text in an article directory

Questions and answers about anything related to Helium Scraper
Post Reply
caliman
Posts: 21
Joined: Tue May 31, 2011 5:12 pm

Problem scraping text in an article directory

Post by caliman » Fri Mar 16, 2012 6:30 am

Hello. When I try to grab text from the site articulo.org, the inner text property shows a 100% different information.

Code: Select all

http://www.articulo.org/articulo/12954/ventajas_y_desventajas_de_trabajar_en_casa.html


Something similar at ezinearticles.com in the search results (I can not grab anything)

Code: Select all

http://ezinearticles.com/results/?cx=partner-pub-3754405753000444%3A3ldnyrvij91&cof=FORID%3A10&ie=ISO-8859-1&q=marketing&sa=
Thanks

caliman
Posts: 21
Joined: Tue May 31, 2011 5:12 pm

Re: Problem scraping text in an article directory

Post by caliman » Sat Mar 17, 2012 11:32 pm

Any ideas? Thanks

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Problem scraping text in an article directory

Post by webmaster » Sun Mar 18, 2012 1:00 am

caliman wrote:Hello. When I try to grab text from the site articulo.org, the inner text property shows a 100% different information.

Code: Select all

http://www.articulo.org/articulo/12954/ventajas_y_desventajas_de_trabajar_en_casa.html
The inner text of which element?
caliman wrote: Something similar at ezinearticles.com in the search results (I can not grab anything)

Code: Select all

http://ezinearticles.com/results/?cx=partner-pub-3754405753000444%3A3ldnyrvij91&cof=FORID%3A10&ie=ISO-8859-1&q=marketing&sa=
Thanks
This is an iFrame. You'll need to navigate into its src attribute. I'm attaching a project that does that. All you need to do is create a kind that selects the iFrame (you know is selected when the OuterHTML property in the selection panel starts with "<iframe"), then add an Execute Actions Tree action that runs the Go To Frame tree, and select your kind. There is already a kind called Frame that the Sample actions tree uses. If you just run this tree it should navigate into the iframe in the sample page you posted and there you'll be able to select the elements you need.
Attachments
GoToFrame.hsp
(503 KiB) Downloaded 664 times
Juan Soldi
The Helium Scraper Team

caliman
Posts: 21
Joined: Tue May 31, 2011 5:12 pm

Re: Problem scraping text in an article directory

Post by caliman » Sun Mar 18, 2012 7:56 pm

Thanks, the ezine problem was fixed. About the other site, when I select the whole text in an article the inner text column gets a different information. This is the image.

Image

Thank You

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Problem scraping text in an article directory

Post by webmaster » Tue Mar 20, 2012 6:36 am

I see what's going on. These are some scripts at the top of the page. You'll need to use text gatherers here. Just select the whole thing, then go to Project -> Text Gatherers, add a Slice step, and use the following as a delimiter (make sure you remove the line break that is there by default):

Code: Select all

//-->
Then select From last slice (keep the slice position at 0) and you should have the text on the right side.
Juan Soldi
The Helium Scraper Team

caliman
Posts: 21
Joined: Tue May 31, 2011 5:12 pm

Re: Problem scraping text in an article directory

Post by caliman » Fri Mar 23, 2012 4:42 pm

The inner text kind is still showing the same odd content (like in the image above). Do I have to select another kind? I tried other kinds but I can not find any showing the content of the article. thks

webmaster
Site Admin
Posts: 521
Joined: Mon Dec 06, 2010 8:39 am
Contact:

Re: Problem scraping text in an article directory

Post by webmaster » Fri Mar 23, 2012 5:11 pm

You'd use the same kind but as a property choose whatever name you gave to the property you created with the text gatherer instead of inner text. You can also preview it by clicking the Choose active properties button in the selection panel at the bottom and selecting your property (which will start with the "JS_" prefix).
Juan Soldi
The Helium Scraper Team

caliman
Posts: 21
Joined: Tue May 31, 2011 5:12 pm

Re: Problem scraping text in an article directory

Post by caliman » Sat Mar 24, 2012 1:41 am

thanks, works perfect. :P

Post Reply