Page 1 of 1

Problem scraping text in an article directory

Posted: Fri Mar 16, 2012 6:30 am
by caliman
Hello. When I try to grab text from the site articulo.org, the inner text property shows a 100% different information.

Code: Select all

http://www.articulo.org/articulo/12954/ventajas_y_desventajas_de_trabajar_en_casa.html


Something similar at ezinearticles.com in the search results (I can not grab anything)

Code: Select all

http://ezinearticles.com/results/?cx=partner-pub-3754405753000444%3A3ldnyrvij91&cof=FORID%3A10&ie=ISO-8859-1&q=marketing&sa=
Thanks

Re: Problem scraping text in an article directory

Posted: Sat Mar 17, 2012 11:32 pm
by caliman
Any ideas? Thanks

Re: Problem scraping text in an article directory

Posted: Sun Mar 18, 2012 1:00 am
by webmaster
caliman wrote:Hello. When I try to grab text from the site articulo.org, the inner text property shows a 100% different information.

Code: Select all

http://www.articulo.org/articulo/12954/ventajas_y_desventajas_de_trabajar_en_casa.html
The inner text of which element?
caliman wrote: Something similar at ezinearticles.com in the search results (I can not grab anything)

Code: Select all

http://ezinearticles.com/results/?cx=partner-pub-3754405753000444%3A3ldnyrvij91&cof=FORID%3A10&ie=ISO-8859-1&q=marketing&sa=
Thanks
This is an iFrame. You'll need to navigate into its src attribute. I'm attaching a project that does that. All you need to do is create a kind that selects the iFrame (you know is selected when the OuterHTML property in the selection panel starts with "<iframe"), then add an Execute Actions Tree action that runs the Go To Frame tree, and select your kind. There is already a kind called Frame that the Sample actions tree uses. If you just run this tree it should navigate into the iframe in the sample page you posted and there you'll be able to select the elements you need.

Re: Problem scraping text in an article directory

Posted: Sun Mar 18, 2012 7:56 pm
by caliman
Thanks, the ezine problem was fixed. About the other site, when I select the whole text in an article the inner text column gets a different information. This is the image.

Image

Thank You

Re: Problem scraping text in an article directory

Posted: Tue Mar 20, 2012 6:36 am
by webmaster
I see what's going on. These are some scripts at the top of the page. You'll need to use text gatherers here. Just select the whole thing, then go to Project -> Text Gatherers, add a Slice step, and use the following as a delimiter (make sure you remove the line break that is there by default):

Code: Select all

//-->
Then select From last slice (keep the slice position at 0) and you should have the text on the right side.

Re: Problem scraping text in an article directory

Posted: Fri Mar 23, 2012 4:42 pm
by caliman
The inner text kind is still showing the same odd content (like in the image above). Do I have to select another kind? I tried other kinds but I can not find any showing the content of the article. thks

Re: Problem scraping text in an article directory

Posted: Fri Mar 23, 2012 5:11 pm
by webmaster
You'd use the same kind but as a property choose whatever name you gave to the property you created with the text gatherer instead of inner text. You can also preview it by clicking the Choose active properties button in the selection panel at the bottom and selecting your property (which will start with the "JS_" prefix).

Re: Problem scraping text in an article directory

Posted: Sat Mar 24, 2012 1:41 am
by caliman
thanks, works perfect. :P