Page 1 of 1
Problem scraping text in an article directory
Posted: Fri Mar 16, 2012 6:30 am
by caliman
Hello. When I try to grab text from the site articulo.org, the inner text property shows a 100% different information.
Code: Select all
http://www.articulo.org/articulo/12954/ventajas_y_desventajas_de_trabajar_en_casa.html
Something similar at ezinearticles.com in the search results (I can not grab anything)
Code: Select all
http://ezinearticles.com/results/?cx=partner-pub-3754405753000444%3A3ldnyrvij91&cof=FORID%3A10&ie=ISO-8859-1&q=marketing&sa=
Thanks
Re: Problem scraping text in an article directory
Posted: Sat Mar 17, 2012 11:32 pm
by caliman
Any ideas? Thanks
Re: Problem scraping text in an article directory
Posted: Sun Mar 18, 2012 1:00 am
by webmaster
caliman wrote:Hello. When I try to grab text from the site articulo.org, the inner text property shows a 100% different information.
Code: Select all
http://www.articulo.org/articulo/12954/ventajas_y_desventajas_de_trabajar_en_casa.html
The inner text of which element?
caliman wrote:
Something similar at ezinearticles.com in the search results (I can not grab anything)
Code: Select all
http://ezinearticles.com/results/?cx=partner-pub-3754405753000444%3A3ldnyrvij91&cof=FORID%3A10&ie=ISO-8859-1&q=marketing&sa=
Thanks
This is an iFrame. You'll need to navigate into its
src attribute. I'm attaching a project that does that. All you need to do is create a kind that selects the iFrame (you know is selected when the OuterHTML property in the selection panel starts with "<iframe"), then add an
Execute Actions Tree action that runs the
Go To Frame tree, and select your kind. There is already a kind called
Frame that the
Sample actions tree uses. If you just run this tree it should navigate into the iframe in the sample page you posted and there you'll be able to select the elements you need.
Re: Problem scraping text in an article directory
Posted: Sun Mar 18, 2012 7:56 pm
by caliman
Thanks, the ezine problem was fixed. About the other site, when I select the whole text in an article the inner text column gets a different information. This is the image.
Thank You
Re: Problem scraping text in an article directory
Posted: Tue Mar 20, 2012 6:36 am
by webmaster
I see what's going on. These are some scripts at the top of the page. You'll need to use text gatherers here. Just select the whole thing, then go to
Project -> Text Gatherers, add a
Slice step, and use the following as a delimiter (make sure you remove the line break that is there by default):
Then select
From last slice (keep the slice position at 0) and you should have the text on the right side.
Re: Problem scraping text in an article directory
Posted: Fri Mar 23, 2012 4:42 pm
by caliman
The inner text kind is still showing the same odd content (like in the image above). Do I have to select another kind? I tried other kinds but I can not find any showing the content of the article. thks
Re: Problem scraping text in an article directory
Posted: Fri Mar 23, 2012 5:11 pm
by webmaster
You'd use the same kind but as a property choose whatever name you gave to the property you created with the text gatherer instead of inner text. You can also preview it by clicking the Choose active properties button in the selection panel at the bottom and selecting your property (which will start with the "JS_" prefix).
Re: Problem scraping text in an article directory
Posted: Sat Mar 24, 2012 1:41 am
by caliman
thanks, works perfect.
