Questions and answers about anything related to Helium Scraper
-
kelvincushman
- Posts: 1
- Joined: Thu Oct 13, 2011 7:50 pm
Post
by kelvincushman » Thu Oct 13, 2011 8:07 pm
hi please can someone help i need to extract some data from a site but the email is protected by javascript please can you tell me how i can extract this informaion
here is the code from page source
Code: Select all
<LI><A class=hnormal href="JAVASCRIPT:showemail('jackiemaria2001AtRePlAcEyahoodOtRePlAcEcodOtRePlAcEuk','State of Play');">jackiemaria2001<IMG alt="band me up" src="/FIND/pics/ngista.jpg" border=0>yahoo.co.uk</A>
dalton1990
band me uphotmail.co.uk
this is what the javascript is replacing
@ = AtRePlAcE
. = dOtRePlAcE
AND THE EMAIL THAT IS BEING EXTRACTED IS AND THIS IS LEAVING THE @ SIGN OUT
dalton1990hotmail.co.uk
br kelvin
-
webmaster
- Site Admin
- Posts: 521
- Joined: Mon Dec 06, 2010 8:39 am
-
Contact:
Post
by webmaster » Fri Oct 14, 2011 7:40 am
Hi,
The attached project should solve your problem. I actually found the site you are trying to scrape and created a text gatherer (you can create these from Project -> Text Gatherers, and can view the created one at Project -> JavaScript Gatherers) that replaces the appropriated strings for @ and the dot. I had to edit the code of the created JavaScript gatherer and change the first line from element.innerText to element.href so that it gets the href property instead of just the text .
Hope this make sense, but anyway, the attached project already has this property gatherer and is called JS_Email, which you would extract from a kind that selects the A elements such as the one in the HTML code you pasted.
Remember that you can import this project into your current project from the File -> Import command.
-
Attachments
-
- BandMeUpEmails.hsp
- (276.39 KiB) Downloaded 648 times
Juan Soldi
The Helium Scraper Team