Hello,
i'm really amazed by this program, it gives me a lot of help!
I would like not to download 'all' the photos from a page just the one type i registered in Kinds (_F.jpg in my case)
How to tweak to get only one photo from the page, please? I don't want duplicate lines (automatically made by the program for each photos downloaded).
Any advice, please?
Bests
extract more photos than i want, and duplicate lines
-
- Posts: 6
- Joined: Fri Mar 23, 2012 1:07 pm
extract more photos than i want, and duplicate lines
- Attachments
-
- Win [En fonction].jpg (164.81 KiB) Viewed 8983 times
Re: extract more photos than i want, and duplicate lines
Hi,
Well the kind is definitely selecting all the photos. Does the "_F" photo have any particular difference compared to the rest of them? Perhaps you could try going to Project -> Options -> Select Property Gatherers and select all under the Kind Defining tab, and then creating your photo kind again.
If the only reason why you don't want to download the other pictures is to prevent duplicated lines, then you can also extract them to another table and relate them by an Id. Assuming your main table (that one you are not downloading the pictures to) is called "Table1", you'd add a column to your "Pictures" table (the one you'd extract the photos to) that extracts the ID_Table1 property from the BODY kind. Then you'd place the Extract action that extracts to the "Pictues" table right underneath the one that extracts to the "Table1" table. This will keep both tables related and prevent duplicated lines while letting you download more than one picture per page.
Well the kind is definitely selecting all the photos. Does the "_F" photo have any particular difference compared to the rest of them? Perhaps you could try going to Project -> Options -> Select Property Gatherers and select all under the Kind Defining tab, and then creating your photo kind again.
If the only reason why you don't want to download the other pictures is to prevent duplicated lines, then you can also extract them to another table and relate them by an Id. Assuming your main table (that one you are not downloading the pictures to) is called "Table1", you'd add a column to your "Pictures" table (the one you'd extract the photos to) that extracts the ID_Table1 property from the BODY kind. Then you'd place the Extract action that extracts to the "Pictues" table right underneath the one that extracts to the "Table1" table. This will keep both tables related and prevent duplicated lines while letting you download more than one picture per page.
Juan Soldi
The Helium Scraper Team
The Helium Scraper Team
-
- Posts: 6
- Joined: Fri Mar 23, 2012 1:07 pm
Re: extract more photos than i want, and duplicate lines
Great thank you!
I'll go for double scraping, one for text , the other for images.
But, another thing bother me.
When scraping text from the central part of the page i usaully have to choose different part of it. From the exemple before i have : "citation", resumé", "notes" and "bio".
I would have the entire text wrapped toghether, instead of 4 different colums. Since i'm scraping thousands of pages, this would be of a great help.
Any way to get it the right way, please?
Bests
I'll go for double scraping, one for text , the other for images.
But, another thing bother me.
When scraping text from the central part of the page i usaully have to choose different part of it. From the exemple before i have : "citation", resumé", "notes" and "bio".
I would have the entire text wrapped toghether, instead of 4 different colums. Since i'm scraping thousands of pages, this would be of a great help.
Any way to get it the right way, please?
Bests
Re: extract more photos than i want, and duplicate lines
Do you need the text to show up on 4 columns or all in one column? If one column, try selecting any of these items and clicking on the Select Parent button at the bottom until the whole text is selected and then creating your kind.
Juan Soldi
The Helium Scraper Team
The Helium Scraper Team