This is the simplest way to extract data to the database. To create an Extract action, first click on the Extract
menu item, then select the Kinds you want to extract, and click OK. If more than one Kind is chosen, Helium Scraper will figure out by analyzing the HTML structure how to organize the extracted data in rows. Normally, elements that share the same HTML parent node will share the same row in the data table.
On the next window you can modify various extraction parameters and the structure of the output table. Here is the list of parameters and their functions:
| Id Column Name |
If specified, adds an Id Column to the generated table with the given name. This Id can be then gathered by using the ID_{Table Name} property gatherer to keep rows in different tables related to each other in a coherent way.
| Column Name |
The name of the column in the data table where the data will be extracted.
| Kind Name |
The Kind to be extracted.
| Property |
The Property to be extracted from the selected Kind. Only Extraction Properties will be listed. These properties can be set from the Project -> Options menu item.
| Req. Mode |
Requirement Mode. Determines whether the required amount must be at least or exactly the amount given in the Req. Amount column.
| Req. Amount |
Required Amount. The amount of elements that must be selected. If the requirement is not met, a Message Box will appear.
| Unique |
Causes this column to be the or one of the columns that are used by the Extract action to uniquely identify a row. If more than one Unique column is used, every distinct occurrence of all Unique columns considered together will serve as the row identifier.
This means that a row where Column1's value is A, and Column2's value is B, will be considered as different than a row where Column1's value is A and Column2's value is C, despite the fact that Column1's value is A in both rows.
Whenever a row is about to be extracted, the target table will be queried for a row that matches the Unique columns to be extracted. If found, the row will be updated with the new non Unique values if these are different than the stored ones, or otherwise ignored.
| Download |
If checked, Helium Scraper will try to download the resource at the location given by the value of the selected Property. For example, setting the Property column to SrcAttribute will download the images if the selected Kind selects image elements. The value in the data table will be the name of the downloaded file instead of the value of the selected Property. The Downloads Folder can be set at Project -> Options in the main menu.
| Data Type |
Sets the Data Type of the column to be created. These are the available options:
| Custom DT |
Used when the Data Type field is set to Custom. Let you use any Jet SQL (MS Access) data type. When used, the Max Length field is ignored. Any parameter must be set in the Custom DT field itself.
| Max Length |
Used with theText and Numeric data types. See the Data Type field for details.
If the Simulate option is checked, no extraction will be performed, but information will be stored in the Virtual Tree as normally, except that elements "inserted" in the data table will have a NULL ID.
This action cannot contain child nodes.