Byteline - How to reliably scrape job board to AIrtable?

Step by Step Instructions

Byteline provides an advanced Airtable integration that enables you to insert, update and delete records without any coding skills. In this documentation, we are building a job board in the Airtable using the Scheduler Trigger Node, Web Scraper Node, and Airtable - Update Records nodes.

Here we will configure the following nodes to create the Job Board:

Scheduler Trigger Node - First of all, we’ll have to configure the scheduler node to run the flow at a regular interval of time.
Web Scraper Node - After that, we’ll need to configure the Web Scraper node to extract data from a webpage. Here, we will scrape data from the ZipRecruiter’s site.
Airtable - Update Records Node - Lastly, we’ll need to configure the Airtable node to fetch and store the scraped data for creating a Job Board.

Let’s get started.

1. Configure Airtable Trigger Node

Base ID

1. Heading Category

Sub-Heading

2. Heading Category

Sub-Heading

3. Heading Category

Sub-Heading

3. Heading Category

Sub-Heading

3. Heading Category

Sub-Heading

Create flow

In this section, you’ll learn to create the flow. For more details, you can check How to Create your First Flow Design.

Step 1: Enter the flow name to create your new flow.

Step 2: Select the Scheduler trigger node to initiate your flow.

Now, you need to configure the Scheduler Node to run the flow at a regular interval of time.

So, let’s get started with Scheduler node configuration!

Configuring Scheduler

Step 1: Click on the edit button to configure the scheduler node.

We will keep the default values for the Scheduler.

The Scheduler configuration is now complete. Now we need to configure the Web Scraper.

For more information, read how to configure Web Scraper.

Configuring Web Scraper

Step 1: Click on the add button to view the nodes in the select node window.

Step 2: Select the Web Scraper node to add it to your flow.

Step 3: Click on the Edit button to configure the Web Scraper node.

Step 4: Launch ZipRecruiter in a new browser tab and enable the Byteline Web Scraper Chrome extension.

Here, we are scraping a couple of fields such as title, company name, job proposal link from the ZipRecruiter website.

For title

Step 1: Double click on the title to select the job title you would like to scrap.

Step 2: Select the Text option to specify the data type for scraping.

Step 3: Click on the Repeating Elements to scrape the multiple job titles over the web page. We are using repeating elements as multiple jobs are scraped from this page.

The Web Scraper will automatically copy the data to the clipboard.

Step 4: In the Webscraper configuration window, click on the Paste from the Chrome Extension to paste the scraped data.

Step 5: Enter the Array Name to specify the JSON array from which you want fetch elements.

Step 6: Enter the title in the field and its XPath is automatically fetched in the field for scraping the Job title.

For Company Name

Step 1: Double click on the company name to select it for scraping.

Step 2: Select the Text option to specify the data type for scraping.

Step 3: Click on the Repeating Elements to scrape the multiple company names over the web page.

The webscraper will automatically save the data in the clipboard.

Step 4: In the Webscraper configuration window, click on the Paste from the Chrome Extension to paste the scraped data.

Step 5: Enter Company in the field and its XPath is automatically fetched in the field for scraping the Company Name.

For Link

Step 1: Double click on the job title (having hyperlink) to select the link for scraping.

Step 2: Select the Link to specify the data type.

Note: You can also preview the link.

Step 3: Click on the Repeating Elements to scrape the multiple company links over the web page.

Note: The webscraper will automatically save the data in the clipboard.

Step 4: In the Webscraper configuration window, click on the Paste from the Chrome Extension to fetch the scraped data.

Step 5: Enter Link in the field and its XPath is automatically fetched in the field for scraping the Link.

For Location

Step 1: Double click on the location to select the company location for scraping.

Step 2: Select the Text option to specify the data type for scraping.

Step 3: Click on the Repeating Elements to scrape the multiple company locations over the web page.

Note: The Web Scraper will automatically copy the data to the clipboard.

Step 4: In the Web Scraper configuration window, click on the Paste from the Chrome Extension to fetch the scraped data.

Step 5: Enter Location in the field and its XPath is automatically fetched in the field for the location.

Step 6: Enter the web page URL in the field to scrape the web page data.

Step 7: Once you’re done with the configuration, click on the Save button.

Thus you have configured the web scraper to scrape the required job details.

After the configuration of the flow, you will need to perform a test run to make sure the web scraper task works.

Run

Click on the Test Run button to test the flow.

Now, click on the 'i' (more information) button on the top-right corner of the Web Scraper node to check the data content extracted.

You will see a SUCCESS window as shown in the snippet below:

Your Web Scraper node has been configured successfully.

Airtable - Update Records

Connect your Airtable account with the Airtable - Update Records node to fetch job proposal details for creating the job board. For more, you can check out our documentation, on how to configure Airtable - Update Records.

Follow the steps below to update your Airtable- Update Records.

Create

In this section, you’ll learn how to create the flow.

Step 1: Click on the add button to view the nodes in the select node window.

Step 2: Select the Airtable - Update records node to add it to your flow.

Configure

In this section, you’ll learn how to configure the Airtable - Update Records node.

Click on the Edit button to configure the Airtable - Update Records node.

After that, you will need to configure the following in the Airtable configuration window to complete the configuration process:

Airtable Base ID
Airtable Table Name
Grid View
Mapping Airtable Columns Data
Advanced Sections

Airtable Base Id

Step 1: By clicking on the here button as shown in the snippet below, you will be redirected to your Airtable account from where you can fetch your Airtable Base ID.

Step 2: Select your base from the Airtable API to view your API documentation. This will allow you to fetch your Airtable Base ID.

Step 3: Copy the base id as shown in the snippet below. In this case, the base id is "appqwypQsXdide6vu".

Step 4: Paste the copied string in the base Id field of the Airtable configuration window.

Airtable Table Name

Step 1: Go to the base in your Airtable and check the required table name mentioned.

Step 2: Enter the table name in the table name field of the configuration window.

Step 3: Tick the loop over the checkbox for its configuration.

Step 4: Select the array from the dropdown menu to iterate over.

For example, here we are selecting the web_scraper array for loop over. In the next section, we will configure each of the records that web_scraper retrieves as output.

Mapping Airtable Columns Data

In this section, we will learn to configure each of the Airtable columns data with their respective output values. Byteline populates the Field Names for which data needs to be configured.

If any field is missing, you can add it using the + button.

For deleting any unwanted columns, click on the delete button next to each field.

Configure the mapping

You can configure the data type of the selected field in your Airtable by selecting the respective data value.

Step 1: Click on the Selector button to fetch the array path in the Value field of the Link column.

Repeat the above step for all the other Fields as shown in the snippet below:

Advanced

Click on the Collapse button to explore all the advanced options for managing your Airtable data, which includes:

Filter Existing Items
Updates
Deleted at Source

Filtering Existing Items

Filter the existing records in the Airtable that you want to update and delete.

Select the None option if you don't want to apply the filter to the Airtable records.

Updates

To manage the existing records, you can select one of the below-discussed update strategies:

Skip Updates
Overwrite
Overwrite When

Select the Overwrite option to update the existing data records.

Deleted at Source

When you delete a record from the source data but if it still exists in your Airtable, then you can decide how to manage it.

If you select the ignore option, you will not be able to delete any source data.

Assign Primary Key

Hover over any of the field names such as link, title, company, and select the key option to make that field primary key.

The primary key is the unique field in the job proposal, every time the flow executes. The field with the primary key will be compared and updated accordingly.

In this configuration, we’ll assign primary keys to three field names i.e. title, location, and company.

Click on the Save button to save the configuration.

Thus you can scrap the required job details.

After the configuration of the flow, you will need to deploy it.

Deploy

Click on the Deploy button on the top right corner of the interface to deploy the flow.

Run

Click on the Test Run button to test the flow.

Now, click on the 'i' (more information) button on the top-right corner of the Web Scraper node to check the data content extracted.

You have successfully scraped the job board from the ZipRecruiter to Airtable.

If you have any doubts, feel free to connect with us.

‍

Scrape job board to Airtable

CHECK OUT THE BYTELINE FLOW

Byteline flow runs on a scheduler

Scrape job board using Byteline Web Scraper

Jobs to Airtable

Step by Step Instructions

Create flow

Create flow

Configuring Scheduler

Configuring Web Scraper

For title

For Company Name

For Link

For Location

Run

Airtable - Update Records

Create

Configure

Airtable Base Id

Airtable Table Name

Mapping Airtable Columns Data

Configure the mapping

Advanced

Filtering Existing Items

Updates

Deleted at Source

Assign Primary Key

Deploy

Run