Data scraping eCommerce sites to Google Sheets

July 8, 2021
7
minutes read
Web Scraper, Web Scrape Chrome extension, No-Code, Google Sheets

WEBINAR

Join our free webinar on

Using Byteline Web Scraper

Thursday Jun 10 at 1 pm ET

Join us to learn the usage of Byteline's no-code Web Scraper along with its Chrome extension. We will then utilize it to scrape an eCommerce store.

Check out the agenda and let us know what more should be covered.

Web scraping is a fast and cost-effective process of extracting voluminous data from web pages and then exporting it in a structured format such as a spreadsheet. Using the web scraper tools, we can extract text, images, and rich text in a matter of a few minutes. 

With Byteline’s web scraping as a service, users can easily scrape data from any website without any coding efforts. You can simply install Byteline’s chrome extension, pick the data to be extracted, and export the data to Airtable, Google Sheets, or any other Byteline integration. 

How to Scrape Data Using Web Scraper 

This blog has illustrated a few easy steps to help you extract data and export it to Google Sheets using a robust web scraping tool developed at Byteline. For a better understanding, we will source Etsy URLs from a Google sheet, scrape the data (product’s titles and price), and export it to the same sheet but in a different tab. In addition, we will use a loop over to extract multiple URLs. 

We have divided this article into three sections to convey an easy understanding, as illustrated below: 

  1. Extract Data Using Google Sheets - Read Values node
  2. Configure the Web Scraper Node
  3. Export Data Using Google Sheets - Insert All node

Let’s Begin.

Extract Data Using Google Sheets - Read Values node

In this section, you will learn how to create a flow, set the scheduler, and extract data using Byteline’s 'Google Sheets - Read Values' node.  

Step 1: Click on the add button next to the FLOWS tab in the left sidebar of the console to create a new flow.  


Step 2: Assign a name to the flow. 


Step 3: Select the Scheduler node to run the task (data scraping) at regular intervals to identify the new entries, if any. 


Step 4: Click on the Edit button to configure the scheduler settings. 


Step 5: Select the time interval at which you want the scraper to run automatically. Here, we have configured the node to run every couple of hours. 


Step 6: Click on the Save button to save the scheduler settings. 


Step 7: To read the URLs to scrape from a Google Sheets spreadsheet, you will need to add Google Sheets - Read Values node to your flow first. 

Simply click on the '+' button and select Google Sheets - Read Value node. 


Step 8: Click on the Edit button to configure Google Sheets - Read Values node. 


Step 9: Click on the SignIn to Google Sheets button to access the spreadsheet through one of your preferred Google accounts.   


Step 10: Click on the Allow button to confirm your choice of account. 


Step 11: Now, switch to the Byteline console and click on the Refresh button. You will see the Connected status in green color on the console. 


Step 12: Enter the Spreadsheet Id in the Spreadsheet Id box. 


Now, let’s find out how you can fetch the spreadsheet Id. 

How to Find the Spreadsheet ID

Step 1: Open the Google Sheet in which you want to insert the data.


Step 2: Your Spreadsheet Id is the string in the Google Sheet URL as shown below: 


Step 3: Switch to the Byteline console to enter the Spreadsheet Id in the Spreadsheet Id input box and hit the Save button.  


Step 4: The green indicator reflects that the node has been successfully configured. 


Step 5: Perform a test run by clicking on the Deploy button. After that, hit the Run button to ensure you receive the requested data from Google sheets. 


Note: Make sure to deploy the flow whenever you make any changes to the data.  

Step 6: Once the flow is successfully configured, click on the ‘i (info)’ button to view all the extracted data from the URLs. 


After sourcing the desired data, you should be able to scrape it.  

Configure the Web Scraper Node 

In this section, you will learn how to configure Byteline’s Web Scraper node to extract the required data.  

Step 1: To configure the Web Scraper Node, go back to flow and add the Web Scraper node. 


Step 2: Click on the Edit button to configure the Web Scraper node. 


Note: To configure the node, you need to use the Web Scraper Chrome extension. You can learn more about it from this documentation.


Step 3: Using Chrome Extension, we have fetched the product title and price from the first Amazon link. 

Step 4: Next, we need to configure the loop over, which always requires an array field. To understand loop over basics, check out this doc. After checking the ‘loop over’ checkbox, click on the Select Variable Tool button, and you will observe all the variables available in a flow to be used. 


Step 5: The output tab reflects the output of various nodes. We will use the output from the Google Sheets Read Values and copy the text by simply clicking on the transformed array. 


Step 6: For selecting the URL, click on the Select Variable Tool on the right of the URL field and click on the  URL value in the selection tool. As you're using loop over, it will set @.column0 in the URL field, which means it will use column0 field from each of the iterated objects. 


Step 7: After saving the configuration settings, perform a test run by clicking on the Deploy button. After that, hit the Run button to make sure the node is configured successfully. 


Step 8: Once you get the success message, click on the ‘i (info)’ button to view the output. 




Web Scraper output

Now, let’s see how you can export this extracted data into the Google sheet. 

Export Data Using Google Sheets - Insert All Node 

In this section, you will learn how to export requested data using Byteline’s Google Sheets - Insert All node into your Google sheet. 

Step 1: Go back to the flow and add Google Sheets - Insert All node. 


Step 2: Configure the node by clicking on the Edit button. 



Step 3: Enter the Spreadsheet Id and the sheet name where you want to insert the scraped data. 




Note: Do not forget to check the Clear values from spreadsheet checkbox, so whenever you update the data, it clears the existing data and inserts new values. 



Step 4: For spreadsheet data, click on the Select Variable button to select the Web Scraper output. 



Note: We are using the array here as we want to add multiple rows to the Google sheet. 

Step 5: After saving the configuration settings, perform a test run by clicking on the Deploy button. After that, hit the Run button to make sure the node is configured successfully. 

 



Step 6: Once you get the success message, click on the ‘i (info)’ button to see the output.



 

The output here shows that the web scraper tool has updated 3 rows and 3 columns. 

Step 7: Navigate to Google Sheet and check the ‘Scraped Data’ sheet to see the extracted data. 



You have successfully imported the extracted data to your Google Sheets. 

Conclusion 

With Byteline’s Web Scraper node, you can easily extract text, rich text, or even images with the simple steps outlined in the blog. The blog has intuitively explained how you can source data from the Google Sheet, scrape the data, and export the data to another Google Sheet. The no-code Web Scraper node allows you to conveniently extract the data without using any technical know-how of high-end programming languages. 

Isn’t it amazing? Do let us know how you have used this Web Scraper tool!

Resources

  • Loop over basics: https://docs.byteline.io/article/wleqk06swt-loop-over
  • Web Scraper: https://docs.byteline.io/article/jpgei032m3-web-scrapper-configuration
  • Google Sheets Insert Values: https://docs.byteline.io/article/66iu5stfiw-google-sheets-insert-all-values

Upvote this feature

If you like this feature and are interested in using it, please upvote it from the Byteline Console at https://console.byteline.io

How can I use it?

This feature is generally available and you can start using it from the Byteline Console at https://console.byteline.io/