Introduction to web scraping: Setup

Google Chrome requirements

For the episode on visual scraping in your web browser, the following are required:

Python requirements

We will be using Anaconda for the episode on scraping in Python, as it includes all of the necessary requirements. If you don’t wish to install Anaconda, you will need to install each requirement separately. This document only details installation with Anaconda.

Please set up your Python environment prior to the workshop. If you encounter problems with the installation procedure, ask your workshop organizers via e-mail for assistance, so you are ready to go when the workshop begins.

Recommended:

Optional (already provided with Anaconda):

Installing all requirements using Anaconda

Python is a great language for general-purpose programming, and it even has tools available that help with web scraping. Installing each of the additional tools for this lesson individually can be a bit difficult, so we recommend the all-in-one installer Anaconda.

Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., Python 3.6 version).

Windows - Video tutorial

  1. Open http://continuum.io/downloads with your web browser.

  2. Download the Python 3.x version installer for Windows.

  3. Double-click the executable and install Python 3 using MOST of the default settings. The only exception is to check the Make Anaconda the default Python option.

macOS - Video tutorial

  1. Open http://continuum.io/downloads with your web browser.

  2. Download the Python 3.x version Graphical Installer for macOS.

  3. Install Python 3 using all of the defaults for installation.

Linux

Note that the following installation steps require you to work from the shell. If you run into any difficulties, please request help before the workshop begins.

  1. Open http://continuum.io/downloads with your web browser.

  2. Download the Python 3.x version installer for Linux.

  3. Install Python 3 using all of the defaults for installation.

    a. Open a terminal window.

    b. Navigate to the folder where you downloaded the installer

    c. Type

    $ bash Anaconda3-
    

    and press tab. The name of the file you just downloaded should appear.

    d. Press enter.

    e. Follow the text-only prompts. When the license agreement appears (a colon will be present at the bottom of the screen) hold the down arrow until the bottom of the text. Type yes and press enter to approve the license. Press enter again to approve the default location for the files. Type yes and press enter to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).

Optional installation method

If you’ve opted to install the requirements separately (not recommended), you will find links to them below.

With Miniconda for Python 3 installed, the following can be entered on the terminal command line to install all of the required packages:

conda install spyder lxml cssselect requests

Starting Python

We will use the Spyder IDE, the same IDE used in the Library Carpentry Python course. If you installed Python using Anaconda, Spyder is already on your system.

To start Spyder, open a terminal and type the command:

On Windows and Linux:

$ spyder

On Mac:

$ spyder3