Pyppeteer: How To Use Puppeteer in Python (Installation Guide and Examples)

Puppeteer is developed for Node.js but you can also use it to automate Chrome/Chromium in Python with Pyppeteer. We'll show you how in this article, with examples like taking screenshots, downloading images, and extracting data from a web page.
by Josephine Loo ·

Contents

    The advancements in technology and the need for increased efficiency and productivity have contributed to the rising popularity of automation. People are looking for solutions to automate monotonous tasks. Among all programming languages, Python is one of the most popular ones for automation. As it offers a simple syntax and extensive libraries that make coding easier, it is widely used to create automated tasks.

    One of the most popular browser automation libraries is Puppeteer. Although it doesn’t have official support for the Python language, the Python community has developed a library for it—Pyppeteer. In this article, we’ll learn how to use this popular browser automation library in Python, with some simple examples.

    What is Puppeteer

    Puppeteer is a Node.js library that is open-source and maintained by Google. It is developed to provide a high-level API to control Chrome/Chromium over the Chrome DevTools Protocol. Although Puppeteer is developed to run on Chromium, their collaboration with Mozilla has made cross-browser testing on Firefox available from version 2.1.0 onwards.

    Puppeteer can automate tasks that are typically performed manually by humans in a web browser. It is commonly used by developers to run website tests, generate website screenshots and PDFs, scrape web pages, automate form submissions, etc.

    While Selenium is the primary browser automation library for Python, it focuses on cross-browser automation. On the other hand, Puppeteer focuses solely on Chrome/Chromium and supports richer functionalities for this particular browser. Therefore, if you need to perform tasks that are specific to Chrome/Chromium, consider using Puppeteer instead.

    Installation

    Note: The Pyppeteer library can only be used with Python v3.6 and above. Check your Python version by running python –v or python3 -v in the terminal/command prompt. If it's below that, you will need to upgrade it.

    Open a terminal/command prompt and run the command below to install Pyppeteer:

    pip3 install pyppeteer
    

    A recent version of Chromium will be downloaded automatically when you run Pyppeteer for the first time and it might take a while. If you would like to install it manually and execute your first Pyppeteer automation faster, run the command below after the Pyppeteer installation command:

    pyppeteer-install
    

    That’s it, Pyppeteer has been installed and we can start using it to automate Chrome/Chromium in Python!

    Getting Started with Pyppeteer

    Let’s start with a simple example that launches the browser. To begin using Pyppeteer, we need to import the necessary modules (asyncio and launch) and initialize a new browser instance.

    import asyncio
    from pyppeteer import launch
    

    Then, define an async function to launch the browser and perform different actions (which we will implement later). After performing the actions, close the browser to end the automation.

    async def main():
        browser = await launch()
        page = await browser.newPage()
    
        # Perform actions on the page
    
        await browser.close()
    

    Finally, run the function.

    asyncio.get_event_loop().run_until_complete(main())
    

    As no action is performed, the browser is launched and then closed immediately and you won't see what is going on even if you launch it in the headful mode (launch(headless=True)). Next, we’ll add actions to the code to perform the tasks below:

    • Taking screenshots of a web page
    • Downloading images from a web page
    • Scraping data from a web page

    Performing Actions with Pyppeteer

    Pyppeteer provides an extensive API for performing various actions in the Chrome/Chromium browser. For the sake of simplicity, we’ll only include the code for performing actions in the following code snippets. Simply insert them into the code above to complete your script.

    Taking Screenshots of a Web Page

    Taking screenshots of a web page with Pyppeteer is easy. First, navigate to the target web page using the goto() method and then capture a screenshot of the page using screenshot(), with the file name (screenshot.png) specified.

    await page.goto('https://www.browserbear.com/blog/how-to-auto-fill-forms-with-data-from-a-csv-file-in-python-using-browserbear/')
    await page.screenshot({'path': 'screenshot.png'})
    

    Execute the script and you will find the screenshot in your project folder. Here’s what the screenshot looks like:

    screenshot of a website taken using Pyppeteer

    You can also set a specific dimension for the viewport:

    await page.setViewport({ 'width': 1280, 'height': 720 })
    await page.goto('https://www.browserbear.com/blog/how-to-auto-fill-forms-with-data-from-a-csv-file-in-python-using-browserbear/')
    await page.screenshot({'path': 'screenshot.png'})
    

    The screenshot will be taken in the dimension specified.

    screenshot of a website taken using Pyppeteer (different dimension)

    To take a full-page screenshot, set fullPage to True in the screenshot() method.

    await page.setViewport({ 'width': 1280, 'height': 720 })
    await page.goto('https://www.browserbear.com/blog/how-to-auto-fill-forms-with-data-from-a-csv-file-in-python-using-browserbear/')
    await page.screenshot({'path': 'screenshot.png'}, 'fullPage': True })
    

    The full page will be captured:

    screenshot of a website taken using Pyppeteer (full screen)

    Downloading Images from a Web Page

    Downloading images manually is unfeasible and inefficient if you’re trying to scrape them from a website that has hundreds or even ten of thousands of images, like e-commerce stores, e-commerce platforms, social media platforms, news sites, etc.

    Instead of doing it manually, use Pypetteer to automate the process. We’ll locate the images using a query selector to retrieve their URLs and then download them into a folder.

    For this task, we’ll need the requests and os modules. Import them in your code:

    import requests
    import os
    

    Then, create a function that saves an image from its URL:

    def save_image_from_url(url, file_path):
        response = requests.get(url)
        if response.status_code == 200:
            print(file_path)
            os.makedirs(os.path.dirname(file_path), exist_ok=True)
            with open(file_path, 'wb') as file:
                file.write(response.content)
                print("Image saved successfully.")
        else:
            print(f"Failed to save image. Status code: {response.status_code}")
    

    Lastly, in your code where the action is performed, add the code below:

    await page.goto('https://www.browserbear.com/blog/how-to-auto-fill-forms-with-data-from-a-csv-file-in-python-using-browserbear/')
      
    image_urls = await page.evaluate('''() => {return Array.from(document.querySelectorAll('img')).map(img => img.src)}''')
    
    for i, url in enumerate(image_urls):
        save_image_from_url(url, "images/" + url.split("/")[-1])
    

    This will go to the target web page, locate elements with the <img> tag, and extract their URLs. When the script is executed, all images on that web page will be downloaded into the images folder.

    images downloaded using Pyppeteer

    Scraping Data from a Web Page

    Scraping data from a web page works similarly to downloading images. First, we’ll locate the target HTML element using a query selector or the evaluate() function. After the elements have been located, extract text or other attributes from them.

    Let’s scrape the job titles from this job board. On the web page, identify the query selector that you need to use to retrieve the target HTML element. In this case, it’s the <a> tag.

    screenshot of the job board

    Then, add the code below to your script to retrieve the text of the <a> elements:

    await page.goto('https://playground.browserbear.com/jobs/')
    jobs = await page.evaluate('''() => {return Array.from(document.querySelectorAll('a')).map(job => job.innerHTML)}''')
    for i, job in enumerate(jobs):
        print(job + "\\n")
    

    This will scrape all job titles from the web page:

    Screenshot 2023-07-19 at 4.13.44 PM.png

    Conclusion

    Using Pyppeteer, you can utilize the powerful browser automation capabilities of Puppeteer in Python to perform various tasks. In this article, we have learned how to install and use it to automate tasks on the browser such as taking screenshots of a web page, downloading images, and extracting data from it.

    These are just some simple usages of Pyppeteer for automating tasks on the browser. Depending on the project requirement and how the website is coded, it could be more complex. For example, a website could implement security measures like CAPTCHA, rate limiting, etc. to prevent being scraped. In this case, using a browser automation tool like Browserbear that has built-in features to help you avoid getting blocked would be easier.

    If you're interested to find out how, you can register a free trial and follow these video or written tutorials to get started.

    About the authorJosephine Loo
    Josephine is an automation enthusiast. She loves automating stuff and helping people to increase productivity with automation.

    How to Generate Wedding Invites for Your Entire Guest List

    Wedding invites set the tone for the event, but they don't need to be created one by one. In this article, you'll learn three easy ways to streamline the process with Bannerbear.

    How to Instantly Create and Send Pet Adoption Certificates by Email (Free Template!)

    Pet shelters and rescues often have limited means, so automating adoption certificate creation can reduce the load on staff and free up resources for other tasks. Here's how to set up a simple nocode automation that generates and sends certificates out in one go!

    How to Automatically Create Multi-Time Zone Event Posters with Node.js and Bannerbear

    Hosting online events, workshops, or webinars for a global audience? Use Bannerbear in Node.js to automatically generate event posters with the correct time zones for different regions!

    Automate & Scale
    Your Marketing

    Bannerbear helps you auto-generate social media visuals, banners and more with our API and nocode integrations

    Pyppeteer: How To Use Puppeteer in Python (Installation Guide and Examples)
    Pyppeteer: How To Use Puppeteer in Python (Installation Guide and Examples)