video-scraper-core - DEV

npm

video-scraper-core

An npm package that provides an abstract class to scrape videos with Puppeteer.

Install

To install video-scraper-core, run:

$ npm install video-scraper-core

This module is written because videos hosted on some websites are difficult to download and watchable only in the browser. Even by using some browser tools, sometimes, it may be difficult or impossible to download the video. A solution that can always be used, is actually taking a video screen recording after having played the video, but it is too time-consuming to be done manually.

This is why I have written this module, that uses puppeteer and puppeteer-stream under the hood to open a google-chrome browser, see the video and take a video recording of it.

The module is written in Typescript, uses Webpack to reduce the bundle size (even if most of it comes from the puppeter browser), uses euberlog for a scoped debug log and is full of configurations.

How does it work

The module provides an abstract class that you can extend to create your own scraper. By overriding some simple methods, you can adapt the scraper to your needs.

The scraper:

launches a browser window
loads the url page
calls the hook afterPageLoaded, for example if a login is needed
if not already specified by the options, parses the video duration
if fullscreen is specified by the options, clicks the fullscreen button
clicks the play button
waits an optional delay specified by the options
starts the browser window recording
waits until the video is finished / the specified duration is reached
waits an optional delay specified by the options
closes the browser window and the video is saved

Project usage

An example to create a scraper for TumConf:

import { VideoScraperCore, ScrapingOptions, BrowserOptions } from 'video-scraper-core';
import { Page } from 'puppeteer';
import { Logger } from 'euberlog';

// Extend VideoScraperCore to create the scraper class
export class TumConfScraper extends VideoScraperCore {
    // The passcode used to login
    private readonly passcode: string;

    // The constructor that allows the passcode to be specified
    constructor(passcode: string, browserOptions: BrowserOptions) {
        super(browserOptions);
        this.passcode = passcode;
    }

    // The selector of the full screen button
    protected getFullScreenSelector(): string {
        return '.vjs-fullscreen-toggle-control-button';
    }
    // The selector of the play button
    protected getPlayButtonSelector(): string {
        return '.vjs-play-control';
    }
    // The selector of the video time duration
    protected getVideoDurationSelector(): string {
        return '.vjs-time-range-duration';
    }

    // After the page is loaded, login by using puppeteer
    protected async afterPageLoaded(_options: ScrapingOptions, page: Page, logger: Logger): Promise<void> {
        logger.debug('Putting the passcode to access the video');
        await page.waitForSelector('input#password');
        await page.$eval(
            'input#password',
            (el: HTMLInputElement, passcode: string) => (el.value = passcode),
            this.passcode
        );

        logger.debug('Clicking the button to access the video');
        await page.waitForSelector('.btn-primary.submit');
        await page.$eval('.btn-primary.submit', (button: HTMLButtonElement) => button.click());
    }
}

async function main() {
    // Create an instance of the scraper
    const scraper = new TumConfScraper('mypasscode', { debug: true });
    // Launch the Chrome browser
    await scraper.launch();
    // Scrape and save the video
    await scraper.scrape('https://videourl.com', './saved.webm');
    // Close the browser
    await scraper.close();
}
main();

API

The documentation site is: video-scraper-core documentation

The documentation for development site is: video-scraper-core dev documentation