Home Identifier Source Test Repository

Broken links checker for website pages

NPM version Coveralls branch Travis Code Climate David

GitHub Logo

RUSSIAN DOCUMENTATION

Working with help of command line interface (cli)

Broken links checker can be used as single NodeJS application and as npm - dependency which can be plugged to your package.

At first case, you should:

At second case you should simply install project as yet another npm - package:

$ npm install --save bs-broken-links-checker

Usage:

Usage of broken-links-checker tool from cli consists of 3 steps:

  1. Configuration file generation with help of config command.
  2. Run site analyze process with run command.
  3. View generated *.html report file.

Commands

config

You can use this command to generate tool configuration file with .js extension. It is suitable to have configuration file by 2 reasons:

Parameters:

Usage example:

$ node bin/blc config -n my.broken-site.com

Expected console output:

INFO acts/config.js: Configuration file: => my.broken-site.com.js has been generated successfully

Notation: generated configuration file my.broken-site.com.js will be placed into ./configs folder inside process working directory.

Configuration file structure:

Configuration file is simple NodeJS module, which exports object where keys are names of options and values are option values.

Notation: this parameter is applicable only for inner links. All external links are checked by 100 items concurrently.

You can pass regular expression or string patterns (including wildcards) as values of this param.

More examples:

module.exports = {
    ...
    excludeLinkPatterns: [
        /\/contacts/,
        http://google.com,
        http://my.site.com/foo/*,
        */foo/bar
    ]
}

run

Launches website analyze process for existed broken links verification.

Parameters:

Notation:

Sometimes it conveniently to scan only separate section of website or even single page. Your can use mode option for this.

If value of mode option is equal to 'section' then only nested pages of url option value will be scanned. For example if website my.site.com (which configuration file is in ./configs folder and has name my.site.com.js) has structure as given here:

/
/foo
/foo/foo1
/foo/foo2
/bar

then run command with given options:

 $ node bin/blc run -c ./configs/my.site.com.js -u http://my.site.com/foo -m section

will cause the analyze only of pages: /foo, /foo1, /foo2. Page '/bar' will be omitted.

If value of mode option is equal to 'page', then run:

$ node bin/blc run -c ./configs/my.site.com.js -u http://my.site.com/foo -m page

will cause the links analyze only for /foo page.

Examples of run command usage:

Result of run command execution:

All total results of analyze will be printed into console output after run command execution. Also generated reports file paths will be placed there.

version

This command will simply print current application version to console. Usage example:

$ node bin/blc version

Expected console output (version can differ from value here):

INFO cli/cmd-version.js: Application name: => bs-broken-links-checker
INFO cli/cmd-version.js: Application version: => 0.0.1

JavaScript API

Package can be installed as usual npm dependency.

$ npm install --save bs-broken-links-checker

For tool initialization you should create new instance of BrokenLinksChecker class.

var BrokenLinksChecker = require('bs-broken-links-checker').BrokenLinksChecker,
    brokenLinksChecker = new BrokenLinksChecker();

You should call method start and pass url of your website as argument, for example:

brokenLinksChecker.start('https://my.site.com');

BrokenLinksChecker class constructor takes options object as argument. More detail about available option fields.

Options

concurrent

Number of inner website links which would be analyzed concurrently. The optimal value of this param should be found empirically. If this value is too low then total time of website analyze will increase. If this value is too high then workload your website server will increase and cause some network errors and result corruptions.

Value by default: 100.

Notation: this parameter is applicable only for inner links. All external links are checked by 100 items concurrently.

requestHeaders

Allows to set custom request headers.

Value by default: { 'user-agent': 'node-spider' }.

requestRetriesAmount

Max request attempts for single analyzed url before it will be resolved as broken.

Value by default: 5.

requestTimeout

Request timeout in milliseconds.

Value by default: 5000.

acceptedSchemes

Permitted url schemas. All links which urls contains schemas different from listed here will be excluded from analyze.

Value by default: ['http:', 'https:'].

checkExternalUrls

Enables or disables external links check. If value of this param is equal to false, then only inner links of website will be analyzed..

Value by default: false

excludeLinkPatterns

Allows to exclude some url patterns from processing. You can pass the array of regular expressions or string patterns (including wildcards) as value of this option. All url that matches on any of listed expressions will be excluded from processing. For example if you want to exclude pages that urls contains foo or bar you can set this option value as: [/\/foo/i, /\/bar/i].

Value by default: []

More examples:

module.exports = {
    ...
    excludeLinkPatterns: [
        /\/contacts/,
        http://google.com,
        http://my.site.com/foo/*,
        */foo/bar
    ]
}
onDone

Callback function which will be fired on the end of analyze. This function takes instance of Statistic class. It has all fields and methods for working with results of scan.

You can see usage examples here.

Testing

Launch of tests with istanbul coverage calculation:

$ npm test

Code syntax check with help of: jshint, jscs

$ npm run codestyle

Special thanks to:

Developer: Kuznetsov Andrey

You can send your questions and proposals to adress or create issues here.