Crawling Slower Sites
At times, some pages in your project may return status codes like '403 Forbidden', '400 Bad Request' or '500 Internal Server Error'. The pages may be listed under 'Resources with 4xx or 5xx status codes' sections while the URLs are valid and load fine in a browser.
This may happen with huge or slower websites hosted on servers with high security or crawling budget restrictions. To decrease the load and get valid results, you can adjust the crawling settings in WebSite Auditor to limit the number and the speed of the requests sent to a site.
Once you are creating or rebuilding a project, you can enable Expert Options and click Next.
In Robots.txt Instructions, you can enable the option to crawl a website with a specific user-agent (browser). When the option is unchecked, a random user-agent is being used.
In Speed section, you can limit the number of requests to the website to prevent excessive load on the server. We recommend setting the limit to 1 request per 5 seconds if the server tends to block the requests.
These settings can be accessed or modified any time under Preferences > Crawler Settings before rebuilding an existing project.
Additionally, you can limit the number of tasks WebSite Auditor will run simultaneously. Go to Preferences > Misc Global Settings and set 1-2 tasks in the Number of Tasks field.
This way the load on the server will decrease and the program shouldn't get blocked, so you should get accurate results for valid pages.