The SEO Spider is a powerful and flexible site crawler, able to crawl both small and very large websites efficiently, while allowing you to analyse the results in real-time. It gathers key onsite data to allow SEOs to make informed decisions.
Screaming Frog SEO Tool Review: It’s Not a Cloud-Based Solution
Crawl a website instantly and find broken links (404s) and server errors.
Bulk export the errors and source URLs to fix, or send to a developer. Find temporary and permanent redirects, identify redirect chains and loops, or upload a list of URLs to audit in a site migration.
Discover exact duplicate URLs with an md5 algorithmic check, partially duplicated elements such as page titles, descriptions or headings and find low content pages.
Web Scraping & Data Extraction Using The SEO Spider Tool
Collect any data from the HTML of a web page using CSS Path, XPath or regex. This might include social meta tags, additional headings, prices, SKUs or more!
View URLs blocked by robots.txt, meta robots or X-Robots-Tag directives such as ‘noindex’ or ‘nofollow’, as well as canonicals and rel=“next” and rel=“prev”.
Evaluate internal linking and URL structure using interactive crawl and directory force-directed diagrams and tree graph site visualisations. Schedule crawls to run at chosen intervals and auto export crawl data to any location, including Google Sheets. Or automate entirely via command line.
What is Screaming Frog?
Track progress of SEO issues and opportunities and see what's changed between crawls. Compare staging against production environments using advanced URL Mapping.
Find Broken Links, Errors & Redirects. Analyse Page Titles & Meta Data. Review Meta Robots & Directives. Audit hreflang Attributes.
Screaming Frog SEO Tool Review: Meta Description Tab
Crawl Comparison. Near Duplicate Content. Custom robots.txt. AMP Crawling & Validation. Structured Data & Validation. Spelling & Grammar Checks. Custom Source Code Search. Custom Extraction. Google Analytics Integration. Search Console Integration.
PageSpeed Insights Integration. Link Metrics Integration. Forms Based Authentication. Store & View Raw & Rendered HTML.
Free Technical Support. Licences last 1 year. After that you will be required to renew your licence. Find Broken Links, Errors & Redirects.
Analyse Page Titles & Meta Data. Review Meta Robots & Directives.
- Audit hreflang Attributes.
- Discover Exact Duplicate Pages.
- Generate XML Sitemaps.
- Site Visualisations. Crawl Limit - 500 URLs .
- Crawl Configuration.
Near Duplicate Content. Custom robots.txt. AMP Crawling & Validation. Structured Data & Validation. Spelling & Grammar Checks. Custom Source Code Search. Custom Extraction. Google Analytics Integration.
Search Console Integration. PageSpeed Insights Integration. Link Metrics Integration. Forms Based Authentication. Store & View Raw & Rendered HTML. Free Technical Support. Find Broken Links, Errors & Redirects. Analyse Page Titles & Meta Data. Review Meta Robots & Directives. Audit hreflang Attributes.
Screaming Frog SEO Tool Review: The Tabs
Crawl Comparison. Near Duplicate Content. Custom robots.txt.
- AMP Crawling & Validation. Structured Data & Validation. Spelling & Grammar Checks. Custom Source Code Search. Custom Extraction. Google Analytics Integration. Search Console Integration.
- * The maximum number of URLs you can crawl is dependent on allocated memory and storage.
- Please see our FAQ. Some of the biggest brands & agencies use our software.
- The SEO Spider is regularly featured in top publications. Updated by Richie Lauridsen & Allison Hahn on February 19, 2020.Originally published on May 11, 2015. So, I admit it: When we started looking at our own blog traffic, we realized this was one of the most historically popular blog posts on the Seer domain.
This keeps file sizes and data exports a bit more manageable.
We go over this in further detail below. For crawling your entire site, including all subdomains, you’ll need to make some slight adjustments to the spider configuration to get started. By default, Screaming Frog only crawls the subdomain that you enter.
Any additional subdomains that the spider encounters will be viewed as external links. In order to crawl additional subdomains, you must change the settings in the Spider Configuration menu. By checking ‘Crawl All Subdomains’, you will ensure that the spider crawls any links that it encounters to other subdomains on your site.
Screaming Frog SEO Tool Review: Response Codes Tab
In addition, if you’re starting your crawl from a specific subfolder or subdirectory and still want Screaming Frog to crawl the whole site, check the box marked “Crawl Outside of Start Folder.”.
- By default, the SEO Spider is only set to crawl the subfolder or subdirectory you crawl from forwards.
- If you want to crawl the whole site and start from a specific subdirectory, be sure that the configuration is set to crawl outside the start folder.
- To save time and disk space, be mindful of resources that you may not need in your crawl.
Screaming Frog SEO Tool Review: Redirect Chains Report
If you wish to limit your crawl to a single folder, simply enter the URL and press start without changing any of the default settings. If you’ve overwritten the original default settings, reset the default configuration within the ‘File’ menu. If you wish to start your crawl in a specific folder, but want to continue crawling to the rest of the subdomain, be sure to select ‘Crawl Outside Of Start Folder’ in the Spider Configuration settings before entering your specific starting URL.
If you wish to limit your crawl to a specific set of subdomains or subdirectories, you can use RegEx to set those rules in the Include or Exclude settings in the Configuration menu. In this example, we crawled every page on seerinteractive.com excluding the ‘about’ pages on every subdomain.
Go to Configuration > Exclude; use a wildcard regular expression to identify the URLs or parameters you want to exclude. Test your regular expression to make sure it’s excluding the pages you expected to exclude before you start your crawl:.
In the example below, we only wanted to crawl the team subfolder on seerinteractive.com. Again, use the “Test” tab to test a few URLs and ensure the RegEx is appropriately configured for your inclusion rule. This is a great way to crawl larger sites; in fact, Screaming Frog recommends this method if you need to divide and conquer a crawl for a bigger domain.
Running the spider with these settings unchecked will, in effect, provide you with a list of all of the pages on your site that have internal links pointing to them.
Once the crawl is finished, go to the ‘Internal’ tab and filter your results by ‘HTML’.
Click ‘Export’, and you’ll have the full list in CSV format.
If you tend to use the same settings for each crawl, Screaming Frog now allows you to save your configuration settings:.
- Running the spider with these settings unchecked will, in effect, give you a list of all of the pages in your starting folder (as long as they are not orphaned pages).
- There are several different ways to find all of the subdomains on a site.
- Use Screaming Frog to identify all subdomains on a given site.
- Navigate to Configuration > Spider, and ensure that “Crawl all Subdomains” is selected.
- Just like crawling your whole site above, this will help crawl any subdomain that is linked to within the site crawl.
- However, this will not find subdomains that are orphaned or unlinked.
- Use Google to identify all indexed subdomains.
- By using the Scraper Chrome extension and some advanced search operators, we can find all indexable subdomains for a given domain.
- Start by using a site: search operator in Google to restrict results to your specific domain.
- Then, use the -inurl search operator to narrow the search results by removing the main domain.
- You should begin to see a list of subdomains that have been indexed in Google that do not contain the main domain.
- Use the Scraper extension to extract all of the results into a Google Sheet.
- Simply right-click the URL in the SERP, click “Scrape Similar” and export to a Google Doc. In your Google Doc, use the following function to trim the URL to the subdomain:.
- Essentially, the formula above should help strip off any subdirectories, pages, or file names at the end of a site.
- This formula essentially tells sheets or Excel to return what is to the left of the trailing slash.
- The start number of 9 is significant, because we are asking it to start looking for a trailing slash after the 9th character.
Screaming Frog SEO Tool Review: The Verdict
This accounts for the protocol: https://, which is 8 characters long. De-duplicate the list, and upload the list into Screaming Frog in List Mode–you can manually paste the list of domains, use the paste function, or upload a CSV.
Enter the root domain URL into tools that help you look for sites that might exist on the same IP or search engines designed especially to search for subdomains, like FindSubdomains.
PPC & Analytics
Create a free account to login and export a list of subdomains. Then, upload the list to Screaming Frog using List Mode. Once the spider has finished running, you’ll be able to see status codes, as well as any links on the subdomain homepages, anchor text and duplicate page titles among other things.
- Screaming Frog was not originally built to crawl hundreds of thousands of pages, but thanks to some upgrades, it’s getting closer every day.
- The newest version of Screaming Frog has been updated to rely on database storage for crawls.
- In version 11.0, Screaming Frog allowed users to opt to save all data to disk in a database rather than just keep it in RAM.
- This opened up the possibility of crawling very large sites for the first time.
- In version 12.0, the crawler automatically saves crawls to the database.
- This allows them to be accessed and opened using “File > Crawls” in the top-level menu–in case you panic and wonder where the open command went!
- While using database crawls helps Screaming Frog better manage larger crawls, it’s certainly not the only way to crawl a large site.
- First, you can increase the memory allocation of the spider.
- Second, you can break down the crawl by subdirectory or only crawl certain parts of the site using your Include/Exclude settings.
- By deselecting these options in the Configuration menu, you can save memory by crawling HTML only.
- Until recently, the Screaming Frog SEO Spider might have paused or crashed when crawling a large site.
- Now, with database storage as the default setting, you can recover crawls to pick up where you left off.
Additionally, you can also access queued URLs. This may give you insight about any additional parameters or rules you may want to exclude in order to crawl a large site.
- In some cases, older servers may not be able to handle the default number of URL requests per second.
- In fact, we recommend including a limit on the number of URLs to crawl per second to be respectful of a site’s server just in case.
- It’s best to let a client know when you’re planning on crawling a site just in case they might have protections in place against unknown User Agents.
- On one hand, they may need to whitelist your IP or User Agent before you crawl the site. The worst case scenario may be that you send too many requests to the server and inadvertently crash the site.
- To change your crawl speed, choose ‘Speed’ in the Configuration menu, and in the pop-up window, select the maximum number of threads that should run concurrently.
- From this menu, you can also choose the maximum number of URLs requested per second.
Pros and Cons
- If you find that your crawl is resulting in a lot of server errors, go to the ‘Advanced’ tab in the Spider Configuration menu, and increase the value of the ‘Response Timeout’ and of the ‘5xx Response Retries’ to get better results.
- Although search bots don’t accept cookies, if you are crawling a site and need to allow cookies, simply select ‘Allow Cookies’ in the ‘Advanced’ tab of the Spider Configuration menu.
- To crawl using a different user agent, select ‘User Agent’ in the ‘Configuration’ menu, then select a search bot from the drop-down or type in your desired user agent strings.
- As Google is now mobile-first, try crawling the site as Googlebot Smartphone, or modify the User-Agent to be a spoof of Googlebot Smartphone.
- This is important for two different reasons:.
- Crawling the site mimicking the Googlebot Smartphone user agent may help determine any issues that Google is having when crawling and rendering your site’s content.
- Using a modified version of the Googlebot Smartphone user agent will help you distinguish between your crawls and Google’s crawls when analyzing server logs.
- When the Screaming Frog spider comes across a page that is password-protected, a pop-up box will appear, in which you can enter the required username and password.
- Note: Forms-Based authentication should be used sparingly, and only by advanced users.
- The crawler is programmed to click every link on a page, so that could potentially result in links to log you out, create posts, or even delete data.
- To manage authentication, navigate to Configuration > Authentication.
- In order to turn off authentication requests, deselect ‘Standards Based Authentication’ in the ‘Authentication’ window from the Configuration menu.
- Once the spider has finished crawling, use the Bulk Export menu to export a CSV of ‘All Links’.
- This will provide you with all of the link locations, as well as the corresponding anchor text, directives, etc.
- All inlinks can be a big report.
- Be mindful of this when exporting.
Crawl Path Report
For a large site, this export can sometimes take minutes to run. For a quick tally of the number of links on each page, go to the ‘Internal’ tab and sort by ‘Outlinks’.
Once the spider has finished crawling, sort the ‘Internal’ tab results by ‘Status Code’.
View Original HTML and Rendered HTML
Any 404’s, 301’s or other status codes will be easily viewable. Upon clicking on any individual URL in the crawl results, you’ll see information change in the bottom window of the program.
It’s an SEO spider. It’s a bot that crawls all over your site and looks for problems.
By clicking on the ‘In Links’ tab in the bottom window, you’ll find a list of pages that are linking to the selected URL, as well as anchor text and directives used on those links.
Page Titles Tab
You can use this feature to identify pages where internal links need to be updated.
- To export the full list of pages that include broken or redirected links, choose ‘Redirection (3xx) In Links’ or ‘Client Error (4xx) In Links’ or ‘Server Error (5xx) In Links’ in the ‘Advanced Export’ menu, and you’ll get a CSV export of the data.
- To export the full list of pages that include broken or redirected links, visit the Bulk Export menu.
- Scroll down to response codes, and look at the following reports:.
- No Response Inlinks. Redirection (3xx) Inlinks.
- Redirection (Meta Refresh) Inlinks.
Screaming Frog SEO Tool Review: External Tab
Client Error (4xx) Inlinks. Server Error (5xx) Inlinks. Reviewing all of these reports should give us an adequate representation of what internal links should be updated to ensure they point to the canonical version of the URL and efficiently distribute link equity.
What can you do with the SEO Spider Tool?
Screaming Frog SEO Tool Review: hreflang Tab
Upon clicking on any individual URL in the crawl results and then clicking on the ‘In Links’ tab in the bottom window, you’ll find a list of pages that are pointing to the selected URL.
You can use this feature to identify pages where outbound links need to be updated. To export your full list of outbound links, click ‘External Links’ on the Bulk Export tab. For a complete listing of all the locations and anchor text of outbound links, select ‘All Outlinks’ in the ‘Bulk Export’ menu.
Insecure Content Report
The All Outlinks report will include outbound links to your subdomains as well; if you want to exclude your domain, lean on the “External Links” report referenced above.
3) Input Your Syntax
After the spider has finished crawling, select the ‘Response Codes’ tab from the main UI, and filter by Status Code.
Because Screaming Frog uses Regular Expressions for search, submit the following criteria as a filter: 301|302|307.
This should give you a pretty solid list of all links that came back with some sort of redirect, whether the content was permanently moved, found and redirected, or temporarily redirected due to HSTS settings (this is the likely cause of 307 redirects in Screaming Frog).
Sort by ‘Status Code’, and you’ll be able to break the results down by type. Click on the ‘In Links’ tab in the bottom window to view all of the pages where the redirecting link is used.
If you export directly from this tab, you will only see the data that is shown in the top window (original URL, status code, and where it redirects to).
To export the full list of pages that include redirected links, you will have to choose ‘Redirection (3xx) In Links’ in the ‘Advanced Export’ menu. This will return a CSV that includes the location of all your redirected links. To show internal redirects only, filter the ‘Destination’ column in the CSV to include only your domain.