Orphan Pages: How to find them and Fix them?
Do you know that certain technical site errors are like pits that can occur if you have a website with a long history? One of these pits is Orphan Pages. When you want to clean them up, finding quick fixes is your number one task, so that your SEO gets better. You may want to start with a basic site audit and see a number of orphan pages. These are bad for your site, however, if you are looking to fix them, keep on reading to learn more about them.
Orphan pages are pages that search engines may find difficult to discover as they have no internal links from elsewhere on your website.
Moreover, these URLs tend to fall through the cracks as search engine crawlers can only find pages from the sitemap file or external backlinks, and users can only get to the page if they know the URL.
Keep on reading.
Often, orphan pages are accidental and occur for a number of reasons. The most common cause of having them is not continuing with the process of site migrations, navigation changes, site redesigns, out-of-stock products, testing, or dev pages.
Moreover, orphan pages may also be intentional in some cases.
These may be promotional and paid advertising landing pages or any instance where you do not want the page to be a part of the user journey.
Why are Orphan Pages bad for SEO?
Different search engines have a hard time finding orphan pages as they use links to help discover new content and understand the significance of the page.
According to Google: Google searches the web with automated programs: Crawlers looking for pages that are new or updated.
Then they find pages by many different methods, however, the main method they follow is links from pages that they already know.
For instance, let’s say that you publish a new webpage and forgot to link it elsewhere on your site.
If the page is not in your sitemap and has no backlinks, Google will not find or index it.
This is because their web crawler does not know that the page exists.
What’s even worse is, that the page can not receive PageRank.
In general terms, PageRank is the way for Google to understand the significance of a page by counting the number of “votes” a page gets.
Characteristics of an Orphan Page
The following are some of the common characteristics of an orphan page:
No Inbound Links One of the defining traits of an orphan page.
When your page has even one inbound link, either from the home page or an old blog post, your page is not an orphan.
However, you may want to consider strengthening them with internal linking.
A Live Page The key difference of an orphan page is that it has value for users and is live, just unreachable.
Despite having a 200 server status, the fact that users have no way to find that page is part of the problem.
A page may be orphan even if it is Indexed or a tool says it is not
In some cases, this trait is difficult to verify as it requires some investigative efforts.
While some pages are found and characterized as orphan pages turn out to be just that, and some occur due to the inaccurate methods of some tools.
Orphan vs. Dead End Pages
Before diving into orphan pages, let’s take a moment here and clarify the difference between orphan and dead-end pages.
An orphan page is a webpage that is either not linked to or reachable from, any other page on the same website.
However, a dead-end page is a webpage that does not link to any other internal web pages or any external websites.
Thus, it creates a “dead end”. Moreover, when users land on this page, they can either hit back or just abandon the site.
When search engine crawlers land on the page, they have nowhere to go, and no link equity passes.
It is important to note that you can easily solve the issues of a dead-end page or make sure that the sidebar or footer navigation is populated on every page.
Now, let’s discuss ways through which you can find orphan pages.
1# Indentify your Crawlable Pages
At first, you will need a list of all the URLs on your website that you can currently reach by crawling the links of your site.
For this, you can use an SEO spider, while ScreamingFrog is also a good choice.
Whatever you choose to use, make sure it is set to crawl only those pages that are indexable by search engines.
This means that it should not crawl pages that are:
- hidden from search engines by robots.txt.
Start the crawl from the homepage of your website.
Moreover, make sure to use the canonical URL, which includes proper HTTPS or HTTP, and www or non-www.
After you crawl your website, export the URLs to a spreadsheet-like below:
Once you do this, you will have a list of pages.
2# Resolve 2 common Causes of Orphan Pages
There are two common causes of orphan pages that you need to address and deal with immediately.
Both of these causes are page duplicates that should automatically redirect consistently to only one URL.
However, if they do not do so, it is likely to have some versions of the page that have no links and as a result, are orphans.
In such a case, the fact that these pages are orphans is not the primary issue, the fact that they are duplicates is.
Moreover, these can come up later while you are looking for orphan pages and you will need to deal with them.
Therefore, it is a good idea to get them out of the way.
Non-Canonical HTTP/HTTPS or www/non-www One of the important things to note is that every public page on your site uses HTTP or HTPPS consistently, and www or non-www.
In order to check if this is the case, try tying all of the variations of the homepage of your site into your browser like:
All four of the above variations should redirect automatically to the exact same URL.
Furthermore, for consistency, that page should be canonical to itself.
If one of these variations, however, does not redirect properly, it can be a sign of similar issues on the whole site.
Check other URLs, using the same variation, to make sure whether it is a widespread issue or not.
You should also test a few other pages of your site and check the .htaccess file of your site to make sure that redirects for these are set up properly.
Trailing Slashes Another issue to look out for is the consistent use of trailing slashes.
For instance, the following two URLs may produce the same content, however, their URLs are not identical:
Make sure to check a few pages on your site both with and without the trailing slash.
Additionally, make sure that they automatically redirect to the same URL and they do so consistently.
3# Get a List from Google Analytics
By definition, crawlers have a difficult time finding orphan pages.
Thus, using an SEO tool is often bound to be problematic.
However, one of the best places to start looking for orphan pages is to use Google Analytics data or any other analytics package you use.
As long as your pages have Google Analytics, if no one has ever visited the page, there is a record of it somewhere in this tool.
In order to get a comprehensive list of URLs, from the left sidebar, go to Behavior> Site Content > All Pages.
As the orphan pages are difficult to find, the number of times anyone visits them is also likely to be quite low.
Then click “pageviews” so that the arrow is pointing upward. This indicates that the list of URLs is sorted in ascending order from least to most pageviews.
Moreover, this will also move pages that are most likely to be orphans to the top.
To make your list as comprehensive as possible, go to the date range at the top right.
Set the starting date back to the time Google Analytics was in place and click the Apply button
Then, you can expand your list of URLs as much as possible.
In the bottom right, click the Show Rows dropdown menu and select the highest number of rows.
It is important to note that one of the biggest issues is that Analytics can only list up to 5,000 URLs at a time.
If you have more than this, you will have to export 5,000 pages at a time, until you have all the visitor data.
Then head up to the top right, and select export a Google Sheets, Excel file, or CSV spreadsheet to get the list of URLs.
You can also use Google Analytics API to speed up the process.
Now, copy the URL into your orphan page spreadsheet.
You will need to get these into URL format for them to be useful.
To do this, insert a new column, and paste down the homepage URL.
And use that Concat () formula to come these together into the URL in the next column over.
Lastly, just drag the formula down to get the full list of URLs.
4# Identifying Orphan URLs
In order to identify orphan URLs, you need to compare the list of Crawlable URLs and the list you have from Analytics URLs in your spreadsheet.
Consider the following example.
It is obvious that https://example.com/11 is an orphan page, however, you will almost have far more URLs to shift through and you will also need to automate the process of identifying orphan URLs.
To do so, you need a formula that checks if each URL in your analytics is also found in your list of Crawlable URLs.
You can use the formula: =match(D2,$A$2:$A$11,0)
This formula checks if the URL in cell D2 is in the range $A$2:$A$11.
However, if you are not too familiar with spreadsheets, the dollar signs help to make sure that when you drag down the column, the range will not change.
The value “0” shows that Google Sheets columns are not sorted. If there is a match, the formula returns its position in the range.
What is more, interesting is that if there is no match, what will you do?
The formula returns the error “#N/A” for htpps://example.com/11 as it is not found in your list of Crawlable URLs. This means it is an orphan page.
Thus, to get a list of orphan pages, all you need to do is sort out the Match column to collect all the “#N/A” results in one place.
You can also copy the list of orphan URLs and paste them into a new sheet where you can address how to fix them.
5# Other places to look for Orphan URLs
It is important to note that you can repeat this process of identifying orphan URLs using different data sources other than Google Analytics.
You can use the following tools to have a list of pages crawled from your side:
- Moz Link Explorer
- Raven Tools
An important thing to note is that both SEMRush and Ahrefs have specific tools and practices that can help you to discover orphan pages.
Moreover, it is possible in some cases, that these tools will find pages that are not directly crawlable because you find them using other means.
You can also look through the log files to find this data.
Log files contain information about who visited your website, where they came from, and what pages they visited.
Furthermore, you can also perform a second crawl of your website, ignoring directive live “nofollow” and “noindex” and compare them to the original crawl.
However, there may be pages that are only accessible by crawlers who ignore such directives, and those can be another source of orphan pages.
Finally, you can get a list of URLs from Google Search Console as well.
How to Fix Orphan Pages?
After finding the orphan pages, you need to fix the issue. However, before you fix them consider addressing why these pages became orphans, to begin with.
This can help to make sure it does not happens again. Taking this step can help to identify and implement guidelines for redirects and internal linking will benefit you in the future.
Keep on reading.
Let’s discuss how you can fix orphan pages:
Resurrecting an Orphan Page
This is an easy solution.
When you want an orphan page to be found and visited, all you need to do is to create an internal link to it from another page on your website.
Moreover, you can achieve this with a link to a page from another website.
However, linking from within your own website is the easiest and better for search engine crawlers and indexing.
What actually matters is that you create an opportunity for the page to be found by the search crawlers and users.
Finding an Unncessaey Orphan Page
There are a number of wats to do about fixing the issue of unwanted orphan pages. This means that you do not want the page to exist.
One option to do so is to archive the page.
In these cases, the page and its information are still viewable, however, is no longer a part of the liv site.
It will act much like when an internal corporate document is no longer in circulation and your archive it for the prosperity of your business.
Another method is to set up a redirect of the URL to a new location, ideally a relevant equivalent page or the older/directory it lives in.
Moreover, search crawlers and users that come across this will then be redirected to a page you want them to see and crawlers will index it accordingly.
Search engines cannot index orphan pages if they do not show up in your sitemap and they can create SEO issues. When you do through the above steps and solve these issues, make sure you are fixing problems that can help improve your SEO. You can use the methods above to find the orphan pages and get this issue resolved.