You see Google views different versions of a web page address and different server configurations as separate websites. For example, Google sees http://mywebsite.com
as a separate site to https://mywebsite.com
. While this won't get penalised for not having proper redirects in place, it will make it harder for Google to work out which site is the "true" site, and it potentially spreads out the "link juice".
By itself this is not a major problem, just make sure to redirect all non-HTTPS traffic to the HTTPS version and Google will detect this and all will be fine.
The issue becomes a problem when you use relative URLs in internal links.
A relative URL is one where you just specify the link to the page, for example:
/shop/category/product.aspx
Absolute URLs specify the protocol and the domain, for example:
https://mywebsite.com/shop/category/product.aspx
Browsers and search engines read this as saying "no domain has been specified, so let's assume the same domain as we are already on and load the page from here."
Not only is the HTTP vs HTTPS going to cause duplicate content if not handled correctly, your WWW vs non-WWW will as well. If these are not handled properly then you could potentially have four versions of a page listed in Google and the Googlebot crawler working 4 times harder to traverse your websites.
http://mywebsite.com/shop/category/product.aspx http://www.mywebsite.com/shop/category/product.aspx https://mywebsite.com/shop/category/product.aspx https://www.mywebsite.com/shop/category/product.aspx
Redirect WWW to non-WWW
The solution to this is to correctly redirect WWW to non-WWW, (or vice versa if that is your preference). In Apache it is as easy as adding one of these rules to your .htaccess file:
Redirect WWW to non-WWW in Apache
RewriteEngine On RewriteBase / RewriteCond %{HTTP_HOST} ^www.(.*)$ [NC] RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
Redirect non-WWW to WWW in Apache
RewriteEngine On RewriteBase / RewriteCond %{HTTP_HOST} !^www. [NC] RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
Redirect HTTP to HTTPS
The next solution is to redirect all traffic to HTTPS (or HTTP if you prefer). Again, just place these rules in your .htaccess. If you already have RewriteEngine and RewriteBase in the rules you don't need to add them again.
Redirect HTTP to HTTPS in Apache
RewriteEngine On RewriteCond %{SERVER_PORT} 80 RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R=301,L]
Redirect HTTPS to HTTP in Apache
RewriteEngine On RewriteCond %{HTTPS} on RewriteRule (.*) http://%{HTTP_HOST}/$1 [R=301,L]
Remove Relative URLs
The next step is to remove all the relative URLs and replace them with absolute URLs. This can be a tricky and time-consuming task depending on the website platform and the size of the site. With the rules above working correctly, it isn't as critical to update the links, but the site will be slower as a link through to another page may have to be redirected to the HTTPS version and again to the non-WWW version.
More Reasons to Replace Relative with Absolute URLs
Aside from the above mentioned duplicate content issues, another good reason to use absolute URLs is that when your site gets scraped, links back to your site are (sometimes) preserved. If your site was scraped with relative URLs none of the links will get back to you.
When you are not serving four different versions of your site you could potentially save bandwidth, especially when Googlebot isn't crawling each page four times.
The Downsides of Absolute URL's
Depending on how your content is generated this may not be an issue, but if you are hard coding static absolute URLs into your pages, just bear in mind what would happen if you ever needed to change the domain name, or at some later date switch to pure HTTPS only. Would you need to go through and update all the links again? Some platforms such as WordPress allow you to dynamically link to pages so it may not be an issue.