tfw2005
09-18-2007, 03:03 AM
I am noticing that when searching, the results show the SEO urls for the entries, with ?catid=searchresults&searchid=XXX at the end of it.
This string also gets applied to the url of every single entry url on the search result page.
This number changes each new search.
Having anything picked up in the engines other than the straight forward regular SEO URL as seen on normal Browse Cat pages can trigger duplicate content penalties.
FOR BROWSECAT ISSUES
I have added a conditional to the browsecat template for now, which should address the issue partially.
<if condition="$_REQUEST['do'] == 'searchresults'">
<meta name="robots" content="noindex,nofollow">
</if>
This tells the engines not to index the search result pages, since the searchid can be entered falsely 1 - 1000000 and get the same page.
Since that same string is applied to all entries on the results page, we have to tell it not to follow those URLS either, a dead end for spidering, which sucks.
If you could get the entry URLS to appear on result pages without the string attached, you could do the following:
[code]<if condition="$_REQUEST['do'] == 'searchresults'">
<meta name="robots" content="noindex,follow">
</if>
That way, dynamic url based pages wont get picked up. But the links on them can be followed, and indexed. Currently, cant tell them to follow because of the potential dup content issue.
A completely seperate single SEO command for searches would also fix this, such as http://www.url.com/dyna/search/keyword-keywords - with no variables, and allow the search pages themselves to be relevant non dup content (somewhat, as much of a risk as multi-level categories, which seem to be handled fine by engines) (similar to what I was saying to do with the tags situation in another thread, I would prefer these be the same - tags/search = one).
FOR SHOWENTRY ISSUES
Since direct urls to entry pages with the search string can be copy/pasted to other websites, forums, etc, that could become a spidering point for the engines. If it has the string in it, it would be indexed, and eventually registered as dup content, even tho you nofollow-ed it on your own site. So, I put this there:
<if condition="$_REQUEST['do'] == 'searchresults'">
<meta name="robots" content="noindex,follow">
</if>
This tells engines not to index that page, but follow the other pages. Seems like all the others appearing there are good to go. The dynamic based previous/next results go to the next result, but that page loses the search string. I think thats an error on your part, but the way I want it :).
This sucks because the page it first found you are telling it to ignore, losing relevance if the link on the other site was related to your topic.
Dont want to have this great SEO based URLS thing going on and have it ruined with 3000 searchids pointing to the same results, each of those results with 3000 searchids attached to the end. All it would take is one malicious person to mass copy/paste links with fake search ID strings at the end on a couple different websites, engines see and follows, boom, you are screwed.
I know the variables need to be there for search pagination and probably next/previous situations, but I know Wordpress handles this somehow, so maybe reference them for how they do it.
This string also gets applied to the url of every single entry url on the search result page.
This number changes each new search.
Having anything picked up in the engines other than the straight forward regular SEO URL as seen on normal Browse Cat pages can trigger duplicate content penalties.
FOR BROWSECAT ISSUES
I have added a conditional to the browsecat template for now, which should address the issue partially.
<if condition="$_REQUEST['do'] == 'searchresults'">
<meta name="robots" content="noindex,nofollow">
</if>
This tells the engines not to index the search result pages, since the searchid can be entered falsely 1 - 1000000 and get the same page.
Since that same string is applied to all entries on the results page, we have to tell it not to follow those URLS either, a dead end for spidering, which sucks.
If you could get the entry URLS to appear on result pages without the string attached, you could do the following:
[code]<if condition="$_REQUEST['do'] == 'searchresults'">
<meta name="robots" content="noindex,follow">
</if>
That way, dynamic url based pages wont get picked up. But the links on them can be followed, and indexed. Currently, cant tell them to follow because of the potential dup content issue.
A completely seperate single SEO command for searches would also fix this, such as http://www.url.com/dyna/search/keyword-keywords - with no variables, and allow the search pages themselves to be relevant non dup content (somewhat, as much of a risk as multi-level categories, which seem to be handled fine by engines) (similar to what I was saying to do with the tags situation in another thread, I would prefer these be the same - tags/search = one).
FOR SHOWENTRY ISSUES
Since direct urls to entry pages with the search string can be copy/pasted to other websites, forums, etc, that could become a spidering point for the engines. If it has the string in it, it would be indexed, and eventually registered as dup content, even tho you nofollow-ed it on your own site. So, I put this there:
<if condition="$_REQUEST['do'] == 'searchresults'">
<meta name="robots" content="noindex,follow">
</if>
This tells engines not to index that page, but follow the other pages. Seems like all the others appearing there are good to go. The dynamic based previous/next results go to the next result, but that page loses the search string. I think thats an error on your part, but the way I want it :).
This sucks because the page it first found you are telling it to ignore, losing relevance if the link on the other site was related to your topic.
Dont want to have this great SEO based URLS thing going on and have it ruined with 3000 searchids pointing to the same results, each of those results with 3000 searchids attached to the end. All it would take is one malicious person to mass copy/paste links with fake search ID strings at the end on a couple different websites, engines see and follows, boom, you are screwed.
I know the variables need to be there for search pagination and probably next/previous situations, but I know Wordpress handles this somehow, so maybe reference them for how they do it.