Is there any way to prevent a spider from grabbing URLs that you want to keep off search engines?
Absolutely – there are many ways, and you should use them all. For a quick overview, search for robots or spiders on Google, visit The Web Robots Page, or visit B.4 Notes on helping search engines index your website from the World Wide Web Consortium. The most fool-proof method to block spidders is to password protect any files that you don’t want indexed by the search engines. See Can a search engine index pages that are password protected?
In general, you should create a robots.txt file for the root folder on your site, use the robots meta tag on pages you don’t want indexed, and password protect any files you’re serious about protecting.
Here is a robots meta tag :
<meta name="robots" content="noindex, nofollow">
Why would someone want to block spiders from indexing their pages? The answer is simple. Google and other major search engines aren’t the only ones sending out spiders to traverse your site. Many email marketers send out spiders in an attempt to find email addresses. Other spiders seem to ignore the robots.txt file. Some robots will attempt to access the files on your site so fast that it may cause problems for your server.