请选择 进入手机版 | 继续访问电脑版

网络科技

    今日:95| 主题:285499
收藏本版
互联网、科技极客的综合动态。

[其他] How to check which URLs have been indexed by Google using Python

[复制链接]
倾国倾城 发表于 2016-10-6 07:34:38
421 15

How to check which URLs have been indexed by Google using Python

How to check which URLs have been indexed by Google using Python-1-网络科技-particular,different,determine,important,receive
   There are three main components to organic search: crawling , indexing and ranking . When a search engine like Google arrives at your website, it crawls all of the links it finds. Information about what it finds is then entered into the search engine’s index, where different factors are used to determine which pages to fetch, and in what order, for a particular search query.
  As SEOs, we tend to focus our efforts on the ranking component, but if a search engine isn’t able to crawl and index the pages on your site, you’re not going to receive any traffic from Google. Clearly, ensuring your site is properly crawled and indexed by search engines is an important part of SEO.
  But how can you tell if your site is indexed properly?
   If you have access to Google Search Console, it tells you how many pages are contained in your XML sitemap and how many of them are indexed. Unfortunately, it doesn’t go as far as to tell you which pages aren’t indexed.
  

How to check which URLs have been indexed by Google using Python

How to check which URLs have been indexed by Google using Python-2-网络科技-particular,different,determine,important,receive
   This can leave you with a lot of guesswork or manual checking. It’s like looking for a needle in a haystack. No good! Let’s solve this problem with a little technical ingenuity and another free SEO tool of mine.
  Determining if a single URL has been indexed by Google

  To determine if an individual URL has been indexed by Google, we can use the “info:” search operator, like so:
  info:http://searchengineland.com/google-downplays-google-algorithm-ranking-update-week-normal-fluctuations-258923
  If the URL is indexed, a result will show for that URL:
  

How to check which URLs have been indexed by Google using Python

How to check which URLs have been indexed by Google using Python-3-网络科技-particular,different,determine,important,receive
  However, if the URL is not indexed, Google will return an error saying there is no information available for that URL:
  

How to check which URLs have been indexed by Google using Python

How to check which URLs have been indexed by Google using Python-4-网络科技-particular,different,determine,important,receive
  Using Python to bulk-check index status of URLs

  Now that we know how to check if a single URL has been indexed, you might be wondering how you can do this en masse. You could have 1,000 little workers check each one — or, if you prefer, you could use my Python solution:
   To use the Python script above, make sure you have Python 3 installed. You will also have to install the BeautifulSoup library. To do this, open up a terminal or command prompt and execute:
  pip install beautifulsoup4
  You can then download the script to your computer. In the same folder as the script, create a text file with a list of URLs, listing each URL on a separate line.
  

How to check which URLs have been indexed by Google using Python

How to check which URLs have been indexed by Google using Python-5-网络科技-particular,different,determine,important,receive
   Now that your script is ready, we need to set up Tor to run as our free proxy. On Windows, download the Tor Expert Bundle . Extract the zip folder to a local directory and run tor.exe . Feel free to minimize the window.
  

How to check which URLs have been indexed by Google using Python

How to check which URLs have been indexed by Google using Python-6-网络科技-particular,different,determine,important,receive
   Next, we have to install Polipo to run Tor and HTTP proxy. Download the latest Windows binary (it will be named “polipo-1.x.x.x-win32.zip”) and unzip to a folder.
  In your Polipo folder, create a text file (ex: config.txt) with the following contents:
  1. socksParentProxy = "localhost:9050"
  2. socksProxyType = socks5
  3. diskCacheRoot = ""
  4. disableLocalInterface=true
复制代码
Open a command prompt and navigate to your Polipo directory.
  Run the following command:
  polipo.exe -c config.txt
  

How to check which URLs have been indexed by Google using Python

How to check which URLs have been indexed by Google using Python-7-网络科技-particular,different,determine,important,receive
  At this point, we’re ready to run our actual Python script:
  python indexchecker.py
  

How to check which URLs have been indexed by Google using Python

How to check which URLs have been indexed by Google using Python-8-网络科技-particular,different,determine,important,receive
  The script will prompt you to specify the number of seconds to wait between checking each URL.
  It will also prompt you to enter a filename (without the file extension) to output the results to a CSV.
  Finally, it will ask for the filename of the text file that contains the list of URLs to check.
  Enter this information and let the script run.
  The end result will be a CSV file, which can easily be opened in Excel, specifying TRUE if a page is indexed or FALSE if it isn’t.
  

How to check which URLs have been indexed by Google using Python

How to check which URLs have been indexed by Google using Python-9-网络科技-particular,different,determine,important,receive
  In the event that the script seems to not be working, Google has probably blocked Tor. Feel free to use your own proxy service in this case, by modifying the following lines of the script:
  1. proxies = {
  2. 'https' : 'https://localhost:8123',
  3. 'https' : 'http://localhost:8123'
  4. }
复制代码
Conclusion

  Knowing which pages are indexed by Google is critical to SEO success. You can’t get traffic from Google if your web pages aren’t in Google’s database!
  Unfortunately, Google doesn’t make it easy to determine which URLs on a website are indexed. But with a little elbow grease and the above Python script, we are able to solve this problem.
   Some opinions expressed in this article may be those of a guest author and not necessarily Search Engine Land. Staff authors are listedhere.
沛凝 发表于 2016-10-6 08:57:48
占坑编辑ing
回复 支持 反对

使用道具 举报

悄悄话不说 发表于 2016-10-6 09:00:22
人是帖,饭是钢,一天不回,心慌慌
回复 支持 反对

使用道具 举报

聊沅 发表于 2016-10-6 09:13:10
当你的眼泪忍不住要流出来的时候,睁大眼睛,千万别眨眼,你会看到世界由清晰到模糊的全过程
回复 支持 反对

使用道具 举报

ptyks 发表于 2016-10-6 11:01:57
本宫准楼下的继续跟帖。。。
回复 支持 反对

使用道具 举报

目送妳旳愛※ 发表于 2016-10-6 11:42:01
我了个去,顶了
回复 支持 反对

使用道具 举报

蒋帆 发表于 2016-10-6 11:59:03
老子误吃了一瓶“乌鸡白凤丸”.这下可好,每个月都要流几天的鼻血.
回复 支持 反对

使用道具 举报

刘馨 发表于 2016-10-6 13:24:28
远看是美景,近看想报警。
回复 支持 反对

使用道具 举报

董云 发表于 2016-10-9 05:34:52
世界那么大,我想去看看
回复 支持 反对

使用道具 举报

黄建川 发表于 2016-10-10 02:29:56
作为一个曾经充分理解怎么吃也不胖的瘦子,如今我总算完全的体会了一吃就胖的感悟。
回复 支持 反对

使用道具 举报

我要投稿

回页顶回复上一篇下一篇回列表
手机版/c.CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 | 粤公网安备 44010402000842号 )

© 2001-2017 Comsenz Inc.

返回顶部 返回列表