摘要:但是假設(shè)我們的網(wǎng)站進(jìn)行經(jīng)常更新,那么是不是每次我都要手動(dòng)更新呢。
由于工作的原因,最近需要生成網(wǎng)站的sitemap.xml,谷歌百度了很多地方,沒(méi)有發(fā)現(xiàn)并合適可用的代碼,三思之后還是決定自己寫(xiě)吧!雖然可能寫(xiě)的有所缺陷,但是畢竟是認(rèn)認(rèn)真真寫(xiě)的,希望對(duì)一些后來(lái)者有所幫助......
1、為什么要自己寫(xiě)腳本生成sitemap.xml?很多人會(huì)說(shuō),在網(wǎng)上有現(xiàn)成的工具,掃一下就可以了,沒(méi)有必要自己寫(xiě)。是的,的確是這樣的。但是假設(shè)我們的網(wǎng)站進(jìn)行經(jīng)常更新,那么是不是每次我都要手動(dòng)更新sitemap呢。我很懶,那么,有沒(méi)有更好的方案呢?肯定是有的,我是否可以起一個(gè)定時(shí)任務(wù),每天晚上更新一次呢,此時(shí)腳本就有用武之地了
2、文檔目錄:配置文件 - config/config.ini.php sitemap主文件 - SiteMap.class.php3、主文件代碼
* @version 1.0 */ namespace MaweibinguoSiteMap; class SiteMap { const SCHEMA = "http://www.sitemaps.org/schemas/sitemap/0.9"; /** * @var webUrlList * @access public */ public $webUrlList = array(); /** * @var siteMapList * @access public */ public $siteMapList = array(); /** * @var isUseCookie * @access public */ public $isUseCookie = false; /** * @var cookieFilePath * @access public */ public $cookieFilePath = ""; /** * @var xmlWriter * @access private */ private $_xmlWriter = ""; /** * init basic config * * @access public */ public function __construct() { $this->_xmlWriter = new XMLWriter(); $result = $this->_enviromentTest(); } /** * test the enviroment for the script * * @access pirvate */ private function _enviromentTest() { $sapiType = php_sapi_name (); if( strtolower($sapiType) != "cli" ) { echo " The Script Must Run In Command Lines ", " "; exit(); } } /** * load the configValue for genrating sitemap by configname * * @param string $configName * @return string $configValue * @access public */ public function loadConfig($configName) { /* init return value */ $configValue = ""; /* load config value */ $configPath = __DIR__ . "/config/config.ini.php"; if(file_exists( $configPath )) { require $configPath; } else { echo "Can not find config file", " "; exit(); } $configValue = $$configName; /* return config value */ return $configValue; } /** * generate sitemap.xml for the web * * @param siteMapList * @access public */ public function generateSiteMapXml($siteMapList) { /* init return result */ $result = false; if( !is_array($siteMapList) || count($siteMapList) <= 0 ) { echo "The SiteMap Cotent Is Empty"," "; exit(); } /* check the parameter */ $siteMapPath = $this->loadConfig("SITEMAPPATH"); if(!file_exists($siteMapPath)) { $commandStr = "touch ${siteMapPath}"; exec($commandStr); } if( !is_writable($siteMapPath) ) { echo "Is Not Writeable"," "; exit(); } $this->_xmlWriter->openURI($siteMapPath); $this->_xmlWriter->startDocument("1.0", "UTF-8"); $this->_xmlWriter->setIndent(true); $this->_xmlWriter->startElement("urlset"); $this->_xmlWriter->writeAttribute("xmlns", self::SCHEMA); foreach($siteMapList as $siteMapKey => $siteMapItem) { $this->_xmlWriter->startElement("url"); $this->_xmlWriter->writeElement("loc",$siteMapItem["Url"]); $this->_xmlWriter->writeElement("title",$siteMapItem["Title"]); $changefreq = !empty($siteMapItem["ChangeFreq"]) ? $siteMapItem["ChangeFreq"] : "Daily"; $this->_xmlWriter->writeElement("changefreq",$changefreq); $priority = !empty($siteMapItem["Priority"]) ? $siteMapItem["Priority"] : 0.5; $this->_xmlWriter->writeElement("priority",$priority); $this->_xmlWriter->endElement(); } $this->_xmlWriter->endElement(); /* return return */ return $result; } /** * start to send request to the target url, and get the reponse * * @param string $targetUrl * @return mixed $returnData * @access public */ public function sendRequest($url) { /* init return value */ $responseData = false; /* check the parameter */ if( !filter_var($url, FILTER_VALIDATE_URL) ) { return $responseData; } $connectTimeOut = $this->loadConfig("CURLOPT_CONNECTTIMEOUT"); if( $connectTimeOut === false ) { return $responseData; } $timeOut = $this->loadConfig("CURLOPT_TIMEOUT"); if( $timeOut === false ) { return $responseData; } $handle = curl_init(); curl_setopt($handle, CURLOPT_URL, $url); curl_setopt($handle, CURLOPT_HEADER, false); curl_setopt($handle, CURLOPT_AUTOREFERER, true); curl_setopt($handle, CURLOPT_RETURNTRANSFER , true); curl_setopt($handle, CURLOPT_CONNECTTIMEOUT, $connectTimeOut); curl_setopt($handle, CURLOPT_TIMEOUT, $timeOut); curl_setopt($handle, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; MSIE 5.01; Windows NT 5.0)" ); $headersItem = array( "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Connection: Keep-Alive" ); curl_setopt($handle, CURLOPT_HTTPHEADER, $headersItem); curl_setopt($handle, CURLOPT_FOLLOWLOCATION, 1); $cookieList = $this->loadConfig("COOKIELIST"); $isUseCookie = $cookieList["IsUseCookie"]; $cookieFilePath = $cookieList["CookiePath"]; if($isUseCookie) { if(!file_exists($cookieFilePath)) { $touchCommand = " touch {$cookieFilePath} "; exec($touchCommand); } curl_setopt($handle, CURLOPT_COOKIEFILE, $cookieFilePath); curl_setopt($handle, CURLOPT_COOKIEJAR, $cookieFilePath); } $responseData = curl_exec($handle); $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE); if($httpCode != 200) { $responseData = false; } curl_close($handle); /* return response data */ return $responseData; } /** * get the sitemap content of the url, it contains url, title, priority, changefreq * * @param string $url * @access public */ public function generateSiteMapList($url) { $content = $this->sendRequest($url); if($content !== false) { $tagsList = $this->_parseContent($content, $url); $urlItem = $tagsList["UrlItem"]; $title = $tagsList["Title"]; $siteMapItem = array( "Url" => trim($url), "Title" => trim($title) ); $priority = $this->_calculatePriority($siteMapItem["Url"]); $siteMapItem["Priority"] = $priority; $changefreq = $this->_calculateChangefreq($siteMapItem["Url"]); $siteMapItem["ChangeFreq"] = $changefreq; $this->siteMapList[] = $siteMapItem; foreach($urlItem as $nextUrl) { if( !in_array($nextUrl, $this->webUrlList) ) { $skipUrlList = $this->loadConfig("SKIP_URLLIST"); foreach($skipUrlList as $keyWords) { if( stripos($nextUrl, $keyWords) !== false ) { continue 2; } } $this->webUrlList[] = $nextUrl; echo $nextUrl," "; $this->generateSiteMapList($nextUrl); } } } } /** *teChangefreq get sitemaplist of the web * * @access public * @return array $siteMapList */ public function getSiteMapList() { return $this->siteMapList; } /** * calate the priority of the targeturl * * @param string $targetUrl * @return float $priority * @access private */ private function _calculatePriority($targetUrl) { /* init priority */ $priority = 0.5; /* calculate the priority */ if( filter_var($targetUrl, FILTER_VALIDATE_URL) ) { $priorityList = $this->loadConfig("PRIORITYLIST"); foreach($priorityList as $priorityKey => $priorityValue) { if(stripos($targetUrl, $priorityKey) !== false) { $priority = $priorityValue; break; } } } /* return priority */ return $priority; } /** * calate the changefreq of the targeturl * * @param string $targetUrl * @return float $changefreq * @access private */ private function _calculateChangefreq($targetUrl) { /* init changefreq*/ $changefreq = "Daily"; /* calculate the priority */ if( filter_var($targetUrl, FILTER_VALIDATE_URL) ) { $changefreqList = $this->loadConfig("CHANGEFREQLIST"); foreach($changefreqList as $changefreqKey => $changefreqValue) { if(stripos($targetUrl, $changefreqKey) !== false) { $changefreq = $changefreqValue; break; } } } /* return priority */ return $changefreq; } /** * format url * * @param $url * @param $orginUrl * @access private * @return $formatUrl */ private function _formatUrl($url, $originUrl) { /* init url */ $formatUrl = ""; /* format url */ if( !empty($url) && !empty($originUrl) ) { $badUrlItem = array( "", "/" , "javascript", "javascript:;", "" ); $formatUrl = trim($url); $formatUrl = trim($formatUrl, "#"); $formatUrl = trim($formatUrl, """); $formatUrl = trim($formatUrl, """); if(stripos($formatUrl, "http") === false && !in_array($formatUrl, $badUrlItem)) { if(strpos($formatUrl, "/") === 0) { $domainName = $this->loadConfig("DOMAIN_NAME"); $formatUrl = $domainName . trim($formatUrl, "/"); } else { $formatUrl = substr( $originUrl, 0, strrpos($originUrl, "/") ) ."/". $formatUrl; } } elseif( stripos($formatUrl, "http") === false && in_array($formatUrl, $badUrlItem) ) { $formatUrl = ""; } } /* return url */ return $formatUrl; } /** * check domain is right * * @param $url * @return $url * @access private */ private function _checkDomain($url) { /* init url */ $result = false; /* check domain */ if($url) { $domainName = $this->loadConfig("DOMAIN_NAME"); if( stripos($url, $domainName) === false ) { return $result; } $result = true; } /* return url */ return $result; } /** * parse the response content, so that we can get the urls * * @param string $content * @param string $originUrl * @return array $urlItem * @access public */ public function _parseContent($content, $originUrl) { /* init return data */ $tagsList = array(); /* start parse */ if( !empty($content) && !empty($originUrl) ) { $domainName = $this->loadConfig("DOMAIN_NAME"); /* get the attribute of href for tags */ $regStrForTagA = "#4、配置文件代碼$url) { $formatUrl = $this->_formatUrl($url, $originUrl); if( empty($formatUrl) ) { unset($urlItem[$urlKey]); continue; } $result = $this->_checkDomain($formatUrl); if($result === false) { unset($urlItem[$urlKey]); continue; } $urlItem[$urlKey] = $formatUrl; } } $tagsList["UrlItem"] = $urlItem; /* get the title tags content */ $regStrForTitle = "#(.*?)#um"; if( preg_match($regStrForTitle, $content, $matches) ) { $title = $matches[1]; } $tagsList["Title"] = $title; } /* return tagsList */ return $tagsList; } } /* here is a example */ $startTime = microtime(true); echo "/***********************************************************************/"," "; echo "/* start to run {$startTime} */"," "; echo "/***********************************************************************/"," "; $siteMap = new SiteMap(); $domain = $siteMap->loadConfig("DOMAIN_NAME"); $siteMap->generateSiteMapList($domain); $siteMapList = $siteMap->getSiteMapList(); $siteMap->generateSiteMapXml($siteMapList); $endTime = microtime(true); $takeTime = $endTime - $startTime; echo "/***********************************************************************/"," "; echo "/* Had Done, it total take {$takeTime} */"," "; echo "/***********************************************************************/"," "; ?>
true, "CookiePath" => "/tmp/sitemapcookie" ); //sitemap文件的保存地址 $SITEMAPPATH = "./sitemap.xml"; //根據(jù)連接關(guān)鍵字設(shè)置priority $PRIORITYLIST = array( "product" => "0.8", "device" => "0.6", "intelligent" => "0.4", "course" => "0.2" ); //根據(jù)連接關(guān)鍵字設(shè)置CHANGEFREQ $CHANGEFREQLIST = array( "product" => "Always", "device" => "Hourly", "intelligent" => "Daily", "course" => "Weekly", "login" => "Monthly", "about" => "Yearly" ); ?>5、獲取源碼包
單擊下載源代碼 (提取碼:fc1c)
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://systransis.cn/yun/21739.html
摘要:輸出類(lèi)似強(qiáng)行刪除某插件此方法用于卸載插件失敗時(shí)的替補(bǔ)方法,老高一般將此方法寫(xiě)入插件的方法里,這樣刷新以下后臺(tái),出問(wèn)題的插件就被卸載了。比如老高的插件,就用此方法為系統(tǒng)添加了一個(gè)的路由。 此文原本發(fā)表于我的博客 老高的技術(shù)博客 ,歡迎和老高交流! Helper類(lèi)為我們封裝了很多與插件有關(guān)的操作,并且全部是公共靜態(tài)方法,比如獲取系統(tǒng)配置、添加路由、添加面板等功能,是開(kāi)發(fā)插件必不可少的工...
摘要:而我本人需要完成的任務(wù)是定時(shí)訪(fǎng)問(wèn)一個(gè)文件鏈接去生成,所以訪(fǎng)問(wèn)就不能用去完成,而是要用。本站的這篇下執(zhí)行定時(shí)任務(wù)命令詳解寫(xiě)的非常詳細(xì),建議看一下。 crontab -e 新建/編輯一個(gè)任務(wù)crontab -l 列出所有任務(wù) crontab 格式: 基本格式 :分鐘 小時(shí) 日 月 星期 命令第1列表示分鐘1~59 每分鐘用或者 /1表示第2列表示小時(shí)1~23(0表示0點(diǎn)...
showImg(https://segmentfault.com/img/remote/1460000018808058?w=900&h=500); 簡(jiǎn)介 SEO、sitemap、搜索引擎優(yōu)化、簡(jiǎn)單教程 在曖昧期和暗戀期時(shí)心里總是懸掛著: ta 為什么還不和我表白? ta 是不是對(duì)我沒(méi)感覺(jué)? ta 是不是只是把我當(dāng)備胎? ta 是不是對(duì)誰(shuí)都這樣? 解決問(wèn)題最簡(jiǎn)單的方式就是直接 問(wèn)問(wèn)對(duì)方...
閱讀 2670·2023-04-26 00:42
閱讀 2815·2021-09-24 10:34
閱讀 3828·2021-09-24 09:48
閱讀 4165·2021-09-03 10:28
閱讀 2586·2019-08-30 15:56
閱讀 2782·2019-08-30 15:55
閱讀 3273·2019-08-29 12:46
閱讀 2252·2019-08-28 17:52