成人国产在线小视频_日韩寡妇人妻调教在线播放_色成人www永久在线观看_2018国产精品久久_亚洲欧美高清在线30p_亚洲少妇综合一区_黄色在线播放国产_亚洲另类技巧小说校园_国产主播xx日韩_a级毛片在线免费

資訊專欄INFORMATION COLUMN

Java URL類踩坑指南

zhisheng / 2281人閱讀

摘要:類的源碼對(duì)象的方法其中會(huì)消耗大量時(shí)間。所以,如果在基于哈希表的容器中存儲(chǔ)對(duì)象,簡直就是災(zāi)難。下面這段代碼,對(duì)比了和在存儲(chǔ)次時(shí)的表現(xiàn)輸出為所以,基于哈希表實(shí)現(xiàn)的容器最好不要用。這也給我們啟發(fā)結(jié)尾的最好還是加上以上,本周末發(fā)現(xiàn)的一些坑。

背景介紹

最近再做一個(gè)RSS閱讀工具給自己用,其中一個(gè)環(huán)節(jié)是從服務(wù)器端獲取一個(gè)包含了RSS源列表的json文件,再根據(jù)這個(gè)json文件下載、解析RSS內(nèi)容。核心代碼如下:

class PresenterImpl(val context: Context, val activity: MainActivity) : IPresenter {
    private val URL_API = "https://vimerzhao.github.io/others/rssreader/RSS.json"

    override fun getRssResource(): RssSource {
        val gson = GsonBuilder().create()
        return gson.fromJson(getFromNet(URL_API), RssSource::class.java)
    }

    private fun getFromNet(url: String): String {
        val result = URL(url).readText()
        return result
    }

    ......
}

之前一直執(zhí)行地很好,直到前兩天我購買了一個(gè)vimerzhao.top的域名,并將原來的域名vimerzhao.github.io重定向到了vimerzhao.top。這個(gè)工具就無法使用了,但在瀏覽器輸入URL_API卻能得到數(shù)據(jù):

那為什么URL.readText()沒有拿到數(shù)據(jù)呢?

不支持重定向

可以通過下面代碼測(cè)試:

import java.net.*;
import java.io.*;

public class TestRedirect {
    public static void main(String args[]) {
        try {
            URL url1 = new URL("https://vimerzhao.github.io/others/rssreader/RSS.json");
            URL url2 = new URL("http://vimerzhao.top/others/rssreader/RSS.json");
            read(url1);
            System.out.println("=--------------------------------=");
            read(url2);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    public static void read(URL url) {
        try {
            BufferedReader in = new BufferedReader(
                    new InputStreamReader(url.openStream()));

            String inputLine;
            while ((inputLine = in.readLine()) != null) {
                System.out.println(inputLine);
            }
            in.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

得到結(jié)果如下:


301 Moved Permanently

301 Moved Permanently


nginx
=--------------------------------= {"theme":"tech","author":"zhaoyu","email":"[email protected]","version":"0.01","contents":[{"category":"綜合版塊","websites":[{"tag":"門戶網(wǎng)站","url":["http://geek.csdn.net/admin/news_service/rss","http://blog.jobbole.com/feed/","http://feed.cnblogs.com/blog/sitehome/rss","https://segmentfault.com/feeds","http://www.codeceo.com/article/category/pick/feed"]},{"tag":"知名社區(qū)","url":["https://stackoverflow.com/feeds","https://www.v2ex.com/index.xml"]},{"tag":"官方博客","url":["https://www.blog.google/rss/","https://blog.jetbrains.com/feed/"]},{"tag":"個(gè)人博客-行業(yè)","url":["http://feed.williamlong.info/","https://www.liaoxuefeng.com/feed/articles"]},{"tag":"個(gè)人博客-學(xué)術(shù)","url":["http://www.norvig.com/rss-feed.xml"]}]},{"category":"編程語言","websites":[{"tag":"Kotlin","url":["https://kotliner.cn/api/rss/latest"]},{"tag":"Python","url":["https://www.python.org/dev/peps/peps.rss/"]},{"tag":"Java","url":["http://www.codeceo.com/article/category/develop/java/feed"]}]},{"category":"行業(yè)動(dòng)態(tài)","websites":[{"tag":"Android","url":["http://www.codeceo.com/article/category/develop/android/feed"]}]},{"category":"亂七八遭","websites":[{"tag":"Linux-綜合","url":["https://linux.cn/rss.xml","http://www.linuxidc.com/rssFeed.aspx","http://www.codeceo.com/article/tag/linux/feed"]},{"tag":"Linux-發(fā)行版","url":["https://blog.linuxmint.com/?feed=rss2","https://manjaro.github.io/feed.xml"]}]}]}

HTTP返回碼301,即發(fā)生了重定向。可在瀏覽器上這個(gè)過程太快以至于我們看不到這個(gè)301界面的出現(xiàn)。這里需要說明的是URL.readText()是Kotlin中一個(gè)擴(kuò)展函數(shù),本質(zhì)還是調(diào)用了URL類的openStream方法,部分源碼如下:

.....
/**
 * Reads the entire content of this URL as a String using UTF-8 or the specified [charset].
 *
 * This method is not recommended on huge files.
 *
 * @param charset a character set to use.
 * @return a string with this URL entire content.
 */
@kotlin.internal.InlineOnly
public inline fun URL.readText(charset: Charset = Charsets.UTF_8): String = readBytes().toString(charset)

/**
 * Reads the entire content of the URL as byte array.
 *
 * This method is not recommended on huge files.
 *
 * @return a byte array with this URL entire content.
 */
public fun URL.readBytes(): ByteArray = openStream().use { it.readBytes() }

所以上面的測(cè)試代碼即說明了URL.readText()失敗的原因。
不過URL不支持重定向是否合理?為什么不支持?還有待探究。

不穩(wěn)定的equals方法

首先看下equals的說明(URL (Java Platform SE 7 )):

Compares this URL for equality with another object.
If the given object is not a URL then this method immediately returns false.
Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.
Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can"t be resolved, the host names must be equal without regard to case; or both host names equal to null.
Since hosts comparison requires name resolution, this operation is a blocking operation.
Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.

接下來再看一段代碼:

import java.net.*;
public class TestEquals {
    public static void main(String args[]) {
        try {
            // vimerzhao的博客主頁
            URL url1 = new URL("https://vimerzhao.github.io/");
            // zhanglanqing的博客主頁
            URL url2 = new URL("https://zhanglanqing.github.io/");
            // vimerzhao博客主頁重定向后的域名
            URL url3 = new URL("http://vimerzhao.top/");
            System.out.println(url1.equals(url2));
            System.out.println(url1.equals(url3));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

根據(jù)定義輸出結(jié)果是什么呢?運(yùn)行之后是這樣:

true
false

你可能猜對(duì)了,但如果我把電腦斷網(wǎng)之后再次執(zhí)行,結(jié)果卻是:

false
false

但其實(shí)3個(gè)域名的IP地址都是相同的,可以ping一下:

zhaoyu@Inspiron ~/Project $ ping vimezhao.github.io
PING sni.github.map.fastly.net (151.101.77.147) 56(84) bytes of data.
64 bytes from 151.101.77.147: icmp_seq=1 ttl=44 time=396 ms
^C
--- sni.github.map.fastly.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 396.692/396.692/396.692/0.000 ms
zhaoyu@Inspiron ~/Project $ ping zhanglanqing.github.io
PING sni.github.map.fastly.net (151.101.77.147) 56(84) bytes of data.
64 bytes from 151.101.77.147: icmp_seq=1 ttl=44 time=396 ms
^C
--- sni.github.map.fastly.net ping statistics ---
2 packets transmitted, 1 received, 50% packet loss, time 1000ms
rtt min/avg/max/mdev = 396.009/396.009/396.009/0.000 ms
zhaoyu@Inspiron ~/Project $ ping vimezhao.top
ping: unknown host vimezhao.top
zhaoyu@Inspiron ~/Project $ ping vimerzhao.top
PING sni.github.map.fastly.net (151.101.77.147) 56(84) bytes of data.
64 bytes from 151.101.77.147: icmp_seq=1 ttl=44 time=409 ms
^C
--- sni.github.map.fastly.net ping statistics ---
2 packets transmitted, 1 received, 50% packet loss, time 1001ms
rtt min/avg/max/mdev = 409.978/409.978/409.978/0.000 ms

首先看一下有網(wǎng)絡(luò)連接的情況,vimerzhao.github.iozhanglanqing.github.io是我和我同學(xué)的博客,雖然內(nèi)容不一樣但是指向相同的IP,協(xié)議、端口等都相同,所以相等了;而vimerzhao.github.io雖然和vimerzhao.top指向同一個(gè)博客,但是一個(gè)是https一個(gè)是http,協(xié)議不同,所以判斷為不相等。相信這和大多數(shù)人的直覺是相背的:指向不同博客的URL相等了,但指向相同博客的URL卻不相等!
再分析斷網(wǎng)之后的結(jié)果:首先查看URL的源碼:

    public boolean equals(Object obj) {
        if (!(obj instanceof URL))
            return false;
        URL u2 = (URL)obj;

        return handler.equals(this, u2);
    }

再看handler對(duì)象的源碼:

    protected boolean equals(URL u1, URL u2) {
        String ref1 = u1.getRef();
        String ref2 = u2.getRef();
        return (ref1 == ref2 || (ref1 != null && ref1.equals(ref2))) &&
               sameFile(u1, u2);
    }

sameFile源碼:

    protected boolean sameFile(URL u1, URL u2) {
        // Compare the protocols.
        if (!((u1.getProtocol() == u2.getProtocol()) ||
              (u1.getProtocol() != null &&
               u1.getProtocol().equalsIgnoreCase(u2.getProtocol()))))
            return false;

        // Compare the files.
        if (!(u1.getFile() == u2.getFile() ||
              (u1.getFile() != null && u1.getFile().equals(u2.getFile()))))
            return false;

        // Compare the ports.
        int port1, port2;
        port1 = (u1.getPort() != -1) ? u1.getPort() : u1.handler.getDefaultPort();
        port2 = (u2.getPort() != -1) ? u2.getPort() : u2.handler.getDefaultPort();
        if (port1 != port2)
            return false;

        // Compare the hosts.
        if (!hostsEqual(u1, u2))
            return false;// 無網(wǎng)絡(luò)連接時(shí)會(huì)觸發(fā)這一句

        return true;
    }

最后是hostsEqual的源碼:

    protected boolean hostsEqual(URL u1, URL u2) {
        InetAddress a1 = getHostAddress(u1);
        InetAddress a2 = getHostAddress(u2);
        // if we have internet address for both, compare them
        if (a1 != null && a2 != null) {
            return a1.equals(a2);
        // else, if both have host names, compare them
        } else if (u1.getHost() != null && u2.getHost() != null)
            return u1.getHost().equalsIgnoreCase(u2.getHost());
         else
            return u1.getHost() == null && u2.getHost() == null;
    }

在有網(wǎng)絡(luò)的情況下,a1a2都不是null所以會(huì)觸發(fā)return a1.equals(a2),返回true;而沒有網(wǎng)絡(luò)時(shí)則會(huì)觸發(fā)return u1.getHost().equalsIgnoreCase(u2.getHost());即第二個(gè)判斷,顯然url1hostvimerzhao.github.io)和url2hostzhanglanqing.github.io)不等,所以返回false,導(dǎo)致if (!hostsEqual(u1, u2))判斷為真,return false執(zhí)行。
可見,URL類的equals方法不僅違反直覺還缺乏一致性,在不同環(huán)境會(huì)有不同結(jié)果,十分危險(xiǎn)!

耗時(shí)的equals方法

此外,equals還是個(gè)耗時(shí)的操作,因?yàn)樵谟芯W(wǎng)絡(luò)的情況下需要進(jìn)行DNS解析,hashCode()同理,這里以hashCode()為例說明。URL類的hashCode()源碼:

    public synchronized int hashCode() {
        if (hashCode != -1)
            return hashCode;

        hashCode = handler.hashCode(this);
        return hashCode;
    }

handler對(duì)象的hashCode()方法:

    protected int hashCode(URL u) {
        int h = 0;

        // Generate the protocol part.
        String protocol = u.getProtocol();
        if (protocol != null)
            h += protocol.hashCode();

        // Generate the host part.
        InetAddress addr = getHostAddress(u);
        if (addr != null) {
            h += addr.hashCode();
        } else {
            String host = u.getHost();
            if (host != null)
                h += host.toLowerCase().hashCode();
        }

        // Generate the file part.
        String file = u.getFile();
        if (file != null)
            h += file.hashCode();

        // Generate the port part.
        if (u.getPort() == -1)
            h += getDefaultPort();
        else
            h += u.getPort();

        // Generate the ref part.
        String ref = u.getRef();
        if (ref != null)
            h += ref.hashCode();

        return h;
    }

其中getHostAddress()會(huì)消耗大量時(shí)間。所以,如果在基于哈希表的容器中存儲(chǔ)URL對(duì)象,簡直就是災(zāi)難。下面這段代碼,對(duì)比了URLURI在存儲(chǔ)50次時(shí)的表現(xiàn):

import java.net.*;
import java.util.*;

public class TestHash {
    public static void main(String args[]) {
        HashSet list1 = new HashSet<>();
        HashSet list2 = new HashSet<>();
        try {
            URL url1 = new URL("https://vimerzhao.github.io/");
            URI url2 = new URI("https://zhanglanqing.github.io/");
            long cur = System.currentTimeMillis();
            int cnt = 50;
            for (int i = 0; i < cnt; i++) {
                list1.add(url1);
            }
            System.out.println(System.currentTimeMillis() - cur);
            cur = System.currentTimeMillis();
            for (int i = 0; i < cnt; i++) {
                list2.add(url2);
            }
            System.out.println(System.currentTimeMillis() - cur);

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

輸出為:

271
0

所以,基于哈希表實(shí)現(xiàn)的容器最好不要用URL

TrailingSlash的作用

所謂TrailingSlash就是域名結(jié)尾的斜杠。比如我們?cè)跒g覽器看到vimerzhao.top,復(fù)制后粘貼發(fā)現(xiàn)是http://vimerzhao.top/。首先用下面代碼測(cè)試:

import java.net.*;
import java.io.*;

public class TestTrailingSlash {
    public static void main(String args[]) {
        try {
            URL url1 = new URL("https://vimerzhao.github.io/");
            URL url2 = new URL("https://vimerzhao.github.io");
            System.out.println(url1.equals(url2));
            outputInfo(url1);
            outputInfo(url2);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    public static void outputInfo(URL url) {
        System.out.println("------" + url.toString() + "----------");
        System.out.println(url.getRef());
        System.out.println(url.getFile());
        System.out.println(url.getHost());
        System.out.println("----------------");
    }
}

得到結(jié)果如下:

false
------https://vimerzhao.github.io/----------
null
/
vimerzhao.github.io
----------------
------https://vimerzhao.github.io----------
null

vimerzhao.github.io
----------------

其實(shí),無論用前面的read()方法讀或者地址欄直接輸入url,url1url2內(nèi)容都是相同的,但是加/表示這是一個(gè)目錄,不加表示這是一個(gè)文件,所以二者getFile()的結(jié)果不同,導(dǎo)致equals判斷為false。在地址欄輸入時(shí)甚至不會(huì)覺察到這個(gè)TrailingSlash,所返回的結(jié)果也一樣,但equals判斷竟然為false,真是防不勝防!
這里還有一個(gè)問題就是:一個(gè)是文件,令一個(gè)是目錄,為什么都能得到相同結(jié)果?
調(diào)查一番后發(fā)現(xiàn):其實(shí)再請(qǐng)求的時(shí)候如果有/,那么就會(huì)在這個(gè)目錄下找index.html文件;如果沒有,以vimerzhao.top/tags為例,則會(huì)先找tags,如果找不到就會(huì)自動(dòng)在后面添加一個(gè)/,再在tags目錄下找index.html文件。如圖:

這里有一個(gè)有趣的測(cè)試,編寫兩段代碼如下:

import java.net.*;
import java.io.*;

public class TestTrailingSlash {
    public static void main(String args[]) {
        try {
            URL urlWithSlash = new URL("http://vimerzhao.top/tags/");
            int cnt = 5;
            long cur = System.currentTimeMillis();
            for (int i = 0; i < cnt; i++) {
                read(urlWithSlash);
            }
            System.out.println(System.currentTimeMillis() - cur);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    public static void read(URL url) {
        try {
            BufferedReader in = new BufferedReader(
                    new InputStreamReader(url.openStream()));

            String inputLine;
            while ((inputLine = in.readLine()) != null) {
                //System.out.println(inputLine);
            }
            in.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
import java.net.*;
import java.io.*;

public class TestWithoutTrailingSlash {
    public static void main(String args[]) {
        try {
            URL urlWithoutSlash = new URL("http://vimerzhao.top/tags");
            int cnt = 5;
            long cur = System.currentTimeMillis();
            for (int i = 0; i < cnt; i++) {
                read(urlWithoutSlash);
            }
            System.out.println(System.currentTimeMillis() - cur);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    public static void read(URL url) {
        try {
            BufferedReader in = new BufferedReader(
                    new InputStreamReader(url.openStream()));

            String inputLine;
            while ((inputLine = in.readLine()) != null) {
                //System.out.println(inputLine);
            }
            in.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

使用如下腳本測(cè)試:

#!/bin/sh
for i in {1..20}; do
    java TestTrailingSlash > out1
    java TestWithoutTrailingSlash > out2
done

將輸出的時(shí)間做成表格:

可以發(fā)現(xiàn),添加了/的速度更快,這是因?yàn)?strong>省去了查找是否有tags文件的操作。這也給我們啟發(fā):URL結(jié)尾的/最好還是加上!

以上,本周末發(fā)現(xiàn)的一些坑。

參考

Official Google Webmaster Central Blog: To slash or not to slash

url rewriting - When should I use a trailing slash in my URL? - Stack Overflow

What Does a Slash at the End of a Website"s URL Mean?

Mr. Gosling - why did you make URL equals suck?!? - Invert Your Mind ? Invert Your Mind

java - URLConnection Doesn"t Follow Redirect - Stack Overflow

java - Proper way to check for URL equality - Stack Overflow

http - How to compare two URLs in java? - Stack Overflow

文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址:http://systransis.cn/yun/70640.html

相關(guān)文章

  • Next.js項(xiàng)目實(shí)戰(zhàn)踩坑指南

    摘要:項(xiàng)目實(shí)戰(zhàn)踩坑指南移動(dòng)端,滾動(dòng)卡頓解決方案主容器增加樣式路由跳轉(zhuǎn)后樣式丟失原因下樣式根據(jù)頁面動(dòng)態(tài)加載,瀏覽器緩存文件造成樣式不更新??缬蚣皞鬟f的問題第一步,登錄成功后服務(wù)器返回。第二步,瀏覽器自動(dòng)緩存,再后續(xù)請(qǐng)求中攜帶此。 項(xiàng)目實(shí)戰(zhàn)踩坑指南 1. 移動(dòng)端overflow:auto,ios滾動(dòng)卡頓 解決方案: 主容器增加樣式-webkit-overflow-scrolling: touc...

    用戶83 評(píng)論0 收藏0
  • 監(jiān)聽微信返回事件踩坑指南

    摘要:瀏覽器返回等于重新進(jìn)入上一個(gè)頁面,會(huì)觸發(fā)刷新動(dòng)作,而微信不會(huì)。也就是困擾我多時(shí)的微信返回不刷新。也就是說當(dāng)時(shí)微信返回還是會(huì)觸發(fā)渲染事件的具體是什么事件也不清楚,因?yàn)楫?dāng)時(shí)沒有深究,但是確實(shí)是觸發(fā)了。 PC瀏覽器返回等于重新進(jìn)入上一個(gè)頁面,會(huì)觸發(fā)刷新動(dòng)作,而微信不會(huì)。也就是困擾我多時(shí)的微信返回不刷新。 大概再2017年初和2016末(大概也是從那個(gè)時(shí)候我開始做微信公眾號(hào)),還可以通過在se...

    adam1q84 評(píng)論0 收藏0
  • vue 開發(fā)中遇到的問題匯總(踩坑指南

    摘要:組件中使用定時(shí)器及銷毀問題如果我們?cè)陧撁嬷惺褂昧艘粋€(gè)定時(shí)器,當(dāng)從頁面跳轉(zhuǎn)到頁面時(shí),如果不手動(dòng)清除這個(gè)定時(shí)器,那么它仍舊會(huì)執(zhí)行,這不是我們所期望的。 公司年初開始從jquery轉(zhuǎn)型到vue開發(fā),思想上從jquery的操作DOM到vue的操作數(shù)據(jù),剛開始還不太習(xí)慣,但用了一段時(shí)間發(fā)現(xiàn)確實(shí)比較方便。在剛開始用vue的時(shí)候,也踩了一些坑,現(xiàn)在分享出來,供剛?cè)腴T上手開發(fā)vue的朋友參考,都是一些...

    wean 評(píng)論0 收藏0
  • Nuxt.js的踩坑指南(常見問題匯總)

    摘要:本文會(huì)不定期更新在中遇到的問題進(jìn)行匯總。轉(zhuǎn)發(fā)請(qǐng)注明出處,尊重作者,謝謝注意版本為,適合低版本指南,不通用以上。強(qiáng)烈推薦作者文檔版踩坑指南,點(diǎn)擊跳轉(zhuǎn) 本文會(huì)不定期更新在nuxt.js中遇到的問題進(jìn)行匯總。轉(zhuǎn)發(fā)請(qǐng)注明出處,尊重作者,謝謝! 注意:版本為1.0+,適合低版本nuxt指南,不通用2.0+以上。 強(qiáng)烈推薦作者文檔版踩坑指南,點(diǎn)擊跳轉(zhuǎn)

    maochunguang 評(píng)論0 收藏0

發(fā)表評(píng)論

0條評(píng)論

閱讀需要支付1元查看
<