Java读取网页HTML 汉字乱码 (已解决,bufferedReader中编码改成
Java读取网页HTML 汉字乱码 (已解决,bufferedReader中编码改成跟网页的一致即可)
#网页HTML的编码是gb2312
#Java用utf-8 读取汉字乱码,读取网页HTML内容后输出汉子乱码
URLConnection urlConnection = new URL(url).openConnection();
HttpURLConnection connection = (HttpURLConnection) urlConnection;
connection.setRequestMethod("GET");
//连接
connection.connect();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader
(connection.getInputStream(), StandardCharsets.UTF_8));
StringBuilder bs = new StringBuilder();
String l;
while ((l = bufferedReader.readLine()) != null) {
if(l.indexOf("
}
}
#输出到txt也乱码
BufferedWriter bWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(filePathName)),
"UTF-8"));
#已解决(已解决,bufferedReader中编码改成跟网页的一致即可)
###在链接返回后,使用BufferedReader读取时,将编码方式改成跟网页的编码一样即可
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader
(connection.getInputStream(),"gb2312"));
Java读取网页HTML 汉字乱码 (已解决,bufferedReader中编码改成相关教程