使用htmlunit的好处有两点,相比httpclient,htmlunit是对浏览器的模拟,比如你定位一个按钮,就可以执行click()方法,此外不需要象在httpclient中一样编写复杂的代码,如一堆request header还有一大堆请求参数,你只需要填写用户名,密码,验证码即可,就象在使用一个没有界面的浏览器,当然更重要的是htmlunit对js的支持设置极其简单
1.添加maven的htmlunit(标红)依赖
1
2
3 junit
4 junit
5 ${junit.version}
6 test
7
8
9 com.alibaba
10 fastjson
11 1.2.47
12
13
14 org.jsoup
15 jsoup
16 1.11.3
17
18
19 net.sourceforge.htmlunit
20 htmlunit
21 2.18
22
23
24
2.思路
定位用户名,密码,验证码框等元素,填写即可,验证码可以先把图片下载下来然后手动输入,也可以使用tess4j进行图片识别,这里是手动输入测试的网站,是一个伪ajaxsubmit,测试多次发现需要二次输入验证码才能正确登录,但令人疑惑的是两次生成的验证码一样(如果不一样说明你第一次输入错误)
1 public static void main(String[] args) throwsException {2 WebClient webClient = newWebClient(BrowserVersion.CHROME);3 webClient.getOptions().setJavaScriptEnabled(true);4 webClient.getOptions().setCssEnabled(true);5 webClient.getOptions().setThrowExceptionOnScriptError(false);6 //webClient.getOptions().setThrowExceptionOnFailingStatusCode(true);
7 webClient.getOptions().setActiveXNative(false);8
9
10 //ajax
11 webClient.setAjaxController(newNicelyResynchronizingAjaxController());12 webClient.getOptions().setUseInsecureSSL(false);13
14
15 //允许重定向
16 webClient.getOptions().setRedirectEnabled(true);17
18
19 //连接超时
20 webClient.getOptions().setTimeout(5000);21
22 //js执行超时
23 webClient.setJavaScriptTimeout(10000*3);24
25 //对于此网站务必开启
26 webClient.getCookieManager().setCookiesEnabled(true);27
28 String url = "/login/";29 HtmlPage page =webClient.getPage(url);30 webClient.waitForBackgroundJavaScript(5000);31
32
33
34
35 HtmlPage newPage =readyPage(page, webClient);36 //String content1 = newPage.asXml();37 //IOUtils.write(content1.getBytes(),new FileWriter(new File("f:/content1.txt")));38
39 //如果页面url没有变化重新进行一次登录
40 if(newPage.getUrl().toString().equals(url)) {41 System.out.println("出现错误请重新登录-------------");42 HtmlPage result =readyPage(newPage,webClient);43 System.out.println("url----------------"+result.getUrl());44 System.out.println("页面----" +result.asXml());45 //IOUtils.write(result.asXml(),new FileWriter(new File("f:/content2.txt")));
46
47 }48
49 webClient.close();50 }51
52
53
54 public static HtmlPage readyPage(HtmlPage page,WebClient webClient) throwsException {55 //封装页面元素
56 HtmlForm form = page.getHtmlElementById("form2");57 HtmlTextInput loginname = form.getInputByName("loginname");58 loginname.setValueAttribute("用户名");59 HtmlPasswordInput loginpwd = form.getInputByName("loginpwd");60 loginpwd.setValueAttribute("密码");61
62 //验证码输入框
63 HtmlTextInput verify_code = form.getInputByName("verify_code");64
65 //验证码图片
66 HtmlImage verify_img = (HtmlImage) page.getElementById("verify_img");67 UUID randomUUID =UUID.randomUUID();68 //保存
69 verify_img.saveAs(new File("./src/main/resources/image/verifyimg"+ randomUUID.toString() +".png"));70
71 System.out.println("验证码图片已保存!");72 System.out.println("请输入验证码");73 //手动输入验证码
74 Scanner scanner = newScanner(System.in);75 String code =scanner.nextLine();76 System.out.println("验证码-------------" +code);77 verify_code.setValueAttribute(code);78
79
80 //登录按钮也可以使用page.executeJavaScript("javascript:document.getElementById(\'loginsubmit\').click()").getNewPage();
81 HtmlAnchor login = page.getHtmlElementById("loginsubmit");82 HtmlPage newPage =login.click();83
84 //等待js加载
85 webClient.waitForBackgroundJavaScript(5000);86 returnnewPage;87
88 }89
3.控制台部分输出截图