如何从一段XML JAVA中抓取包装在CDATA标签中的文本内容

我有以下XML:

    
application/xml
local-C++
200 <![CDATA[]]>

我想从内容节点解析出以下文本,如下所示:

<![CDATA[]]>

注意这里的内容被包装在CDATA标签中。 我怎样才能用Java来完成这个任何方法。

这是我的代码:

 @Test public void testGetDoOrchResponse() throws IOException { String path = "/Users/haddad/Git/Tools/ContentUtils/src/test/resources/testdata/doOrch_testfiles/doOrch_response.xml"; File f = new File(path); String response = FileUtils.readFileToString(f); String content = getDoOrchResponse(response, "content"); System.out.println("Content: "+content); } 

//输出:内容:空白

 static String getDoOrchResponse(String xml, String tagFragment) throws FileNotFoundException { String content = new String(); try { Document doc = getDocumentXML(xml); NodeList nlNodeExplanationList = doc.getElementsByTagName("response"); for(int i=0;i<nlNodeExplanationList.getLength();i++) { Node explanationNode = nlNodeExplanationList.item(i); List titleList = getTextValuesByTagName((Element)explanationNode, tagFragment); content = titleList.get(0); } } catch (IOException e) { e.printStackTrace(); } return content; } static List getTextValuesByTagName(Element element, String tagName) { NodeList nodeList = element.getElementsByTagName(tagName); ArrayList list = new ArrayList(); for (int i = 0; i < nodeList.getLength(); i++) { String textValue = getTextValue(nodeList.item(i)); if(textValue.equalsIgnoreCase("") ) { textValue = "blank"; } list.add(textValue); } return list; } static String getTextValue(Node node) { StringBuffer textValue = new StringBuffer(); int length = node.getChildNodes().getLength(); for (int i = 0; i < length; i ++) { Node c = node.getChildNodes().item(i); if (c.getNodeType() == Node.TEXT_NODE) { textValue.append(c.getNodeValue()); } } return textValue.toString().trim(); } static Document getDocumentXML(String xml) throws FileNotFoundException { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db; Document doc = null; try { db = dbf.newDocumentBuilder(); doc = db.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8")))); doc.getDocumentElement().normalize(); } catch (ParserConfigurationException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } return doc; } 

我究竟做错了什么? 为什么我输入空白? 我只是不明白

如果要提取Element节点的内容,请使用getTextContent()方法。 如果您确实需要或想要CDATA节标记,则需要使用LSSerializer或类似的方法序列化该节点:

  DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); docFactory.setNamespaceAware(true); DocumentBuilder docBuilder = docFactory.newDocumentBuilder(); Document doc = docBuilder.parse(new File("doc1.xml")); Element content = (Element)doc.getElementsByTagNameNS("http://comResponse.engine/response", "content").item(0); if (content != null) { System.out.println(content.getTextContent()); LSSerializer ser = ((DOMImplementationLS)doc.getImplementation()).createLSSerializer(); if (content.getFirstChild() != null) { System.out.println(ser.writeToString(content.getFirstChild())); } } 

这就是理论,对于我来说,Java JRE 1.8输出没有CDATA节的结束标记,它看起来像LSSerializer不能正确使用单个CDATA节节点。