Sometimes, web pages need a little "HTML Tidy" treatment before they can be successfully used by the parsers in the XML package. This function tries to tidy them using the online web service for HTML Tidy before parsing it.
tidyHTML(URL)
URL | The problematic URL |
---|
A parsed URL, ready to be used with readHTMLTable
from the
XML
package.
Still no guarantee it will work! :-)
http://stackoverflow.com/a/12761741/1270695
Ananda Mahto
if (FALSE) { ## Can't find an actual example. The URL from the ## question is no longer online to test it with. Page <- "http://en.wikipedia.org/wiki/List_of_countries_by_population" u <- tidyHTML(Page) tables <- readHTMLTable(u) str(tables) }