Sometimes, web pages need a little "HTML Tidy" treatment before they can be successfully used by the parsers in the XML package. This function tries to tidy them using the online web service for HTML Tidy before parsing it.

tidyHTML(URL)

Arguments

URL

The problematic URL

Value

A parsed URL, ready to be used with readHTMLTable from the XML package.

Note

Still no guarantee it will work! :-)

References

http://stackoverflow.com/a/12761741/1270695

Author

Ananda Mahto

Examples

if (FALSE) { ## Can't find an actual example. The URL from the ## question is no longer online to test it with. Page <- "http://en.wikipedia.org/wiki/List_of_countries_by_population" u <- tidyHTML(Page) tables <- readHTMLTable(u) str(tables) }