Why Would Html.agilitypack Miss Some Image Tags?
Solution 1:
Is "//img" just looking for root ones?
No it looking for descendant nodes (children, grandchildren, etc. of the current node). Your xpath expression selects all the images from the document.
When I go to that page and do $('img').size(); I get 43 back.
My assumption - some of the images are created dynamically via javascript. HtmlAgilityPack cannot handle this.
By the way, for the http://test.com
I got 87 image nodes with AgilityPack (doc.DocumentNode.SelectNodes("//img").Count()
), and 87 image nodes from the Chome console ($('img').size()
).
EDIT: HtmlWeb.Load()
method internally uses WebRequest
class to get data. The role of AgilityPack is to parse the data correctly. It's completely possible that some web resources return different content for the same URI depending on some of request headers like User-Agent
and others. E.g. User-Agent
header could be set via HtmlWeb.UserAgent
property.
Post a Comment for "Why Would Html.agilitypack Miss Some Image Tags?"