System.Net.HttpWebRequest - Arabic Data
Few days back, there was a task that I have to
get/parse data from html pages on a site. I first tried to use
System.Net.HttpWebRequest class to make the request, get the data.
Dim objRequest As
System.Net.HttpWebRequest = System.Net.WebRequest.Create(Url)
Dim result As String
objRequest.Method = "GET"
Dim objResponse As System.Net.HttpWebResponse = objRequest.GetResponse()
Dim sr As System.IO.StreamReader
sr = New System.IO.StreamReader(objResponse.GetResponseStream())
result = sr.ReadToEnd()
It worked fine but I was getting corrupt Arabic
data (the site was in Arabic). I played with the stream classes and found the
solution. One have to include the encoding while streaming response.
sr = New System.IO.StreamReader(objResponse.GetResponseStream(),
After getting the data in correct format, I tried
to use XmlDocument to load the result but again there was a problem. XmlDocument
was unable to load the result throwing exceptions. After some checking I found
out the XmlDocument was doing thsi due to the html tags that do not have ending
tags. e.g. <br>, <hr>, nowrap, <Img> etc. Then I applied some formatting on the
strMatter = strMatter.Replace("<BR>", "")
strMatter = strMatter.Replace("nowrap", "")
strMatter = strMatter.Replace("pointer;"">", "pointer;""></IMG>")
strMatter = strMatter.Replace("pointer;"" >", "pointer;""></IMG>")
Then I successfully parsed and saved the
data in database. Is there any corresponding class for
Html like for Xml we have XmlDocument that can easily load html and parse it???