Welcome to weblogs.com.pk Sign in | Join | Help

Aziz ur Rahman

Random Thoughts

News

  • The WeatherPixie
    Listed on BlogShares


    Counter : simple hit counter
System.Net.HttpWebRequest - Arabic Data

Few days back, there was a task that I have to get/parse data from html pages on a site. I first tried to use System.Net.HttpWebRequest class to make the request, get the data.

<code>
Dim objRequest As System.Net.HttpWebRequest = System.Net.WebRequest.Create(Url)
Dim result As String
objRequest.Method = "GET"

Dim objResponse As System.Net.HttpWebResponse = objRequest.GetResponse()
Dim sr As System.IO.StreamReader
sr = New System.IO.StreamReader(objResponse.GetResponseStream())
result = sr.ReadToEnd()
sr.Close()
Return result

</code>

It worked fine but I was getting corrupt Arabic data (the site was in Arabic). I played with the stream classes and found the solution. One have to include the encoding while streaming response.

<code>
sr = New System.IO.StreamReader(objResponse.GetResponseStream(), System.Text.Encoding.UTF8)
</code>

After getting the data in correct format, I tried to use XmlDocument to load the result but again there was a problem. XmlDocument was unable to load the result throwing exceptions. After some checking I found out the XmlDocument was doing thsi due to the html tags that do not have ending tags. e.g. <br>, <hr>, nowrap, <Img> etc. Then I applied some formatting on the result like

<code>
strMatter = strMatter.Replace("<BR>", "")
strMatter = strMatter.Replace("nowrap", "")
strMatter = strMatter.Replace("pointer;"">", "pointer;""></IMG>")
strMatter = strMatter.Replace("pointer;"" >", "pointer;""></IMG>")
</code>

Then I successfully parsed and saved the data in database. Is there any corresponding class for Html like for Xml we have XmlDocument that can easily load html and parse it???

Posted: Saturday, February 04, 2006 9:45 AM by aziz
Filed under: ,

Comments

Anonymous comments are disabled