473,387 Members | 1,423 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

WebBrowser and Returing the raw HTML

Ok I'm making a fairly simple application. It contains 2
web browsers, the top one is used so that you can view a
website (i.e. one you have created). Every time you load
a page, the HTML which was received is then sent to the
http://validator.w3.org website to validate your HTML /
XHTML.

So far I've got everything to work, even the part where
the HTML is posted to the w3.org website.

But all of the following commands (Browser1 is the main
WebBrowser control) produce a form of HTML for the
document, but all the tags get converted to uppercase and
parts of the document go missing such as the "DOCTYPE"...

Browser1.Document.ToString()
Browser1.Document.documentelement.outerhtml
Browser1.Document.documentelement.innerhtml
Browser1.Document.Body.outerhtml
Browser1.Document.Body.innerhtml
Browser1.Document.All(0).outerhtml
Browser1.Document.All(0).innerhtml
Browser1.Document.All(1).outerhtml
Browser1.Document.All(1).innerhtml
Browser1.Document.All(2).outerhtml
Browser1.Document.All(2).innerhtml
NOTE: The HTML sent to the w3.org website must be exactly
the same as what the server sends otherwise what's the
point in validating it?

Finally, because it will be used on interactive websites
(with a user login), you cant use controls such as the
Inet to return the HTML as then the user (main browser)
will make a request to the server (which may delete a
record) then the Inet or Winsock (etc) will make a
request, but this will then return a different page
(saying you cant delete a record).
Nov 20 '05 #1
11 14257
Hi Craig

Because the DOCTYPE tag is outside the main document, it is not included
when you retrieve inner and outer HTML. To include the entire file you will
need to use the IPersistStreamInit interface, e.g.

<interface>
Imports System.Runtime.InteropServices

' IPersistStreamInit interface
<ComVisible(True), ComImport(),
Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), _
InterfaceTypeAttribute(ComInterfaceType.InterfaceI sIUnknown)> _
Public Interface IPersistStreamInit
Sub GetClassID(ByRef pClassID As Guid)

<PreserveSig()> Function IsDirty() As Integer
<PreserveSig()> Function Load(ByVal pstm As UCOMIStream) As Integer
<PreserveSig()> Function Save(ByVal pstm As UCOMIStream, ByVal fClearDirty
As Boolean) As Integer
<PreserveSig()> Function GetSizeMax(<InAttribute(), Out(),
MarshalAs(UnmanagedType.U8)> ByRef pcbSize As Long) As Integer
<PreserveSig()> Function InitNew() As Integer
End Interface
</interface>

<code>
Dim ips as IPersistStreamInit

ips = DirectCast(Browser1.document, IPersistStreamInit)

ips.Save(strm, False)
</code>

This will save the complete HTML to a stream, which you can turn into a
string.

Regarding the conversion to uppercase, is this actually a problem? The
change of case should not affect the validity of the parsing.

There also two particular newsgroups which may give further help:

microsoft.public.inetsdk.programming.mshtml_hostin g
microsoft.public.inetsdk.programming.webbrowser_ct l

HTH

Charles

"Craig Francis" <1@1.com> wrote in message
news:08****************************@phx.gbl...
Ok I'm making a fairly simple application. It contains 2
web browsers, the top one is used so that you can view a
website (i.e. one you have created). Every time you load
a page, the HTML which was received is then sent to the
http://validator.w3.org website to validate your HTML /
XHTML.

So far I've got everything to work, even the part where
the HTML is posted to the w3.org website.

But all of the following commands (Browser1 is the main
WebBrowser control) produce a form of HTML for the
document, but all the tags get converted to uppercase and
parts of the document go missing such as the "DOCTYPE"...

Browser1.Document.ToString()
Browser1.Document.documentelement.outerhtml
Browser1.Document.documentelement.innerhtml
Browser1.Document.Body.outerhtml
Browser1.Document.Body.innerhtml
Browser1.Document.All(0).outerhtml
Browser1.Document.All(0).innerhtml
Browser1.Document.All(1).outerhtml
Browser1.Document.All(1).innerhtml
Browser1.Document.All(2).outerhtml
Browser1.Document.All(2).innerhtml
NOTE: The HTML sent to the w3.org website must be exactly
the same as what the server sends otherwise what's the
point in validating it?

Finally, because it will be used on interactive websites
(with a user login), you cant use controls such as the
Inet to return the HTML as then the user (main browser)
will make a request to the server (which may delete a
record) then the Inet or Winsock (etc) will make a
request, but this will then return a different page
(saying you cant delete a record).

Nov 20 '05 #2
Thank you for your quick reply.

But is that VB code? I've been using VB5/6 for several
years and that looks slightly C like - this project is
being written in VB.NET, but I've only just upgraded and
finding some of these new methods a little strange.

Also RE the tags being changed to uppercase - The reason
I mentioned it was because it shows that the HTML
document is being changed, probably into a form that the
browser can easily understand (and is probably strict XML
even if the input wasn't XML based).

Anyway, thanks for giving me something else to try.

Craig
Nov 20 '05 #3
Got it, you put all the

<interface></interface>

before the "Public Class Form1" bit - so the first part
of the form, then the

<code></code>

in the function which returns the HTML code. Well that
method doesn't bring up any errors apart from what "strm"
should be dimed as - I've never used a stream before.

But thanks again - this is the most progress I've made in
the past 2 days!

Nov 20 '05 #4
Hi Craig

Yes, sorry about that. It's just a habit I have got into to show where code
and stuff begins and ends. Add the following for the stream handling:

<code>
<DllImport("OLE32.DLL")> _
Public Shared Sub CreateStreamOnHGlobal(ByVal hGlobal As IntPtr, ByVal
fDelete As Boolean, ByRef stm As UCOMIStream)
' LEAVE THIS BLANK - PLACEHOLDER
End Sub

<DllImport("OLE32.DLL")> _
Public Shared Sub GetHGlobalFromStream(ByVal stm As UCOMIStream, ByRef
hGlobal As IntPtr)
' LEAVE THIS BLANK - PLACEHOLDER
End Sub

Private Function GetStream(ByVal size As Integer) As UCOMIStream

Dim iptr As IntPtr
Dim strm As UCOMIStream

iptr = Marshal.AllocHGlobal(size)
CreateStreamOnHGlobal(iptr, True, strm)

Return strm

End Function

Private Function StreamToString(ByVal strm As UCOMIStream) As String

Dim iptr As IntPtr
Dim s As String

GetHGlobalFromStream(strm, iptr)
s = Marshal.PtrToStringAnsi(iptr)

Return s

End Function
</code>

<code>
Dim strm As UCOMIStream
Dim s As String

' Allocate a reasonably high value!
strm = GetStream(2048)

' Save HTML and convert to a string
ips.Save(strm, False)
s = StreamToString(strm)
</code>

The code above should allow you to be able get the full HTML. The only issue
with this is the allocation of the stream. IPersistStreamInit.GetSizeMax()
should return a value indicating the size of the stream required, but it
always returns zero. The best way, therefore is to read the stream a bit at
a time until the buffer is empty, but for simplicity I have just allocated a
stream that should be big enough to take it all in one go. You can make it
bigger of course if you need to.

HTH

Charles
"Craig Francis" <1@1.com> wrote in message
news:09****************************@phx.gbl...
Got it, you put all the

<interface></interface>

before the "Public Class Form1" bit - so the first part
of the form, then the

<code></code>

in the function which returns the HTML code. Well that
method doesn't bring up any errors apart from what "strm"
should be dimed as - I've never used a stream before.

But thanks again - this is the most progress I've made in
the past 2 days!

Nov 20 '05 #5
You are a f________ god!

Thankyou, it works perfectly!!!
Nov 20 '05 #6
Hello,

"Craig Francis" <1@1.com> schrieb:
Ok I'm making a fairly simple application. It contains 2
web browsers, the top one is used so that you can view a
website (i.e. one you have created). Every time you load
a page, the HTML which was received is then sent to the
http://validator.w3.org website to validate your HTML /
XHTML.

So far I've got everything to work, even the part where
the HTML is posted to the w3.org website.

But all of the following commands (Browser1 is the main
WebBrowser control) produce a form of HTML for the
document, but all the tags get converted to uppercase and
parts of the document go missing such as the "DOCTYPE"...


I don't really understand why you use the WebBrowser control to download the
web page. Why not use, for example, the 'WebRequest' class?

--
Herfried K. Wagner
MVP · VB Classic, VB.NET
http://www.mvps.org/dotnet
Nov 20 '05 #7
>I don't really understand why you use the WebBrowser
control to download the
web page. Why not use, for example, the 'WebRequest'

class?

Because im fairly new to VB.NET and wanted a simple
application to create - well what I thought might be
simple.

Also I've used the WebBrowser control before and it was a
simple way to add a browser to the application where the
user could navigate in exactly the same way as in IE.
Nov 20 '05 #8
Cor
Charles,
Just a question, I have seen you uses always the mshtml.IHtmldocument2
I use the mshtml.Htmldocument.
I have the idea, that with that I can access all <tags> including the src,
innertext and innerhtml etc per framepage.

What do I mis?
Cor
Nov 20 '05 #9
Hi Cor

Long time no speak.

The simple answer is speed. Try the following on an initialised WebBrowser
control and you may be surprised:

<code>
Dim doc As mshtml.HTMLDocument
Dim doc2 As mshtml.IHTMLDocument2
Dim elem As mshtml.IHTMLElement

Dim dt As Date

MsgBox("Start")

dt = Now

For i As Integer = 1 To 1000
doc = DirectCast(AxWebBrowser1.Document, mshtml.HTMLDocument)
elem = doc.createElement("INPUT")
Next i

MsgBox(Now.Subtract(dt).ToString)

dt = Now

For i As Integer = 1 To 1000
doc2 = DirectCast(AxWebBrowser1.Document, mshtml.IHTMLDocument2)
elem = doc2.createElement("INPUT")
Next i

MsgBox(Now.Subtract(dt).ToString)
</code>

I used mshtml.HTMLDocument once in the earlier post because New doesn't work
on interfaces of course. But otherwise I use the interfaces. It means a bit
more code to cast to the correct one all the time [long live Option Strict
On], but it's worth it in performance.

Regards

Charles
"Cor" <no*@non.com> wrote in message
news:3f***********************@reader22.wxs.nl...
Charles,
Just a question, I have seen you uses always the mshtml.IHtmldocument2
I use the mshtml.Htmldocument.
I have the idea, that with that I can access all <tags> including the src,
innertext and innerhtml etc per framepage.

What do I mis?
Cor

Nov 20 '05 #10
Cor
Charles,
I dont have to test it, In this case I can simple believe you (what I do of
cours always) but in this case there (was) a speed problem.

I did want to insert your piece of program and there was this sentence.
Dim tagname As String = iDocument.all.item(i).tagName ' voor snelheid

voor snelheid=for speed

I normaly try to avoid putting comments in a program because I find that
than the programming is not well done,
but this was such a stupid contruction.

So I go change that big routine and try to use the IHtmldocument2 there.

I did not test it, because I use the Mshmtl in another class than the
webbrowser, I made that once to overcome that slow behaviour from the IDE,
before I did discover that it was to overcome by just not putting the import
in the program.

Again thanks a lot

Cor
Nov 20 '05 #11
Hi Craig,

Using WebRequest will allow you to get the HTML in the raw.

Have a play with the routine below. It will show you how easy it is to get
a web page.

Regards,
Fergus

<code>
Public Sub GetThisWebPage (sUrl As String)
'What we want.
Dim oRequest As WebRequest = WebRequest.Create (sUrl)

'Go get it
Dim oResponse As HttpWebResponse = oRequest.GetResponse()

'Let's see some info about the response.
Dim S = "To: " & sUrl & vbCrLf
S = S & "From: " & oResponse.ResponseUri.ToString & vbCrLf & vbCrLf

S = S & "Headers:" & vbCrLf
Dim I As Integer
For I = 0 To oResponse.Headers.Count - 1
S = S & " <" & oResponse.Headers.Keys(I) & "> "
S = S & oResponse.Headers.Item (I) & vbCrLf
Next
S = S & vbCrLf
S = S & "Type: " & oResponse.ContentType & vbCrLf
S = S & "Len: " & oResponse.ContentLength & vbCrLf & vbCrLf
MsgBox (S)

'Now the data itself.
Dim oHtmlStream As New StreamReader (oResponse.GetResponseStream)
Dim sHtml As String = oHtmlStream.ReadToEnd
MsgBox (sHtml)

'Finish
oResponse.Close()
End Sub
</code
Nov 20 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: ASP .NET Newbie | last post by:
How can I run a WebBrowser control using ASP.NET/VB.NET? I know I can use the WebClient to get the page data, but I need to be able to use the WebBrowser (AxWebBrowser)? Thanks, Chad
4
by: Toma Marinov | last post by:
Hello ! I made some test with webbrowser control in VS.2005. When I load a word document in webbrowser through .Navigate method (from my hdd), I want to get the stream of the loaded doc file with...
1
by: eskildb | last post by:
First, please be gently. I am fairly new to the programming world (1.5 years with some expermentation prior to). I have been working on a project that has to print HTML pages with graphics in a...
1
by: eskildb | last post by:
First, please be gently. I am fairly new to the programming world (1.5 years with some expermentation prior to). I have been working on a project that has to print HTML pages with graphics in a...
12
by: Alex Clark | last post by:
Greetings, (.NET 2.0, WinXP Pro/Server 2003, IE6 with latest service packs). I've decided to take advantage of the layout characteristics of HTML documents to simplify my printing tasks, but...
4
by: Steve Richter | last post by:
I would like to build an HTML stream as a string and have the WebBrowser control render that HTML. Then on PostBack, or whatever it is called, I would like my code to be the one that receives what...
6
by: titan.nyquist | last post by:
The WebBrowser control won't load a css file written in the same directory as the program. If I put an absolute path to it, it will load it. Thus, the current directory of the WebBrowser control...
2
by: scottbvfx | last post by:
Hi, I'm trying to launch a web browser along with an html file with a fragment identifier in its path. I'm using the webbrowser module for this. ie....
0
by: =?Utf-8?B?Q29kZVJhem9y?= | last post by:
I am converting a windows application which contains a web browser control into an ASP.net application. The Windows project references all manner of html controls in the WebBrowser control and...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.