473,323 Members | 1,589 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,323 software developers and data experts.

convert to utf

Hi,

I have text from mime email messages with different encoding that I want to
convert to utf-8, but I'm relatively new on encoding problems.

I use the following code, but this doesn't seem to work (the in and output
remains the same):

string
x=toUTF8("=?GB2312?B?s8q5q8u+vq3A7aGissbO8bK/w8W1xNK7t+LQxQ==?=","GB2312");
public static string toUTF8(string messageString, string charset)
{
Encoding dstEnc = Encoding.UTF8;
if(charset.Length==0)
{
charset="us-ascii";
}
Encoding srcEnc=Encoding.GetEncoding(charset);
byte[] srcData = srcEnc.GetBytes( messageString );
byte[] dstData;
// see if we need to convert data
if(dstEnc != srcEnc )
{
dstData = Encoding.Convert( srcEnc, dstEnc, srcData );
}
else
{
dstData = srcData;
}
char[] utf8Chars = new char[Encoding.UTF8.GetCharCount(dstData, 0,
dstData.Length)];
Encoding.UTF8.GetChars(dstData, 0, dstData.Length, utf8Chars, 0);
string utf8String = new string(utf8Chars);
return utf8String;
}

Any help would be appreciated,

Albert Jan
Nov 16 '05 #1
4 7910
Hi Albert,

I'm not too familiar with GB2312 (Simplified Chinese) but I believe all
characters in your string have the same value in both GB2312 and UTF8 so
there would be no change when you recode it (they look the same in both
encodings). For reference your string looks to me like

"=?GB2312?B?s8q5q8u+vq3A7aGissbO8bK/w8W1xNK7t+LQxQ==?=" which isn't
chinese in any way.

Although your encoding conversion code is overly elaborate I see nothing
wrong with it. You could simplify if greatly with something like this.:

public static string toUTF8(string messageString, string charset)
{
Encoding dstEnc = Encoding.UTF8;
MessageBox.Show(Encoding.Default.ToString());

if(charset.Length==0)
{
charset="us-ascii";
}

Encoding srcEnc=Encoding.GetEncoding(charset);
byte[] srcData = srcEnc.GetBytes( messageString );

string utf8String = dstEnc.GetString(srcData);
return utf8String;
}
--
Happy Coding!
Morten Wennevik [C# MVP]
Nov 16 '05 #2
Hi Morten,

Thanks for your answer

the string I use ("=?GB2312?B?s8q5q8u+vq3A7aGissbO8bK/w8W1xNK7t+LQxQ==?=" )
comes from a chinese mail-message and was probably transferred to
'quoted-printable' by the sender.

So my problem is how to redisplay the original chinese characters. I suppose
I have to cut out the "GB2312?", but what next?
Regards,

Albert Jan
"Morten Wennevik" <Mo************@hotmail.com> wrote in message
news:opsmjnecbyklbvpo@pbn_computer...
Hi Albert,

I'm not too familiar with GB2312 (Simplified Chinese) but I believe all
characters in your string have the same value in both GB2312 and UTF8 so
there would be no change when you recode it (they look the same in both
encodings). For reference your string looks to me like

"=?GB2312?B?s8q5q8u+vq3A7aGissbO8bK/w8W1xNK7t+LQxQ==?=" which isn't
chinese in any way.

Although your encoding conversion code is overly elaborate I see nothing
wrong with it. You could simplify if greatly with something like this.:

public static string toUTF8(string messageString, string charset)
{
Encoding dstEnc = Encoding.UTF8;
MessageBox.Show(Encoding.Default.ToString());

if(charset.Length==0)
{
charset="us-ascii";
}

Encoding srcEnc=Encoding.GetEncoding(charset);
byte[] srcData = srcEnc.GetBytes( messageString );

string utf8String = dstEnc.GetString(srcData);
return utf8String;
}
--
Happy Coding!
Morten Wennevik [C# MVP]

Nov 16 '05 #3
Hi,

The string looks like base64 encoded data to me.

I suggest decoding the s8q5q8u+vq3A7aGissbO8bK/w8W1xNK7t+LQxQ== part and
try to convert the resulting bytes to unicode using the GB2312 encoding...

string DoConversion(string base64)
{
// decode the base64 string
byte[] bytes = Convert.FromBase64String(base64);

// get the GB2312 encoding
Encoding encoding = Encoding.GetEncoding(20936);

// decode the GB2312 encoded byte stream
string result = encoding.GetString(bytes);

return result;
}

The result certainly looks like chinese to me... ;)

HTH,
Stefan
Albert Jan wrote:
Hi Morten,

Thanks for your answer

the string I use ("=?GB2312?B?s8q5q8u+vq3A7aGissbO8bK/w8W1xNK7t+LQxQ==?=" )
comes from a chinese mail-message and was probably transferred to
'quoted-printable' by the sender.

So my problem is how to redisplay the original chinese characters. I suppose
I have to cut out the "GB2312?", but what next?
Regards,

Albert Jan
"Morten Wennevik" <Mo************@hotmail.com> wrote in message
news:opsmjnecbyklbvpo@pbn_computer...
Hi Albert,

I'm not too familiar with GB2312 (Simplified Chinese) but I believe all
characters in your string have the same value in both GB2312 and UTF8 so
there would be no change when you recode it (they look the same in both
encodings). For reference your string looks to me like

"=?GB2312?B?s8q5q8u+vq3A7aGissbO8bK/w8W1xNK7t+LQxQ==?=" which isn't
chinese in any way.

Although your encoding conversion code is overly elaborate I see nothing
wrong with it. You could simplify if greatly with something like this.:

public static string toUTF8(string messageString, string charset)
{
Encoding dstEnc = Encoding.UTF8;
MessageBox.Show(Encoding.Default.ToString());

if(charset.Length==0)
{
charset="us-ascii";
}

Encoding srcEnc=Encoding.GetEncoding(charset);
byte[] srcData = srcEnc.GetBytes( messageString );

string utf8String = dstEnc.GetString(srcData);
return utf8String;
}
--
Happy Coding!
Morten Wennevik [C# MVP]


Nov 16 '05 #4
Great, this works!

I have probably overlooked a part of the Mime specification: nowhere in the
mailmessage that I used as an example for imput is the use of base64
declared

Thanks,

Albert Jan

"Stefan Simek" <no****@nospam.nospam> wrote in message
news:O5**************@TK2MSFTNGP12.phx.gbl...
Hi,

The string looks like base64 encoded data to me.

I suggest decoding the s8q5q8u+vq3A7aGissbO8bK/w8W1xNK7t+LQxQ== part and
try to convert the resulting bytes to unicode using the GB2312 encoding...

string DoConversion(string base64)
{
// decode the base64 string
byte[] bytes = Convert.FromBase64String(base64);

// get the GB2312 encoding
Encoding encoding = Encoding.GetEncoding(20936);

// decode the GB2312 encoded byte stream
string result = encoding.GetString(bytes);

return result;
}

The result certainly looks like chinese to me... ;)

HTH,
Stefan
Albert Jan wrote:
Hi Morten,

Thanks for your answer

the string I use ("=?GB2312?B?s8q5q8u+vq3A7aGissbO8bK/w8W1xNK7t+LQxQ==?=" ) comes from a chinese mail-message and was probably transferred to
'quoted-printable' by the sender.

So my problem is how to redisplay the original chinese characters. I suppose I have to cut out the "GB2312?", but what next?
Regards,

Albert Jan
"Morten Wennevik" <Mo************@hotmail.com> wrote in message
news:opsmjnecbyklbvpo@pbn_computer...
Hi Albert,

I'm not too familiar with GB2312 (Simplified Chinese) but I believe all
characters in your string have the same value in both GB2312 and UTF8 so
there would be no change when you recode it (they look the same in both
encodings). For reference your string looks to me like

"=?GB2312?B?s8q5q8u+vq3A7aGissbO8bK/w8W1xNK7t+LQxQ==?=" which isn't
chinese in any way.

Although your encoding conversion code is overly elaborate I see nothing
wrong with it. You could simplify if greatly with something like this.:

public static string toUTF8(string messageString, string charset)
{
Encoding dstEnc = Encoding.UTF8;
MessageBox.Show(Encoding.Default.ToString());

if(charset.Length==0)
{
charset="us-ascii";
}

Encoding srcEnc=Encoding.GetEncoding(charset);
byte[] srcData = srcEnc.GetBytes( messageString );

string utf8String = dstEnc.GetString(srcData);
return utf8String;
}
--
Happy Coding!
Morten Wennevik [C# MVP]


Nov 16 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: Lauren Quantrell | last post by:
I have a stored procedure using Convert where the exact same Convert string works in the SELECT portion of the procedure but fails in the WHERE portion. The entire SP is listed below....
1
by: Logan X via .NET 247 | last post by:
It's official....Convert blows. I ran a number of tests converting a double to an integer usingboth Convert & CType. I *ASSUMED* that CType would piggy-back ontop of Convert, and that performance...
4
by: Eric Lilja | last post by:
Hello, I've made a templated class Option (a child of the abstract base class OptionBase) that stores an option name (in the form someoption=) and the value belonging to that option. The value is...
7
by: whatluo | last post by:
Hi, all I'm now working on a program which will convert dec number to hex and oct and bin respectively, I've checked the clc but with no luck, so can anybody give me a hit how to make this done...
3
by: Convert TextBox.Text to Int32 Problem | last post by:
Need a little help here. I saw some related posts, so here goes... I have some textboxes which are designed for the user to enter a integer value. In "old school C" we just used the atoi function...
7
by: patang | last post by:
I want to convert amount to words. Is there any funciton available? Example: $230.30 Two Hundred Thirty Dollars and 30/100
4
by: Edwin Knoppert | last post by:
In my code i use the text from a textbox and convert it to a double value. I was using Convert.ToDouble() but i'm used to convert comma to dot. This way i can assure the text is correct. However...
1
by: johnlim20088 | last post by:
Hi, Currently I have 6 web projects located in Visual Source Safe 6.0, as usual, everytime I will open solution file located in my local computer, connected to source safe, then check out/check in...
6
by: Ken Fine | last post by:
This is a basic question. What is the difference between casting and using the Convert.ToXXX methods, from the standpoint of the compiler, in terms of performance, and in other ways? e.g. ...
0
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.