|
|
Our problem
Any questions and comments are welcome to me.
In the Tomcat developer mailing list, we have discussed about the way to decode the FORM data back to the original string, in another word, how to provide the appropriate parameter strings to the servlet writer.
Many read worthing messages are posted by the developers of tomcat-dev. I appreciate their enthusiastic support for Tomcat.
As for the FORM parameter string sent from the WWW client, the current version of Servlet Specification provides the servlet writer with the following methods:
byte[] some=... //Creating bad string with the default Java encoding String bad=new String(); //Get the original byte array based on the default //Java encoding byte[] other=bad.getBytes(); //And then get the original string with the appropriate Java //encoding String enc="us-ascii"; String good=new String(other,enc);If the solution above works in all the case, the servlet writer can retrieve the original parameter strings and this may be enough. (The server implementation can leave such a task to the servlet writer.)
As I described above, to get the original parameter string, the server implementation must convert the byte array to the string based on the appropriate Java character encoding.
Given the original charset of the client side, the server implementation can determine the corresponding Java character encoding easily.
Then how the server implementation can tell the original charset on the client side?
The original charset of the WWW client should be set as the 'charset' attribute of the 'Content-type' header. But the WWW browsers at this time does not supply the 'charset' attribute and this results in the difficulty of decoding.
Thus, we, as the developer of the server implementation, encountered the difficulty to tell the charset in which the original FORM string is encoded.
As long as we can't rely on the WWW client, we have to find another way to determine the charset of the client side. We can list up 3 possible options:
In case that the charset determined by any option above is the right one, the server implementation can provide the original string to the servlet writer. But we can't ensure that it is always true. As you can guess, any of A, B or C can not always supply the right charset.
As a result, we can say that it is too heavy responsibility to tell the right charset of the client browser for the server implementation.
Java and all Java-based trademarks and logos are trademarks or registered of Sun Microsystems, Inc. in the United States and other countries.
|
|
ALL CONTENTS COPYRIGHT 2000
, Jun Inamori. All rights reserved.
Any questions and comments are welcome to
Jun Inamori
.