View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0022125 | Open CASCADE | OCCT:Foundation Classes | public | 2010-11-30 14:33 | 2017-10-01 19:33 |
| Reporter | Assigned To | bugmaster | |||
| Priority | normal | Severity | trivial | ||
| Status | closed | Resolution | fixed | ||
| OS | All | ||||
| Target Version | 6.8.0 | Fixed in Version | 6.8.0 | ||
| Summary | 0022125: TCollection_ExtendedString: conversion from UTF-8 to unicode | ||||
| Description | There is a problem in the following constructor of TCollection_ExtendedString class: TCollection_ExtendedString(const Standard_CString astring, const Standard_Boolean isMultiByte); This constructor is used to restore a unicode string from its UTF-8 representation in case when isMultiByte = Standard_True. Internally it invokes ConvertToUnicode3B & ConvertToUnicode2B functions which are intended to construct a single Standard_ExtCharacter instead of 3 or 2 passed chars correspondingly. ConvertToUnicodeXB functions use the following data structure: union { struct { unsigned char h; unsigned char l; } hl; Standard_ExtCharacter chr; } EL; E.g: let's take a symbol 12510 (Japanese hieroglyph character) as an example. It has the following UTF-8 representation (3 bytes): 1110_0011 10_000011 10_011110 which must be restored to 0011000011011110 16-bit value. However, ConvertToUnicode3B will return the following instead: 11011110 00110000 (EL.hl.l and EL.hl.r appear in a wrong order). Issue was reproduced on Win32. This issue was faced during implementation of unified IGES-reading routine accepting utf-8 string as a filename. You can find attached a draft workaround for such a routine (win32-compliant only). This workaround uses MultiByteToWideChar win-function instead. | ||||
| Tags | No tags attached. | ||||
| Test case number | bugs fclasses bug22125 | ||||
|
|
2010-11-30 12:33
|
|
|
|
Please provide a test file |
|
2014-10-14 16:23 developer |
|
|
|
Please, find attached an IGES file with Japanese name. |
|
2014-10-14 19:15 reporter |
test_iges_jp.tcl (100 bytes) |
|
|
Test script added. This problem will be resolved after 0025367 integration |
|
|
> This problem will be resolved after 0025367 integration the problem in description is irrelevant to 0025367 patch. |
|
|
Dear bugmaster, please switch the bug to "verified". The issue has been solved within patch for 0022484:
inline Standard_ExtCharacter ConvertToUnicode3B (unsigned char *p)
{
// *p, *(p+1), *(p+2) =>0 , 1, 2
+ // little endian
union {
struct {
- unsigned char h;
unsigned char l;
+ unsigned char h;
} hl;
Available UTF-8/UTF-16 conversion APIs convert the filename "Part1_badname_マヹヱ.igs" from test case in the same way: Utf16 SOURCE: 57 00 3A 00 5C 00 50 00|61 00 72 00 74 00 31 00 5F 00 62 00 61 00 64 00|6E 00 61 00 6D 00 65 00 5F 00 DE 30 F9 30 F1 30|2E 00 69 00 67 00 73 00 Utf16 TCol from Utf8: 57 00 3A 00 5C 00 50 00|61 00 72 00 74 00 31 00 5F 00 62 00 61 00 64 00|6E 00 61 00 6D 00 65 00 5F 00 DE 30 F9 30 F1 30|2E 00 69 00 67 00 73 00 Utf16 NCol from Utf8: 57 00 3A 00 5C 00 50 00|61 00 72 00 74 00 31 00 5F 00 62 00 61 00 64 00|6E 00 61 00 6D 00 65 00 5F 00 DE 30 F9 30 F1 30|2E 00 69 00 67 00 73 00 Utf8 WApi from Utf16: 57 3A 5C 50 61 72 74 31|5F 62 61 64 6E 61 6D 65 5F E3 83 9E E3 83 B9 E3|83 B1 2E 69 67 73 Utf8 NCol from Utf16: 57 3A 5C 50 61 72 74 31|5F 62 61 64 6E 61 6D 65 5F E3 83 9E E3 83 B9 E3|83 B1 2E 69 67 73 Utf8 TCol from Utf16: 57 3A 5C 50 61 72 74 31|5F 62 61 64 6E 61 6D 65 5F E3 83 9E E3 83 B9 E3|83 B1 2E 69 67 73
static TCollection_AsciiString formatHex (const Standard_Byte* theData,
const Standard_Size theSize)
{
TCollection_AsciiString anOut;
char aByte[4];
for (size_t aByteId = 0; aByteId < theSize; ++aByteId)
{
unsigned char aChar = theData[aByteId];
char anEsc = ' ';
if ( (aByteId + 1) % 16 == 0 && aByteId != 0)
{
anEsc = '\n';
}
else if ((aByteId + 1) % 8 == 0)
{
anEsc = '|';
}
_snprintf (aByte, 4, "%02X%c", (unsigned int )aChar, anEsc);
anOut += aByte;
}
return anOut;
}
static Standard_Integer testunicode (Draw_Interpretor& /*theDI*/, Standard_Integer , const char** )
{
wchar_t aFilePath [MAX_PATH]; aFilePath [0] = L'\0';
wchar_t aFileTitle[MAX_PATH]; aFileTitle[0] = L'\0';
OPENFILENAMEW anOpenStruct; memset (&anOpenStruct, 0, sizeof(OPENFILENAMEW));
anOpenStruct.lStructSize = sizeof(OPENFILENAMEW);
anOpenStruct.nFilterIndex = 1;
anOpenStruct.lpstrFile = aFilePath;
anOpenStruct.nMaxFile = sizeof(aFilePath);
anOpenStruct.lpstrFileTitle = aFileTitle;
anOpenStruct.nMaxFileTitle = sizeof(aFileTitle);
anOpenStruct.lpstrTitle = L"No Title";
anOpenStruct.Flags = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST;
if (!GetOpenFileNameW (&anOpenStruct)
|| *anOpenStruct.lpstrFile == L'\0')
{
return 0;
}
char aBuffU8[4096];
WideCharToMultiByte (CP_UTF8, 0, anOpenStruct.lpstrFile, -1, aBuffU8, 4096, NULL, NULL);
NCollection_String anUtf8NCol (anOpenStruct.lpstrFile, -1);
char aBuffU8UsingExt[4096];
char* aPtr = aBuffU8UsingExt;
TCollection_ExtendedString anExtWide ((Standard_ExtString )anOpenStruct.lpstrFile);
anExtWide.ToUTF8CString (aPtr);
TCollection_AsciiString aHexUtf16Src = formatHex ((const Standard_Byte* )anOpenStruct.lpstrFile, wcslen (anOpenStruct.lpstrFile) * 2);
TCollection_AsciiString aHexUtf8WApi = formatHex ((const Standard_Byte* )aBuffU8, strlen(aBuffU8));
TCollection_AsciiString aHexUtf8NCol = formatHex ((const Standard_Byte* )anUtf8NCol.ToCString(), anUtf8NCol.Size());
TCollection_ExtendedString anExtWideFromUtf8 (aBuffU8, Standard_True);
TCollection_AsciiString aHexUtf16ExtFromU8 = formatHex ((const Standard_Byte* )anExtWideFromUtf8.ToExtString(), anExtWideFromUtf8.Length() * 2);
TCollection_AsciiString aHexUtf8TColEx = formatHex ((const Standard_Byte* )aBuffU8UsingExt, strlen(aBuffU8UsingExt));
NCollection_UtfWideString anUtf16NColFromUtf8 (aBuffU8, -1);
TCollection_AsciiString aHexUtf16NColFromU8 = formatHex ((const Standard_Byte* )anUtf16NColFromUtf8.ToCString(), anUtf16NColFromUtf8.Size());
std::cerr << "Utf16 SOURCE:\n" << aHexUtf16Src << "\n"
<< "Utf16 TCol from Utf8:\n" << aHexUtf16ExtFromU8 << "\n"
<< "Utf16 NCol from Utf8:\n" << aHexUtf16NColFromU8 << "\n"
<< "Utf8 WApi from Utf16:\n" << aHexUtf8WApi << "\n"
<< "Utf8 NCol from Utf16:\n" << aHexUtf8NCol << "\n"
<< "Utf8 TCol from Utf16:\n" << aHexUtf8TColEx << "\n";
return 0;
}
|
|
|
Mikhail, Please create testing case |
|
|
Branch CR22125 has been created by apn. SHA-1: 28d7ddb64363611911034b716439922bc0b362cf Detailed log of new commits: Author: apn Date: Fri Oct 31 16:46:53 2014 +0300 0022125: TCollection_ExtendedString: conversion from UTF-8 to unicode Added test case bugs/fclasses/bug22125 |
|
|
Problem is not reproduced on current state of master on Windows and Debian60-64 in Release and Debug modes. Branch CR22125 was created. It contains test case: bugs fclasses bug22125 - OK |
|
|
Branch CR22125 has been deleted by kgv. SHA-1: 28d7ddb64363611911034b716439922bc0b362cf |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2010-11-30 14:39 |
|
CC | => pdn, nkv |
| 2011-08-02 11:23 | bugmaster | Category | OCCT:FDC => OCCT:Foundation Classes |
| 2011-12-05 10:45 |
|
Relationship added | child of 0014673 |
| 2011-12-20 15:02 |
|
Fixed in Version | EMPTY => |
| 2011-12-20 15:02 |
|
Target Version | => 6.5.3 |
| 2011-12-20 15:02 |
|
Description Updated | |
| 2012-02-02 10:15 |
|
Target Version | 6.5.3 => 6.5.4 |
| 2012-10-21 11:16 |
|
Target Version | 6.5.4 => 6.6.0 |
| 2013-02-28 17:06 |
|
Target Version | 6.6.0 => 6.7.0 |
| 2013-11-06 15:10 | kgv | Relationship added | related to 0022484 |
| 2013-11-06 15:11 | kgv | Target Version | 6.7.0 => 6.7.1 |
| 2014-04-04 18:32 |
|
Target Version | 6.7.1 => 6.8.0 |
| 2014-09-11 10:24 |
|
Target Version | 6.8.0 => 7.1.0 |
| 2014-10-03 14:07 |
|
Note Added: 0032629 | |
| 2014-10-03 14:07 |
|
Assigned To | bugmaster => ssv |
| 2014-10-03 14:07 |
|
Status | new => feedback |
| 2014-10-14 16:23 |
|
File Added: Part1_badname.zip | |
| 2014-10-14 16:24 |
|
Note Added: 0033071 | |
| 2014-10-14 16:24 |
|
Assigned To | ssv => pdn |
| 2014-10-14 16:29 |
|
Status | feedback => assigned |
| 2014-10-14 19:15 |
|
File Added: test_iges_jp.tcl | |
| 2014-10-14 19:16 |
|
Note Added: 0033080 | |
| 2014-10-14 19:17 |
|
Assigned To | pdn => kgv |
| 2014-10-14 19:17 |
|
Status | assigned => resolved |
| 2014-10-14 20:05 | kgv | Assigned To | kgv => pdn |
| 2014-10-14 20:05 | kgv | Status | resolved => assigned |
| 2014-10-14 20:06 | kgv | Note Added: 0033086 | |
| 2014-10-16 10:36 | kgv | Note Added: 0033182 | |
| 2014-10-16 10:36 | kgv | Assigned To | pdn => bugmaster |
| 2014-10-16 10:36 | kgv | Status | assigned => feedback |
| 2014-10-16 10:36 | kgv | Resolution | open => fixed |
| 2014-10-16 10:36 | kgv | Target Version | 7.1.0 => 6.8.0 |
| 2014-10-17 14:12 | bugmaster | Assigned To | bugmaster => mkv |
| 2014-10-17 14:12 | bugmaster | Note Added: 0033257 | |
| 2014-10-20 12:03 | bugmaster | Assigned To | mkv => apn |
| 2014-10-31 16:47 | git | Note Added: 0033966 | |
| 2014-10-31 16:47 | apn | Note Added: 0033967 | |
| 2014-10-31 16:47 | apn | Test case number | => bugs fclasses bug22125 |
| 2014-10-31 16:47 | apn | Assigned To | apn => bugmaster |
| 2014-10-31 16:47 | apn | Status | feedback => tested |
| 2014-11-06 15:18 | bugmaster | Changeset attached | => occt master 5e5ce65b |
| 2014-11-06 15:18 | bugmaster | Status | tested => verified |
| 2014-11-11 12:42 |
|
Fixed in Version | => 6.8.0 |
| 2014-11-11 13:03 |
|
Status | verified => closed |
| 2014-11-12 08:55 | git | Note Added: 0034243 | |
| 2017-10-01 19:33 |
|
Relationship added | related to 0029081 |