memory overrun in ToUTF8CString

Hola~

Not sure if this is the right place, but I am reading a STEP file and an XCAF document. I then use the XCAFPrs_DocumentExplorer and convert the node names from OCCT structures to std::string:

for(
XCAFPrs_DocumentExplorer i(anXdeDoc, XCAFPrs_DocumentExplorerFlags_None);
i.More();
i.Next()) {

const XCAFPrs_DocumentNode& aNode = i.Current();

Handle(TDataStd_Name) aNodeName;
if(
!aNode.Label.FindAttribute(TDataStd_Name::GetID(), aNodeName) ||
!aNode.RefLabel.FindAttribute(TDataStd_Name::GetID(), aNodeName)) {

INFO("Unable to get the node's name: " << aNode.Id);
}
REQUIRE(aNodeName->Get().Length() > 0);

unsigned int aBufferLength = aNodeName->Get().LengthOfCString();
char* aPtr = new char[aBufferLength];
REQUIRE(aNodeName->Get().ToUTF8CString(aPtr) == aBufferLength);

string aNameAsString(aPtr);
delete[] aPtr;
.....
}

If you run this through valgrind, it flags the ::ToUTF8CString for writing a character past the end of aPtr here:

921 Standard_Integer TCollection_ExtendedString::ToUTF8CString(Standard_PCharacter& theCString) const
922 {
923 NCollection_Utf16Iter anIterRead(mystring);
924 Standard_Utf8Char* anIterWrite = theCString;
925 if (*anIterRead == 0)
926 {
927 *anIterWrite = '\0';
928 return 0;
929 }
930
931 for (; *anIterRead != 0; ++anIterRead)
932 {
933 anIterWrite = anIterRead.GetUtf(anIterWrite);
934 }
935 *anIterWrite = '\0';
936 return Standard_Integer(anIterWrite - theCString);
937 }

The *anIterWrite = '\0' on line 935 is one past the end of the string length that is returned from LengthOfCString(). If I allocate a string with space for the null terminator:

char* aPtr = new char[aBufferLength + 1];

valgrind says the memory is correctly handled.

Not sure what the right way to handle this is. I'd argue LengthOfCString() should be at least 1 if in an empty string (line 927) the pointer is written to. My recommendation is not write to the buffer on 927 and then make sure LengthOfCString() accounts for the null on non-empty strings.

MO

Dmitrii Pasukhin's picture

Hello. The recommended place to share your suggestions or bug repors: https://github.com/Open-Cascade-SAS/OCCT/issues/new/choose

Thank you for detailed explanation, I will investigate that issue!

Best regards, Dmitrii

Michael O&#039;Brien's picture

Ack. Sorry I put it here. I can make a new issue as well so you have record of it. Thanks for taking a look.

Michael O&#039;Brien's picture
gkv311 n's picture

TCollection_ExtendedString::LengthOfCString() returns length excluding \0 terminator, replicating TCollection_ExtendedString::Length() behavior. So the only thing that can be improved here is the documentation of these methods.

Anyway, TCollection_ExtendedString::ToUTF8CString() and TCollection_ExtendedString::LengthOfCString() are just artifacts of legacy UTF-8 conversion API which I would rather not recommend to use as it is C-alike, not C++. The most straight-forward way to convert TCollection_ExtendedString to UTF-8 is to pass it to TCollection_AsciiString constructor (and take TCollection_AsciiString::ToCString()):

  //! Creation by converting an extended string to an ascii string.
  //! If replaceNonAscii is non-null character, it will be used
  //! in place of any non-ascii character found in the source string.
  //! Otherwise, creates UTF-8 unicode string.
  TCollection_AsciiString(const TCollection_ExtendedString& astring, const Standard_Character replaceNonAscii = 0);
Michael O&#039;Brien's picture

Roger that. I can update my code to use this instead. Thanks for the help.