Hello,
I am using Grid database and I found multiple broken letters.
The data field included " ◻ {\displaystyle \sqâ" that supposed to be "-", therefore, I used a formula to replace it. Although the letter "â" was replaced to "-", the two squares were not removed in the data. It seems like the actual data does not include two squares though...
Does anyone know how to handle this issue?
Sincerely,
Kazumi
Solved! Go to Solution.
Hi @knozawa
One thing that works for me in this cases is to handle all String fields as WString (Use a Select Tool to change them).
Can you try this and let us know?
Thanks
Thank you for your suggestion. I converted the field from V_string to WString, but the result was the same. The two squares were still there.
I also tried to use the ConvertFromCodepage() and DecomposeUnicodeForMatch(), but both didn't work well. I believe the code is ISO639, but there was no choice for the ConvertFromCodePage() formula. After using the DecomposeUnicodeForMatch(), some of the letter was converted to "?" mark.
Sincerely,
Kazumi
Can you share a sample of the data you're using, so I can see the exact case?
grid.413735.7 (name: Harvardâ“MIT Division of Health Sciences and Technology)
grid.10067.30 (label: Ðациональный унивеÑÑитет «ЛьвовÑÐºÐ°Ñ Ð¿Ð¾Ð»Ð¸Ñ‚ÐµÑ…Ð½Ð¸ÐºÐ°Â»)
grid.10211.33 (label: Lüneburg‚)
Above are some examples that I would like to fix.
Sincerely,
Kazumi
Hi,
Reviewing the data you sent, I got this results:
The coding of the data seems to be UTF-8
In which format (.csv, Excel) is the original data ?
The original data is csv (type: Microsoft Office Excel Comma Separated Values File). I downloaded data from Grid database and combined all 11 data together. Attached the workflow.
Thank you very much for your help.
Sincerely,
Kazumi
Using your workflow, I changed the Encoding of the label.csv file to UTF-8. All accents, foreign language characters and symbols are there (See below)... I think if you this to all your inputs, all characters should be right...