如何读取UTF-8编码？乐筑天下 - Powered by Discuz! Archiver

mit 发表于 2022-7-5 15:12:24

如何读取UTF-8编码？

大家好

我试图编写代码来读取数据。txt文件并填充到Autocad Map中的对象数据表中，但它被显示了？？？？？？？？

https://www.cadtutor.net/forum/attachment.php?attachmentid=63717&d=1523370322&thumb=1&stc=1

云你能帮帮我吗？

测验txt文件
尚塔博里。图纸
填充OD_表。lsp

hanhphuc 发表于 2022-7-5 15:25:30

不确定，但请尝试一下，如果不是一个好的解决方案，很抱歉

Can't display "ດິນບຸກຄົນ" ?
;;;(setq remark (vk_ReadTextStream "C:/test.txt" "UTF-8"))

;Try alternative way manually copy text from text file then paste
(setq remark (getstring "\nPaste our text here -> "))
;"\U+0E94\U+0EB4\U+0E99\U+0E9A\U+0EB8\U+0E81\U+0E84\U+0EBB\U+0E99"

;or dialog
(setq remark (lisped "paste here ") )

mit 发表于 2022-7-5 15:29:02

谢谢hanhphuc我会试试的

hanhphuc 发表于 2022-7-5 15:33:02

一些亚洲字体可以显示正常的打开功能，但它只支持ANSI
如果初始对为FE FF（十六进制）或254 255

保存测试。txt作为Unicode

(setq f (open path "r"))
(setq ret (read-line f)) ;<--test only 1st line
(if f (close f))

测验

(defun foo ( str ) ; read unicode - test version
hanhphuc 17.04.2018
(apply 'strcat
(mapcar
''( ( x ) (apply 'strcat (vl-list* (chr 92) "U+" (mapcar ''( (x / $)(setq $ ( LM:dec->base x 16))
   (if (or (< x 10) (=(strlen $)1)) (strcat "0" $) $) )
   (reverse x)
)
   )
      )
      )

(
'( ( f ) (f (vl-remove-if
      '(lambda (x) (vl-some '(lambda (y)
      (= x y)
      )
   '( 254 255 ))
      )
(vl-string->list str)
      )
   )
      )
'( ( l ) (if l (cons (list (car l)(cadr l))
   (f (cddr l)))
   )
   )
)
)
)
)

;; Decimal to Base-Lee Mac
;; Converts a decimal number to another base.
;; n - decimal integer
;; b - non-zero positive integer base
;; Returns: Representation of decimal in specified base

(defun LM:dec->base ( n b )
(if (< n b)
   (chr (+ n (if (< n 10) 48 55)))
   (strcat (LM:dec->base (/ n b) b) (LM:dec->base (rem n b) b))
)
)

如果以上内容适用于您的语言，请尝试？
否则方案B：假设您使用UTF-8文件FSO读取流，它更稳定，但如果我有时间的话，很难配对1到4个字节

mit 发表于 2022-7-5 15:42:44

太棒了
它可以工作

非常感谢hanhphuc
谢谢李的密码

hanhphuc 发表于 2022-7-5 15:50:42

不客气。希望你下次能自己编写代码

如果您对以前的unicode方法有疑问，下面是我的UTF-8函数在将来可能会有用。试试吧，祝你好运。。

(alert (foo ret ) )
ດິນບຸກຄົນ ??

"\U+0E94\U+0EB4\U+0E99\U+0E9A\U+0EB8\U+0E81\U+0E84\U+0EBB\U+0E99"

最后连接所有编码字节列表：

一些屏幕截图
https://i.imgur.com/ooL2kHm.png

随机测试阿拉伯语、汉语、印地语、日语、韩语、老挝语、旁遮普语、俄语、泰米尔语、越南语等。。还有一些问题

mit 发表于 2022-7-5 15:54:49

你好
云你能帮帮我吗？
这个代码怎么了？

;Reference, post#138
;https://stackoverflow.com/questions/643694/what-is-the-difference-between-utf-8-and-unicode

(defun UTF8->unicode ( l / ls 8b d2 foo) ; encode UTF-8 to unicode
;;;hanhphuc 17.04.2018
(setq 8b '((s) (while (< (strlen s)(setq s (strcat "0" s))) s)
d2 '((str) ;split string to two list
(if (> (strlen str) 0)
(cons (substr str 1(d2 (setq str (substr str 9 ))))
   )
      )
foo '(($ / pos i) ; base2 to decimal
(setq i 0)
(+ (cond ((while (and (> (strlen $) 0) (setq pos (vl-string-search "1" $)))
      (setq $ (substr $ (+ 2 pos))
   i (+ i (expt 2 (strlen $)))
   )
            )
   )
   (0)
         )
   (atoi $)
            )
   )
ls (mapcar ''((x / $)
      (setq $ (LM:dec->base (foo x) 16))
      (if
      (= (strlen $) 1)
      (strcat "0" $)
      $
      )
      )
   (d2
      (apply 'strcat
   (mapcar ''((a x) (substr (8b a) (- 9 x) x))
   l
   (cdr (assoc (length l) '((1 . (7)) (2 . (5 6)) (3 . (4 6 6)) (4 . (3 6 6 6)))))
   )
   )
      )
   )
)
(apply 'strcat
(vl-list* "\\U"
   (if (> (length ls) 1)
"+"
"+00")
   ls
   )
)
)

(defun U8:bytes (l / x ls)
;hanhphuc 17.04.2018
;UTF-8 split the bytes
(setq x (car l))
(if l
(cons (vl-remove nil (cond ((<= 0 x 191)
(setq ls (list x)
      l(cdr l)
      )
ls
)
((<= 192 x 223)
(setq ls (list x (cadr l))
      l(cddr l)
      )
ls
)
((<= 224 x 239)
(setq ls (list x (cadr l) (caddr l))
      l(cdddr l)
      )
ls
)
((<= 240 x 247)
(setq ls (list x (cadr l) (caddr l) (cadddr l))
      l(cddddr l)
      )
ls
)
)
)
(U8:bytes l)
)
)
)

所以换成这个

(setq ret "ï»¿Lee Mac & Marko Ribar\r\nHappy Birthday\r\nç¥ä½*ä»¬ç”Ÿæ—¥å¿«ä¹\r\nå¹¸ç¦\r\nChÃºc má»«ng sinh nháº*t\r\n"
)

愉快地编码

p/s：使用read char读取unicode文件

hanhphuc 发表于 2022-7-5 16:01:33

非常感谢穆什·汉赫普克

mit 发表于 2022-7-5 16:09:28

不客气

对不起，参考链接中有#138的错别字，应该是#147的帖子

hanhphuc 发表于 2022-7-5 16:15:19

页: [1]

乐筑天下's Archiver

如何读取UTF-8编码？