[Lispweb] utf-8 manipulations

Frédéric Gobry frederic.gobry at epfl.ch
Tue Jun 28 13:42:28 CDT 2005


(please tell me if this is the wrong mailing list, I don't feel like my
question was welcome)

> Well, trivial ad-hoc algorithms rarely end in libraries...

For my information, which of general charset conversion or generic
string substitution is trivial ad-hoc?

> But if you're asking for url clean, then instead of reading python
> docs you'd better read standards:

I did not have to read python docs, I mostly develop in python these
days.

> http://www.w3.org/International/O-URL-code.html

Thanks, but I don't wish to end up with "Fran%C3%A7ois" but with
"francois".  The aim is to generate a first version of an url from a
string, which can then be manually tweaked by the (non technical) user.

> 
> (defun encode-for-uri (string)
>   (flet ((hex (nibble) (+ nibble (if (< nibble 10) 48 55))))
>     (let ((bytes (make-array (list (length string)) 
>                              :element-type '(unsigned-byte 8)
>                              :adjustable t :fill-pointer 0)))
>       (loop for byte across (encode-string-to-utf-8 string) 
>             do (if (< byte 128)
>                  (vector-push-extend byte bytes)
>                  (progn
>                    (vector-push-extend 37 bytes) ; ASCII %
>                    (vector-push-extend (hex (truncate byte 16)) bytes)
>                    (vector-push-extend (hex (mod      byte 16)) bytes))))
>       bytes)))
> 
> For encode-string-to-utf-8:
> 
> (defun ENCODE-STRING-TO-UTF-8 (string)
>   #+clisp (EXT:CONVERT-STRING-TO-BYTES string CHARSET:UTF-8))
> 

Thanks for the code, it's always useful to read nice samples.

> I suppose something similar exists in SBCL.  (I wonder why SBCL
> developers did not use the de-facto standard of clisp for these
> functions ;-)

As I said, I'd prefer sticking with sarge, so no unicode support in
sbcl. I planned to do something like utf-8 -> latin-1 and remap some
characters in order to remove some diacritics.

Do you think it would be wiser to move to another CL implementation?
I've not yet built strong affective links with any of them, having tried
a bit cmucl and a bit more sbcl. So, clisp would be a better choice?

Frédéric




More information about the lispweb mailing list