[Lispweb] utf-8 manipulations
Pascal Bourguignon
pjb at informatimago.com
Tue Jun 28 10:31:26 CDT 2005
Frédéric Gobry writes:
> Hi,
>
> I'm working on my 1st lisp-based web application (I'm a lisp newcomer
> too :-)). It's based on Araneida, and runs in sbcl (0.8.16 from debian
> sarge, no unicode support).
>
> In my app, users can provide names for certain pages, say:
>
> Page accentuée
>
> which should give birth to an url that matches the name as
> much as possible but remains readable, say:
>
> page-accentuee
>
> I get the user input as an utf-8 string. Is there a library to help me
> processing the raw name into the url-clean version, or should I write
> both an utf-8 parser and the equivalent of python's translate () method?
>
> Any general advice on that topic is welcome, as I don't have much
> experience with lisp.
Well, trivial ad-hoc algorithms rarely end in libraries...
But if you're asking for url clean, then instead of reading python
docs you'd better read standards:
http://www.w3.org/International/O-URL-code.html
So, to get a vector of ASCII codes, encoding a string encoded into
UTF-8, usable in a URI, you'd do:
(defun encode-for-uri (string)
(flet ((hex (nibble) (+ nibble (if (< nibble 10) 48 55))))
(let ((bytes (make-array (list (length string))
:element-type '(unsigned-byte 8)
:adjustable t :fill-pointer 0)))
(loop for byte across (encode-string-to-utf-8 string)
do (if (< byte 128)
(vector-push-extend byte bytes)
(progn
(vector-push-extend 37 bytes) ; ASCII %
(vector-push-extend (hex (truncate byte 16)) bytes)
(vector-push-extend (hex (mod byte 16)) bytes))))
bytes)))
For encode-string-to-utf-8:
(defun ENCODE-STRING-TO-UTF-8 (string)
#+clisp (EXT:CONVERT-STRING-TO-BYTES string CHARSET:UTF-8))
I suppose something similar exists in SBCL. (I wonder why SBCL
developers did not use the de-facto standard of clisp for these
functions ;-)
[14]> (encode-for-uri "François")
#(70 114 97 110 37 67 51 37 65 55 111 105 115)
--
__Pascal Bourguignon__ http://www.informatimago.com/
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d? s++:++ a+ C+++ UL++++ P--- L+++ E+++ W++ N+++ o-- K- w---
O- M++ V PS PE++ Y++ PGP t+ 5+ X++ R !tv b+++ DI++++ D++
G e+++ h+ r-- z?
------END GEEK CODE BLOCK------
More information about the lispweb
mailing list