[Openmcl-devel] A faster read-line
ron at flownet.com
Wed Oct 20 12:27:52 CDT 2010
Thanks Gary. I guess the pitfall here is remembering that :ascii is not actually a translation-free format, at least not when it's done right.
On Oct 20, 2010, at 10:13 AM, Gary Byers wrote:
> In the most general case, READ-LINE is something like:
> (let* ((temp (make-string-with-fill-pointer)))
> (let* ((ch (read-char stream nil nil)))
> (cond ((null ch) (return (values (copy-seq temp) t)))
> ((eql ch #\newline) (return (values (copy-seq temp) nil)))
> (t (vector-push-extend ch temp))))))
> where a string-with-fill-pointer might or might not be the best way
> to accumulate characters.
> If the stream is buffered (and you know things about how it's buffered),
> no newline translation is going on, and the mapping between octets and
> characters is simple enough, you can do better: you can look for an octet
> with value #\a in the buffer and if you find one, know how many octets are
> used to encode the string (and therefore know the length o the string in
> characters), and there are other things that you can do that can be a lot
> faster than the "just collect characters until EOF or newline" approach.
> The code used in that case (iso-8859-1 encoding, unix line-termination) is
> faster than the general case; it's still likely to be slower than #_fgets
> (read at most N octets into a preallocated buffer, confuse concepts
> "characters" and "octets", etc.)
> There's a lot of room in between the very simple iso-8859-1/unix case and
> the general one (e.g, ASCII/unix is almost as simple as iso-8859-1), but
> CCL doesn't try to do anything special to handle those cases. Most of those
> special things involve trying to determine whether there's a newline in
> the buffer, which depends on what character(s) are used to represent #\newline
> and on what octet(s) are used to represent those characters.
> On Tue, 19 Oct 2010, Ron Garret wrote:
>> It seems to be unicode conversion that is taking all the time. Python yields similar disparities depending on whether you're reading a file opened with open or codecs.open.
>> READ-SEQUENCE is nice and zippy.
>> On Oct 19, 2010, at 4:36 PM, Greg Pfeil wrote:
>>> On 19 Oct 2010, at 19:27, Ron Garret wrote:
>>>> Without doing anything special, read-line is, empirically, about fifteen times slower than the equivalent C code, even with :external-format :ascii. (My benchmark is comparing (loop while (read-line f nil nil)) with wc.) Lisp also seems to be CPU bound during read-line. What is it doing with all those cycles? Are there any easy ways to speed this up? What's the fastest way to ingest a file in CCL?
>>> I don't know what CCL is doing, but I remember seeing this forever ago: http://www.ymeme.com/slurping-a-file-common-lisp-83.html
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
More information about the Openmcl-devel