• 软件测试技术
  • 软件测试博客
  • 软件测试视频
  • 开源软件测试技术
  • 软件测试论坛
  • 软件测试沙龙
  • 软件测试资料下载
  • 软件测试杂志
  • 软件测试人才招聘
    暂时没有公告

字号: | 推荐给好友 上一篇 | 下一篇

Lua: technical note 9

发布: 2007-7-04 20:48 | 作者: admin | 来源:  网友评论 | 查看: 22次 | 进入软件测试论坛讨论

领测软件测试网

Lua Technical Note 9

Creating Strings Piece by Piece

by Roberto Ierusalimschy

Abstract

In Lua, "accumulation" of string results (that is, loops over a code like s = s..x ) can be quite expensive. This note describes an efficient way to create a string piece by piece in Lua.

The Problem

Suppose you are building a string piecemeal, for instance reading a file line by line. Your typical code would look like this:

-- WARNING: bad code ahead!!
local buff = ""
while 1 do
  local line = read()
  if line == nil then break end
  buff = buff..line.."\n"
end
Despite its innocent look, this code in Lua can cause a huge performance penalty for large files: For instance, it takes almost a minute to read a 350 Kbyte file.

Frequently this is not a problem. For small strings, the above loop is OK. To read a whole file, you can use the "*all" option, that reads it at once. But sometimes you have no such simple solution. Then, the only solution is a more efficient algorithm for your problem. Here we show one (algorithm, not problem).

The Solution

The heart of the algorithm is a stack, that keeps the large strings already created in its bottom, while small strings enter through the top. The main invariant of this stack is similar to that of the popular (among programmers, I mean) "Tower of Hanoy": A string in the stack can never sit over a shorter string. Whenever a new string is pushed over a shorter one, then (and only then) the algorithm concatenates both. This concatenation creates a larger string, that now may be larger than its neighbor in the previous floor. If that happens, they are also joined. Those concatenations go down the stack until the loop reaches a larger string or the stack bottom.
function newBuffer ()
  return {n=0}     -- 'n' counts number of elements in the stack
end

function addString (stack, s)
  tinsert(stack, s)       -- push 's' into the top of the stack
  for i=stack.n-1, 1, -1 do
    if strlen(stack[i]) > strlen(stack[i+1]) then break end
    stack[i] = stack[i]..tremove(stack)
  end
end
To get the final contents of the buffer, we just need to concatenate all strings down the bottom:
function toString (stack)
  for i=stack.n-1, 1, -1 do
    stack[i] = stack[i]..tremove(stack)
  end
  return stack[1]
end

Using this new data structure, we can rewrite our program as follows:

local s = newBuffer()
while 1 do
  local line = read()
  if line == nil then break end
  addString(s, line.."\n")  
end
s = toString(s)
This new program reduces our original time to read a 350 Kbyte file from 40 seconds to 0.5 seconds. (The call read"*all" is still faster, finishing the job in 0.02 seconds.)

Explanation

To understand what happens with the naive approach, let us assume that we are in the middle of the reading; buff is already a string with 50 Kbytes, and each line has 20 bytes. After the assignment

    buff = buff..line.."\n"
buff is a new string with 50,020 bytes, and the old string in now garbage. After two loop cycles, buff is a string with 50,040 bytes, and there are two old strings making a total of more than 100 Kbytes of garbage. Therefore, Lua decides, quite correctly, that it is a good time to run its garbage collector, and so it frees those 100 Kbytes. The problem is that this will happen every two cycles, and so Lua will run its garbage collector two thousand times before finishing the loop. Even with all this work, its memory usage will be around three times the file size. To make things worse, each concatenation must copy the whole string content (50 Kbytes and growing) into the new string.

This problem is not exclusive of Lua: other languages with true garbage collection, and where strings are immutable objects, present a similar behavior (Java being the most famous example).

Our original loop did a "linear" approach to the problem, concatenating small strings one by one into the accumulator. The new algorithm avoids this, using a binary approach. It concatenates many small strings among them, and once in a while it concatenates the resulting large strings into larger ones.


延伸阅读

文章来源于领测软件测试网 https://www.ltesting.net/


关于领测软件测试网 | 领测软件测试网合作伙伴 | 广告服务 | 投稿指南 | 联系我们 | 网站地图 | 友情链接
版权所有(C) 2003-2010 TestAge(领测软件测试网)|领测国际科技(北京)有限公司|软件测试工程师培训网 All Rights Reserved
北京市海淀区中关村南大街9号北京理工科技大厦1402室 京ICP备10010545号-5
技术支持和业务联系:info@testage.com.cn 电话:010-51297073

软件测试 | 领测国际ISTQBISTQB官网TMMiTMMi认证国际软件测试工程师认证领测软件测试网