Hi, about filtering the license information.
Right now i'm preparing the filtering of the license information in from books downloaded from project Gutenberg, and i have 2 doubts: 1) I understand i can replace the license with this string ""This Ebook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included in this eBook or online at www.gutenberg.net"" is that enough? 2) Is there any other markup beside the '_' to represent italic boundaries? And does this markup only occur in txt files?
Also, can a italic markup span more than one line/paragraph? On Wed, Dec 23, 2009 at 10:41 PM, Paulo Levi <i30817@gmail.com> wrote:
Right now i'm preparing the filtering of the license information in from books downloaded from project Gutenberg, and i have 2 doubts: 1) I understand i can replace the license with this string ""This Ebook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included in this eBook or online at www.gutenberg.net"" is that enough? 2) Is there any other markup beside the '_' to represent italic boundaries? And does this markup only occur in txt files?
Answering myself. They can. On Wed, Dec 23, 2009 at 11:55 PM, Paulo Levi <i30817@gmail.com> wrote:
Also, can a italic markup span more than one line/paragraph?
On Wed, Dec 23, 2009 at 10:41 PM, Paulo Levi <i30817@gmail.com> wrote:
Right now i'm preparing the filtering of the license information in from books downloaded from project Gutenberg, and i have 2 doubts: 1) I understand i can replace the license with this string ""This Ebook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included in this eBook or online at www.gutenberg.net"" is that enough? 2) Is there any other markup beside the '_' to represent italic boundaries? And does this markup only occur in txt files?
This is the code i ended up with. The context is that i don't control the lines (so i can't use a regex - in don't know if i have all the input). I suppose to be sure i should employ a stringbuilder to concatenate the lines while they don't match the end of the "tag". boolean isMarkupStart = line.startsWith("\n***"); if (isMarkupStart) { isStart = line.contains(START_TAG); isEnd = line.contains(END_TAG); inTag = isStart || isEnd; } if (isInValidText && !inTag) { super.insertString(offset, line, attr); } //best i can do. If the string breaks exactly at *\n** //or something, i suppose this wil break horribly. if (inTag && line.endsWith("***")) { isInValidText = isStart && !isEnd; inTag = false; }
Removed the possibility using a class StringBuilder. boolean isMarkupStart = line.startsWith("\n***"); if (isMarkupStart) { isStart = line.contains(START_TAG); isEnd = line.contains(END_TAG); inTag = isStart || isEnd; } if(inTag){ tagForMatch.append(line); }else if (isInValidText) { super.insertString(bypass, offset, line, attr); } //The stringbuffer is to be sure *** is not broken into many lines if (inTag && tagForMatch.toString().endsWith("***")) { isInValidText = isStart && !isEnd; inTag = false; tagForMatch.setLength(0); }
Also using _ to represent italic is ... awkward to say the least. Many url or other data have italics.
participants (1)
-
Paulo Levi