Mercurial > hg > TextShaper
changeset 52:8d8c1ac0e8e1
add a test text and wire some things up
author | Jeff Hammel <k0scist@gmail.com> |
---|---|
date | Sun, 17 May 2015 08:48:56 -0700 |
parents | c3b69728f291 |
children | 3691ffa84a3a |
files | tests/test.txt textshaper/split.py |
diffstat | 2 files changed, 20 insertions(+), 3 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tests/test.txt Sun May 17 08:48:56 2015 -0700 @@ -0,0 +1,1 @@ +The fog of an October evening occluded the arrival of a carriage to the township of Bronswick so that none could discern through the mists that the townhouse at 18 Merriwether Lane was again to be occupied, save only those neighbors across the way that might have the vantage. The girl of the house, a Miss Anne Danubar, was staring into the gray streets when the clod of horse hooves broke pace and rested. From the carriage stepped two men, an elderly gentleman and a younger, who unlocked the gate to the yard which had never stood unlocked in times rememembered. The driver aided them with their modest luggage and soon they were inside, the carriage leaving. \ No newline at end of file
--- a/textshaper/split.py Sun May 17 08:33:23 2015 -0700 +++ b/textshaper/split.py Sun May 17 08:48:56 2015 -0700 @@ -38,6 +38,20 @@ def split_sentences(text, ends='.?!'): """split a text into sentences""" + text = text.strip() + sentences = [] + _indices = indices(text, ends) + + begin = 0 + for index, value in _indices: + sentence = text[begin:index] + sentence += value + sentence.strip() + begin = index + if sentence: + sentences.append(sentence) + import pdb; pdb.set_trace() + def split_paragraphs(text): lines = [line.strip() for line in text.strip().splitlines()] @@ -60,11 +74,13 @@ text = ' '.join(text.split()) # paragraphs = split_paragraphs(text) + # find all sentences ends = '.?!' + sentences = split_sentences(text, ends) - # find all ending punctuation - - + # display + for sentence in sentences: + print (sentence) if __name__ == '__main__': main()