
Python Assignment 4
KWIC! More Python!
Update: Corrected the second example output on 3/30.
(If you already had it working to match the old ordering, I'll be
happy with that. And it was probably more work.)
A KeWord In Context (KWIC) index is a simple index for a list of
lines or titles. This assignment involves creating a KWIC index
for an input list of titles. Here's a small example. For the input
Casablanca
The Maltese Falcon
The Big Sleep
your program should produce the output
3 The Big Sleep
1 Casablanca
2 The Maltese Falcon
2 The Maltese Falcon
3 The Big Sleep
As you can see, each title is listed for each word (omitting some
minor words). The titles are arranged so that the word being indexed is
shown in a colum on the page. The line number (starting from 1) is
shown on the left.
Your Python solution should follow the following rules:
- The input is just a series of titles, one per line.
Any leading or trailing spaces should be removed. Internal
spaces should be retained.
- A word is a maximal sequence of non-blank characters.
- The output line is at most 79 characters wide.
- The number is 5 characters wide, right-justified.
- There is a space after the number.
- The key word starts at position 40
(numbering from 1).
- If the part of the title left of the keyword is longer than 33, trim it
(on the left) to 33.
- If the part of the keyword and the part to the right
is longer than 40, trim it to 40.
- Each title appears in the output once for each word that isn't
“minor.” Any word of length two or less is minor, as are the words
and and the, regardless of case.
- If a title has a repeated word, it should be listed for
each repetition, in left-to-right order.
- Sorting should be case-insensitive.
Here's a longer example. For the input:
It's a Mad, Mad, Mad, Mad World
The Most Amazingly Long Title of Anything You Ever Saw In Your Whole Life
See Far Away
Produce the output:
2 The Most Amazingly Long Title of Anything You Eve
2 The Most Amazingly Long Title of Anything You Ever Saw In Your Whole Life
3 See Far Away
2 ingly Long Title of Anything You Ever Saw In Your Whole Life
3 See Far Away
1 It's a Mad, Mad, Mad, Mad World
2 thing You Ever Saw In Your Whole Life
2 The Most Amazingly Long Title of Anything You Ever Saw In Y
1 It's a Mad, Mad, Mad, Mad World
1 It's a Mad, Mad, Mad, Mad World
1 It's a Mad, Mad, Mad, Mad World
1 It's a Mad, Mad, Mad, Mad World
2 The Most Amazingly Long Title of Anything Yo
2 Long Title of Anything You Ever Saw In Your Whole Life
3 See Far Away
2 The Most Amazingly Long Title of Anything You Ever Saw In Your W
2 of Anything You Ever Saw In Your Whole Life
1 It's a Mad, Mad, Mad, Mad World
2 Amazingly Long Title of Anything You Ever Saw In Your Whole Life
2 itle of Anything You Ever Saw In Your Whole Life
Some hints:
- Read the input and create a list of items. For each title, add
items to the list for each major word.
- Each item would be a tuple containing the title, the word to
sort on, or whatever is needed to produce a line of input.
- Use the builtin sort function to sort the list.
- Go through the list to generate the lines of the index.
When your program works,
submit it
here.