Saturday, September 19, 2015

Messing around with book numbers

I picked up a book published in 2006, The 101 Most Influential People Who Never Lived by Allan Lazar, Dan Karlan, and Jeremy Salter. The premise is that some fictional characters have been as influential on opinions and actions in the real world as real people. The authors have created a subjective list of 101 characters and then written a short essay on each of them. The characters were selected based on the breadth of the population affected by the character as well as the depth. The combination is intended to screen out popular but inconsequential characters.

Clever but fluffy. Still, it looked intriguing. I bought it and it has been sitting among the stacks for several years now. In cleaning up my library in the vain hope of making room for more books, or at least being able to move around without causing booklanches, 101 surfaced.

Still haven't read it but I did skim. Here is there original list. I added Brer Rabbit and have also included twenty or so that they included in the appendix as also rans that did not make the final cut.
Person, Rank
The Marlboro Man 1
Big Brother 2
King Arthur 3
Santa Claus 4
Hamlet 5
Dr. Frankenstein's Monster 6
Siegfried 7
Sherlock Holmes 8
Romeo and Juliet 9
Dr. Jekyll and Mr. Hyde 10
Uncle Tom 11
Robin Hood 12
Jim Crow 13
Oedipus 14
Lady Chatterley 15
Ebenezer Scrooge 16
Don Quixote 17
Mickey Mouse 18
The American Cowboy 19
Prince Charming 20
Smokey the Bear 21
Robinson Crusoe 22
Apollo 23
Dionysus 23
Apollo and Dionysus 23
Odysseus 24
Nora Helmer 25
Cinderella 26
Shylock 27
Rosie the Riveter 28
Midas 29
Hester Prynne 30
The Little Engine That Could 31
Archie Bunker 32
Dracula 33
Alice in Wonderland 34
Citizen Kane 35
Faust 36
Figaro 37
Godzilla 38
Mary Richards 39
Don Juan 40
Bambi 41
William Tell 42
Barbie 43
Buffy the Vampire Slayer 44
Venus 45
Cupid 45
Venus and Cupid 45
Prometheus 46
Pandora 47
G.I. Joe 48
Tarzan 49
Captain Kirk 50
Mr. Spock 50
Captain Kirk and Mr. Spock 50
Atticus Finch 51
Hansel and Gretel 52
Captain Ahab 53
Elmer Gantry 54
The Ugly Duckling 55
Loch Ness Monster 56
Saint Valentine 58
Helen of Troy 59
Batman 60
Uncle Sam 61
Nancy Drew 62
J.R. Ewing 63
Superman 64
Huckleberry Finn 65
Tom Sawyer 65
Tom Sawyer and Huckleberry Finn 65
HAL 9000 66
Kermit the Frog 67
Sam Spade 68
The Pied Piper 69
Peter Pan 70
Hiawatha 71
Othello 72
The Little Tramp 73
King Kong 74
Dr. Strangelove 75
Hercules 76
Dick Tracy 77
Joe Camel 78
The Cat in the Hat 79
Icarus 80
Mammy 81
Sinbad the Sailor 82
Amos n' Andy 83
Buck Rogers 84
Luke Skywalker 85
Perry Mason 86
Bond, James Bond 87
Pygmalion 88
Madame Butterfly 89
Hans Beckert 90
Dorothy Gale 91
The Wandering Jew 92
The Great Gatsby 93
Buck 94
Willy Loman 95
Betty Boop 96
Ivanhoe 97
Norman Bates 98
Lilith 99
John Doe 100
Paul Bunyan 101
Brer Rabbit Added
Lancelot Also Rans
Medea Also Rans
Beowulf Also Rans
Gulliver Also Rans
Lolita Also Rans
Pinocchio Also Rans
Raskolnikov Also Rans
Golem Also Rans
Mother Goose Also Rans
The Phoenix Also Rans
Uncle Remus Also Rans
Bugs Bunny Also Rans
Winnie-the-Pooh Also Rans
Dirty Harry Also Rans
Homer Simpson Also Rans
Holden Caulfield Also Rans
Walter Mitty Also Rans
Tom Joad Also Rans
Jewish American Princess Also Rans
George Milton and Lenny Small Also Rans
George Milton Individual
Lenny Small Individual
Reading their list, I could see why most were included. I question some like Willy Loman or Lilith. Why would they be on the list? And who the heck is Raskolnikov? And some of the rankings seemed off.

I got to thinking though. Is there a way to make this less subjective? Indeed there is, Google NGram Viewer which allows you to measure how often a word, name or phrase is mentioned across the corpus of a large sample of books published up to 2008. There were a few challenges. For example, Siegfried is not just a literary character but also a common German name. Mammy is a not uncommon slang name as well. Buck is a guy's name in addition to being a verb and a noun for a male deer and also the name of the hero dog in Jack London's Call of the Wild. The American Cowboy seems a stretch. Venus is not just a Roman goddess but also a planet. Still, not a bad rough and ready indicator of relevance.

I did a search on all the names and then ordered them based on frequency of textual reference. The first number is the original ranking and the second number is the ranking based on the corpus search.

Person Rank Ordinal Rank
Venus 45 1
Apollo 23 2
Hamlet 5 3
Buck 94 4
Hercules 76 5
Oedipus 14 6
Faust 36 7
Odysseus 24 8
Othello 72 9
Prometheus 46 10
Dr. Frankenstein's Monster 6 11
Siegfried 7 12
Don Quixote 17 13
Tarzan 49 14
Cinderella 26 15
Lancelot Also Rans 16
Jim Crow 13 17
Superman 64 18
Barbie 43 19
Don Juan 40 20
Dracula 33 21
Cupid 45 22
Pandora 47 23
Dionysus 23 24
Medea Also Rans 25
Beowulf Also Rans 26
King Arthur 3 27
Uncle Tom 11 28
Batman 60 29
Santa Claus 4 30
Uncle Sam 61 31
Romeo and Juliet 9 32
Robin Hood 12 33
Sherlock Holmes 8 34
Gulliver Also Rans 35
Shylock 27 36
Lilith 99 37
Figaro 37 38
Mammy 81 39
Robinson Crusoe 22 40
Sam Spade 68 41
Big Brother 2 42
Icarus 80 43
Bond, James Bond 87 44
Ebenezer Scrooge 16 45
Lolita Also Rans 46
Midas 29 47
Mickey Mouse 18 48
Peter Pan 70 49
Pygmalion 88 50
Huckleberry Finn 65 51
Hiawatha 71 52
Pinocchio Also Rans 53
Ivanhoe 97 54
John Doe 100 55
Raskolnikov Also Rans 56
Tom Sawyer 65 57
Bambi 41 58
Alice in Wonderland 34 59
Godzilla 38 60
King Kong 74 61
Golem Also Rans 62
Mother Goose Also Rans 63
Citizen Kane 35 64
Prince Charming 20 65
The Great Gatsby 93 66
Nancy Drew 62 67
The Phoenix Also Rans 68
Buffy the Vampire Slayer 44 69
Hansel and Gretel 52 70
William Tell 42 71
Lady Chatterley 15 72
Helen of Troy 59 73
Hester Prynne 30 74
Uncle Remus Also Rans 75
Dr. Jekyll and Mr. Hyde 10 76
Bugs Bunny Also Rans 77
Paul Bunyan 101 78
Brer Rabbit Added 79
Winnie-the-Pooh Also Rans 80
Dirty Harry Also Rans 81
Dick Tracy 77 82
Dr. Strangelove 75 83
Perry Mason 86 84
Madame Butterfly 89 85
Rosie the Riveter 28 86
Captain Ahab 53 87
Willy Loman 95 88
Luke Skywalker 85 89
The Ugly Duckling 55 90
Captain Kirk 50 91
The Cat in the Hat 79 92
Buck Rogers 84 93
Amos n' Andy 83 94
Homer Simpson Also Rans 95
Holden Caulfield Also Rans 96
G.I. Joe 48 97
Mr. Spock 50 98
Archie Bunker 32 99
Loch Ness Monster 56 100
Betty Boop 96 101
The Pied Piper 69 102
The Marlboro Man 1 103
Norman Bates 98 104
Atticus Finch 51 105
Walter Mitty Also Rans 106
Elmer Gantry 54 107
Venus and Cupid 45 108
Joe Camel 78 109
The Wandering Jew 92 110
Sinbad the Sailor 82 111
Tom Joad Also Rans 112
Saint Valentine 58 113
Smokey the Bear 21 114
Kermit the Frog 67 115
The Little Engine That Could 31 116
Mary Richards 39 117
Apollo and Dionysus 23 118
Tom Sawyer and Huckleberry Finn 65 119
Dorothy Gale 91 120
HAL 9000 66 121
Jewish American Princess Also Rans 122
Nora Helmer 25 123
J.R. Ewing 63 124
The Little Tramp 73 125
George Milton Individual 126
The American Cowboy 19 127
Hans Beckert 90 128
Captain Kirk and Mr. Spock 50 129
George Milton and Lenny Small Also Rans 130
Lenny Small Individual 131
So in the first list, Lazar, Karlan, and Salter thought that Venus was probably 45 out of the 101 in importance but in the NGram Viewer, she is number one.

It then occurred to me that by searching the corpus 1800-2008, this was probably unduly weighting the classical names over such contemporary stalwarts as Mickey Mouse and Buffy. I did a search restricting the corpus to 1950-2008 expecting that that might substantially change the ordering of the names. I took a sample of a dozen classic names as well as a dozen more contemporary names and to my surprise, all of them remained in very close order to the original NGram list. Venus is still #1.

For those concerned about the undermining of Western Civilization, there is some solace to be taken from this exercise. All the classical characters are there - Lancelot, King Arthur, Sherlock Holmes, Hercules, Robin Hood, Uncle Sam, Dracula, Superman, Tarzan, Hamlet, etc. The foundations of Western reading Civilization look pretty immutable on this count.

For those more interested in the Social Justice Warrior aspects, there probably is some concern. Only about 20% of the characters are female and their rankings are on the low side of things. Only seven of the 22 score above average in terms of how often they are discussed in books. And some of them are questionable as role models. I am thinking of Cinderella, Barbie, Mammy, Lolita, Lady Chatterley, Hester Prynne, Betty Boop and Jewish American Princess (revealing of when the authors came of age). Anyway, the 22 characters and their ranks are here:

Person Rank
Venus 1
Cinderella 15
Barbie 19
Pandora 23
Medea 25
Lilith 37
Mammy 39
Lolita 46
Alice in Wonderland 59
Mother Goose 63
Nancy Drew 67
Buffy the Vampire Slayer 69
Lady Chatterley 72
Helen of Troy 73
Hester Prynne 74
Madame Butterfly 85
Rosie the Riveter 86
Betty Boop 101
Mary Richards 117
Dorothy Gale 120
Jewish American Princess 122
Nora Helmer 123



No comments:

Post a Comment