Chapter 1 Some general introspection

Three types of results from Google Vision API:

  • GV_label broad sets of objects (with probabilities)
  • GV_entity uses Google Image Search to find topical entities (with probablities)
  • GV_ocr for Optical Character Recognition of text

GV_entity and GV_label receive a score as a measure for how confident Google is about this.

One type of results from Clarifai Predict API:

  • CP_label “concepts with corresponding probabilities of how likely it is these concepts are contained within the image”

1.1 Frequency of all labels

1.2 Frequency of unique labels

1.3 Distribution of scores

Range of scores:

  • GV entity: 0, 1.84
  • GV label: 0.5, 0.99
  • CP name: 0.6023139, 0.9997798

1.4 Top ranking labels (on average)

Number of labels with mean score of .8 or higher:

type n
CP_label 607
GV_entity 26
GV_label 62

1.5 Score distribution for those top ranking labels

NOTE: There can be labels that are only mentioned very rarely, but scored high in confidence. Ns added to the plots.

For Clarifai Preditions [CP] only top labels with mean score of .95 or higher (N = 31).

1.6 Most used labels (more than 50 times) and their mean scores

type value N mean_score
CP_label no person 670 0.9797910
CP_label rock 442 0.9215385
CP_label stone 439 0.9178132
CP_label dirty 417 0.9257794
CP_label travel 415 0.9055904
CP_label architecture 357 0.9202801
CP_label outdoors 343 0.8917493
CP_label wall 304 0.9121711
CP_label building 290 0.9130690
CP_label sand 278 0.9182014
CP_label old 276 0.8867391
CP_label one 263 0.9039924
CP_label people 261 0.9231801
CP_label industry 251 0.8865339
CP_label soil 249 0.8951406
CP_label ancient 249 0.8971888
CP_label daylight 240 0.8864167
CP_label desktop 233 0.8881116
CP_label geology 218 0.9077523
CP_label landscape 218 0.8983028
CP_label cement 213 0.9102817
CP_label concrete 204 0.9042157
CP_label environment 204 0.8819608
CP_label calamity 202 0.9009901
CP_label indoors 198 0.8920707
CP_label desert 192 0.9014583
CP_label nature 183 0.8873224
CP_label expression 175 0.9064000
CP_label texture 168 0.9040476
CP_label cave 165 0.9269697
CP_label subway system 160 0.9223125
CP_label mine 158 0.8894937
CP_label empty 154 0.8824675
CP_label rough 151 0.8914570
CP_label retro 145 0.8795862
CP_label water 139 0.8866187
CP_label adult 135 0.9471852
CP_label abandoned 133 0.8850376
CP_label wear 123 0.8799187
CP_label art 121 0.8579339
CP_label mud 116 0.8952586
CP_label man 114 0.9398246
CP_label abstract 113 0.8948673
CP_label exploration 102 0.8816667
CP_label pattern 92 0.9020652
CP_label construction 87 0.9050575
CP_label two 87 0.8889655
CP_label seashore 86 0.8874419
CP_label woman 78 0.9238462
CP_label road 77 0.8774026
CP_label limestone 75 0.9012000
CP_label paper 75 0.8938667
CP_label religion 74 0.8879730
CP_label sculpture 71 0.8832394
CP_label wood 70 0.8414286
CP_label food 69 0.8801449
CP_label group 69 0.9259420
CP_label tunnel 69 0.9182609
CP_label recreation 68 0.8886765
CP_label family 67 0.8740299
CP_label dark 65 0.8403077
CP_label child 62 0.9304839
CP_label still life 62 0.8585484
CP_label dust 61 0.8803279
CP_label home 59 0.8893220
CP_label hole 57 0.8931579
CP_label business 56 0.8680357
CP_label beach 55 0.9005455
CP_label archaeology 53 0.8833962
CP_label room 53 0.8864151
GV_entity Soil 360 0.5528611
GV_entity Archaeology 327 0.7094190
GV_entity History 284 0.4787324
GV_entity Ancient history 281 0.4510320
GV_entity Archaeological site 237 0.6183544
GV_entity Geology 144 0.6234028
GV_entity Outcrop 114 0.6414035
GV_entity Wadi 86 0.2995349
GV_entity 55 0.4000000
GV_label soil 442 0.7854977
GV_label geology 438 0.7047945
GV_label rock 367 0.7064033
GV_label sand 357 0.6711485
GV_label ancient history 285 0.7209825
GV_label archaeology 278 0.6617626
GV_label archaeological site 241 0.7300830
GV_label formation 177 0.6158757
GV_label history 161 0.6526087
GV_label outcrop 114 0.5998246
GV_label bedrock 110 0.6481818
GV_label ruins 104 0.7242308
GV_label landscape 94 0.5571277
GV_label wall 84 0.8289286
GV_label artifact 63 0.7053968
GV_label concrete 61 0.5429508

1.7 Highest scoring labels (GV over 0.8, CP over .95) and their frequency

Table 1.1: GV Label
value n
soil 199
archaeological site 87
ancient history 86
rock 83
wall 59
sand 58
geology 43
ruins 39
archaeology 28
artifact 14
structure 11
bedrock 9
history 9
sky 9
text 9
brown 8
architecture 6
fauna 6
historic site 6
stone carving 6
field 5
light 5
tree 5
atmosphere 4
badlands 4
black 4
ecosystem 4
vehicle 4
yellow 4
black and white 3
building 3
fashion accessory 3
font 3
girl 3
man 3
mineral 3
prairie 3
water 3
aerial photography 2
car 2
cliff dwelling 2
cloud 2
construction 2
darkness 2
fault 2
floor 2
grassland 2
male 2
outcrop 2
person 2
photograph 2
plain 2
plant 2
road 2
room 2
social group 2
vacation 2
beach 1
beam 1
beauty 1
bird’s eye view 1
bottle 1
cave 1
climbing 1
community 1
construction worker 1
cumulus 1
daylighting 1
daytime 1
exhibition 1
eyewear 1
facial expression 1
finger 1
floor plan 1
flooring 1
food 1
foundation 1
freezing 1
fun 1
geological phenomenon 1
glasses 1
grass 1
grass family 1
hand 1
headgear 1
highland 1
horizon 1
human hair color 1
infrastructure 1
local food 1
mammal 1
market 1
mast 1
meteorological phenomenon 1
mode of transport 1
morning 1
mortuary temple 1
night 1
orange 1
path 1
people 1
phragmites 1
pink 1
plaid 1
plan 1
plaster 1
produce 1
property 1
rainbow 1
reflection 1
relief 1
sail 1
sailing ship 1
senior citizen 1
sitting 1
smile 1
snapshot 1
snow 1
sport climbing 1
standing 1
tall ship 1
tartan 1
transport 1
vision care 1
wadi 1
wilderness 1
wood 1
youth 1
Table 1.1: GV Entity
value n
Archaeology 155
Çatalhöyük 11
Laborer 4
Car 3
Neolithic 3
Geology 2
Outcrop 2
Art 1
Art exhibition 1
Art museum 1
Boulder 1
Brigantine 1
Brooch 1
Earring 1
Escuela de Aviación Militar Airport 1
Facial hair 1
Floor 1
Floor plan 1
Free University of Berlin 1
Glass bottle 1
Gobustan National Park 1
Konya Archaeological Museum 1
Museum of Anatolian Civilizations 1
Paper 1
Plucked string instrument 1
Poster 1
Rear-view mirror 1
Sail 1
Sport climbing 1
Street 1
Sweet Grass 1
University of Kiel 1
Vegetable 1
Table 1.1: GCP Label
value n
no person 603
dirty 155
rock 138
people 126
stone 123
architecture 118
adult 82
travel 81
sand 76
building 73
wall 73
one 73
cave 66
man 61
subway system 60
geology 45
cement 44
texture 42
desktop 38
landscape 32
desert 32
concrete 31
ancient 29
expression 28
group 28
indoors 28
woman 28
soil 27
calamity 27
old 27
industry 25
outdoors 25
child 24
nature 23
food 20
rough 17
two 17
tunnel 16
sculpture 15
abstract 15
pattern 14
paper 13
daylight 13
beach 12
wear 12
mud 11
water 11
mine 10
sky 9
empty 9
hole 9
construction 8
limestone 8
recreation 8
road 8
room 8
Earth surface 8
retro 8
environment 7
seashore 7
furniture 6
abandoned 6
art 5
home 5
light 5
portrait 5
vehicle 5
exploration 5
religion 5
family 4
house 4
mammal 4
storm 4
winter 4
business 4
fabric 4
agriculture 3
archaeology 3
astronomy 3
design 3
farm 3
field 3
interior design 3
moon 3
snow 3
tree 3
window 3
wood 3
background 3
ball-shaped 3
fish 3
glass items 3
ground 3
inside 3
lid 3
modern 3
seat 3
site 3
skill 3
still life 3
sunset 3
togetherness 3
transportation system 3
brick 2
ceiling 2
facial expression 2
fun 2
hand 2
museum 2
danger 2
dark 2
dawn 2
dry 2
education 2
grow 2
many 2
marble 2
solid 2
trading floor 2
wallpaper 2
weather 2
apartment 1
bone 1
car 1
construction worker 1
exhibition 1
flora 1
garden 1
girl 1
grass 1
ice 1
illustration 1
leaf 1
leisure 1
market 1
painting 1
pasture 1
rainbow 1
roof 1
rope 1
sail 1
sailboat 1
ship 1
steel 1
table 1
technology 1
text 1
watercraft 1
action 1
aircraft 1
airplane 1
airport 1
backlit 1
balance 1
barrel 1
bathroom 1
bathtub 1
battle 1
bird 1
broken 1
cane 1
cardboard 1
carpentry 1
city 1
cold 1
color 1
competition 1
computer 1
container 1
contemporary 1
currency 1
demolition 1
desk 1
dig 1
document 1
driver 1
earthquake 1
finance 1
flood 1
frame 1
futuristic 1
geometric 1
graphic 1
group together 1
growth 1
happiness 1
hurricane 1
initiation 1
insubstantial 1
invertebrate 1
level plane 1
lifestyle 1
money 1
offense 1
office 1
parchment 1
pavement 1
perspective 1
pollution 1
prehistoric 1
reed 1
rural 1
scale 1
school 1
seafood 1
several 1
signalise 1
skull 1
stucco 1
summer 1
three 1
vector 1
vehicle window 1
veil 1
volcano 1
war 1
waste 1
wealth 1
wheat 1
worn 1
yacht 1

1.8 Labels occuring in both sets: GV and CP

Table 1.2: GV AND CP labels
value CP_label GV_label
adventure 3 8
agriculture 9 4
apartment 3 1
arch 2 3
archaeology 53 278
architecture 357 7
art 121 4
asphalt 3 3
astronomy 8 1
atmosphere 1 6
beach 55 2
boat 1 1
bone 2 1
bottle 1 1
boulder 2 11
brick 19 1
building 290 5
cap 2 1
car 2 3
cave 165 11
ceiling 4 4
cement 213 4
ceremony 1 1
child 62 2
clay 30 1
cloud 1 6
collection 2 1
communication 1 2
concrete 204 61
construction 87 13
construction worker 7 10
crop 2 5
crystal 1 9
cuisine 1 1
design 18 8
drink 1 1
dusk 1 1
dust 61 3
energy 4 2
evening 2 1
exhibition 2 1
facial expression 9 1
family 67 1
farm 5 3
field 5 9
finger 1 4
floor 7 42
flower 1 2
food 69 3
foot 1 1
fun 8 29
furniture 25 5
garden 2 1
geology 218 438
girl 18 24
granite 9 1
graphic design 1 1
grass 6 19
grassland 2 5
gun 1 1
hand 6 6
herb 1 1
hill 1 4
home 59 2
house 45 6
human 1 3
hut 1 1
ice 8 2
illustration 12 1
interior design 7 1
label 2 1
landscape 218 94
leisure 19 3
light 42 5
limestone 75 7
line 1 13
lunch 1 1
mammal 24 1
man 114 3
market 2 1
mast 1 1
meat 1 1
monochrome 1 3
mountain 3 3
mud 116 7
museum 27 2
music 2 1
number 4 2
painting 30 1
paper 75 4
pasture 3 3
pattern 92 7
people 261 1
person 3 2
picture frame 3 1
planet 2 1
plaster 7 4
plastic 2 1
portrait 23 1
prairie 2 6
rainbow 1 1
recreation 68 41
reflection 6 1
road 77 5
rock 442 367
roof 6 9
room 53 2
rope 2 1
rust 1 1
sail 1 1
sailboat 1 1
sand 278 357
sculpture 71 15
sea 22 1
shadow 11 4
ship 1 1
sign 9 4
sky 35 28
sleep 1 1
snow 13 2
soil 249 442
space 9 5
sphere 1 1
sports equipment 2 1
square 5 2
stall 1 1
steel 15 1
storm 10 1
street 11 1
table 6 4
technology 16 1
temple 3 11
text 20 9
tile 1 1
tool 4 1
tourism 50 17
travel 415 5
tree 8 25
vacation 10 23
vegetable 1 1
vehicle 47 16
wall 304 84
water 139 7
watercraft 2 1
weapon 13 1
wind 1 1
window 8 13
winter 22 2
wire 1 2
wood 70 30
writing 12 1

1.9 Predictor “agreement”: images

Predicors agree on at least one label in 605 out of the total of 766 images.

Broken down by number of agreed labels below.

1.10 Predictor “agreement”: scores

Labels used by CP and GV to describe the same image. Distribution of scores shown for labels that were used on more than 10 images.

#> Using `n` as weighting variable