Friday, March 22, 2013

Questions for 22 March


Smeulders et al.
Rating: 1

1) Why is it important to look at the past? The authors of this paper could have chosen to focus on the future and talked about where content-based image retrieval is going, but instead they are looking at what was happening 10 years before their publication to see what ideas worked out and which ones didn’t.  What value does this have for present researchers?

2) How does the end goal of the user affect the image retrieval tools used? Why do some methods fit some search patterns better than others? When I search a set of images (locally on my computer or on-line) what happens differently if I’m looking for an image of the Mona Lisa as opposed to an image of a generic tree? Since Smeulders wrote this paper has the field improved as far as using the right tools for the job is concerned?

3) The authors raise the question of how to evaluate system performance in content-based image retrieval but focus mainly on the problems associated with it without discussing possible solutions to those problems. How can some of these challenges be met? What has been tried – successfully or unsuccessfully – since the paper was published?

Saracevic, Tefko
Rating: 2

1) Saracevic claims on page 146 that we need no definition for relevance because its meaning is known to everyone. He compares it to information in this way, as though information needs no definition. But in the same paragraph he asserts that relevance “affects the whole process of communication, overtly or subtly.” He calls it a “fundamental aspect of human communication.” If relevance is so important how can we get by without defining it? Wouldn’t a definition help us understand the meaning better than when we rely on intuition alone? That understanding could, in turn, lead to improved communication.

2) How does this paper relate to the Smeulders paper we read? What is the role of relevance in content-based image retrieval? In the 25 years between the two papers did the concept of relevance evolve?

3) There are many different views on relevance described in the paper. What is the “so what?” of these views? How does this philosophical discussion of relevance affect the practical side of information science? As an example, would a follower of the ‘deductive inference view’ build a different retrieval system than a follower of the ‘pragmatic view’? What differences would there be and how would the differing schools of thought give birth to these differences?

Croft, Metzler, and Strohman
Rating: 3

1) The definition of Information Retrieval that the authors borrow from Gerard Salton is very broad. The benefit to this is that the definition is still applicable today even though it was penned 45 years ago. What is the downside to using a dated definition? Are they missing out on any potential benefits that an updated definition might bring? If they wanted to update the definition, could they? Since IR is so broad that it encompasses many fields is it possible to update the definition without driving a wedge between certain aspects of IR?

2) How does time affect IR? As time passes objects can change – especially digital objects. A new version of software is released, a website’s content is updated, Wikipedia is edited, etc. If the information I’m seeking is something that existed on a certain day or at a certain time how does this affect IR?

3) How do concepts like ‘relevance’ and/or ‘evaluation’ transfer from one branch of IR to another? How are these concepts different for, say, a search engine designer and a cartographer (who, by Salton’s definition is in an IR career)? Is there a difference?

Friday, March 15, 2013

Questions for 3-15


JISC Digital Media
Rating: 4

1) What is the risk with limiting your collection to one specific metadata schema? Is there a way to ensure that metadata that falls outside your schema is not lost? What benefits to having a schema are there and how do they outweigh the risks? Do they outweigh the risks?

2) How does they JISC definition of metadata compare to Gilliland’s definition? What differences and/or similarities are there? Could the differences arise because JISC is specifically speaking to “collections of sharable digital collections [of] still image, moving image or audio collections,” are they due to differences in author’s opinions or are they due to some other factor?

3) According to the website metadata can be kept in the digital file itself, in a database, in an XML document, or in all of the above. What are the benefits to keeping metadata in each of these locations? If each is beneficial doesn’t it make sense to use all 3 in every case? Why would some organizations choose to not do this?

Gilliland
Rating: 3

1) Metadata is currently a blend of manual and automatic processes. What would you estimate the ratio of manual to automated processes to be? How do you think this will this change as technology progresses? What are the risks in automating the collection and documentation of metadata and what benefits outweigh those risks? Regardless of what will happen, what should happen? Should we automate more processes or should we reintroduce a stronger human element?

2) How is metadata different today than it was 100 years ago? 10 years ago? Is the rate of change slow enough that we will be able to keep up with it?

3) Gilliland says that all information objects have 3 features: content, context, and structure. How do these features relate to the 5 attributes of metadata laid out later in the article? Does each attribute detail information about a single feature? Or do all the attributes exist separately for each feature? Should we look to accept either the 3 features or the 5 attributes or do they work together to paint a complete picture of the information object?

Nunberg
Rating: 5

1) Here again we see the need for a human element in computer processes. We saw it in the WordNet readings and the Church and Hanks readings from two weeks ago. We saw it in the data mining reading from the week before that. With the advances in technology we’ve experienced in our lifetime why do we still have this dependence on human input? Will computers ever be able to overcome the need for the human element? Should we be worried if this were to ever happen?

2) What kind of metadata is Nunberg talking about here? Are the things he mentions administrative? Descriptive? Technical? Something else? Is there a pattern here where some types of metadata are being bungled and others are not?

3) Why didn’t anyone tell the world that Sigmund Freud and his translator invented the internet back in 1939? Wouldn’t this have brought technology forward much more quickly? How would this have affected world history over the past 70 years?

Wiggins
Rating: 4

1) When they found that the information on the various schools’ websites were not up to date, did the authors attempt to contact the schools in question to get more reliable information? I mean, if they were already on the schools’ websites the phone number would probably have been right there. It just seems like a simple step that would have improved the results of the study. If they did not, could there be a good reason why they did not?

2) According to Table 2, UT Austin does not fit the stereotypical iSchool makeup of the members of the iSchool Caucus. We had much lower numbers of Computing faculty and we had the 3rd highest percentage of both Humanities and Communication faculty. What does that say about the degree programs offered here in contrast with those offered at other schools? Why do you think UT has chosen to be different in the composition of its faculty? Was this a good choice?

3) Table 2 is not based off of the number of faculty from each field but the percentage of faculty. How does this affect the comparisons we draw between the schools? As an example, U Illinois has 20% of their faculty coming from the Humanities and UT has 18%. The actual numbers are that U Illinois has 6 faculty and we have 4. That is a 33% decrease represented by two percentage points. More drastic is UCLA under the same category. 10% of their faculty comes from the Humanities which turns out being 7 faculty. All of these numbers are correct, but it can be misleading because of the way they are presented. Is there a better way to organize this information?

Friday, March 1, 2013

Questions for March 1


WordNet Readings
Rating: 2

1)  Miller talks about how dictionaries leave out a lot of important information because it is assumed that human users know some important, basic information already. For WordNet this information must be explicitly stated because computers only know what we tell them. He talks about adding additional information about the hypernym, information about coordinate terms, etc. He even mentions in passing that traditional dictionaries are distinct from encyclopedias, at least in part, because of the lack of information. My question is, if WordNet is trying to fill in these information gaps and link everything together so related words have connections between them how is this project different from Wikipedia? It sounds like WordNet is trying to be more like an encyclopedia than a dictionary and since it cross-references its entries it sounds like almost the exact same project.

2) If asked to re-create a project as massive as WordNet, where would you begin? What would your process be? Seeing how detailed each part of speech becomes it seems an overwhelmingly gargantuan task. And because language is constantly changing the larger question becomes: can we keep up? Is WordNet constantly becoming more and more up-to-date or simply falling further and further behind as language evolves?

3) If concepts like ‘gradation’ and ‘markedness’ are not used in WordNet, why do Fellbaum and the others include them in the reading? Do these concepts help the reader understand the ideas that WordNet does include? Is it because they wish to be transparent about their known weaknesses? If you were the author of this article would you include these sections?

HP Grice
Rating: 3

1) Does knowing whether or not you are failing to fulfill a particular maxim matter? For example, the second maxim under ‘Quality’ is “Do not say that for which you lack adequate evidence.” If a speaker should have adequate evidence to support their point but is unaware that the evidence they are relying on is faulty how would this be classified? As a VIOLATION? Can a conversational implicature arise from such a situation?

2) How does Grice’s discussion of divergences in logical formal devices at the beginning set the stage for the rest of the article? In other words, why did Grice bother to talk about them? Could he have just started off with the section on Implicature?

3) Near the end of the article Grice asserts “The implicature is not carried by what is said, but only by the saying of what is said.” Does this hold true for any other type of linguistic form? If so, are there commonalities between implicature and the other forms communicated by the saying, not the said?

Church and Hanks
Rating: 2

1) What is the role of Church’s association ratio within WordNet? Would this be a benefit to the project since is helps show how some words are connected? Or would it simply overly-complicate things? Is the relevance of association ratio different for the various relations observed (Fixed, compound, semantic and Lexical)? For example, would WordNet benefit from knowing that ‘bread’ is related to ‘butter’ or ‘United’ is related to ‘States’ but not benefit from knowing ‘refraining’ is related to ‘from’?

2) Besides the number of words and a brief mention that one is mostly British and another American journalese we don’t get a lot of clarification about what is in the corpora used in the study. How would the content affect our perception of the results? For example, if one corpus was all dime-store romance novels and another was filled with various encyclopedias would that change our view of the results? If so, how?

3) As technology continues to improve computers become more and more intelligent and are able to perform more tasks that were previously thought to require human effort. Association ratio is simply a tool for lexicographers, it does not replace them, but in the future how could this technology improve to minimize the work of the lexicographer and maximize the effort of the computer?