Friday, April 19, 2013

Questions for 4-19


Few
Rating:  4

1) How do we balance ensuring that we present our information in multiple formats for higher level consumption with the issue he raises about our brains having a limited ability to hold multiple items in our minds at the same time? Look at Fig. 35.5. Here is a visualization that represents information in multiple formats, helping the reader to better understand the underlying data. But images such as this can sometimes be intimidating; a quick glance can make the display seem complicated, even if it actually simplifies the data upon closer inspection. When it comes to data visualization how much is too much and how much is not enough?

2) Do the principles relating to human perception that Few discusses hold true for different types of learners? Some people learn better by seeing or doing or hearing – would the principles of visualization discussed here apply equally to all these groups? Looking at his early example in figures 35.1 and 35.2 isn’t it possible that someone who works well with numbers and learns by doing would see figure 35.1 and compare the values of the numbers in their head almost automatically? Not that this would negate anything Few says. Such a person would still probably glean the same information more quickly from the graphical figure, but for such a person looking at figure 35.1 can we really say “this table fails” as Few does?

3) The node and link visualization can be an excellent tool for showing relationships between entities other than variables. The social media example used in the article is a good one. But I worry that we embrace it so readily because it looks nice. Is this really an effective method for demonstrating relationships? Looking at the figure can you tell me Amanda’s relationship to Nick? How about Scott’s relationship to Ken? Or Jason’s relationship to Christine? All of this is on the chart, but because the chart is cluttered with so many nodes it is not readily apparent (at least, it isn’t to me). To quote Few’s introduction, “’a picture is worth a thousand words’ - often more - but only when the story is best told graphically rather than verbally and the picture is well designed.” How could this visualization be better designed to overcome some of its weaknesses?

Icke
Rating:  3

1) Does it seem strange to anyone else that Icke spends 6.5 pages talking about the system aspects of VA, and crams user aspects and human-machine collaboration aspects into less than one page total while trying to tell us that user input and system computing power need to be balanced?

2) Icke says that the correct algorithms must be chosen for the dataset with which you are working but he doesn’t really explain how this is done. What factors are important to consider when deciding on analytics? What are the available options when it comes to analytics? There is a lot of information in the paper about the options available when it comes to visualization types, but it is quiet about the analytics themselves. Since various types of data and visualizations are discussed wouldn’t it make sense to expect a cursory discussion of available analytical tools?

3) What would Few say about this article? Would he agree with the visualizations Icke uses? Do they incorporate aspects of human perception? Looking at figure 35.9 from Few’s article, are Icke’s visualizations in-balance or out-of-balance when looking at the brain functions of seeing vs. thinking?

Lam et al.
Rating:  2

1) What is the difference between the process-oriented CTV scenario and the visualization-oriented UP scenario? They both seem to be evaluating how a program conveys information and how that information is received. It just seems to me that these types of evaluations would have a lot of overlap.

2) Why are the visualization scenarios so much more common than the process ones? I understand that some of it comes from the traditions of the HCI and CG fields, but the process scenarios are certainly not unknown since they’re 15% of the evaluations looked at in this study, and if they truly provide profitable information wouldn’t companies be interested in exploring those options? I wonder if the methodology of this paper had been changed to include papers from more than the 4 sources listed if that would have changed the percentages significantly.

3) Each year the number of papers that include evaluations increases. The authors mention that, according to the review by Barkhuus and Rode, the quality of these evaluations may have remained the same. What is the impact of static evaluations on the future development of visualization resources? Are the current evaluations sufficient or should we be investigating new methods that could spark developers to reach for a higher standard?

Friday, April 12, 2013

Questions for 4-12


Mislove, et al.
Rating: 3

1) Do people behave differently in a social network that is based off of sharing a particular type of media (photos in Flickr, videos on YouTube, etc.) as opposed to social networks specifically built for connecting socially(like Orkut)? If so, is this change in behavior significant? How would it manifest itself in this study?

2) The authors mention that their study will be of import to fields outside of just computer science. They mention sociology specifically. But who else benefits from this study? Since we are reading this article we can safely assume that iSchools would be interested, but what about business? Economics? Humanities? How would they find value in this study? What other fields are being overlooked?

3) The article’s methodology was a little different for Orkut than for the other sites because Orkut does not export an API for external developers. This meant that the information had to be gathered a little differently and the authors explain that this will have skewed the results slightly for Orkut. I get that partial BFS crawls have shown sampling bias in other instances, but couldn’t at least some of the differences between Orkut and the other sites stem from the differences between Orkut and the other sites? Orkut’s main purpose is social networking; the other sites are focused on blogs or photos or videos. Orkut has a user base that is largely Brazilian and Indian; the other sites are more globally balanced. Couldn’t these differences account for some of the differences in the results?

Bizer et al.
Rating: 2

1) Maybe I’ve just missed the point, but why is RDF format so much better than using traditional HTML documents? I get that RDF gives additional information about the relationship between two things, but can’t that relationship often be understood without being explicitly stated?

2) As demonstrated in the article, linked data has the potential to really improve several services/apps we use on-line and on mobile devices. How would linked data improve apps like: eReading technology? Games? Schedulers? Financial planners? The list goes on.

3) Is linked data really the next big thing? Or is it more of a buzzword that people get excited about today, only to forget about tomorrow like their now forgotten Palm Pilots and MySpace accounts? Sure there are advantages to linked data over standard HTML style browsing, but will people be willing to put forth the effort to get there?

I Horrocks
Rating: 4

1) How does WordNet fit into the semantic web? Horrocks talks about an ontology that would allow us to include the term SnowyOwl within the larger classification of Owl, and Owl within Raptor, and so on. Isn’t this closely related to WordNet? Is this a main motivation for WordNet being developed?

2) Horrocks tells us on page 6 that “query answering in OWL is not simply a matter of checking the data, but may require complex reasoning to be performed.” Should OWL and RDF searches ever really begin to take off wouldn’t this extra reasoning mean longer processing times on searches? Will people be willing to wait the extra time for the process to run if it means they get highly integrated answers from Linked data? Or will they stick to their old, tried-and-true methods that maybe aren’t as detailed in their response but can process much faster?

3) Future directions include continuing progress in ontology alignment – where ontologies whose domains overlap are reconciled and all are the better for it. Does this mean that the eventual end goal is one large ontology that covers everything? All subjects are going to overlap somehow into other subjects and everything can be connected in this way. How would it change the semantic web if we had an ontology covering almost every topic ready to go today?

Friday, April 5, 2013

Questions for 4-5


Golder, Huberman
Rating: 3

1) The largest category of tags from the study was those that identified what or who the URL in question was about. Rather than represent a need for extensive tagging could this represent a need for better title metadata? Even if this isn’t the case, how can we limit recording redundant information as we tag?

2) Golder talks about some of the strengths and weaknesses of tagging versus taxonomies. Has anyone ever tried to combine these systems? Could we eliminate having to search in as many areas of taxonomies if we could retrieve results from multiple areas of the taxonomy were tagged similarly? Would this simply result in the same problems we already experience?

3) Golder tells us that users are extremely varied in their tagging behavior. Looking at those who use many tags compared to those who use only a few are there trends in the types of tags used? Do those who only use a few tags use more personalized tags as opposed to descriptive tags? Or is there no correlation at all?

Marshall, Cathy
Rating: 4

1) Just to play devil’s advocate for a moment – why do we even care about the metadata for all these photos, most of which are intended for personal use? We recognize that we can’t possibly archive everything so who cares about the tags or description associated with the photo of Mr. John Tourist standing on bull testicles?

2) How would knowing the tendency for different kinds of metadata (place, artifact, story) to appear in different metadata types (title, tags, captions) help programmers working with image retrieval? Would it make their algorithms more efficient? What other related fields, like IR, would find this study relevant to their work?

3) Marshall mentions the problem associated with some user’s reluctance to refer to the bull’s testicles.  When relying on the masses to provide metadata how can we account for tendencies such as this where people edit their input based on personal beliefs, biases, opinions, etc.?

Kling
Rating: 2

1) Kling talks about the productivity paradox, explaining that the introduction of computers did not increase productivity as promised. What technologies today are new and promising, and making claims of revolutionizing our lives? Is there anything to separate these claims from those made in the 60s-80s about computing? Or will technology today leave us unsatisfied as well? Is technology destined to fall short of expectations? If not, how does it avoid that pitfall?

2) What was it about The Electronic Journal of Cognitive and Brain Sciences (EJCBS) that caused it to have poorer results than Electronic Transactions on Artificial Intelligence (ETAI)? Are the differences between these journals social or technological? In these journals, and in other settings as well, does social interaction influence technology or is it the other way around?

3) In section 6 “Why Social Informatics Matters” Kling begins by stating, “Social informatics research pertains to information technology use and social change in any sort of social setting, not just organizations.” How would Kling explain the benefits of social informatics to a religious leader? A university professor? A political candidate? 

Friday, March 22, 2013

Questions for 22 March


Smeulders et al.
Rating: 1

1) Why is it important to look at the past? The authors of this paper could have chosen to focus on the future and talked about where content-based image retrieval is going, but instead they are looking at what was happening 10 years before their publication to see what ideas worked out and which ones didn’t.  What value does this have for present researchers?

2) How does the end goal of the user affect the image retrieval tools used? Why do some methods fit some search patterns better than others? When I search a set of images (locally on my computer or on-line) what happens differently if I’m looking for an image of the Mona Lisa as opposed to an image of a generic tree? Since Smeulders wrote this paper has the field improved as far as using the right tools for the job is concerned?

3) The authors raise the question of how to evaluate system performance in content-based image retrieval but focus mainly on the problems associated with it without discussing possible solutions to those problems. How can some of these challenges be met? What has been tried – successfully or unsuccessfully – since the paper was published?

Saracevic, Tefko
Rating: 2

1) Saracevic claims on page 146 that we need no definition for relevance because its meaning is known to everyone. He compares it to information in this way, as though information needs no definition. But in the same paragraph he asserts that relevance “affects the whole process of communication, overtly or subtly.” He calls it a “fundamental aspect of human communication.” If relevance is so important how can we get by without defining it? Wouldn’t a definition help us understand the meaning better than when we rely on intuition alone? That understanding could, in turn, lead to improved communication.

2) How does this paper relate to the Smeulders paper we read? What is the role of relevance in content-based image retrieval? In the 25 years between the two papers did the concept of relevance evolve?

3) There are many different views on relevance described in the paper. What is the “so what?” of these views? How does this philosophical discussion of relevance affect the practical side of information science? As an example, would a follower of the ‘deductive inference view’ build a different retrieval system than a follower of the ‘pragmatic view’? What differences would there be and how would the differing schools of thought give birth to these differences?

Croft, Metzler, and Strohman
Rating: 3

1) The definition of Information Retrieval that the authors borrow from Gerard Salton is very broad. The benefit to this is that the definition is still applicable today even though it was penned 45 years ago. What is the downside to using a dated definition? Are they missing out on any potential benefits that an updated definition might bring? If they wanted to update the definition, could they? Since IR is so broad that it encompasses many fields is it possible to update the definition without driving a wedge between certain aspects of IR?

2) How does time affect IR? As time passes objects can change – especially digital objects. A new version of software is released, a website’s content is updated, Wikipedia is edited, etc. If the information I’m seeking is something that existed on a certain day or at a certain time how does this affect IR?

3) How do concepts like ‘relevance’ and/or ‘evaluation’ transfer from one branch of IR to another? How are these concepts different for, say, a search engine designer and a cartographer (who, by Salton’s definition is in an IR career)? Is there a difference?

Friday, March 15, 2013

Questions for 3-15


JISC Digital Media
Rating: 4

1) What is the risk with limiting your collection to one specific metadata schema? Is there a way to ensure that metadata that falls outside your schema is not lost? What benefits to having a schema are there and how do they outweigh the risks? Do they outweigh the risks?

2) How does they JISC definition of metadata compare to Gilliland’s definition? What differences and/or similarities are there? Could the differences arise because JISC is specifically speaking to “collections of sharable digital collections [of] still image, moving image or audio collections,” are they due to differences in author’s opinions or are they due to some other factor?

3) According to the website metadata can be kept in the digital file itself, in a database, in an XML document, or in all of the above. What are the benefits to keeping metadata in each of these locations? If each is beneficial doesn’t it make sense to use all 3 in every case? Why would some organizations choose to not do this?

Gilliland
Rating: 3

1) Metadata is currently a blend of manual and automatic processes. What would you estimate the ratio of manual to automated processes to be? How do you think this will this change as technology progresses? What are the risks in automating the collection and documentation of metadata and what benefits outweigh those risks? Regardless of what will happen, what should happen? Should we automate more processes or should we reintroduce a stronger human element?

2) How is metadata different today than it was 100 years ago? 10 years ago? Is the rate of change slow enough that we will be able to keep up with it?

3) Gilliland says that all information objects have 3 features: content, context, and structure. How do these features relate to the 5 attributes of metadata laid out later in the article? Does each attribute detail information about a single feature? Or do all the attributes exist separately for each feature? Should we look to accept either the 3 features or the 5 attributes or do they work together to paint a complete picture of the information object?

Nunberg
Rating: 5

1) Here again we see the need for a human element in computer processes. We saw it in the WordNet readings and the Church and Hanks readings from two weeks ago. We saw it in the data mining reading from the week before that. With the advances in technology we’ve experienced in our lifetime why do we still have this dependence on human input? Will computers ever be able to overcome the need for the human element? Should we be worried if this were to ever happen?

2) What kind of metadata is Nunberg talking about here? Are the things he mentions administrative? Descriptive? Technical? Something else? Is there a pattern here where some types of metadata are being bungled and others are not?

3) Why didn’t anyone tell the world that Sigmund Freud and his translator invented the internet back in 1939? Wouldn’t this have brought technology forward much more quickly? How would this have affected world history over the past 70 years?

Wiggins
Rating: 4

1) When they found that the information on the various schools’ websites were not up to date, did the authors attempt to contact the schools in question to get more reliable information? I mean, if they were already on the schools’ websites the phone number would probably have been right there. It just seems like a simple step that would have improved the results of the study. If they did not, could there be a good reason why they did not?

2) According to Table 2, UT Austin does not fit the stereotypical iSchool makeup of the members of the iSchool Caucus. We had much lower numbers of Computing faculty and we had the 3rd highest percentage of both Humanities and Communication faculty. What does that say about the degree programs offered here in contrast with those offered at other schools? Why do you think UT has chosen to be different in the composition of its faculty? Was this a good choice?

3) Table 2 is not based off of the number of faculty from each field but the percentage of faculty. How does this affect the comparisons we draw between the schools? As an example, U Illinois has 20% of their faculty coming from the Humanities and UT has 18%. The actual numbers are that U Illinois has 6 faculty and we have 4. That is a 33% decrease represented by two percentage points. More drastic is UCLA under the same category. 10% of their faculty comes from the Humanities which turns out being 7 faculty. All of these numbers are correct, but it can be misleading because of the way they are presented. Is there a better way to organize this information?

Friday, March 1, 2013

Questions for March 1


WordNet Readings
Rating: 2

1)  Miller talks about how dictionaries leave out a lot of important information because it is assumed that human users know some important, basic information already. For WordNet this information must be explicitly stated because computers only know what we tell them. He talks about adding additional information about the hypernym, information about coordinate terms, etc. He even mentions in passing that traditional dictionaries are distinct from encyclopedias, at least in part, because of the lack of information. My question is, if WordNet is trying to fill in these information gaps and link everything together so related words have connections between them how is this project different from Wikipedia? It sounds like WordNet is trying to be more like an encyclopedia than a dictionary and since it cross-references its entries it sounds like almost the exact same project.

2) If asked to re-create a project as massive as WordNet, where would you begin? What would your process be? Seeing how detailed each part of speech becomes it seems an overwhelmingly gargantuan task. And because language is constantly changing the larger question becomes: can we keep up? Is WordNet constantly becoming more and more up-to-date or simply falling further and further behind as language evolves?

3) If concepts like ‘gradation’ and ‘markedness’ are not used in WordNet, why do Fellbaum and the others include them in the reading? Do these concepts help the reader understand the ideas that WordNet does include? Is it because they wish to be transparent about their known weaknesses? If you were the author of this article would you include these sections?

HP Grice
Rating: 3

1) Does knowing whether or not you are failing to fulfill a particular maxim matter? For example, the second maxim under ‘Quality’ is “Do not say that for which you lack adequate evidence.” If a speaker should have adequate evidence to support their point but is unaware that the evidence they are relying on is faulty how would this be classified? As a VIOLATION? Can a conversational implicature arise from such a situation?

2) How does Grice’s discussion of divergences in logical formal devices at the beginning set the stage for the rest of the article? In other words, why did Grice bother to talk about them? Could he have just started off with the section on Implicature?

3) Near the end of the article Grice asserts “The implicature is not carried by what is said, but only by the saying of what is said.” Does this hold true for any other type of linguistic form? If so, are there commonalities between implicature and the other forms communicated by the saying, not the said?

Church and Hanks
Rating: 2

1) What is the role of Church’s association ratio within WordNet? Would this be a benefit to the project since is helps show how some words are connected? Or would it simply overly-complicate things? Is the relevance of association ratio different for the various relations observed (Fixed, compound, semantic and Lexical)? For example, would WordNet benefit from knowing that ‘bread’ is related to ‘butter’ or ‘United’ is related to ‘States’ but not benefit from knowing ‘refraining’ is related to ‘from’?

2) Besides the number of words and a brief mention that one is mostly British and another American journalese we don’t get a lot of clarification about what is in the corpora used in the study. How would the content affect our perception of the results? For example, if one corpus was all dime-store romance novels and another was filled with various encyclopedias would that change our view of the results? If so, how?

3) As technology continues to improve computers become more and more intelligent and are able to perform more tasks that were previously thought to require human effort. Association ratio is simply a tool for lexicographers, it does not replace them, but in the future how could this technology improve to minimize the work of the lexicographer and maximize the effort of the computer?


Friday, February 22, 2013

Questions for 22 Feb.


Brookshear
Rating: 4

1)  Are lossless systems always the best way to go? If not, when should lossy systems take precedence? What conditions need to exist in order for the lost information to be acceptable?

2) Perhaps I misunderstood something, but isn’t the “commit point” the end of a process? Instead of saying a process has reached the commit point why do we not simply say a process is complete?

3) What are the advantages and disadvantages to indexed filing and hash systems? How can we determine which one works more efficiently in a given situation? Is one always better than the other? Is on usually better than the other?

DT Larose
Rating: 4

1)  Larose says that data mining cannot run itself and there is a need for human supervision over any data mining project. Will this always be the case? Computers are getting smarter all the time right? So will they ever reach the point where they can perform data mining tasks on their own?

2) One of the fallacies related to data mining put forth in the reading is that data mining quickly pay for itself most, if not all, of the time. Is there any way to predict when this will occur?

3) For case study #4 there was no real deployment stage. Does there need to be deployment for a data mining project to have value? Which of the other stages might be skipped over any how would that affect the value of data mining?

Wayner
Rating: 3

1)  How do you balance the need for greater compression with the need for stability? On pg 20 of chapter two we read that variable length coding can really compress a file a lot, but it is also very fragile. Maybe we can give up some of the space compressed if we get a little more stability. But give up too much compression and we have to ask ourselves if the compression is worth it in the first place. What is the balance?