Blog on the landscape: 2009

Wednesday, December 23, 2009

11. References and resources

My blog URL: http://www.spangis.blogspot.com/

DITA website exercise: http://www.student.city.ac.uk/~abhp645

Javascript exercise: http://www.student.city.ac.uk/~abhp645/javascript.html

References:

Belkin, NJ, Oddy, RN and Brooks, HM (1982) ASK for Information Retrieval part 1. Journal of Documentation 38(2) p61-71. http://comminfo.rutgers.edu/~belkin/articles/Belkin%20ASK%20p1.pdf [2.62Mb] [accessed 23 December 2009]

Brin, S and Page, L (1998) The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of the 7th World Wide Web Conference http://infolab.stanford.edu/pub/papers/google.pdf [accessed 23 December 2009]

Day, M (2001) Metadata in a nutshell. Information Europe 6(2), p 11. Draft available online: http://www.ukoln.ac.uk/metadata/publications/nutshell [accessed 23 December 2009]

Meyer, E (2000) What Makes CSS So Great? O’Reillynet http://www.oreillynet.com/pub/a/network/2000/07/21/magazine/css_intro.html?page=1 [accessed 23 December 2009]

Morville, P and Rosenfeld, L (2007) Information Architecture for the World Wide Web, O’Reilly.

Oliver, S (2008) How the Semantic Web Will Change Information Management: Three Predictions
http://web.fumsi.com/go/article/manage/3327 [accessed 23 December 2009]

University College London (2008) Information Behaviour of the Researcher of the Future – A Ciber briefing paper http://www.ucl.ac.uk/infostudies/research/ciber/downloads/ggexecutive.pdf [1.7Mb][accessed 23 December 2009]

Resources:

Archives Hub http://www.archiveshub.ac.uk/arch/ead.shtml - Encoded Archival Description (EAD) [accessed 23 December 2009]

Bibliography and referencing packages http://www.endnote.com/ and http://www.refworks.com/ [accessed 23 December 2009]

CERN http://public.web.cern.ch/public/en/About/WebWork-en.html - How the web works [accessed 23 December 2009]

Copyright free images from http://www.copyrightfreephotos.com/ and http://www.freeimages.co.uk/ [accessed 23 December 2009]

CSS Zen Garden http://www.csszengarden.com/ - Samples of cascading style sheets [accessed 23 December 2009]

DARPA and the Internet Revolution by Mitch Waldrop http://www.darpa.mil/Docs/Internet_Development_200807180909255.pdf - History of the Internet [accessed 23 December 2009]

Dublin Core Metadata Initiative http://dublincore.org/ [accessed 23 December 2009]

Google Search Operators http://www.googleguide.com/advanced_operators.html [accessed 23 December 2009]

InfoDesign (2004) http://www.informationdesign.org/special/wurman_interview.htm - Interview with Richard Saul Wurman [accessed 23 December 2009]

Meyerweb http://meyerweb.com/eric/css - CSS resources from Eric Meyer’s website [accessed 23 December 2009]

Midomi http://www.midomi.com/ - Search for music using your voice by humming or singing [accessed 23 December 2009]

My IP Number http://www.myipnumber.com/ - Find out your Internet Protocol address/number [accessed 23 December 2009]

National Archives http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf - Digital preservation guidance: selecting file formats for long term preservation [accessed 23 December 2009]

Opera http://dev.opera.com/articles/view/mama-tables/ - 2008 figures showing how many websites use table layout [accessed 23 December 2009]

Problem Site http://www.theproblemsite.com/codes/binary.asp - Text to binary encoder [accessed 23 December 2009]

RNIB http://www.rnib.org.uk/PROFESSIONALS/WEBACCESSIBILITY/Pages/web_accessibility.aspx - web accessibility guide [accessed 23 December 2009]

Search Engine Watch http://searchenginewatch.com/2168031 - Explains how search engines work [accessed 23 December 2009]

Technical dictionary http://dictionary.zdnet.com/ and http://www.techterms.com/ [accessed 23 December 2009]

W3C http://www.w3.org/People/Berners-Lee - Biography of Sir Tim Berners-Lee [accessed 23 December 2009]

W3C http://www.w3.org/2001/sw - Interviews with Sir Tim Berners-Lee on the Semantic Web [accessed 23 December 2009]

W3C http://jigsaw.w3.org/css-validator/#validate_by_input+with_options - CSS validator [accessed 23 December 2009]

Web building tutorials and tips including HTML, XML and Javascript http://www.w3schools.com/ and http://www.webmonkey.com/ [accessed 23 December 2009]

Web Design Practices http://www.webdesignpractices.com/navigation/facets.html - Faceted classification as found in shopping websites [accessed 23 December 2009]

YouTube http://www.youtube.com/watch?v=qdFmSlFojIw - Binary in 60 seconds [accessed 23 December 2009]

Zetoc http://zetoc.mimas.ac.uk/ - British Library database of journals and conference proceedings [accessed 23 December 2009]

Monday, December 7, 2009

10. Building blocks, not stumbling blocks

My final entry is about information architecture. The term is attributed to Richard Saul Wurman who, in the 1970s, aimed to make information more understandable (see interview in InfoDesign, 2004).

When developing any information retrieval system including a website, it should be aesthetically pleasing; it needs to function properly such as download quickly, be secure and user-friendly. Navigating and searching for information has to be simple and fast. Morville and Rosenfeld use fishing to describe the way people search the Web ie perfect catch (you know exactly what you want), lobster trapping (you're looking for a variety of answers eg hotels in London) and driftnetting (you want everything on a topic) (Information Architecture for the World Wide Web, 2007, p33-34). By understanding user behaviour, you can develop systems accordingly.

For example, online shopping sites use faceted classification to enhance searching and provide shoppers with time-saving functions. When you log onto the Tesco website, it remembers what you have previously bought so you can select those items again. The public sector is also recognising the importance of personalised websites. When you log onto Redbridge Council's website, you will see information relevant to where you live, such as when your next refuse collection will be.

For library and information professionals, the challenge is in meeting the demands of users who expect a fast 24/7 service. In a University College London paper The Information Behaviour of the Researcher of the Future, it describes the way students turn to web search engines first when looking for information and how higher education and research institutions need to adapt their information retrieval systems to cater for future academics.

Technologies such as radio frequency identification (RFID) are being introduced in libraries and there are opportunities to exploit the technology further, for example, by using mobile technology to assist users in locating books and journals that may be of interest to them based on their reading habits.

^{© City University London Library}

The semantic web or Web 3.0 will revolutionise the way information is shared and will require great attention to structure and organisation - the key components of information architecture (see 2008 BBC Radio 4 Today programme interview with Sir Tim Berners-Lee). Taxonomy and ontology, the vocabulary of library and information professionals, are an integral part of this vision. As the technology becomes smarter, information will need to be readily available without users having to ask for it (see article by Silver Oliver).

Monday, November 30, 2009

9. Javawocky and other strange languages

To gain information about people visiting your website, you can use client side programming like Javascript. It is limited in the amount of information it can handle, unlike server side programming, which can deal with complex queries (eg TfL Journey Planner). Javascript (not to be confused with Java) can also add interest to a website.

As a complete novice to programming, Javascript can be daunting but our lecturer Richard Butterworth recommended we remember the seven pillars of programming wisdom:

1. variables
2. input and output
3. arrays
4. sequence
5. selection or conditions
6. iteration
7. procedures or functions

I produced this webpage using Javascript. A prompt asks if you are interested in news or sport. If you choose news, you are then asked where you live. Based on your response, a link to the BBC News website for your region will appear. If you select sport, you then choose from cycling, golf, football and tennis and a link will appear that will take you to the relevant page on the BBC Sports website. Unfortunately I couldn't figure out how to deal with invalid responses.

If you view the source for the webpage, you will see that I have defined the variables first before adding the conditions and functions. Originally, I didn't use parsing and it worked to an extent, however, it's advisable to add this function to ensure that your commands work properly.

You might need to check that the web brower isn't blocking your script. My webpage works in Firefox but the prompt was blocked when I tested it in Internet Explorer 8.

This was produced using the alert function. If you click on the box, you will see a message.

Monday, November 23, 2009

8. ASK and it will be given

I use different search methods for different purposes. When looking for resources for this blog, my first port of call is recommended course reading before embarking on Google for quick answers; Zdnet for technical definitions; W3Schools and Webmonkey for tutorials and tips and ISI Web of Knowledge and City University London library for academic books and journals.

Belkin, Oddy and Brooks refer to Information Retrieval (IR) as "resolving a user's anomalous state of knowledge", or ASK. They define the process as "ASK, query and evaluate" (Journal of Documentation, 38(2) p61-71, June 1982). When searching for this article, I used the university library only to find it is subscription only, so I used Google and found it on Nicholas Belkin's website.

Search engines are fast ways to find information, operating by web crawling, sometimes referred to as spider or robot (see Search Engine Watch). As a test, I try to find out about information or library science using Boolean operators:

information OR library AND science

The results were (click to enlarge):

As the results were mostly irrelevant, I changed the search to:

information AND science OR library AND science

This improved the relevance but unfortunately a band called Library Science was included in the top results:

This shows how framing questions in different ways can affect the results. Different search engines will also reveal different results (see Google results). Search operators can be used to make the results more meaningful.

IR systems, like search engines, are subjective and differ from data retrieval which is objective (see SQLs and Pearls). To improve search results, data needs to be indexed by collecting, parsing and storing. This involves removing stop words, suffixes and constructing a thesaurus. IR systems use inverted files which enable fast information retrieval. This link shows how an inverted file works.

Evaluating the success of the IR process can be qualitative such as interviews and questionnaires, or quantitative by testing in lab conditions (see Brin and Page's evaluation of Google search results). Another method is to assess the financial value of the information.

The ASK model works with text but may not be appropriate for media such as sound - Zdnet gives examples of music search methods, including Midomi, a website that can identify songs using your voice by singing or humming.

Monday, November 16, 2009

7. SQLs and Pearls

You can find databases in the form of contacts list and shopping websites. In library and information management systems, they can be used for storing and retrieving records of, for example, books, journal articles, images and artefacts and where they're located. There are off-the-shelf packages but they tend to restrict how you can manipulate the information. A relational database table can look like this:

Table: Books

Auth_id	Author	Year	Title
001	Russell, B	1912	The problems of philosophy
002	Wollstonecraft, M	1792	A vindication of the rights of women
003	Luther King, M	1964	Why we can't wait
004	Gingrich, N	2008	Real change: from the world that fails to the world that works

Table: Publishers

Pub_id	Company_name	Address	Auth_id
01	Oxford University Press	Oxford	001
02	Prometheus Books	Buffalo, NY	002
03	Harpers and Row	New York	003
04	Regnery Publishers	Washington DC	004

Structured Query Language (SQL) is used in relational database management systems (RDBMS), which was introduced by E F Codd in the 1970s (see his obituary). A query is based on the following commands:

SELECT - what information you wish to retrieve
FROM - from which table(s) you wish to retrieve the information
WHERE - the conditions of the search

A basic search of the table above entitled "Books" could look like this:

SELECT Auth_id, Author, Year, Title FROM Books WHERE Author LIKE 'Russ%' \g

Or you could use * to select all columns:

SELECT * FROM Books WHERE Author LIKE 'Russ%' \g

This shows that you want to retrieve details from the Books table where the author has "Russ" in his/her name. The % sign is a wildcard and is useful if, for example, you are unsure of the spelling or you don't know if the database uses American or UK English spelling.

The search becomes slightly complicated if you want to obtain one set of data from two or more tables. Taking the two tables above, you can see that there is one column that is common in both: Auth_id. This is called a primary key and can be used to join the two tables to establish a relationship.

If you want to search for titles (from Books table) that are published by Oxford University Press (from Publishers table), you can use the following command:

SELECT Books.Title FROM Books, Publishers WHERE Publishers.Company_name = 'Oxford University Press' AND Publishers.Auth.id = Books.Auth.id \g

This indicates that you are searching for the titles contained in the Books table and cross-referencing it with the Publishers table using the Auth.id column (which is common to both tables) to find titles published by Oxford University Press. \g or ; executes the query.

A quick reference is available on the W3Schools website.

Monday, November 9, 2009

6. CSS not CSI

Websites were initially designed using table layout, and many still are (see findings by Opera). Having worked with such a website, I found it tricky to add and edit content as it's similar to using a spreadsheet for text and images.

Cascading style sheets (CSS) make webpages look presentable and provide a framework for structuring content. XML and HTML are hierarchical, so CSS makes it look attractive. CSS allows the content to be separate from the presentation. This is important when considering accessibility, for example, someone who has a visual impairment may use a screen reader (see guidance by the RNIB).

A website with good examples of CSS is CSS Zen Garden. If you look at my website you'll see that I've applied a different style sheet to the main page from the html exercise and Epping Forest pages. In reality, it's better to be consistent throughout, particularly to gain brand recognition.

A style sheet determines the formatting of, for example, headers, paragraphs and fonts. This is what CSS markups look like. Note that you need to use American spelling such as color.

You can check if your CSS is valid with the W3C CSS validation service.

Advocates of CSS (for example, Eric Meyer) claim that websites with style sheets load up more quickly than those using tables; are easier to maintain and redesign; are more accessible and that search engines are more likely to find websites with CSS. Critics say CSS-based websites are unstable and do not work for all internet browers. This article from Smashing Magazine looks at websites with table layouts and how web developers are now using the div tag in the same way.

Monday, October 26, 2009

5. Extensibly Marked Up

I'm new to XML (eXtensible Markup Language) but as I embark on an information career, I will need to be familiar with it. XML defines data and allows information to be moved from one place to another. It is used to organise information, such as in records management, for example, the Dublin Core Metadata Initiative, which is an international standard that ensures records can be shared or transferred and in archiving, the Encoded Archival Description (EAD) is the internationally recognised schema. XML is also used in bibliography and referencing packages such as EndNote and RefWorks, which enables resources to be transferred to other packages as well as word processing software.

I produced this list of magazines using XML. It's a very basic structure, having only the title, subject and date of issue as the categories. To improve the search and make it more detailed, I could also add a list of contents and writers/contributors and where the magazine is located.

XML is like a family. It contains one root element, the parent, while the elements below the root are the children. In this example, the root element is the magazine list and the child elements are the title, subject and date.

XML tags must be well-formed and valid otherwise the document will not display properly. Unlike HTML, XML must have open and closed tags and it has to contain consistent syntax. For example, if you use "yes" or "no", you cannot use other words such as "maybe".

A document type definition (DTD), describes the validity of the XML:

Working with XML and DTD requires a really good eye for detail. DTD is very limited as it's not flexible and it looks like a different language to XML, hence the XML Schema is becoming increasingly popular.

Monday, October 19, 2009

4. Worth more than a thousand words

I'm put off by websites that are text-heavy. Images help to improve presentation and can also be a useful aid to people who don't have English as their first language. Information management systems such as bibliography packages tend not to have images other than a logo - functionality is important but personally I think images would make people want to use the systems.

The most commonly used formats for websites are Gif, Jpeg and Png:

Gif (graphics interchange format) images record up to 256 colours and are lossless as they retain information when compressed. Gifs can be used to create animated files by placing one image upon another.

Jpeg (joint photographic experts group) supports more than 16 million colours and is the common format for photographs. They are lossy as they lose information when compressed.

Png (portable network graphics) is similar to Gifs in that it is lossless but is higher quality as it caters for up to 24 bits in colour. However, it cannot be animated.

This link illustrates what happens to images when they are resized and resampled. Throughout this blog, I've used different image formats to show the differences.

When digitising records, you need to consider which format is most appropriate for what purpose. For example, in archiving and preservation, it would need to be at optimum resolution whereas for the web, it would be lower quality so that they load up quickly. Using formats that support metadata such as Tagged Image File Format (TIFF) is recommended by the National Archives when preserving images as this would enable high searchability.

Images can be added to websites by embedding a file that's saved on your drive (example 1). It can also be linked from an online source (example 2). Be aware that if the file is moved or removed, then you will lose the image from your website.

Also be mindful of copyright issues. There are websites where you can obtain copyright-free photographs, such as copyright-free.com and freeimages.co.uk.

Monday, October 12, 2009

3. WWW and the Net

I confess that, like a lot of people, I've used the terms Internet and World Wide Web in the wrong context. The Internet is the infrastructure, which allows computers to communicate with each other while the World Wide Web is a system of hyperlinked documents or hypertext that is accessed through the Internet.

The Internet was based on an information sharing system designed by the US military (DARPA) in the 1960s while the Web was developed by CERN, the European Organization for Nuclear Research in the early 1990s.

Each computer on the Net has a unique identification number. This is done through the Internet Protocol (IP). You can find your IP number here. The IP address is translated into a domain name to make it easier to recognise. The Uniform Resource Locator (URL) enables us to find documents and files on the internet. The format is: protocol://servername/local file path. For example, my website URL is http://www.student.city.ac.uk/~abhp645/. The CERN website explains how the Web works.

This system was developed in the 1990s by Sir Tim Berners-Lee, who in October 2009 admitted the format didn't really need the two slashes // (see Daily Telegraph article).

When creating a webpage, you can use marked up text. Markups can represent the style and structure of the webpage. W3Schools website provides an easy to follow tutorial on html.

Here's a link to my website. I added hyperlinks to connect from one document to the main page. This links to the html exercise that I produced. If you view the page source, you will see the html markup. I used the University's Unix system to publish the document onto the Web so that it is now visible to a global audience.

Monday, October 5, 2009

2. Bits, Bytes and Binary

In primary school, my teacher taught us computer science by using an empty cereal packet, knitting needles and hole-punched card. This was before I progressed to binary. This YouTube video shows how simple binary is.

Computers process bits in multiples of eight ie 8, 16, 32, 64 and 128. A bit can be either 0 or 1, while a byte is a sequence of bits. The more bits, the greater the ability to access more data.

Binary can represent text. For example:

01110111 01101111 01110010 01100100

is binary for word

Converting text to binary can take a while, so using an encoder such as the one on The Problem Site is really useful.

From the 1970s, converting binary into text was through the American Standard Code for Information Interchange (ASCII) which is basic text. When you use Notepad, you will see the limitations of ASCII in terms of language and formatting. When Unicode was developed, it allowed different language scripts such as 中文 (ie Chinese). ASCII is built into Unicode and is compatible with it.

I produced my website using Notepad before converting it into a web document by changing the extension to .html. If you right click and view page source, you can see the html markup.

Some documents can only be viewed properly with specific programs, for example, you will need Microsoft Word for a document with .doc extension. Problems can also arise when documents produced in new versions of the software are not compatible with older ones.

Monday, September 28, 2009

1. New kid on the blog

I'm entering the bloggersphere for the first time with a weekly account of my learning experience in Digital Information, Technologies and Architecture (DITA), a module of my MSc in Information Science.

Blogs are Web 2.0 technology as they are user content driven; similar to Facebook, MySpace and Twitter. Blogs allow people to communicate topics in detail. For example, I follow blogs such as the UK Freedom of Information as it's a subject I'm interested in.

I chose Blogger after considering WordPress and LiveJournal. Blogger won because it's simple to use; perfect for a beginner like me. I quite fancy WordPress, should I ever become a professional blogger, as there are numerous features to play around with.

I've added functions such as a search box and display of tags/labels to make it easy for visitors to find items. I'll also be building up a library of bookmarks using Diigo, where the references can be exported into Blogger. Social bookmarking contain metadata, which I will be learning more about on my course.

To attract visitors, a blog needs to make an impact, or else it will get lost in cyberspace. You need to understand your audience and how much they know about the subject. Adding a touch of humour and images might improve the format. I like the style of the Wired Science blog; the way the author manages to be succinct and his use of hyperlinks to define points.

With Blogger, I hope to make use of the html (hyper text markup language) editing option, embed images and video clips and add hyperlinks to websites that provide definitions or to illustrate particular points. You can keep up with my DITA lab exercises on my website.

Blog on the landscape

Wednesday, December 23, 2009

11. References and resources

Monday, December 7, 2009

10. Building blocks, not stumbling blocks

Monday, November 30, 2009

9. Javawocky and other strange languages

Monday, November 23, 2009

8. ASK and it will be given

Monday, November 16, 2009

7. SQLs and Pearls

Monday, November 9, 2009

6. CSS not CSI

Monday, October 26, 2009

5. Extensibly Marked Up

Monday, October 19, 2009

4. Worth more than a thousand words

Monday, October 12, 2009

3. WWW and the Net

Monday, October 5, 2009

2. Bits, Bytes and Binary

Monday, September 28, 2009

1. New kid on the blog

About Me

Search This Blog

Blog Archive

Labels

Reading list

CILIP Communities

Favourite weblinks

Wednesday, December 23, 2009

Monday, December 7, 2009

Monday, November 30, 2009

Monday, November 23, 2009

Monday, November 16, 2009

Monday, November 9, 2009

Monday, October 26, 2009

Monday, October 19, 2009

Monday, October 12, 2009

Monday, October 5, 2009

Monday, September 28, 2009

About Me

Subscribe To

Search This Blog

Blog Archive

Labels

Reading list

CILIP Communities

Favourite weblinks