Geographical Data Sets -
Geographical Data Sets
Geographic Data Types
Although the two terms, data and information, are often used indiscriminately, they both have a specific meaning. Data can be described as different observations, which are collected and stored. Information is that data, which is useful in answering queries or solving a problem. Digitizing a large number of maps provides a large amount of data after hours of painstaking works, but the data can only render useful information if it is used in analysis.

Spatial and Non-spatial data
Geographic data are organised in a geographic database. This database can be considered as a collection of spatially referenced data that acts as a model of reality. There are two important components of this geographic database: its geographic position and its attributes or properties. In other words, spatial data (where is it?) and attribute data (what is it?)

Attribute Data
The attributes refer to the properties of spatial entities. They are often referred to as non-spatial data since they do not in themselves represent location information.

District Name Area Population
Noida 395 sq. Km. 6,75,341
Ghaziabad 385 sq. Km. 2,57,086
Mirzapur 119 sq. Km. 1,72,952

Spatial data
Geographic position refers to the fact that each feature has a location that must be specified in a unique way. To specify the position in an absolute way a coordinate system is used. For small areas, the simplest coordinate system is the regular square grid. For larger areas, certain approved cartographic projections are commonly used. Internationally there are many different coordinate systems in use.

Geographic object can be shown by FOUR type of representation viz., points, lines, areas, and continuous surfaces.

Point Data
Points are the simplest type of spatial data. They are-zero dimensional objects with only a position in space but no length.

Line Data
Lines (also termed segments or arcs) are one-dimensional spatial objects. Besides having a position in space, they also have a length.

Area Data
Areas (also termed polygons) are two-dimensional spatial objects with not only a position in space and a length but also a width (in other words they have an area).


Continuous Surface
Continuous surfaces are three-dimensional spatial objects with not only a position in space, a length and a width, but also a depth or height (in other words they have a volume). These spatial objects have not been discussed further because most GIS do not include real volumetric spatial data.

Geographic Data -- Linkages and Matching
A GIS typically links different sets. Suppose you want to know the mortality rate to cancer among children under 10 years of age in each country. If you have one file that contains the number of children in this age group, and another that contains the mortality rate from cancer, you must first combine or link the two data files. Once this is done, you can divide one figure by the other to obtain the desired answer.

Image of

Exact Matching
Exact matching occurs when you have information in one computer file about many geographic features (e.g., towns) and additional information in another file about the same set of features. The operation to bring them together is easily achieved by using a key common to both files -- in this case, the town name. Thus, the record in each file with the same town name is extracted, and the two are joined and stored in another file.

Name Populaiton
A 4038
B 7030
C 10777
D 5798
E 5606
Name Avg. housing Cost
A 30,500
B 22,000
C 100,000
D 24,000
E 24,000

Name Population Avg. Housing Cost
A 4038 30,500
B 7030 22,000
C 10777 100,100
D 5798 24,000
E 5606 24,000


Hierarchical Matching
Some types of information, however, are collected in more detail and less frequently than other types of information. For example, financial and unemployment data covering a large area are collected quite frequently. On the other hand, population data are collected in small areas but at less frequent intervals. If the smaller areas nest (i.e., fit exactly) within the larger ones, then the way to make the data match of the same area is to use hierarchical matching -- add the data for the small areas together until the grouped areas match the bigger ones and then match them exactly.

The hierarchical structure illustrated in the chart shows that this city is composed of several tracts. To obtain meaningful values for the city, the tract values must be added together.

Tract Town Population
101 P 60,000
102 Q 45,000
103 R 35,000
104 S 36,000
105 T 57,000
106 Nakkhu 25,000
107 Kupondole 58,000
Tract 101  
  Tract 102

Tract 103

  Tract 104

Tract 105

  Tract 107

Tract 106

Fuzzy Matching
On many occasions, the boundaries of the smaller areas do not match those of the larger ones. This occurs often while dealing with environmental data. For example, crop boundaries, usually defined by field edges, rarely match the boundaries between the soil types. If you want to determine the most productive soil for a particular crop, you need to overlay the two sets and compute crop productivity for each and every soil type. In principle, this is like laying one map over another and noting the combinations of soil and productivity.

A GIS can carry out all these operations because it uses geography, as a common key between the data sets. Information is linked only if it relates to the same geographical area.


Why is data linkage so important? Consider a situation where you have two data sets for a given area, such as yearly income by county and average cost of housing for the same area. Each data might be analysed and/or mapped individually. Alternatively, they may be combined. With two data sets, only one valid combination exists. Even if your data sets may be meaningful for a single query you will still be able to answer many more questions than if the data sets were kept separate. By bringing them together, you add value to the database. To do this, you need GIS.

Image of
Figure 2

Principal Functions of GIS
Data Capture
Data used in GIS often come from many types, and are stored in different ways. A GIS provides tools and a method for the integration of different data into a format to be compared and analysed. Data sources are mainly obtained from manual digitization and scanning of aerial photographs, paper maps, and existing digital data sets. Remote-sensing satellite imagery and GPS are promising data input sources for GIS.

Database Management and Update
After data are collected and integrated, the GIS must provide facilities, which can store and maintain data. Effective data management has many definitions but should include all of the following aspects: data security, data integrity, data storage and retrieval, and data maintenance abilities.

Geographic Analysis
Data integration and conversion are only a part of the input phase of GIS. What is required next is the ability to interpret and to analyze the collected information quantitatively and qualitatively. For example, satellite image can assist an agricultural scientist to project crop yield per hectare for a particular region. For the same region, the scientist also has the rainfall data for the past six months collected through weather station observations. The scientists also have a map of the soils for the region which shows fertility and suitability for agriculture. These point data can be interpolated and what you get is a thematic map showing isohyets or contour lines of rainfall.

Presenting Results
One of the most exciting aspects of GIS technology is the variety of different ways in which the information can be presented once it has been processed by GIS. Traditional methods of tabulating and graphing data can be supplemented by maps and three dimensional images. Visual communication is one of the most fascinating aspects of GIS technology and is available in a diverse range of output options.

Data Capture an Introduction
The functionality of GIS relies on the quality of data available, which, in most developing countries, is either redundant or inaccurate. Although GIS are being used widely, effective and efficient means of data collection have yet to be systematically established. The true value of GIS can only be realized if the proper tools to collect spatial data and integrate them with attribute data are available.

Manual Digitization
Manual Digitizing still is the most common method for entering maps into GIS. The map to be digitized is affixed to a digitizing table, and a pointing device (called the digitizing cursor or mouse) is used to trace the features of the map. These features can be boundary lines between mapping units, other linear features (rivers, roads, etc.) or point features (sampling points, rainfall stations, etc.) The digitizing table electronically encodes the position of the cursor with the precision of a fraction of a millimeter. The most common digitizing table uses a fine grid of wires, embedded in the table. The vertical wires will record the Y-coordinates, and the horizontal ones, the X-coordinates.

The range of digitized coordinates depends upon the density of the wires (called digitizing resolution) and the settings of the digitizing software. A digitizing table is normally a rectangular area in the middle, separated from the outer boundary of the table by a small rim. Outside of this so-called active area of the digitizing table, no coordinates are recorded. The lower left corner of the active area will have the coordinates x = 0 and y = 0. Therefore, make sure that the (part of the) map that you want to digitize is always fixed within the active area.

Scanning System
The second method of obtaining vector data is with the use of scanners. Scanning (or scan digitizing) provides a quicker means of data entry than manual digitizing. In scanning, a digital image of the map is produced by moving an electronic detector across the map surface. The output of a scanner is a digital raster image, consisting of a large number of individual cells ordered in rows and columns. For the Conversion to vector format, two types of raster image can be used.

  • In the case of Chloropleth maps or thematic maps, such as geological maps, the individual mapping units can be separated by the scanner according to their different colours or grey tones. The resulting images will be in colours or grey tone images.

  • In the case of scanned line maps, such as topographic maps, the result is a black-and-white image. Black lines are converted to a value of 1, and the white areas in between lines will obtain a value of 0 in the scanned image. These images, with only two possibilities (1 or 0) are also called binary images.

    The raster image is processed by a computer to improve the image quality and is then edited and checked by an operator. It is then converted into vector format by special computer programmes, which are different for colour/grey tone images and binary images.


Scanning works best with maps that are very clean, simple, relate to one feature only, and do not contain extraneous information, such as text or graphic symbols. For example, a contour map should only contain the contour line, without height indication, drainage network, or infrastructure. In most cases, such maps will not be available, and should be drawn especially for the purpose of scanning. Scanning and conversion to vector is therefore, only beneficial in large organizations, where a large number of complex maps are entered. In most cases, however, manual digitizing will be the only useful method for entering spatial data in vector format.

Image of
Figure 3

Data Conversion
While manipulating and analyzing data, the same format should be used for all data. This Scanning System implies that, when different layers are to be used simultaneously, they should all be in vector or all in raster format. Usually the conversion is from vector to raster, because the biggest part of the analysis is done in the raster domain. Vector data are transformed to raster data by overlaying a grid with a user-defined cell size.

Sometimes the data in the raster format are converted into vector format. This is the case especially if one wants to achieve data reduction because the data storage needed for raster data is much larger than for vector data.

A digital data file with spatial and attribute data might already exist in some way or another. There might be a national database or specific databases from ministries, projects, or companies. In some cases a conversion is necessary before these data can be downloaded into the desired database.

The commonly used attribute databases are dBase and Oracle. Sometimes spreadsheet programmes like Lotus, Quattro, or Excel are used, although these cannot be regarded as real database softwares.

Remote-sensing images are digital datasets recorded by satellite operating agencies and stored in their own image database. They usually have to be converted into the format of the spatial (raster) database before they can be downloaded.


Spatial Data Management

Geo-Relational Data Model
All spatial data files will be geo-referenced. Geo-referencing refers to the location of a layer or coverage in space defined by the coordinate referencing system. The geo relational approach involves abstracting geographic information into a series of independent layers or coverages, each representing a selected set of closely associated geographic features (e.g., roads, land use, river, settlement, etc). Each layer has the theme of a geographic feature and the database is organized in the thematic layers.

With this approach users can combine simple feature sets representing complex relationships in the real world. This approach borrows heavily on the concepts of relational DBMS, and it is typically closely integrated with such systems. This is fundamental to database organization in GIS.

Topological Data Structure.
Topology is the spatial relationship between connecting and adjacent coverage features (e.g., arc, nodes, polygons, and points). For instance, the topology of an arc includes from and to nodes (beginning of the arc and ending of the arc representing direction) and its left and right polygon. Topological relationships are built from simple elements into complex elements: points (simplest elements), arcs (sets of connected points), and areas (sets of connected arcs). Topological data structure, in fact, adds intelligence to the GIS database.

Attribute Data Management
All Data within a GIS (spatial data as well as attribute data) are stored within databases. A database is a collection of information about things and their relationships to each other. For example, you can have an engineering geological database, containing information about soil and rock types, field observations and measurements, and laboratory results. This is interesting data, but not very useful if the laboratory data, for example, cannot be related to soil and rock types.

The objective of collecting and maintaining information in a database is to relate facts and situations that were previously separate.

The principle characteristics of a DBMS are: -

Centralized control over the database is possible, allowing for better quality management and operator-defined access to parts of the database;

Data can be shared effectively by different applications;

The access to the data is much easier, due to the use of a user-interface and the user-views (especially designed formula for entering and consulting the database);

Data redundancy (storage of the same data in more than one place in the database) can be avoided as much as possible; redundancy or unnecessary duplication of data are an annoyance, since this makes updating the database much more difficult; one can easily overlook changing redundant information whenever it occurs; and

The creation of new applications is much easier with DBMS.

The disadvantages relate to the higher cost of purchasing the software, the increased complexity of management, and the higher risk, as data are centrally managed.


Relational Database -- Concepts & Model
The relational data model is conceived as a series of tables, with no hierarchy nor any predefined relations. The relation between the various tables should be made by the user. This is done by identifying a common field in two tables, which is assigned as the flexibility than in the other two data models. However, accessing the database is slower than with the other two models. Due to its greater flexibility, the relational data model is used by nearly all GIS systems

Choosing geographic data
The main purpose of purchasing a geographic information system (GIS)* is to produce results for your organization. Choosing the right GIS/mapping data will help you produce those results effectively.

  • The role of base-map data in your GIS,
  • The common characteristics of geographic data,
  • The commonly available data sources
  • Guidelines for evaluating the suitability of any data set for your project.
The world of GIS data is complex, by choosing the right data set, you can save significant amounts of money and, even more importantly, quickly begin your GIS project.

Data: The Core of Your Mapping / GIS Project

When most people begin a GIS project, their immediate concern is with purchasing computer hardware and software. They enter into lengthy discussions with vendors about the merits of various components and carefully budget for acquisitions. Yet they often give little thought to the core of the system, the data that goes inside it. They fail to recognize that the choice of an initial data set has a tremendous influence on the ultimate success of their GIS project.

Data, the core of any GIS project, must be accurate - but accuracy is not enough. Having the appropriate level of accuracy is vital. Since an increase in data accuracy increases acquisition and maintenance costs, data that is too detailed for your needs can hurt a project just as surely as inaccurate data can. All any GIS project needs is data accurate enough to accomplish its objectives and no more. For example, you would not purchase an engineering workstation to run a simple word-processing application. Similarly, you would not need third-order survey accuracy for a GIS-based population study whose smallest unit of measurement is a county. Purchasing such data would be too costly and inappropriate for the project at hand. Even more critically, collecting overly complex data could be so time-consuming that the GIS project might lose support within the organization.

Even so, many people argue that, since GIS data can far outlast the hardware and software on which it runs, no expense should be spared in its creation. Perfection, however, is relative. Projects and data requirements evolve. Rather than overinvest in data, invest reasonably in a well-documented, well-understood data foundation that meets today's needs and provides a path for future enhancements. This approach is a key to successful GIS project implementation.

Are Your Data Needs Simple or Complex?
Before you start your project, take some time to consider your objectives and your GIS data needs. Ask yourself, "Are my data needs complex or simple?"

*Italicized words can be found in the Glossary at the end of this document except for words used for emphasis or words italicized for reasons of copyediting convention or layout.
If you just need a map as a backdrop for other information, your data requirements are simple. You are building a map for your specific project, and you are primarily interested in displaying the necessary information, not in the map itself. You do not need highly accurate measurements of distances or areas or to combine maps from different sources. Nor do you want to edit or add to the map's basic geographic information.

An example of simple data requirements is a map for a newspaper story that shows the location of a fire. Good presentation is important; absolute accuracy is not.

If you have simple data needs, read this paper to get the overall picture of what GIS data is and how it fits into your project. A project with simple data requirements can be started with inexpensive maps. Your primary interests will be quality graphic- display characteristics and finding maps that are easy to use with your software. You need not be as concerned with technical mapping issues. However, basic knowledge of concepts such as coordinate systems, absolute accuracy, and file formats will help you understand your choices and help you make informed decisions when it's time to add to your system.

What issues suggest more complex GIS data needs?
  • Building a GIS to be used by many people over a long period of time.
  • Storing and maintaining database information about geographic features.
  • Making accurate engineering measurements from the map.
  • Editing or adding to the map.
  • Combining a variety of information from different sources.

An example of a system requiring complex data would be a GIS built to manage infrastructure for an electric utility.

If your data requirements are complex, you ought to pay particular attention to the sections of this paper that discuss data accuracy, coordinate systems, layering, file formats, and the issues involved in combining data from different sources.

Also keep in mind that projects evolve, and simple data needs expand into complex ones as your project moves beyond its original objectives. If you understand the basics of your data set, you will make better decisions as your project grows.


Basics of Digital Mapping

Vector vs. Raster Maps
The most fundamental concept to grasp about any type of graphic data is making the distinction between vector data and raster data. These two data types are as different as night and day, yet they can look the same. For example, a question that commonly comes up is "How can I convert my TIFF files into DXF files?" The answer is "With difficulty," because TIFF is a raster data format and DXF™ (data interchange file) is a vector format. And converting from raster to vector is not simple. Raster maps are best suited to some applications while vector maps are suited to others.

Image of
Figure 4

Raster data represents a graphic object as a pattern of dots, whereas vector data represents the object as a set of lines drawn between specific points. Consider a line drawn diagonally on a piece of paper. A raster file would represent this image by subdividing the paper into a matrix of small rectangles-similar to a sheet of graph paper-called cells (figure 1). Each cell is assigned a position in the data file and given a value based on the color at that position. White cells could be given the value 0; black cells, the value 1; grays would fall in-between. This data representation allows the user to easily reconstruct or visualize the original image.

Image of
Figure 5

A vector representation of the same diagonal line would record the position of the line by simply recording the coordinates of its starting and ending points. Each point would be expressed as two or three numbers (depending on whether the representation was 2D or 3D, often referred to as X,Y or X,Y,Z coordinates (figure 2). The first number, X, is the distance between the point and the left side of the paper; Y, the distance between the point and the bottom of the paper; Z, the point's elevation above or below the paper. The vector is formed by joining the measured points.

Some basic properties of raster and vector data are outlined below.
  • Each entity in a vector file appears as an individual data object. It is easy to record information about an object or to compute characteristics such as its exact length or surface area. It is much harder to derive this kind of information from a raster file because raster files contain little (and sometimes no) geometric information.
  • Some applications can be handled much more easily with raster techniques than with vector techniques. Raster works best for surface modeling and for applications where individual features are not important. For example, a raster surface model can be very useful for performing cut-and-fill analyses for road-building applications, but it doesn't tell you much about the characteristics of the road itself. Terrain elevations can be recorded in a raster format and used to construct digital elevation models (DEMs) (figure 3). Some land-use information comes in raster format.

    Image of
    Figure 6

  • Raster files are often larger than vector files. The raster representation of the line in the example above required a data value for each cell on the page, whereas the vector representation only required the positions of two points.


The size of the cells in a raster file is an important factor. Smaller cells improve image quality because they increase detail. As cell size increases, image definition decreases or blurs. In the example, the position of the line's edge is defined most clearly if the cells are very small. However, there is a trade-off: Dividing the cell size in half increases file size by a factor of four. 

Cell size in a raster file is referred to as resolution. For a given resolution value, the raster cost does not increase with image complexity. That is, any scanner can quickly make a raster file. It takes no more effort to scan a map of a dense urban area than to scan a sparse rural one. On the other hand, a vector file requires careful measuring and recording of each point, so an urban map will be much more time-consuming to draw than a rural map. The process of making vector maps is not easily automated, and cost increases with map complexity. 

Because raster data is often more repetitive and predictable, it can be compressed more easily than vector data. Many raster formats, such as TIFF, have compression options that drastically reduce image sizes, depending upon image complexity and variability.

Raster files are most often used:

  • For digital representations of aerial photographs, satellite images, scanned paper maps, and other applications with very detailed images.
  • When costs need to be kept down.
  • When the map does not require analysis of individual map features.
  • When "backdrop" maps are required.
In contrast, vector maps are appropriate for:
  • Highly precise applications.
  • When file sizes are important.
  • When individual map features require analysis.
  • When descriptive information must be stored.
Raster and vector maps can also be combined visually. For example, a vector street map could be overlaid on a raster aerial photograph. The vector map would provide discrete information about individual street segments, the raster image, a backdrop of the surrounding environment.

Digital Map Formats- How Data Is Stored
The term file format refers to the logical structure used to store information in a GIS file. File formats are important in part because not every GIS software package supports all formats. If you want to use a data set, but it isn't available in a format that your GIS supports, you will have to find a way to transform it, find another data set, or find another GIS.

Almost every GIS has its own internal file format. These formats are designed for optimal use inside the software and are often proprietary. They are not designed for use outside their native systems. Most systems also support transfer file formats. Transfer formats are designed to bring data in and out of the GIS software, so they are usually standardized and well documented.

If your data needs are simple, your main concern will be with the internal format that your GIS software supports. If you have complex data needs, you will want to learn about a wider range of transfer formats, especially if you want to mix data from different sources. Transfer formats will be required to import some data sets into your software.

Vector Formats
Many GIS applications are based on vector technology, so vector formats are the most common. They are also the most complex because there are many ways to store coordinates, attributes, attribute linkages, database structures, and display information. Some of the most common formats are briefly described below

Common Vector File Formats
Format Name Software Platform Internal or Transfer Developer Comments
Arc Export ARC/INFO* Transfer Environmental Systems Research Institute, Inc. (ESRI) Transfers data across ARC/INFO* platforms.
ARC/INFO* Coverages ARC/INFO* Internal ESRI  
AutoCAD Drawing Files (DWG) AutoCAD* Internal Autodesk  
Autodesk Data Interchange File (DXF™) Many Transfer Autodesk Widely used graphics transfer standard.
Digital Line graphs (DLG) Many Transfer United States Geological Survey (USGS) Used to publish USGS digital maps.
Hewlett-Packard Graphic Language (HPGL) Many Internal Hewlett-Packard Used to control HP plotters.
MapInfo Data Transfer Files (MIF/MID) MapInfo* Transfer MapInfo Corp.  
MapInfo Map Files MapInfo* Internal MapInfo Corp.  
MicroStation Design Files (DGN) MicroStation* Internal Bentley Systems, Inc.  
Spatial Data Transfer System (SDTS) Many (in the future) Transfer US Government New US standard for vector and raster geographic data.
Topologically Integrated Geographic Encoding and Referencing (TIGER) Many Transfer US Census Bureau Used to publish US Census Bureau maps.
Vector Product Format (VPF) Military mapping systems Both US Defense Mapping Agency Used to publish Digital Chart of the World.


Raster Formats
Raster files generally are used to store image information, such as scanned paper maps or aerial photographs. They are also used for data captured by satellite and other airborne imaging systems. Images from these systems are often referred to as remote-sensing data. Unlike other raster files, which express resolution in terms of cell size and dots per inch (dpi), resolution in remotely sensed images is expressed in meters, which indicates the size of the ground area covered by each cell.

Some common raster formats are described below
Format Name Software Platform Internal or Transfer Developer Comments
Arc Digitized Raster Graphics (ADRG) Military mapping systems Both US Defense Mapping Agency  
Band Interleaved by Line (BIL) Man Both Common remote-sensing standard.  
Band Interleaved by Pixel (BIP) Many Both Common remote-sensing standard.  
Band Sequential (BSQ) Many Both Common remote-sensing standard.
Digital Elevation Model for (DEM) Many Transfer United States Geological Survey (USGS) USGS standard format digital terrain models.
PC Paintbrush Exchange (PCX) PC Paintbrush Both Zsoft Widely used raster format.
Spatial Data Transfer Standard (SDTS) Many (in the future) Transfer US Federal Government New US standard for both raster and vector geographic data; raster version still under development.
Tagged Image File Format (TIFF) PageMaker Both Aldus Widely used raster format.

An Example of Raster and Vector Integration

Image of
Figure 7: An Example of Raster and Vector Integration


Vectors & Raster Data Models - Merits & Demerits.

  • Simple data structure
  • Easy and efficient overlaying
  • Compatible with RS imagery
  • High spatial variability is efficiently represented
  • Simple for own programming
  • Same grid cells for several attributes 
  • Inefficient use of computer storage
  • Errors in perimeter, and shape
  • Difficult network analysis
  • Inefficient projection transformations
  • Loss of information when using large cells Less accurate (although interactive) maps
  • Compact data structure
  • Efficient for network analysis
  • Efficient projection transformation
  • Accurate map output.
  • Complex data structure
  • Difficult overlay operations
  • High spatial variability is inefficiently represented
  • Not compatible with RS imagery

Hybrid System
It is an integration of the best of Vector and Raster Models. The GIS technology is fast moving towards Hybrid model GIS.

The Integration of Vector and Raster System Hybird System

Image of
Figure 8: The Integration of Vector and Raster System Hybird System


Source: GIS Development (