Wish List - Images

Any Image

There are a number of approaches to object recognition and more are emerging on a regular basis. Various web-based services (e.g., Google Images) will perform object recognition on an uploaded 2D image. So, we may only need to convince a service that AxAp is a browser and then present the resulting information in an accessible manner.

Assorted tools (e.g., ImageMagick) can transform (e.g., crop, pan, filter, zoom) images. Images can also be rendered into non-visual media (e.g., stereo audio, tactile graphics), but there will be a considerable loss of resolution. For example, the resolution of a braille embosser is about 0.1", as opposed to the 0.01" resolution of a typical computer display.

The vOICe is experimental software for navigational assistance, based on a form of sensory substitution. It turns an image into a left-to-right stereo sweep, mapping grayscale intensity to audio amplitude and vertical position to pitch. Although The vOICe is normally used to interpret video (e.g., from head-mounted camera), it has also been used to interpret static images.

Input Output Brief Description of Feature ED SI SU WIP Activities
Image (eg, PDF, PNG, SVG) Audio sonify images (eg, via The vOICe) 3 7 7 AUR, DGA
Image (eg, PDF, PNG, SVG) HTML identify types of pictured objects 3 7 7 AUR, DGA
Image (eg, PDF, PNG, SVG) Image transform, filter images (eg, pan, zoom) 3 7 7 AUR, DGA
Image (eg, PDF, PNG, SVG) Tactile render images (eg, via embosser) 3 7 7 AUR, DGA
Back to Summary Table

Diagram

In a graph-based diagram (e.g., data flow diagram, structure chart), most of the information is encoded in the entity labels and connectivity. Entity shapes and connection styles, along with geometric placement, provide further hints about the intended meaning. Various programs (e.g., Dia, DOT, Microsoft Visio, OmniGraffle) generate graph-based diagrams. Assuming that a diagram is encoded in SVG, detailed geometric data will be readily available. However, deeper analysis may be challenging.

If the "path" and "text" entries for a graph entity aren't explicitly grouped, we may need to analyze the entire diagram's geometry to determine which entries go together. Getting connectivity information for entities and edges presents similar problems. Recognizing entity and link types (e.g., "squashed rectangle", "dotted line") could also be challenging, given the range of possible types and variations.

Input Output Brief Description of Feature ED SI SU WIP Activities
Diagram (eg, Visio SVG) SVG extract structure of graph diagrams 5 5 5 AUR, DGA
Back to Summary Table

Infographic

It could be very useful to extract the type (e.g., histogram, pie chart) and semantic content (e.g., data values, labels) from a chart or plot. However, our ability to do this is constrained by the file's data and metadata, which may be only loosely related to the chart's semantic payload.

Assuming that a chart was generated in SVG format (or converted to it), detailed geometric data can be extracted. However, further analysis and interpretation may be challenging. For example, what are the meanings of the geometric constructs? What scale factor(s) were applied? That said, it should be possible to recognize and deconstruct SVG idioms generated by programs such as Microsoft Excel.

Some of these issues can be finessed in SVG images produced by D3.js. The input data, which tends to be stored with domain-relevant names and units, can be accessed by a JavaScript interpreter. However, the D3 programming model may get in the way. Although the base code for an application may have been downloaded, each D3 script is unique. So, recognizing the type of plot (let alone the meaning of the data) could be difficult or even impossible.

That said, D3 offers some interesting possibilities for data sonification. If we could extend the D3 library in various ways, it might be possible to capture and sonify the input data without changing the application code.

Input Output Brief Description of Feature ED SI SU WIP Activities
Infographic (eg, D3 SVG) Audio generate interactive data sonification 7 5 5 AUR
Chart or Plot (eg, D3 SVG) JSON extract type and semantic content 7 5 5 AUR
Chart or Plot (eg, Excel SVG) JSON extract type and semantic content 7 5 5 AUR
Back to Summary Table

Raster Image

In computer graphics, a raster graphics image is a dot matrix data structure, representing a generally rectangular grid of pixels, or points of color, viewable via a monitor, paper, or other display medium. (WP)

Despite the fact that raster images are ubiquitous in digital documents, there are no convenient ways for blind users to extract information from them. Fortunately, a number of promising technologies are available for integration and use.

Optical character recognition (OCR) is a well-developed technology, provided by a large number of software libraries, application packages, and web sites. The input would typically be an image from some web site (e.g., Facebook), slides from a presentation, or a scanned sheet of paper. Ideally, the software would extract structure and style information, along with text.

Let's say that we have a raster image of a plot, histogram, etc. Now, we'd like to get an idea of the represented data. If we can approximate the image in SVG format (e.g., as path elements), we may be able to extract useful numerical information from it.

It may be possible to improve the accessibility of raster images for some visually-impaired users by improving the contrast, color usage, etc.

Input Output Brief Description of Feature ED SI SU WIP Activities
Raster Image (eg, PNG) JSON extract text and formatting, via OCR 3 7 7 AUR, DGA
Raster Image (eg, PNG) PNG adjust contrast, color usage, etc. 3 7 7 AUR, DGA
Raster Image (eg, PNG) SVG approximate image using path, etc. 5 5 5 AUR
Back to Summary Table

Vector Image

Vector graphics is the use of polygons to represent images in computer graphics. Vector graphics are based on vectors, which lead through locations called control points or nodes. Each of these points has a definite position on the x and y axes of the work plane and determines the direction of the path; further, each path may be assigned various attributes, including such values as stroke color, shape, curve, thickness, and fill. (WP)

Vector images are increasingly common in digital documents, largely because of their compact size and ability to handle geometric scaling. Vector images tend to be encoded in a declarative manner, describing a layered process similar to screen printing. So, it's possible to convert vector images from (say) PDF to SVG.

In addition, whereas the text in raster images has been transformed into pixels, the text in most vector images can generally be extracted in a relatively simple fashion. Extracting formatting information may also be possible.

Input Output Brief Description of Feature ED SI SU WIP Activities
Vector Image (eg, PDF) SVG convert description of geometry 5 3 3 AUR
Vector Image (eg, PDF, SVG) JSON extract text and formatting 3 3 3 AUR
Back to Summary Table

See Also

  • SVG - Scalable Vector Graphics breakout page


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r12 - 28 Nov 2016, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email