Extraction of Metadata from Images


Rohit S1, Dr. M N Nachappa2

1MCA Student, Jain Deemed-to-be University, Karnataka, India.

2Dean School of CS & IT, Jain Deemed-to-be University, Karnataka, India.

*Corresponding Author E-mail: rohit.s0898@gmail.com, mnnachappa@gmail.com  



Metadata is defined as the information providing data about one or more faces of the data. It is used to abridge basic indication about data which can make pursuing and working with specific data easier. The idea of metadata is often prolonged to involve words or phrases that stand for objects or “objects” in the world, leading to the notion of unit extraction. In this paper, I am proposing extracting the metadata of the files user inputs to the system, this can be achieved using Flask as the web platform and Python programming language, our goal is to make a free and lightweight metadata extractor which is more efficient and user friendly.


KEYWORDS: Metadata, Image Forensic, Image Format, Data Retrieval, Auto Categorization.



Metadata is a kind of highly structured document summary. The idea of metadata is often expanded to include words or phrases that stand for objects or “entities” in the world, leading to the notion of entity extraction. Ordinary documents are full of such terms: phone numbers, fax numbers, street addresses, email addresses, email signatures, abstracts, tables of contents, lists of references, tables, figures, captions, meeting announcements, Web addresses, and more. In addition, there are countless domain-specific entities, such as international standard book numbers (ISBNs), stock symbols, chemical structures, and mathematical equations. These terms act as single vocabulary items, and many document processing tasks can be significantly improved if they are identified as such. They can aid searching, interlinking and cross-referencing between documents. Many short documents describe a particular kind of object or event, combining entities into a higher-level composite that represent the document’s entire content. The task of identifying the composite structure, which can often be represented as a template with slots that are filled by individual pieces of structured information, is called information extraction. Typical extraction problems require finding the predicate structure of a small set of predetermined propositions. 



Nowadays, every modern digital camera has the capability to record this information, along with many other camera settings and other relevant data, right into photographs. These settings can then be later used to establish photographs, perform searches, and provide vital information to photographers about the way a particular photograph was captured. This metadata could give away a lot of user details like geo location, name of camera/ owner etc.



3.1          Existing System

There are many data extraction tools. The derivation of Exif from the TIFF file structure using offset pointers in the files means that data can be spread anywhere within a file, which means that software is likely to corrupt any pointers or corresponding data that it doesn't decode/encode. For this reason, most image editors damage or remove the Exif metadata to some extent upon saving. There is no standard field to record readouts of a camera's accelerometers or inertial navigation system. Such data might help to establish the relationship between the image sensor's XYZ coordinate system and the gravity vector.


3.2                Disadvantages

              Time Consuming

              Encoding Issues

              Process Intensive

              Different Image formats


3.3          Proposed System

Exif information isn't elusive or to eliminate, yet the means shift a piece, contingent upon which gadget you are utilizing. To caution about the metadata in pictures. Rundown all exif substance in a picture. This metadata could part with a ton of client subtleties like geo area, name of camera/proprietor and so forth from a security perspective, the information can be recognized and alert the client or can be utilized to research about a suspect. The pictures are typically put away in a document structure or a cloud climate. This device centers around mining meta information. This work presents a hearty examination of advanced pictures to distinguish the changes/transforming/altering signs by utilizing the picture's exif metadata, thumbnail, camera follows. Consequently, this philosophy could be utilized effectively for validation of advanced pictures for scientific.


3.4          Advantages

●     Metadata can frequently contain that needle in the pile you're searching for during a legal sciences examination. In Digital Forensics, the recuperated information should be appropriately reported.


●     The Digital Forensics industry norms require ensured PC inspectors or criminology specialists to follow certain conventions during their examinations.


●     The fundamental goal of an appropriately directed examination or investigation of a PC or computerized media by an expert analyst is to find conceivable proof through seizure, search, and recovery, while keeping up "information trustworthiness" of the first or suspect media.


●     We are making a page for metadata extraction. The individual/official who attempts to discover the metadata of records doesn't have to have profound specialized information to utilize our answer


4.                   SOFTWARE DESCRIPTION:


Python is a high-level programming language and is widely being used among the developer’s community. Python was mainly developed for emphasis on code readability, and its syntax allows programmers to express concepts in fewer lines of code. Python is a programming language that lets developers work quickly and integrate systems more efficiently. Python is a lot easier to code and learn. Python programs can be written on any plain text editor like notepad, notepad++, or anything of that sort. One can also use an online IDE for writing Python codes or can even install one on their system to make it more feasible to write these codes because IDEs provide a lot of features like intuitive code editor, debugger, compiler, etc.



Flask is a web framework. Flask allows you to build a web application by providing tools, libraries, and technologies. This web application will be a web page, a wiki, or a big web-based calendar application or commercial website. Flask is classified into a micro-framework that means it has little to no dependencies on external libraries. There are some pros and cons. Pros mean there are little dependencies to upgrade and to watch security bugs and cons means by adding the plugin you will increase the dependencies.


There are impressive features to use the flask in your web application framework. Like-

        Integrated support for unit testing

        Built-in development server and fast debugger

        Restful request dispatching

        Unicode base

        Support for cookies

        Templating jinja2

        WSGI 1.0 compliant

        Plus, flask gives you some premier control to develop your project.

        HTTP request handling function

        Flask has a modular design and lightweight so that it can easy to transit into web framework with some extension

        You can plug your favourite ORM

        Basic fundamental API is nicely shaped and coherent

        Highly flexible

        It is easy to deploy the flask in production



5.1 Introduction

The most inventive and stimulating face of system development is system design. It delivers the empathetic and technical details necessary for implementing the system suggested in the feasibility study. Proposal goes through the logical and physical stages of development. The processes stages are moved through program construction and testing. Design of a system can be certain as a process of applying various techniques and principles for the determination of defining a device, a process or a system in sufficient detail to permit its physical realization. Thus, system design is a solution to the “in what way to” approach to the creation of a new system. Thus, an important phase provides a data design, architectural design and a procedural design.


5.2 System Architecture


Fig 5.2 Architecture



Implementation is the stage, which is urgent in the existence pattern of the new framework planned. Arranging, preparing and framework testing are the principles stages in the execution. Changing over another or reconsidered framework into an operational one is called execution. Execution incorporates every one of those exercises including the transformation of an old framework into another framework. The new framework might be in an absolutely new idea or an update of the bygone one. A legitimate execution is needed for a solid framework, yet doesn't ensure an effective framework. Odds are there that if execution isn't legitimate the entire framework may turn into a disappointment. The essential audit strategy is information assortment techniques for talk with, perception, examining and record examination.


        Functioning of the equipment

        Data coding

        Methods on form transaction

        Decision support events

        Data handling events

In our project, there is a web page created using flask where we input the files. Once the file is inputted into the system, the python code will do the retrieve and output the results into the web. 



This technique is a reliable and precise solution to the delinquent of extracting the metadata from documents. Future work can be related to the graphical user interface of this tool and to also add a report generation tool to print the information from the Metadata Extractor.



We successfully made the platform for extracting the metadata of the files user inputs to the system. It is made using Flask as the web platform and Python programming language. Our goal was to make a free and lightweight metadata extractor which is more efficient and user friendly and we were successful in that. As metadata extraction can be used in forensics, usually a technical expert is needed for the extraction. Our Research can be used by anyone who is not an expert in this field and is very simple to use.



I would like to express my deep and sincere gratitude to my research supervisor, Dr. M N Nachappa, Head school of (CS & IT), for giving me the opportunity to do the research and providing me invaluable guidance throughout this research.


My special thanks go to my friend and my internship Reporting Manager Kalathil Karthik for the keen interest shown to complete this thesis successfully.



1.        Sagnik Ray Choudhury, Prasenjit Mitra, Andi Kirk, Silvia Szep, Donald Pellegrino, Sue Jones, C. Lee. Giles, “Figure Metadata Extraction From Digital Documents”, 12th International Conference on Document Analysis and Recognition – 2013

2.        Jahongir Azimjonov, Jumabek Alikhanov, “Rule Based Metadata Extraction Framework from Academic Articles”, arXiv – 2018

3.        Minati Mishra, Flt. Lt. Dr. M. C. Adhikary, “Digital Image Tamper Detection Techniques - A Comprehensive Study”, International Journal of Computer Science and Business Informatics – 2013

4.        Dominika Tkaczyk,  Paweł Szostek, Mateusz Fedoryszak, Piotr Jan Dendek, Łukasz Bolikowski, “CERMINE: automatic extraction of structured metadata from scientific literature”, IJDAR – 2015

5.        Hui Han C.Lee Giles Eres Manavoglu  Hongyuan Zha, Zhenyue Zhang, Edward A. Fox, “Automatic Document Metadata Extraction using Support Vector Machines”, Joint Conference on Digital Libraries IEEE – 2003

6.        Binjie Meng, Lei Hou, Erhong Yang, and Juanzi Li, “Metadata Extraction for Scientific Papers”, cips-cl – 2018

7.        Jung-ran Park and Andrew Brenza, “Evaluation of Semi Automatic Metadata Generation        Tools: A Survey of the Current State of  the Art”, Information technology and libraries – 2015

8.        Christopher A. Plaisance, “PDF Metadata Extraction with Python” , Global Information Assurance Certification Paper – 2019

9.        Runtao Liu, Liangcai Gao , Dong An, Zhuoren Jiang and Zhi Tang, “Automatic Document Metadata Extraction based on Deep Networks”, International Journal Of Engineering Sciences & Research Technology – 2014

10.      Prof. Alberto Silva, Prof. Pavel Calado, Prof. Bruno Martins, Prof. José Borbinha, “Metadata Extraction from Scholarly Articles” , Instituto Superior tecnico -2011





Received on 14.05.2021            Accepted on 22.06.2021     

© EnggResearch.net All Right Reserved

International J. Technology. 2021; 11(1):19-22.

DOI: 10.52711/2231-3915.2021.00003