1. PDF file structure

PDF file structure (that is, the physical structure) consists of four parts: the header, the body of the document, cross-reference table and the end of the document.

The header specifies the document to comply with the PDF specification of the version number,pdf to word converter offline software free download full version which appears in the PDF document in the first line.

Document body consists of a series of PDF IndirectObject objects.

Cross-reference table is in order that we can indirect objects can be randomly accessed and the establishment of an indirect object address index table.

The document and the end of the statement of a cross-reference table address,word to pdf converter online i love pdf that is, the root of the specified document body object (Catalog), but also to save the encryption and other security management information.

2. PDF document structure

PDF document structure is the logical organization of the content of PDF documents,merge word documents online i love pdf reflecting the document body in the hierarchical relationship between the indirect objects. PDF document structure is a tree structure. Tree of the root node is the root object of the PDF document. Root node below four subtrees: Pages Tree, OutleTree, ArticleThreads and NamedDestination.

Among them, in the page tree, all page objects are leaf nodes of the tree, they inherit the parent node's attribute values as the default value of its corresponding properties. The bookmark tree organizes bookmarks in a tree-level hierarchy. A bookmark creates a relationship between a bookmark and the location of a particular page, which allows the user to access the content of a document through the bookmark. The thread tree organizes the bar beads under the threads and the individual beads according to the tree structure. As for the name tree, it establishes a relationship between a string (i.e., a name) and the corresponding page area. The leaf nodes of the tree hold the string and its corresponding page area, and the non-leaf nodes are simply an index that allows applications to quickly access the leaf nodes. The role of the name tree is to allow other objects in the PDF document using the string name to indicate the page area.

3. Resources in PDF

PDF in the page content (such as text, graphics, images, etc.) stored in the page object (hereinafter referred to as the content stream) with the page object (hereinafter referred to as the content stream) of the Content keyword corresponding to the stream object. There are many basic objects (numbers, strings, etc.) used in the content stream that are represented by direct objects. But there are other objects (such as fonts) that are represented by themselves as Dictionary or Stream objects rather than direct objects, and you can't have any indirect objects in the Content Stream (otherwise you wouldn't be able to distinguish them from the data in the Content itself), so you give them other names and represent them in the Content Stream. These named objects are called named resources.

page object has a Resources key, which lists the content flow in the use of all resources, and the establishment of the resource name and resource objects between the mapping table. PDF in the named resources include: instruction set (Proc Set), fonts, color space, external objects (including images, forms and PS segments), the expansion of the graphical state, mode and the expansion of the user's markup list.

Non-named resources are Encoding, Font Desc-riptor, Halftone, Function, and C Map. As non-named resources are implicitly used, so there is no need to name.

4.PDF page description description

PDF has 60 page description description of a series of graphical objects on the page. These graphical objects can be roughly divided into four categories, namely, path objects, text objects, image objects and external objects. They are the basic elements that make up all the pages.

PDF file structure PDF document Resources in PDF