Anatomy of PyGeo code
Preliminaries
The intent here is to provide an entry point for an understanding of the PyGeo application code for those wishing (or willing) to undertake the effort.
We approach the task as document commentary, asking the reader to follow along, with the document in view, as a specific module of PyGeo code is analyzed, first looking at the module structure itself, then that of a specific function and then that of a specific class contained within it.
The module being analyzed is not presented as prototypical of a Python module in any general sense. We are approaching a particular document, in its specifics.
However, in discussing the contents of the module, and later in the discussion of the structure of the Anatomy of a function , and Anatomy of a class , we do take the opportunity to introduce, in brief, a number of fundamental Python concepts, e.g. data structures, import statements, etc , as we cross paths with them. The patience of those looking only for PyGeo related details and already familiar with Python is presumed - such folks known to be a patient lot.
But before getting to the document itself, a few matters of overview:
-
Object-orientation and inheritance
PyGeo undertakes a largely object oriented approach, making extensive use of inheritance with the goal of minimizing code duplication and maximizing modularity and maintainability.
Which is not to imply the PyGeo code might be approached with an expectation of encountering a paradigm of object oriented programming technique. To the contrary, as many concerns of object oriented programming orthodoxy are passed over without concern.
But given the general approach and its goals, and once the domain of interest is made specific, a taxonomy of an inheritance structure develops - one likes to believe - largely organically. But if this is true, it is also true that a structure that is organic tends also to be one without defining principles that can be succinctly stated.
Efforts can be made to be descriptive, and such efforts are made herein, but the only answer is the highest level containing object, code base itself.
-
Modules and files, packages and directories
The concept of a "module" in Python is somewhat undefined, beyond that of being a unit of functionality that resides in a single file. Similarly, the concept of package and sub-package is roughly synonymous with the directory structure into which an application is organized.
There are few technical factors dictating how a Python application must be organized into packages, sub-packages and modules. Given that fact, PyGeo takes the opportunity to allow the directory and file structure to be expressive of the code structure and inheritance schemes.
The text based code files of interest are those with the ".py" extensions residing under the "pygeo.base" directory structure. Any of the ".pyc" and ".pyo" files that cohabit these directories are these same text-based code complied to byte code for the benefit of the interpreter, are not intended for human consumption, and are not of interest to us here.
We proceed to look, in some detail, at a module and then at units within it, a 'class factory' function and then a class itself, as representative of those with which one intersects most directly when creating a PyGeo construction...
return to Contents
Anatomy of a module
Given the structure described above, those modules most deeply embedded within the directory structure tend to be those furthest along the chain of inheritance, therefore those most functionally actualized and those most directly facing the user. Consistent with the idea of viewing the application as text from the satellite view, and working down towards more refined levels of resolution, we choose to focus on the analysis of such a module.
We take a look at a specific module, in fact the module that implements the Intersect functionality referenced briefly as an example in the Built-in geometric intelligence section of the Overview document.
We find it in the directory structure under .../pygeo/base/geometry_real/points as the file "intersects.py" - the directory structure itself telling us something about what we can expect to find in the file, i.e. geometric functionality connected to the domain of real space, and specifically objects, among other of such objects of a type that reside in that same directory, that have the geometric characteristics of a point. And the file name itself telling us something more - the concept of intersection is connected to the defining characteristic of these particular points.
(The html colorized version of the intersects.py module )
Opening the module, and remembering that the indentation levels we see have functional significance, we find, reading down the page as one might do with any document:
A sentence embedded in triple quotes
This is the module level documentation string, which is meant to be briefly informative of the contents of the module, and which is computationally functional only to the extent that it is accessible in such processes as automated documentation generation.
Creation of module level attributes, i.e.:
Attributes of the module are created by assignment of data to the following names:
- __Def__
- __Classes__
- __all__
- __doc_hints__
'__Def__' and '__Classes__', as identifiers, have no built-in meaning to Python nor do the leading and trailing underscores provide any pertinent magic here. They are PyGeo specific names, used for the purpose of providing a short 'table of contents' to the module in a manner adequate both for human consumption, when reading a module, and for machine consumption in the context of a mechanism for automated documentation generation.
Assigned to '__Def__' is the name of the factory function by which the classes in the module may be accessed, as described in Built-in geometric intelligence , and to '__Classes__' the names of the specific classes defined in the module
The names being assigned are enclosed in "[" and "]", which indicates a Python list data type, and the individual names are in quotes (single or double would work equally well in this context), indicating that the names are of the Python string data type. So, for example, __Classes__, to the Python compiler, is a list of strings. To the human eye, it provides the names of the classes we can expect to find defined in the module. The names should provide - within (and probably only within) the context of an understanding of the module's overall thrust - an idea as to the distinguishing characteristics of the classes themselves.
We "add" the contents of __Def__ and __Classes__ and assign the result to __all__. The adding of lists in Python concatenates the lists contents, so that __all__ is now a list which contains the name of the module's factory function and those of it's classes. The "__all__" name does have some Python specific magic attached to it, in the use of mechanisms to export and import functionality among separate packages and modules.
"__doc_hints__" is providing some PyGeo specific magic, being used to inform the customized automatic documentation tool utilized by PyGeo to treat the module as one organized around a factory function and related classes, and to document it accordingly.
The data type being assigned to the '__doc_hints__' name is an example of a Python dictionary. It is recognized as such by its enclosures in the "{" and "}" bracketing at the outer level, and the assignment of key/value pairs separated by ":" at the inner. Here there is only one pair. There can be many, which could be created at once by separating the individual pairs with a comma ",".
Functionally, the list and dictionary data types are distinguished by how atoms of the information which they might contain are retrieved - lists by index number, dictionaries by key. So that:
- __Classes__[0] - list indexing being zero-based in Python, will here retrieve the first string, 'PlaneLineIntersect'
- __doc_hints__['m_type'] - quoted since the key is itself a string, will retrieve the value associated with that key, the string 'factory'.
Import of functionality from other modules and packages
Two different species of import statement are encountered
the import of another module's name
The line:
>>> from pygeo.base.abstracts._real as Realdoes in fact more, but most visibly provides us with a synonym for reference to functionality contained in the 'pygeo.base.abstracts._real' module, .i.e. that defined in the file residing at ../pygeo/base/abstracts/_real.py.
We can thereafter use dot notation for referencing specific functionality from within that module, so for example and as we shall see in more specifics in`Anatomy of a class`_ , Real._Point, becomes a reference to the _Point class as defined in that module.
the import of specific functionality from another module.
The lines of the form:
>>> from pygeo.base.analytics.pygeomath import vector ...import specific functionality from the specific modules that they reference, essentially in way that allows one to use that functionality as if it were defined in the containing module directly.
So that:
>>> a=vector(1,4,7)will be understood within the intersect.py module, though its meaning is defined elsewhere.
What is the significance of these import statements in a reading of the module as text, as a document.
Most significantly, perhaps, it is in focusing the field of view and narrowing the range of concern.
Of the many modules, classes, functions, etc. of which an application like PyGeo is comprised, we now can know in some specifics which are to come into to play - helping us form expectations and a context for what is to follow.
Class definitions
What then follows in the document are 3 separate class definitions, in fact the one's we came to expect to find defined in the module earlier by way of their reference in the '__Class__' module attribute definition, i.e.:
- PlaneLineIntersect
- LinesIntersect
- PlanesIntersect
At this level of view we only want to take the time to notice two things from a visual inspection - how these class definitions are similar, i.e. to what pattern do they seem to be conforming, and in what they are dissimilar, i.e. what are their distinguishing characteristics.
Done.
We undertake a more specific analysis of an individual class, that of the PlanesIntersect, in a separate section of this document, as Anatomy of a class . Discussed therein include some specifics in respect to the pattern to which the class conforms, the specifics of its functionality and how that functionality is achieved. That discussion will necessarily take us beyond the intersect.py module, providing better resolution on the class hierarchy structure which a class such as PlanesIntersect sits atop.
The 'class factory' function
The final unit of functionality within the intersects.py module is the module-level function, 'Intersect'.
It is identifiable as a module level function by two essential characteristics:
- it's definition begins with the incantation of "def" before any other naming or functional code.
- it exists at the outermost level of the module, i.e. "def" is unindented
It can sensibly be viewed, to an extent, as a microcosmic unit of functionality in many ways analogous to that of its containing module itself, as indeed much of what has been found in the containing module is found again here:
- the documentation string
- the import statements
- the assignment of attributes
We follow this view by analyzing it as distinct unit of functionality in Anatomy of a function section that follows.
return to Contents
Anatomy of a function
We began with the analysis of a particular PyGeo module in Anatomy of a module , that of the intersect.py module, and turn now to discussion of a specific module level function contained within it, the Intersect function.
In the Anatomy of a module it was noted that some of what is being viewed on the scale of the function had already been encountered on the containing module scale.
The discussion is resumed here by noting the unsurprisingly fact that everything stated about the syntax, data types, etc. in the discussion at the module level applies, without amendment, to the function itself.
So those playing along in sequence, and those who knew it already, recognize that the function level attribute '__sigs__' is being assigned information in the form of a data structure that is a Python list.
We take note of two characteristics that distinguish this list from the __Class__ list met at the module level.
- it itself contains data structures that are themselves lists; we therefore have a list of lists.
- the data within the inner lists are without quotations; they are something other than Python strings.
What is that data existing at the inner level of the list?
The answer is undoubtedly related to the functional purpose of the list. and the clue is that '__sigs__' means to mean 'signatures', and by signatures we intend to mean the constructing arguments applicable to, and which are sufficient to identify, an individual class within the intersect.py module.
Each item within the inner list is in fact a reference to a data structure, but a custom data structure defined as such by its class definition in the ../pygeo/base/abstracts/_real.py module, which is here being referenced by the shorthand synonym "Real", defined by the import statement near the top of the module, as previously explained.
And we see that each inner list is a set of such data structures, and one can surmise that there is a correspondence between each of these sets and one of the classes defined in the module.
So that the first list within the containing list is:
[Real._Plane,Real._Line]
And in fact we know that the PlaneLineIntersect is a class within the intersects,py module that is defined to be the point of intersection of a plane and a line.
Which should begins to give us an idea of where we might be going here.
But, before going further we need to back off to higher level ground to understand one further aspect of the Inspect function. and Python more generally.
We need to know that there are two basic forms in which an argument ,in Python, can be passed to a class on its construction or a function being called. Those are positional and keyword.
In:
>>> Intersect(plane,line,color=BLUE)
we see examples of each - the plane and line arguments being positional and the last argument being a keyword argument which presumes that the class or function to which it is provided is designed to find meaning in what in this case would be the identifier "color".
We already have seen Intersect to be a function occupied, in some manner, with the signatures of the classes within the module it contains.
It is also true that Intersect, as above, is itself a unit of functionality that can process arguments and therefore will have a signature of its own.
And what are the elements of Intersect's signature?
Found in the defining code, contained in the parenthesis, "(" and ")", following the naming of the function, are the mysterious incantations of
*argsand
**kws
There is a combination of Python magic and Python convention in these particular signature elements, the "*"'s being necessary to achieve the desired result, the references, in particular, to "args" and "kws" after them being convention.
The *'s here imply what they generally imply by convention within the context of data processing - wildcards. The single, "*" with "args" meaning any positional arguments, the "**" with "kws" meaning any keyword arguments. So that the signature of Intersect is of the most general kind, declaring it's intent to handle, in one way or other, any positional and/or keyword arguments passed to it.
This generality is consistent with the function's purpose and design - to which we return.
In effect what is happening is the following:
"method_get", a function defined elsewhere but made available here by the import mechanism which we should recognize. is called and provided with two sets of data, the "__sigs__" list of signature lists and "args", the positional arguments that have been received by Intersect.
The method_get function processes the information provided to it and returns two pieces of information:
- the index of the list, within the __sigs__ list of lists, in which an (unordered) match has been found with the arguments it is processing (or None if no match is found).
- the argument list itself, but now processed so as to assure it to be in the order in which it is found in the __sigs__ sub-list with which it has been matched.
From this information, Intersect proceeds to construct the class that has been identified to match the positional arguments, and returns an instance of that class.
"Returns" meaning what?
"Returns" meaning that when we encounter code in a PyGeo construction that might read:
>>> point1 =Intersect(plane,line, color=BLUE)
or for that matter,
>>> point1 = Intersect(line,plane, color=BLUE)
'point1' is being assigned as a name (by way of the "=" sign) to whatever Intersect as a function returns given the arguments passed to it. That, we now know, is an instance of the PlaneLineIntersect class constructed by reference to, and with dependency upon, the particular line and plane passed as arguments to Intersect.
And what of the keywords?
They are passed to the class constructor just as they are received by the function, but otherwise unprocessed.
So that assuming that the PlaneLineIntersect class definition knows what to do with "color" as a keyword (it does), and assuming that some code has been provided to give meaning to the reference to 'BLUE' (it has) and assuming that meaning is sufficient to induce the rendering engine to elicit a color that is blue (it is), then point1 is blue for the asking.
return to Contents
Anatomy of a class
Place holder for an analysis of the PlanesIntersect class of the intersect.py module
"One of my foreign colleagues gives classes on the esthetics of programming. I advised him to throw away the overhead projector and return to the blackboard. It made him find again the joy of teaching".
Edsger Dijkstra