B4J Tutorial [B4X] Native B4X Implementation of Two Pandas-like Classes: DataFrame and Series

Pandas is a Python libary heavily used in Machine Learning (ML) applications.
While investigating ML, when using Pandas with Python, I kept thinking: "I could do this in B4X".
This tutorial is the result of my experimentation. The tutorial comes in 5 parts.

Part 1. Introduction to Pandas methods and B4X implementation challenges

B4X can communicate with Python through PyBridge and therefore use the Pandas library. That works well.
https://www.b4x.com/android/forum/t...ass-to-facilitate-running-python-code.170072/
My self-assigned challenge was to implement the functionality of Pandas in NATIVE B4X (B4J, B4A).

But B4X and Python are very different languages.

An example of a Pandas series, defined as a 1-dimensional labeled array.
B4X:
    import pandas as pd

    x = pd.series([1, 2, 3, 4, 5])
    x.index = ['a', 'b', 'c', 'd', 'e']
    print(x + 100)
a    101
b    102
c    103
d    104
e    105

How do we do this in B4X?

B4X, like Java, is strongly typed at time of coding. Variables in Python do not have an a priori data type.
Solution: Use the B4X Object type to communicate with the Series class

B4X has a fixed number of arguments in Sub calls. Python can have a variable number of arguments, as well as named arguments.
Solution: Use a List or a Map to pass unnamed and/or named arguments

B4X, in the spirit of avoiding brackets, does not use [] for lists or {} for maps (although there is long-standing wish for these).
Solution: Use Array(...) for [], and automatically convert to a List where needed. Use createMap() for {}.

B4X does not have polymorphism of operators (+,-, *, /). Python does.
Solution: (1) wrap operators in a small sub: x.op("+", 100) and (2) provide an alias for this: x.plus(100)

The above code, using B4X and the Series class:

B4X:
    Private pd As WiLPandas
    pd.Initialize(Me)
  
    Dim x As Series = pd.items(Array(1, 2, 3, 4, 5))
    x.addIndex(Array("a", "b", "c", "d", "e"))
    x.op("+", 100).print
a    101
b    102
c    103
d    104
e    105

Obviously the two versions are not the same, but they are similar in function and 'spirit'.

Note 1: 'as' has different meanings in B4X (instance of) and Python (alias)
Note 2: WiLPandas is the manager class that creates instances of a Series and DataFrames. It also has utilities used by these instances.
Note 3: Most methods accept Objects as arguments. These are turned into Lists and Maps for processing - scalars become one-element Lists

For example code and classes, see Part 2 next.
 
Last edited:

William Lancee

Well-Known Member
Licensed User
Longtime User
Part 2. The B4X Series Class: methods for creation and examples of use

The attached .zip is a demo of some methods in the Series class - just a few examples, there are about 130 Public methods overall.
All of the built-in functions of B4X are implemented.

The first thing to note is that when an instance of a Series is created, it is also analysed and a description is stored with the instance.
When ser1.print is invoked, this description is shown in the header of the output.
Let's look at the output of printing a Series of Objects.
B4X:
    Dim ser1 As Series = pd.items(Array(1, Null, 3))
    ser1.name = "Test"
    ser1.print
'________________ Test Series (3) type=Integer  OK=2  miss=1  min=1.0  max=3.0
            Test
        0    1
        1    ―
        2    3

Observations:
- Series have a name (default: Anon), which can be set with an assignment
- This Series has 3 items
- The dominant datatype is "Integer" (defined as >= 95% of non-missing values)
- There is one missing value in this Series
- The minimum is 1 and the maximum is 3
- The implied index is a sequence starting with 0,1,2
- Of the 2 non-missing values, both are OK Integers (not some other data type)
- A missing value is indicated by a horizontal line ― (Chr(0x2015))

Notes:
1. Series can have any type of datatype or any mixture of datatypes
2. If there is is no dominant datatype then the default is Object
3. If the data is predominantly non-numeric then there is no minimum or maximum
4. When a Series is created (directly or as a result of computation or selection) a new description is created.
5. The argument of .items is an Object. That way we can pass a Map, List, Array, Scalar, or even a custom type.

Here is another example:
B4X:
    Dim A As Series = pd.range(1, 5)
    Dim B As Series = A.mult(3)
    A.div(B).printN(3)
'________________ Anon Series (5) Type=Double  OK=5  Min=0.3333333333333333  Max=0.3333333333333333
        0    0.333
        1    0.333
        2    0.333
        3    0.333
        4    0.333

Observations:
- Series methods almost always return another Series. Therefore they can be chained.
- Numerical operations work between two series OR between a series and a single value
- All floating operations use the Double datatype
- The .printN method is a version of .print where you specify the number of decimal places for numbers

There many methods in Series. The best way to see them is to use intellisense.
After typing the . after a series name, a list pops up. Each method has a description.

The most complex method is ser1.describe, it creates a new labeled series that is a statistical summary of the ser1.
B4X:
    Dim x As Series = pd.items(Array( _
            1,2,3,4,4,4,5,5,Null, "AAA", _
            1,2,3,4,4,4,5,5,Null, "BBB", _
            1,2,3,4,4,4,5,5,1,2,3,4,4,4,5,5, _
            1,2,3,4,4,4,5,5,1,2,3,4,4,4,5,5))
    x.describe.print
'________________ Anon Series (15) type=Object
        Index           
        Type        Integer
        Decimals    0
        Items        52
        Valid        48
        Missing        2
        Invalid        2
        Exceptions    [AAA, BBB]
        Minimum        1
        Maximum        5
        Unique        5
        MostFreq    4
        Sum            168
        Mean        3.50
        Variance    1.787
        StdDev        1.337
Observations:
- The header is of the new series, i.e. the statistical report - it has 15 rows and no dominant datatype
- Individual items are accessible (like all series), e.g.: x.describe.get("Mean")
- Some of the rows will be different for non-numerical data
- x has a dominant data type: Integer
 

Attachments

  • WiLPandas2.zip
    121.3 KB · Views: 48
Last edited:

William Lancee

Well-Known Member
Licensed User
Longtime User
Part 3. The B4X DataFrame Class: methods for creation and examples of use

Series are useful, especially if you prefer to avoid loops and/or are mathematically inclined.
But DataFrames are 'extremely' useful. In a nutshell, they encapsulate datasets, matrices, and tables.
In B4X terminology, a DataFrame is a Map of Series, plus some extras.
In the key/values of this map, the keys are column names and the values are series of objects.
All series in a given DataFrame have the same length, but they can have any mixture of datatypes.
Objects in a Series are often of the same datatype, but not always.
Because data columns are series, DataFrames inherit most of the methods of Series. 'Inherit' here means 'delegate'.

Let's look at the output of printing a DataFrame of three columns.
B4X:
    Dim ser1 As Series = pd.items(Array(420, 380, 390))
    Dim ser2 As Series = pd.items(Array("short", "medium", "long"))
    Dim ser3 As Series = pd.items(Array(False, True, False))
    Dim df1 As DataFrame = pd.newDF(CreateMap("Calories": ser1, "Duration": ser2, "AboveMax": ser3))
    df1.print
'________________ DataFrame ncols = 3  nrows = 3
        Calories    Duration    AboveMax
    0    420            short        false
    1    380            medium        true
    2    390            long        false
        Integer        String        Boolean

Observations:
- Like Series, DataFrames have an implicit index 0,1,2,... An explicit string index can be added.
- There is a header that provides the shape of the DataFrame
- The dominant datatype for each series is shown at the bottom of the output
- In the B4X Logs, these sections of the output are color coded - these colors can be set in code

Notes:
1. You can create a DataFrame with mapped series as above, or with map of arrays, or with an array of arrays or lists.
2. If you don't specify columns names, then the default column names will be Col_0, Col_1, etc.
3. You can add an explicit index that has the capacity for grouping records/rows in various methods
4. If there is no dominant datatype in a column the datatype will be set to 'Object'
5. In Part 4 of this tutorial you'll find out about browsing a DataFrame with a user interface (B4J and B4A)
6. In Part 5 of this tutorial you'll learn about importing/exporting external files, and about interfacing with Python (B4J)
7. In the attached demo project of Part 3, there are some novel techniques used for displaying both the code and output in the logs.
 

Attachments

  • WiLPandas3.zip
    41.2 KB · Views: 37
Last edited:

William Lancee

Well-Known Member
Licensed User
Longtime User
Part 4. A dataframe browser and import/export techniques

Both Series and DataFrames do not require a UI and work in both B4J and B4A without changes.
But as you experiment with DataFrames, it becomes clear that a browser/editor would be useful.
This requires some UI elements and cross-platform adaptations.

Rather than changing the already heavy Series and DataFrame classes, I created this as a separate set of classes.
B4Xpages provides the ideal platform for these UI features - the browser is a B4XPage.
I kept it simple - with callback functions to the calling module for handling any user interactions.

B4X has more than a few options for grid/table views - they all have their strengths.
I decided to make something that fits DataFrames well and that can adapt to different devices and orientations.
I did apply my personal preferences, but I tried to keep the style options open.

Here is how you invoke the DataFrame browser - I call it GridPage:
B4X:
    Dim df1 As DataFrame = pd.loadDataFrame(MyDir, "Cereal_Preference_Rating 2026-01-03T014329.dfz")
    GW.createTable("Cereal Preference Rating", df1)
    GW.TitleColor = "Forest Green"
    B4XPages.ShowPage("GW")

And this is what you see - screenhsots from B4A:
landScapeGrid.png


portraitGrid.png


Notes:
1. "Cereal_Preference_Rating 2026-01-03T014329.dfz" is a previously saved DataFrame in binary format (using b4XSerializator)
2. Without handlers the GridPage does not respond any User actions, it is just a navigable viewport of the DataFrame.
3. There are three common GridPage handlers located in B4XMainPage: GridClicked, ToolClicked, SaveAs. You can make your own tools and handle them.
4. There are several GridPage utilities that are invoked by the GridPage handlers, examples: ShowTools, HighLightRow, ShareSelected, ChooseCol
5. The Report class is used by one of the GridPage handlers to display a statisical report
6. The GridPage class uses three layouts and is dependent on four support classes: Point, Colors, Fonts, and TextFlow(for B4J)
7. Navigation buttons are in the four margins. Buttons are only shown when an action is possible.
8. Column width and alignment is automatically determined by dominant datatype of the column series.
9. The Fonts class does some magic with font sizes to adapt to screensizes. The Colors class provides names for color values.

The best way to understand the power of GridPage is to try it out - use the attached .zip.
Navigate, Click headers to select a column. Click Tools (top-left cell) and try things.
Also give it a try in B4A.
 

Attachments

  • WiLPandas4.zip
    75.4 KB · Views: 47

William Lancee

Well-Known Member
Licensed User
Longtime User
Part 5. (B4J only) Techniques for integration with Python. If you try to run this example in B4A, you'll get a message.

Recently I posted a class to facilitate the use of PyBridge.
https://www.b4x.com/android/forum/t...ass-to-facilitate-running-python-code.170072/
We'll use that class here.

'This Python script accepts a B4X DataFrame and returns a row/record
B4X:
    Dim script As String = $"
import pandas As pd
def getRecord (mp, indices, recordId):
    df = pd.DataFrame(mp)
    df.index = indices
    record = df.loc[recordId].to_dict()
    Return record
"$
    Wait For (PW.clean(script)) Complete (script As String)
    Wait For (PW.call(script, Array(df1.toMap, df1.indices, "10043"))) Complete (obj As Object)
    Log(obj.As(Map).Get("name"))                        'Maypo


Notes:
1. df1 is the DataFrame created in Part 4 of this Tutorial
2. To communicate DataFrames between B4X and Python is easy. They are simple maps of columns, with a row index.
3. There are some necessary actions you have to take to use PyWorks and PyBridge, so go to the above link first.

The last examples in the attached .zip file are ways of integrating with two SQL databases: SQLite and DuckDB.
Using PyWorks/PyBridge makes it easy to use Python Pandas to do the work and then get the DataFrame map with PyWorks.

The examples I posted show how to use the builtin B4X SQL library.
DuckDB is used frequent in Machine Learning apps, since it is Column based rather Row-based (like SQLite).
This makes it faster on column vectors. DuckDB has also sophistigated import/export features.
For example, creating a table directly from a .csv file. Recently I posted an example of its use:

This is the end of this Tutorial. If you have any problems with running the examples, feel free to post here.
This is a complex project, so in all likelihood, despite extensive testing, there are bugs and missed edge cases.
I would appreciate it if you would report these here, so I can fix the bugs and improve the code.

Finally, this project was a joy to work on. I have been using it for a while.
I like the way it simplifies code and makes it more transparent and more compact.
You may find it useful in your own work.
At the very least, the boundary-defying techniques could be educational or even entertaining.

1. Feel free to use it or parts of it for your own purposes.
2. Modify at will. Please rename classes to avoid confusion.
3. Report errors and improvements here. I would like to improve upon it if I can.
4. If in the future the [] and {} brackets are implemented, these classes will still work as is!
 
Top