Casts to/from BooleanType are transformed into comparisons since the JVM does not consider Booleans to be numeric types.
Changes Boolean values to Bytes so that expressions like true < false can be Evaluated.
Coerces the type of different branches of a CASE WHEN statement to a common type.
Converts string "NaN"s that are in binary operators with a NaN-able types (Float / Double) to the appropriate numeric equivalent.
Calculates and propagates precision for fixed-precision decimals.
Calculates and propagates precision for fixed-precision decimals. Hive has a number of rules for this based on the SQL standard and MS SQL: https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf
In particular, if we have expressions e1 and e2 with precision/scale p1/s2 and p2/s2 respectively, then the following operations have the following precision / scale:
Operation Result Precision Result Scale ------------------------------------------------------------------------ e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 * e2 p1 + p2 + 1 s1 + s2 e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1) e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2) sum(e1) p1 + 10 s1 avg(e1) p1 + 4 s1 + 4
Catalyst also has unlimited-precision decimals. For those, all ops return unlimited precision.
To implement the rules for fixed-precision types, we introduce casts to turn them to unlimited precision, do the math on unlimited-precision numbers, then introduce casts back to the required fixed precision. This allows us to do all rounding and overflow handling in the cast-to-fixed-precision operator.
In addition, when mixing non-decimal types with decimals, we use the following rules: - BYTE gets turned into DECIMAL(3, 0) - SHORT gets turned into DECIMAL(5, 0) - INT gets turned into DECIMAL(10, 0) - LONG gets turned into DECIMAL(20, 0) - FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE (this is the same as Hive, but note that unlimited decimals are considered bigger than doubles in WidenTypes)
Hive only performs integral division with the DIV operator.
Hive only performs integral division with the DIV operator. The arguments to / are always converted to fractional types.
This ensure that the types for various functions are as expected.
Promotes strings that appear in arithmetic expressions.
Applies any changes to AttributeReference data types that are made by other rules to instances higher in the query tree.
When encountering a cast from a string representing a valid fractional number to an integral
type the jvm will throw a java.lang.NumberFormatException
.
When encountering a cast from a string representing a valid fractional number to an integral
type the jvm will throw a java.lang.NumberFormatException
. Hive, in contrast, returns the
truncated version of this number.
Widens numeric types and converts strings to numbers when appropriate.
Widens numeric types and converts strings to numbers when appropriate.
Loosely based on rules from "Hadoop: The Definitive Guide" 2nd edition, by Tom White
The implicit conversion rules can be summarized as follows:
Additionally, all types when UNION-ed with strings will be promoted to strings. Other string conversions are handled by PromoteStrings.
Widening types might result in loss of precision in the following cases: - IntegerType to FloatType - LongType to FloatType - LongType to DoubleType
A collection of Rules that can be used to coerce differing types that participate in operations into compatible ones. Most of these rules are based on Hive semantics, but they do not introduce any dependencies on the hive codebase. For this reason they remain in Catalyst until we have a more standard set of coercions.