Built In Types: Arrays
The preferred object types for providing container-like storage and operations are
vec
, dict
, and keyset
. These supersede the earlier Hack types
Vector, Map, and Set, and the
legacy array type inherited from PHP. Additional special types,
varray and darray, exist to provide a migration
path for code that currently relies on legacy arrays, and should not be used
otherwise.
Note that the plan is for both legacy arrays, and these migration types, to be eventually removed from the Hack language. We expect this to happen at some point during 2020.
Nomenclature
When talking about Container types, Hack programmers have names for different categories of types. They overlap and are confusing to the uninitiated. Here are some of the most common terms.
Arrays
: When not prefixed with any modifiers or other context usually refers to the value Containers. The value Containers are
Hack arrays
: This specifically refers to the last three (dict
, keyset
, and vec
).
Hack Collections
or Legacy Hack Collections
: This refers to the non-value type Containers from Hack's early life. The Hack Collection types are:
ConstMap
,MutableMap
,ImmMap
, andMap
ConstSet
,MutableSet
,ImmSet
, andSet
ConstVector
,MutableVector
,ImmVector
, andVector
Pair
The Const?
and Mutable?
are interfaces. The Imm?
implements Const
and the unprefixed version implements both.
Hack Containers
: This is a combination of the Hack arrays
and Hack collections
.
Value Containers
: See arrays.
vec
A vec (short for vector) is a data structure that contains a collection of zero or more elements whose values are each accessed through a
corresponding int
key. As the number of elements in a vec can change at runtime, a vec type declaration does not include an element count.
Consider the following class properties:
private vec<string> $colorsVec = vec["red", "white", "blue"];
private vec<num> $valuesVec = vec[123, 567.89];
$colorsVec
has an implicit key type of int
, and an explicit value type of string
. The vec is created and initialized using a vec-literal,
which has the general form vec[
...]
. There are three elements, and their corresponding keys are 0, 1, and 2. $valuesVec
has an implicit key
type of int
, and an explicit value type of num
. There are two elements (of type int
and float
, respectively), and their corresponding keys
are 0 and 1.
An existing element is accessed via the subscript operator, []
, and the value of an element can be changed; however, new values can only be added
to the end by using subscript []
, and key values always start at 0, and go up in increments of 1. Consider the following:
$colors = vec[]; // create an empty vec
\var_dump($colors);
$colors[] = "red"; // add element 0 with value "red"
$colors[] = "white"; // add element 1 with value "white"
$colors[] = "blue"; // add element 2 with value "blue"
\var_dump($colors);
$colors[0] = "pink"; // change element 0's value to "pink"
\var_dump($colors);
$colors = vec["green", "yellow"]; // create a vec of two elements
\var_dump($colors);
vec(0) {
}
vec(3) {
string(3) "red"
string(5) "white"
string(4) "blue"
}
vec(3) {
string(4) "pink"
string(5) "white"
string(4) "blue"
}
vec(2) {
string(5) "green"
string(6) "yellow"
}
The foreach
statement can be used to iterate over the elements in a vec, starting at the element with key 0. This statement provides a way to
access the key and value for each element.
Elements cannot be removed from a vec.
dict
While a vec always has a key type of int, a dict (short for dictionary) requires the key type to be specified. As the number of elements in a dict can change at runtime, a dict type declaration does not include an element count. Consider the following class properties:
private dict<int, string> $colorsDict = dict[1 => "white", 2 => "blue", 0 => "red"];
private dict<int, num> $valuesDict = dict[0 => 123, 1 => 567.89];
A dict has an explicit key type, as well as a value type. In this example, the dict is created and initialized using a dict-literal, which
has the general form dict[
...]
, in which explicit key values are required. Unlike a vec, key values in a dict need not start at 0, nor need
they be consecutive.
Note, carefully, that although $colorsVect
and $colorsDict
result in collections of int
/string
pairs, these variables do not have the
same (or even compatible) types! That said, there are library functions to convert between vec and dict objects.
Consider the following:
$colors = dict[]; // create an empty dict
\var_dump($colors);
$colors[4] = "black"; // create element 4 with value "black"
\var_dump($colors);
$colors[4] = "red"; // replace element 4's value with "red"
$colors[8] = "white"; // create element 8 with value "white"
$colors[-3] = "blue"; // create element -3 with value "blue"
\var_dump($colors);
$colors = dict[
-10 => "white",
12 => "blue",
0 => "red",
]; // create a dict with 3 elements
\var_dump($colors);
dict(0) {
}
dict(1) {
[4]=>
string(5) "black"
}
dict(3) {
[4]=>
string(3) "red"
[8]=>
string(5) "white"
[-3]=>
string(4) "blue"
}
dict(3) {
[-10]=>
string(5) "white"
[12]=>
string(4) "blue"
[0]=>
string(3) "red"
}
The key type can be string
; for example:
private dict<string, int> $fruitDict = dict['oranges' => 25, 'apples' => 12, 'pears' => 17];
The key type must be int
or string
. The arraykey
type, representing a
union of int
and string
, is also allowed, resulting in a dict that accepts
both int
and string
keys (but, barring some special cases such as migrating
from untyped code, it's usually better to be explicit and choose only one key
type).
A dict is represented as an ordered map in which each entry is a key/value pair that represents an element. Duplicate keys are not permitted. The order of the elements in the map is the order in which the elements were inserted into the dict. An element is said to exist once it has been inserted into the dict with a corresponding key. A dict is extended by initializing a previously non-existent element using a new key.
The foreach
statement can be used to iterate over the elements in a dict, in their order of insertion. This statement provides a way to access
the key and value for each element.
To remove an element from a dict, use unset
.
keyset
A keyset is a data structure that contains an ordered collection of zero or more elements whose values are the keys. And as array keys must have
type int
or string
, keysets are restricted to homogenous collections of
values of those types (the arraykey
type can be used to declare a keyset that
accepts both int
and string
values, but it's usually better to be explicit
and choose only one type). As the number of elements in a keyset
can change at runtime, a keyset type declaration does not include an element count. Consider the following class properties:
private keyset<string> $colorsSet = keyset["red", "white", "blue"];
private keyset<int> $valuesSet = keyset[22, 55, 77, 99];
private keyset<arraykey> $thingsSet = keyset["red", "white", 123];
$colorsSet
has a key and value type of string
. The keyset is created and initialized using a keyset-literal, which has the general
form keyset[
...]
. There are three elements, and their corresponding keys are their values. $valuesSet
has a key and value type of int
.
$thingsSet
has a key and value type of arraykey
, so it can contain both int
- and string
-typed elements.
An existing element is accessed via the subscript operator, []
; the value of an element cannot be changed. And new values can only be added
by using subscript []
. Consider the following:
$colors = keyset[]; // create an empty keyset
\var_dump($colors);
$colors[] = "red"; // add element with key/value "red"
$colors[] = "white"; // add element with key/value "white"
$colors[] = "blue"; // add element with key/value "blue"
\var_dump($colors);
$colors = keyset[
"green",
"yellow",
"green",
]; // create a keyset of two elements
\var_dump($colors);
echo "\$colors[\"green\"] = ".$colors["green"]."\n";
keyset(0) {
}
keyset(3) {
string(3) "red"
string(5) "white"
string(4) "blue"
}
keyset(2) {
string(5) "green"
string(6) "yellow"
}
$colors["green"] = green
Attempts to add duplicate keys are accepted but ignored.
The foreach
statement can be used to iterate over the elements in a keyset, in their order of insertion.
A keyset can also be constructed from any Traversable type, using the traversable constructor. For example:
$array = array('a', 'b', 'c');
$set = keyset($array); // constructs $set from an array
To check membership in a keyset, use C\contains
or C\contains_key
. To remove an element from a keyset, use unset
.
Similar to dict
, the order of the elements in a keyset is the order in which the elements were inserted.
Using dicts, keysets, and vecs
Hack arrays are generally manipulated using the C\
, Dict\
, Keyset\
and Vec\
functions from the Hack Standard Library; the most common operations are:
Operation | vec |
dict |
keyset |
---|---|---|---|
Initialize empty | $v = vec[]; |
$d = dict[]; |
$k = keyset[]; |
Literal | $v = vec[1, 2, 3]; |
$d = dict['foo' => 1]; |
$k = keyset['foo', 'bar']; |
From Another Container* | $v = vec($container); |
$d = dict($keyed_container); |
$k = keyset($container); |
Keys from Container* | $v = Vec\keys($container); |
N/A | $k = Keyset\keys($container); |
Add Elements | $v[] = 4; |
$d['baz'] = 2; |
$k[] = 'baz'; |
Bulk Add Elements | $v = Vec\concat($t1, $t2) |
$d = Dict\merge($kt1, $kt2) |
$k = Keyset\union($t1, $t2) |
Remove Elements | $v = Vec\concat(Vec\take($t1, $n), Vec\drop($t1, $n + 1)) |
unset($d['baz']); |
unset($k['baz']); |
Key Existence | C\contains_key($v, 1) |
C\contains_key($d, 'foo') |
C\contains_key($k, 'foo') |
Value Existence | C\contains($v, 3) |
C\contains($d, 2) |
N/A |
Get or default | idx($v, 0, 999) |
idx($d, 'foo', 'default') |
N/A |
Equality (Order-Dependent) | $v1 === $v2 |
$d1 === $d2 |
$k1 === $k2 |
Equality (Order-Independent) | N/A | Dict\equal($d1, $d2) |
Keyset\equal($k1, $k2) |
Count Elements (i.e., length, size of array) | C\count($v) |
C\count($d) |
C\count($k) |
Type Signature | vec<Tv> |
dict<Tk, Tv> |
keyset<Tk> |
Type Refinement | $v is vec<_> |
$d is dict<_, _> |
$k is keyset<_> |
Awaitable Consolidation |
Vec\from_async($v) Vec\gen($v) |
Dict\from_async($d) Dict\gen($d) |
Keyset\from_async($x) Keyset\gen($x) |
$container
can be a dict
, keyset
, vec
, or one of the legacy types of containers described below.
Legacy Vector, Map, and Set
These container types should be avoided in new code; use dict<Tk, Tv>
, keyset<T>
, and vec<T>
instead.
Early in Hack's life, the library provided mutable and immutable generic class types called: Vector
, ImmVector
, Map
, ImmMap
, Set
,
and ImmSet
. However, these have been replaced by vec
, dict
, and keyset
, whose use is recommended in all new code. Each generic type
had a corresponding literal form. For example, a variable of type Vector<int>
might be initialized using Vector {22, 33, $v}
, where $v
is a variable of type int
.
Vector
s, Map
s and Set
s are generally interacted with via instance methods on the classes.
Operation | Vector |
Map |
Set |
---|---|---|---|
Initialize empty | $v = Vector {}; |
$m = Map {}; |
$s = Set {}; |
Literal | $v = Vector {1, 2, 3}; |
$m = Map {'foo' => 1}; |
$s = Set {'foo', 'bar'}; |
Add Elements | $v[] = 4; |
$m['baz'] = 2; |
$s[] = 'baz'; |
PHP arrays: array, varray and darray
These container types should not be used in new code, and we aim to remove them from the language; use dict<Tk, Tv>
, keyset<T>
, and vec<T>
An array
is a legacy container type inherited from PHP; it can represent a keyed container (array<Tk, Tv>
like dict<Tk, Tv>
) or
an unkeyed container (array<Tv>
like vec<Tv>
) - or these behaviors can be mixed, or unspecified (array
without generics).
We aim to remove array
from the language by making it practical to replace array
s with either vec
or dict
; the varray
(vec-like array) and darray
(dict-like array) types exist to make this practical.
While migrating directly from array
, varray
, and darray
to vec
/dict
/keyset
may be desirable and practical in small projects, it is extremely difficult
to make this change safely in large legacy codebases, so we are instead adjusting the way the types work over time:
- identify which
array
s should probably bevec
s and which should bedict
s - mark these with the special types
varray<T>
(instead ofvec<T>
) anddarray<Tk, Tv>
(instead ofdict<Tk, Tv>
) - use runtime logging to locate when a
varray
is used like a keyed container and when adarray
is used like an unkeyed container (e.g. when appended to) - gradually make
varray
behave more likevec
anddarray
behave more likedict
- remove the
varray
type when it behaves identically tovec
, and remove thedarray
type when it behaves identically todict
Common operations include:
Operation | varray |
darray |
---|---|---|
Initialize empty | $v = varray[]; |
$m = darray[]; |
Literal | $v = varray[1, 2, 3]; |
$m = darray['foo' => 1]; |
Add Elements | $v[] = 4; |
$m['baz'] = 2; |
Remove elements | unset($v[$x]); (may convert to darray if not last element) |
unset($m['baz']) |
varray
and darray
also include additional legacy behaviors from PHP arrays that do not fit with the Hack type system; for example, in HHVM 4.36, invalid array keys can be used at runtime
and are silently converted to a valid arraykey
:
$x = darray[false => 123];
var_dump(array_keys($x)[0]);
// int(0), not `bool(false)`
Additionally, the varray_or_darray<T>
type is an alias of varray<T> | darray<_, T>
; on HHVM 4.36, array
is an alias of this type, but this will be removed in the near future. This type should
be avoided wherever possible.
Runtime options
By default, varray
and darray
are interchangeable at runtime; this can be changed, as can some legacy PHP array behaviors, depending on the HHVM version.
The available runtime options change frequently; to get an up-to-date list, search ini_get_all()
for settings beginning with hhvm.hack_arr
; in general:
- a value of
0
means no logging 1
means a warning or notice is raised2
means a recoverable error is raised
The hhvm.hack_arr_compat_notices
option must be set to true for any of the hhvm.hack_arr_
options to have an effect.
Individual runtime settings are documented here.
.hhconfig
options
disallow_array_literal=true
disallowsarray(...)
and[...]
, forcing all such literals to be converted to eithervarray[...]
ordarray[...]
. This option causes errors on HHVM versions prior to 4.25 due to variousarray
literals in built-in functions, so it's only useful with newer versions of HHVM.disallow_array_typehint=true
disallows usingarray
, with or without generic type parameters, in any type annotations (function parameters, return values, instance variables, etc.). Currently, this option also causes errors because of variousarray
typehints in built-in functions, which makes it hard to use, but will likely become more useful in the future, as these functions are migrated.
Converting to legacy collections and PHP arrays
Converting | To Vector |
To Map |
To Set |
To varray |
To darray |
---|---|---|---|---|---|
dict |
N/A | new Map($d) |
N/A | N/A | darray($d) |
dict keys |
Vector::fromKeysOf($d) |
N/A | Set::fromKeysOf($d) |
array_keys($d) PHP\array_keys($d) |
N/A |
dict values |
new Vector($d) |
N/A | new Set($d) |
varray($d) |
N/A |
vec |
new Vector($v) |
new Map($v) |
new Set($v) |
varray($v) |
darray($v) |
keyset |
new Vector($k) |
new Map($k) |
new Set($k) |
varray($k) |
darray($k) |
Value vs. reference semantics
An important distinction between arrays
and Collections
is their interation with value semantics.
All arrays
(recognizable by their lowercase names) have value semantics.
All Collections
(recognizable by their uppercase names) have reference semantics.
What are reference semantics (disambiguation)
Reference semantics
does NOT refer to the deprecated &$references
.
It addresses how an instance behaves when assigned / passed into or returned from functions.
An object has reference semantics.
If I pass you a Person
instance and you set the last name to "Doe"
, I'll be able to observe that change on my instance ($emma
).
<<__EntryPoint>>
function main(): void {
$john = new Person('John', 'Doe');
$emma = new Person('Emma', 'Smith');
echo Str\format(
"%s's last name was %s before she got married.\n",
$emma->firstName,
$emma->lastName,
);
marry($john, $emma);
echo Str\format(
"%s's last name became %s after she got married.\n",
$emma->firstName,
$emma->lastName,
);
}
function marry(Person $a, Person $b): void {
$b->lastName = $a->lastName;
}
class Person {
public function __construct(
public string $firstName,
public string $lastName,
) {}
}
Emma's last name was Smith before she got married.
Emma's last name became Doe after she got married.
However, if you were to do the same with an array type, you'd not observe the changes in your variable ($emma
).
<<__EntryPoint>>
function main(): void {
$john = make_person('John', 'Doe');
$emma = make_person('Emma', 'Smith');
echo Str\format(
"%s's last name was %s before she got married.\n",
$emma['first'],
$emma['last'],
);
marry($john, $emma);
echo Str\format(
"%s's last name is still %s after she got married.\n",
$emma['first'],
$emma['last'],
);
}
/**
* This function is now not doing anything,
* since the modifications are function local.
*/
function marry(dict<string, string> $a, dict<string, string> $b): void {
$b['last'] = $a['last'];
}
function make_person(
string $first_name,
string $last_name,
): dict<string, string> {
return dict[
'first' => $first_name,
'last' => $last_name,
];
}
Emma's last name was Smith before she got married.
Emma's last name is still Smith after she got married.
If you want to modify a value type parameter, see inout
.
This will copy out the changes made in marry()
when the function finishes.
Watch out! Collections have reference semantics!
<<__EntryPoint>>
function main(): void {
$john = make_person('John', 'Doe');
$emma = make_person('Emma', 'Smith');
echo Str\format(
"%s's last name was %s before she got married.\n",
$emma['first'],
$emma['last'],
);
marry($john, $emma);
echo Str\format(
"%s's last name is now %s after she got married.\n",
$emma['first'],
$emma['last'],
);
}
/**
* I modify $b and you'll be able to observe these changes in your $emma.
*/
function marry(Map<string, string> $a, Map<string, string> $b): void {
$b['last'] = $a['last'];
}
function make_person(
string $first_name,
string $last_name,
): Map<string, string> {
return Map {
'first' => $first_name,
'last' => $last_name,
};
}
Emma's last name was Smith before she got married.
Emma's last name is now Doe after she got married.
This difference makes it very difficult to migrate from Containers to arrays.
It is recommended to not share a mutable reference to a Collection accross a codebase.
If this is required for the correctness of your program, but you are migrating to Hack arrays, you can use HH\Lib\Ref<T>
.
Pass a Ref<dict<string, string>>
around your program instead of a Map<string, string>
.
This makes it obvious that you intend to mutate the dict.
When do two value Containers become separate things?
From the perspective of the programmer, as soon as you give the value a (new) name.
$emma = dict['first' => 'Emma', 'last' => 'Smith'];
$another_emma = $emma;
echo Str\format("$%s's last name is %s.\n", 'emma', $emma['last']);
$emma['last'] = 'Doe';
echo Str\format(
"$%s's last name has been changed to %s.\n",
'emma',
$emma['last'],
);
echo Str\format(
"$%s's last name has not been changed and is still %s.\n",
'another_emma',
$another_emma['last'],
);
$emma's last name is Smith.
$emma's last name has been changed to Doe.
$another_emma's last name has not been changed and is still Smith.
Within the assignment $another_emma = $emma
, a logical copy was made.
From the programmer's perspective, you now have two copies of the emma dict.
The hhvm engine is free to defer this copy at runtime until changes would be observable.
If you'd never change any elements of $emma
, $another_emma
could point to the same memory as $emma
.
This saves memory and time.
However, when you modify $emma
, $another_emma
MUST NOT reflect these changes.
This optimization is called Copy on Write or CoW
for short.
This is also why you'll sometimes hear that arrays have copy-on-write semantics.
This is however an implementation detail.
HHVM could just make a copy every time a logical copy is made and be correct.